VoxForge
Hi,
I have a question about filler words to use in pocketsphinx. for example I have following entries in filler dictionary:
++noise++ +noise+
++breath++ +breath+
Then my transcript as follows:
<s> where ++breath++ is my car ++noise++ </s>
My question is above statement is true ? and if I have continuous noise such as speech recorded at outside then how can I map it in transcripts ? the following statement is true or not ?
<s> ++noise++ where is my car ++noise++ </s>
Pls let me know your comments
--- (Edited on 8/7/2009 10:01 am [GMT-0500] by ercani) ---
The idea in filler dictionary is to map different types of sounds to different phones in filler dictionary. So if you have the same type of noises two times, you can indeed use
<s> ++noise++ where is my car ++noise++ </s>
Please note that the sentence above is not a statement, you can't ask if it's true or not.
Also please note that for noisy environments with noise present all the time the usage of noises in transcripts is sensible. Different techniques like noise cancellation should be used.
--- (Edited on 8/11/2009 2:40 pm [GMT-0500] by nsh) ---
Thank you very much for your reply.
I got your mean about the usage of 2 times noises at the same type.
On the other hand, I am a bit confused with continuous noise. if I have continuous noise in transcripts then how can I map it ?
<s> [start_noise] where is my car [end_noise] </s>
how can I let the trainer to understand "start_noise" and "end_noise" ? Some documents map it in filler dictionary as follows:
[start_noise] +noise+
[end_noise] +noise+
can I use tags [start_noise] [end_noise] in my transcripts as above ?
I recorded some speech at noisy environment, intentionally since I will use pocketsphinx in car environments. I need to do something about mapping of cont. noise.
Could you please let me know your comments.
Ercan
--- (Edited on 8/12/2009 1:20 pm [GMT-0500] by ercani) ---
> can I use tags [start_noise] [end_noise] in my transcripts as above?
No, it has no meaning
> I need to do something about mapping of cont noise
Noise cancellation is widely covered by textbooks, I suggest you to read some of them like this one:
http://books.google.com/books?id=qfMq0Wy6ZnkC
--- (Edited on 8/12/2009 1:36 pm [GMT-0500] by nsh) ---
you mean it does not make sense if I record speech at noise intentionally..
thanks
ercan
--- (Edited on 8/12/2009 1:42 pm [GMT-0500] by ercani) ---
> you mean it does not make sense if I record speech at noise intentionally
I never wrote anything like this. If you'll read the post above you'll probably see that "it has no meaning" is the answer on "can I use [start noise] and [end noise]"
--- (Edited on 8/12/2009 1:59 pm [GMT-0500] by nsh) ---
ok, thanks for your help.
--- (Edited on 8/12/2009 3:58 pm [GMT-0500] by Visitor) ---
Hi,
I have one more question but it is about language model.
I am preparing turkish language model for pocket sphinx. there will be 2 language model; one is for command and control application, another one is for cont. recognition. If so,
1) Can I use turtle class language model model which is available on cmu web page for command and control application? It is compatiable with pocket sphinx ? If so, how is its setttings to use it with pocketsphinx.
2) For cont. speech recog. I will use 3-gram model and need to parse the words to its stems since turkish words is longer and has derivatives. For ex, I want to parse the following turkish word as:
the word : " Arkadaşlarını ". ( it means "all of your friends" )
Can I parse it in the language model as: Arkadaş +larını. (Arkadaş is native form and means "friend", larını means "all of your")
"+larını " is a stem ending and can be connected to other words in the language model to gain stem-ending to words that is in native form.
If so, how will I use + sign ? CMU SLM kit will recognize + sign in the 3-gram language model ?
Since, device has limited sources, I dont want to parse the words more than one part. It will be enough to write the word in 2 parts.
I have checked finites state transducers but it is more complicated and it parse the words more than 3-4 stems which is not good for cpu usage.
I know I asked many question but I am new in speech recog sorry for inconv.
Could you let me know your advise about this..
Br,
ercan
--- (Edited on 8/16/2009 5:59 am [GMT-0500] by ercani) ---
> 1) Can I use turtle class language model model which is available on cmu web page for command and control application? It is compatiable with pocket sphinx ? If so, how is its setttings to use it with pocketsphinx.
You can use it. The required script ot start is listed at http://www.speech.cs.cmu.edu/sphinx/models/lm/turtle-class-lm/. You just can replace
S2CONTINUOUS=/usr/local/bin/sphinx2-continuous with pocketsphinx_continuous
> Can I parse it in the language model as: ArkadaÅ +larını.
You can do that, with Morfessor CatMAP http://www.cis.hut.fi/projects/morpho/ for example.
> CMU SLM kit will recognize + sign in the 3-gram language mode?
I'm not sure what do you mean by "recognize", it doesn't care about + or any other symbol. It only cares about spaces.
--- (Edited on 8/16/2009 9:21 am [GMT-0500] by nsh) ---
--- (Edited on 8/16/2009 9:23 am [GMT-0500] by nsh) ---
Thanks for your reply.
I mean if I parse the word " arkadaslarini" as "arkadas +larini" then you said that it doesn't care about + sign. Then I tried 3-gram model and obtained following:
+larini </s> -0.00..
<s> arkadas -0.0...
Thus, you mean is following example?:
Say I have 2 words in the language model and parsed as:
<s> ..... komsu </s>
<s> ...... arkadas +larini </s>
then user utter a word as "komsularini" then the language model can add "larini" to "komsu" as follows:
"komsu +larini"
By this way, should I keep "larini" as a vocabulary unit in the language model?
Thanks and Br,
ercan
--- (Edited on 8/16/2009 3:57 pm [GMT-0500] by ercani) ---