General Discussion

Flat
Simon Dialog Manager and Julian Speech Recognition
User: kmaclean
Date: 4/19/2007 10:24 am
Views: 13582
Rating: 45

My post to the Simon SourcForge Forum 

Hi bedahr, 
 
I managed to translate enough of the GUI (using Google translation, and recompiling the source) to get a basic understanding of what Simon does/will do. 
 
Some questions/clarifications (note: these comments are based on my rough translations from German to English using Google, so I may be misinterpreting things because of this ...) : 
 
juliusd 
 
The Julius Daemon (juliusd) seems like it starts *Julian* in server mode, and then opens up a console that is essentially a replacement for jcontrol. Does juliusd essentially act as an API to Julian for Simon? i.e. Simon has no direct contact with Julian, and only gets recognition Julian results from juliusd?  
 
Juliusd has a configuration setting that points to a julian.jconf. I guess that this is where julian gets pointed to its Acoustic Model. I had to modify the juliusd settings as follows to get things to work: 
Command: julian 
Arguments: -input mic -C julian.jconf  
 
I put the julian.jconf configuration file in the juliusd/bin directory, with the following configurations: 
-h acoustic_model_files/hmmdefs 
-hlist acoustic_model_files/tiedlist 
 
I then copied the most current VoxForge Acoustic Models to the juliusd/bin/acoustic_model_files directory, and started juliusd in its own console as follows: 
$cd juliusd/bin/ 
$juliusd 
 
The output in the juliusd console looked like some of the output usually seen with Julian starts-up, so I think I got things set properly. 
 
How does the “Send one Test Word” work? I'm not sure I understand what it is supposed to do... 
 
 
Simon 
 
System tab 
In the Simon System tab, you set your Julian grammar files (the .voca and .grammar files), and the corresponding system commands. So it seems like you could speak a command, and Simon would send the request to the operating system or application. Will Simon be sending the commands to x-windows (like x-voice), or will it use some other method? Do you have any sample command files? - I'd like to get an idea of what format they are supposed to be in. 
 
You can also point to a pronunciation dictionary and a prompts file. It seems like Simon is a GUI front that can permit someone to add new words to a Julian Grammar file, using pronunciation information from the pronunciation dictionary (so the user does not have to enter phonemes by hand). 
 
I seem to be able to connect to juliusd, but am not quite sure how everything is supposed to work. When I click the Connect link in Simon, I get a message in juliusd that the server was connected, but I am not sure how recognition, and corresponding command execution is to take place. 
 
Word add 
In the main window for Simon, there is a Word icon that seems to let you add a new word, and record speech audio corresponding to that word. I assume that this is for the purpose of gathering acoustic data so that the julian acoustic model can be adapted using audio for the new word. 
 
I'm wondering how any new words might be trained into the Acoustic Model. If you need to use HTK, it would add a level of complexity for new users, because they would have to download the HTK source themselves and compile it (since HTK has distribution restrictions on the source and binaries). 
 
Word List 
This seems like the repository for all the words added in word add ... i.e. a place where you can manage all the words that have been added to the system.  
 
It also seems like it is set up to point to an Acoustic Model trainor (like HTK ...?) - is that what it is for? 
 
Train 
Seems like a way to prompt users to record sentences – using either their own text that they import or some predefined text. But I am not clear how this is to be used to update the acoustic models – since there is no sentence repository GUI front-end, as there is for Word Adds. 
 
Implement 
This seems like where you select the programs that Simon will be sending recognized commands to. 
 
 
Thanks, 
 
Ken

--- (Edited on 4/19/2007 11:24 am [GMT-0400] by kmaclean) ---

Re: Simon Dialog Manager and Julian Speech Recognition
User: kmaclean
Date: 4/19/2007 10:25 am
Views: 414
Rating: 37

post from bedahr :

 

Hi Ken! 
 
> I managed to translate enough of the GUI (using Google translation, and recompiling the source) to get a basic understanding of what Simon does/will do.  
 
Wow, that was fast. Have you replaced the german text or have you added the translation, using qt4s translation features? Because we have to provide a german version too - our targets don't even know English (yet). 
 
 
> juliusd  
>  
> The Julius Daemon (juliusd) seems like it starts *Julian* in server mode, and then opens up a console  
> that is essentially a replacement for jcontrol. Does juliusd essentially act as an API to Julian for  
> Simon? i.e. Simon has no direct contact with Julian, and only gets recognition Julian results from  
> juliusd?  
 
Simon provides a very basic network socket based connection to the real recognizer. Juliusd is more/less a sample daemon that uses this socket, parses julius/julian output and writes it to that socket. 
Simon doesn't really care if the recognition is done by julius, julian or for example sphinx. 
 
> Juliusd has a configuration setting that points to a julian.jconf. I guess that this is where julian  
> gets pointed to its Acoustic Model. I had to modify the juliusd settings as follows to get things to  
> work:  
> Command: julian  
> Arguments: -input mic -C julian.jconf  
 
The current settings are just for testing purposes and actually uses a bug: if the process exits immediatly, juliusd probably won't recognize that and thinks it is still running (we keep this minor bug to simplify the dev. process as we don't need a real recognizer that way. We just have to supply a command that will immediatly fail - for example a program that doesn't even exist. That is why we used "juliano" - it's just a command that will return fast enough). 
 
> The output in the juliusd console looked like some of the output usually seen with Julian starts-up,  
> so I think I got things set properly.  
 
Generally juliusd just starts a process and monitors it output - it's really that simple. 
 
> How does the ?Send one Test Word? work? I'm not sure I understand what it is supposed to do...  
It pretends that julius has recognized a word. Namly the word/sequence that you enter in the dialog. 
(To test simon without the need of a working recoginition itself) 
 
> Simon  
>  
> System tab  
> In the Simon System tab, you set your Julian grammar files (the .voca and .grammar files), and the  
> corresponding system commands. So it seems like you could speak a command, and Simon would send the  
> request to the operating system or application. Will Simon be sending the commands to x-windows (like  
> x-voice), or will it use some other method? Do you have any sample command files? - I'd like to get an  
> idea of what format they are supposed to be in.  
 
We currently have 3 types of commands: 
Exec (executes commands) 
Place (tells the OS to open the given place - works also with urls and stuff) 
Special Keyword (like escaping - for example "simon simon" like "\\" in a text would write a "\"). 
The commands will be stored in an XML format (we are currently working on that). 
 
The .voca and .grammar files in the settings dialog are just stubs for now. In the future the client (simon) would negotiate the language model with the server (juliusd) to simplify the training process between different computers. 
 
 
> You can also point to a pronunciation dictionary and a prompts file. It seems like Simon is a GUI  
> front that can permit someone to add new words to a Julian Grammar file, using pronunciation  
> information from the pronunciation dictionary (so the user does not have to enter phonemes by hand). 
 
Correct. As a failsafe we will also provide a method to add words from scratch. But that's not yet implemented. 
 
> I seem to be able to connect to juliusd, but am not quite sure how everything is supposed to work.  
> When I click the Connect link in Simon, I get a message in juliusd that the server was connected, but  
> I am not sure how recognition, and corresponding command execution is to take place.  
 
Try to say something. If julius/julian recognizes it, it should send it to simon. 
Simon should start typing or execute a command. (try "simon Texteditor" or "simon Google" for example). 
If not, try the "Send one Test Word" in juliusd to send these commands. Simon should act accordingly. 
 
 
> Word add  
> In the main window for Simon, there is a Word icon that seems to let you add a new word, and record speech audio corresponding to that word. I assume that this is for the  
> purpose of gathering acoustic data so that the julian acoustic model can be adapted using audio for the new word.  

> I'm wondering how any new words might be trained into the Acoustic Model. If you need to use HTK, it would add a level of complexity for new users, because they would have  
> to download the HTK source themselves and compile it (since HTK has distribution restrictions on the source and binaries). 
 
Yes. The dialog tries to add a new word to the model. (We haven't discussed how to deal with words which are not in the lexicon - we have to add an option to add a custom pronunciation to the word - which will be difficult if we want to keep it simple). 
HTK seems the only option for now. Writing something similar from scratch is out of reach (at least for now ^^). 
 
 
> Word List  
> This seems like the repository for all the words added in word add ... i.e. a place where you can manage all the words that have been added to the system.  
>  
> It also seems like it is set up to point to an Acoustic Model trainor (like HTK ...?) - is that what it is for?  
 
We want to provide the a way to train certain words alone. For example when we have the word "sample" and it isn't included in a training text, we probably want to train just this single word. 
So we can put together a custom "training text" with just the words that we select. 
Try to fill the training list (top, center) with a few words and hit the "Train" button. 
 
 
> Train  
> Seems like a way to prompt users to record sentences ? using either their own text that they import or some predefined text. But I am not clear how this is to be used to  
> update the acoustic models ? since there is no sentence repository GUI front-end, as there is for Word Adds.  
 
What to you mean with "sentence repository GUI front-end"? 
We can import texts and even put together our custom training model. Then we record the needed utterances and train the language model with the collected data. 
The texts are stored in a xml format. You can find one sampletext in trunk/texts/. 
 
> Implement  
> This seems like where you select the programs that Simon will be sending recognized commands to. 
 
You mean "Ausführen"? 
In that dialog we collect all the commands for the user to see. This are the commands that simon knows and react on. 
Please notice that we have a "magic word" which needs to be put in front of it. ATM this magic word is hardcoded to "simon" 
Like this: "Google" will do nothing. "simon Google" will open Google. 
 
Thanks for your help! 
 
--bedahr

--- (Edited on 4/19/2007 11:25 am [GMT-0400] by kmaclean) ---

Re: Simon Dialog Manager and Julian Speech Recognition
User: kmaclean
Date: 4/19/2007 10:26 am
Views: 412
Rating: 26
>Have you replaced the german text or have you added the translation, using qt4s translation features? 
I don't have much experience with QT, so I replaced German words in the Qt ui xml files, and throughout the code - just to get a feel for Simon. I could take a look at qt4s translation. 
 
>Simon provides a very basic network socket based connection to the real recognizer. Juliusd is more/less a sample daemon that >uses this socket, parses julius/julian output and writes it to that socket.  
>Simon doesn't really care if the recognition is done by julius, julian or for example sphinx.  
So each implementation of Simon/Julian would require 2 sockets: 
Julian in server mode ==> juliusd (for sending commands to and receiving recognition output from Julian) 
juliusd ==> Simon (for forwarding recognition results to Simon - basically acting as an abstraction layer so the Simon does not have to worry about which ASR is being used) 
 
>What to you mean with "sentence repository GUI front-end"?  
I was thinking that a user might want to manage which sentences and the corresponding audio they have submitted for training ... but I guess that they would really be more interested in particular words or word sequences, and that is why you have the "Word List" icon.  
 
So it seems like the process for a new user is to submit a bunch of sentences, record speech audio, have Simon train a general Acoustic Model. For specific words that might not be easily recognized with the general AM, allow the user to record those words, and retrain or adapt the general Acoustic Model. 
 
>ATM this magic word is hardcoded to "simon"  
>Like this: "Google" will do nothing. "simon Google" will open Google.  
I'll give it a try 
 
thanks, 
 
Ken

--- (Edited on 4/19/2007 11:26 am [GMT-0400] by kmaclean) ---

Re: Simon Dialog Manager and Julian Speech Recognition
User: kmaclean
Date: 4/20/2007 12:40 pm
Views: 355
Rating: 29
Hi bedahr, 
 
I think I got Simon to recognize “simon google”! Although a browser with Google in it did not actually start up, a pop-up window appeared in Simon with the word “Google” in it. 
 
I'm a little confused as to how a grammar file set up in Simon would be recognized by Julian.  
 
The way I got Simon to recognize is as follows: 
 
I pointed to a grammar file in my 'julian.jconf' file, as follows: 
-dfa grammar/sample.dfa 
-v grammar/sample.dict 
 
These were grammar files from the VoxForge nightly build, I then added the following the sample.voca file: 
% SIMON 
SIMON s ay m ax n 
 
% COMMAND 
GOOGLE k eh n  
 
(the VoxForge Acoustic Model does not have enough speech audio, and one of the triphones that makes up Google is not in the Acoustic Model – so I just used the phones for the word “ken” to represent Google in this case – i.e. I say “Simon Ken” and Julian will return “Simon Google” to juliusd) 
 
and added the following to the sample.grammar file: 
 
S : NS_B SIMON COMMAND NS_E 
 
and compiled them both into a Julian grammar using the mkdfa.pl script supplied with Julian, as follows: 
 
$mkdfa.pl sample 
 
Does Simon use this process yet, or is that still in the works? 
 
Thanks, 
 
Ken

--- (Edited on 4/20/2007 1:40 pm [GMT-0400] by kmaclean) ---

Re: Simon Dialog Manager and Julian Speech Recognition
User: kmaclean
Date: 4/20/2007 12:41 pm
Views: 386
Rating: 29
> I don't have much experience with QT, so I replaced German words in the Qt ui xml files, and throughout the code - just to get a feel for Simon. I could take a look at qt4s  
> translation.  
That'd be great. 
 
 
> So each implementation of Simon/Julian would require 2 sockets:  
> Julian in server mode ==> juliusd (for sending commands to and receiving recognition output from Julian)  
> juliusd ==> Simon (for forwarding recognition results to Simon - basically acting as an abstraction layer so the Simon does not have to worry about which ASR is being used)  
 
Even simpler. Juliusd spawns julius and just watches its output. 
 
> So it seems like the process for a new user is to submit a bunch of sentences, record speech audio, have Simon train a general Acoustic Model. For specific words that  
> might not be easily recognized with the general AM, allow the user to record those words, and retrain or adapt the general Acoustic Model.  
Yes. We could provide a training text for an initial adaption with a couple of utterances which pops up at first start. 
 
> >ATM this magic word is hardcoded to "simon"  
> >Like this: "Google" will do nothing. "simon Google" will open Google.  
> I'll give it a try  
Every command of the type Exec or Place should work (at least they work for me...). 
 
 
> I think I got Simon to recognize “simon google”! 
> Although a browser with Google in it did not actually start up, a pop-up window appeared in Simon with the word “Google” in it.  
 
:) 
Yes that part should be working pretty well. Google is a place commando and is internaly passed to the systems url handler. The popup states that simon has recognized the command named Google (that means he already matched the input, checked the avalible comands and choose that he found a matching command whichs name is "Google". 
The os url-handler is managed by qt and is simply used by calling, for example:  
"QDesktopServices::openUrl(QUrl( "http://google.at" ));" 
 
 
> I'm a little confused as to how a grammar file set up in Simon would be recognized by Julian. 
That is going to be a task of juliusd. Simon will most probably tell juliusd over the existing socket that he has got a new model ready and transmit it to juliusd. Juliusd will then replace the used model with the new one. 
 
> Does Simon use this process yet, or is that still in the works? 
What do you mean? Command execution? Yes. That should be working (try for example "simon Home" "simon Texteditor" (you may have to change kwrite to wathever you use)).  
The creation of the language model? No. Simon does not touch the language model for now (well it parses the wordlist for displaying in the wordlist but that's it). 
 
I'll try to finish up the command part (at least reading the commands from the XML file) ASAP. 
In the meantime: you can always define your own "magic word": just change the "#define COMMANDIDENT simon" to e.g. "#define COMMANDIDENT ken" in simoncontrol.cpp. 
 
--bedahr

--- (Edited on 4/20/2007 1:41 pm [GMT-0400] by kmaclean) ---

Re: Simon Dialog Manager and Julian Speech Recognition
User: kmaclean
Date: 4/20/2007 12:41 pm
Views: 412
Rating: 24
Hi bedahr, 
 
>> I could take a look at qt4s translation.  
>That'd be great.  
Note that I won't be able to look at this for a few weeks, because I have to finish some stuff on the VoxForge site. 
 
>> Does Simon use this process yet, or is that still in the works?  
>What do you mean? Command execution? Yes. That should be working (try for example "simon Home" "simon Texteditor" (you may have >to change kwrite to wathever you use)).  
>The creation of the language model? No. Simon does not touch the language model for now (well it parses the wordlist for >displaying in the wordlist but that's it).  
Sorry, I should have been clearer ... what I was asking is how will Simon send an updated grammar file to juliusd (which you answered), and how would juliusd convert it into a usable form so that Julian can use it (I assume juliusd will perform a mkdfa.pl command to convert the .grammar and .voca files to a form that is usable by Julian). 
 
> In the meantime: you can always define your own "magic word": just change the "#define COMMANDIDENT simon" to e.g. "#define  
> COMMANDIDENT ken" in simoncontrol.cpp.  
I actually think that "Simon" is a good magic word, and I will keep it!  
 
thanks, 
 
Ken

--- (Edited on 4/20/2007 1:41 pm [GMT-0400] by kmaclean) ---

Re: Simon Dialog Manager and Julian Speech Recognition
User: kmaclean
Date: 4/20/2007 12:42 pm
Views: 2895
Rating: 32
Hi bedahr,  
 
>>> I could take a look at qt4s translation.  
>>That'd be great.  
> Note that I won't be able to look at this for a few weeks, because I have to finish some stuff on the VoxForge site.  
No problem. It's just so that we know that this task has been assigned (to you in this particular case if you want to). (and a translation of a native speaker would most probably be much more accurate than what we could ever produce) 
 
 
> Sorry, I should have been clearer ... what I was asking is how will Simon send an updated grammar file  
> to juliusd (which you answered), and how would juliusd convert it into a usable form so that Julian  
> can use it (I assume juliusd will perform a mkdfa.pl command to convert the .grammar and .voca files  
> to a form that is usable by Julian).  
Most probably: yes. 
 
>> In the meantime: you can always define your own "magic word": just change the "#define COMMANDIDENT simon" to e.g. "#define  
>> COMMANDIDENT ken" in simoncontrol.cpp.  
> I actually think that "Simon" is a good magic word, and I will keep it!  
:) 
 
-- bedahr

--- (Edited on 4/20/2007 1:42 pm [GMT-0400] by kmaclean) ---

PreviousNext