VoxForge
> So it will not be better to randomize submitted prompt from a larger text file
>senteces base for the java applet?
At VoxForge, the SpeechSubmission app randomizes the start point for the prompts to be presented to the user from a larger text file of prompt sentences for English. For example, if there are 500 prompts, then the app randomly selects a number from 1 to 500, and starts reading 10 prompts from the line in the prompts file that corresponds to the random number. We do this because we are trying to get good triphone coverage. This will help us create a *general purpose* acoustic model - since we don't know how a potential user might use the acoustic model.
If you don't have a good idea where you want to use your acoustic model, then you should also work to create a *general purpose* acoustic model. This means that yes, you *should* randomize the prompts from a larger text file of sentences.
The SpeechSubmission app already does this. You just need to create a file containing a list of prompt sentences (10-15 words long), with a "prompt ID" as the first word on each line (the prompts ID is used to name the wav file the user records form the prompts).
Hope this clarifies things,
Ken
>Is there already a simple script that split a text file and format in prompt
>sentences with a prompt ID?
Sorry, I forgot that the SpeechSubmission app will automatically add prompt IDs to a text file (using a pre-defined prefix, and an incrementing number). So there is no need to add your prompt ID to each line.
There is no script to split a text file ... though this can be done rather quickly manually. Try to break them where there might be a natural pause in the original sentence (i.e. break at commas or periods, etc.).
Ken