VoxForge
Hi nsh,
The pronunciations generated by my Festival implementation (Fedora FC4) do not always match those in the VoxForge Dict (about 90% do match ...). They should be the same but they are not. I think Festival uses an older version of the CMU dictionary (VoxForge uses release 0.6), or I've somehow managed to diverge from the CMU pronunciations with my manual additions to the dictionary.
I currently use Festival to provide draft pronunciations for out-of-vocabulary words. However, even with the pronunciations generated with Festival, sometimes the rules are not complete, and it omits phones altogether.
Since the phones don't match exactly, and some words pronunciations generated by Festival are incomplete, I figured that creating a new rule set using the VoxForge Dictionary was the approach easiest approach...
Ken
--- (Edited on 6/26/2007 12:18 pm [GMT-0400] by kmaclean) ---
Ah, I see. Really festival uses CMUdict 0.4 because it's targeted speech synthesis, not speech recognition. Alan commented it earlier:
http://lists.berlios.de/pipermail/festlang-talk/2007-April/001708.html
Well, really one should convert CMUdict-0.6 and train new rules with festvox. festvox/src/lts has script to train included.--- (Edited on 6/26/2007 11:44 am [GMT-0500] by nsh) ---
Hm, I used to discover this thing:
https://lists.berlios.de/pipermail/festlang-talk/2007-August/001974.html
Bad thing is that we didn't know about that. Good thing is that we'll be able to segment librivox audio faster.
--- (Edited on 8/9/2007 12:14 am [GMT-0500] by nsh) ---
Hi nsh,
Very interesting!
BTW the link you posted is dead ... here is the updated link:
http://lists.berlios.de/pipermail/festlang-talk/2007-August/001970.html
From Kishore Prahallad's post:
We call this project as Interslice - to be released under Festvox (Alan
would have more comments).
The basic idea of interslice is to automatically build synthetic voices
from large speech databases typically available from public domain such
as librivox.org and loudlit.org.
Interslice comes with a segmentation tool capable to handling infinitely
large corpora and chunking them into utterances and *.lab files.
This is great news! Especially if it can easily supply pronunciations for words not already included in the VoxForge dictionary.
It also sounds similar to Cepstral's commerical product offering called "VoiceForge". From the press release:
Cepstral LLC announced the release of VoiceForge(tm), a web 2.0 product that can turn a set of recorded audio prompts into a Text-to-Speech (TTS) voice capable of saying anything. With VoiceForge(tm), companies or actors can capture or "bank" their voices on their own. Once a voice is synthetically forged, it can be used to speak dynamic information for Entertainment, Telephony, Navigation, Education, or Reminder applications.
Ken
--- (Edited on 8/9/2007 11:17 pm [GMT-0400] by kmaclean) ---