VoxForge
Hi!
I was looking at the French submited speech data, and I saw that only a part of it was in the Voxforge repository for download. The rest seems to be in the upload directory, which is access restricted, so it is not very easy to recover all the corpus except manually from the download page.
Is there a specific reason for that, or is there a way to get the corpus easily? I saw this post where Ken says:
Unfortunately I have not moved any German audio to subversion.
However, here is quick and dirty way to get the audio:
1. $wget -r -l2 http://www.voxforge.org/home/downloads/speech/german-speech-files -A "ralfherzog*"
this will create a directory called www.voxforge.org
2. search the directory for *.zip files using Gnome's search tool, and drag the results to the directory you want.
I'm not a wget expert but I don't think it's going to get files which are not in the specified directory. Any help?
Thanks a lot!
Marion
It's what I did but as I said before, most of the corpus is in the updload directory, and to access it you need the complete address to each zip file, like http://voxforge.org/uploads/q0/0Q/q00QgKBqYb4KK6_qzhITig/phil_be-20090310-mif.zip, so you can't just do
$wget -r -l3 http://www.voxforge.org/uploads
I found a solution using WinHTTrack but wget should have worked too, it's just that you have to download a lot of stuff and then erase all but the zip files you want.
I just wanted to point out that not all French data is in the repository, but I perfectly understand that you don't have time to process all!
Thanks anyway for the answer and this project, this is great!
Marion
Hi Ken,
ooooops, I wanted to download the French corpus today but I've made a mistake and I've downloaded much more than expected.
I hope not to have been the cause of trouble for your server.
Sorry about that.
Samuel-
Hello
I am a French amateur roboticist and I would like to include voice recognition in a robot.
I wonder if the project was still open? (before contributions)
I can contribute to several tens of hours and use my network, however, before you can use my network I will need to show concretely that his works a minimum.
If I make a contribution of 10 hours are will it be possible to make a demo compelling enough?
PS sorry I do not speak English very well
>I wonder if the project was still open? (before contributions)
I don't understand what you are asking...
> I will need toshow concretely that his works a minimum.
Please clarify...
If it is easier for you, post in French - Google translate can help me with my rudimentary knowledge of french...
Il n’y a pas beaucoup de messages et pas d’ajout de nouveau fichier depuis l’année dernières.
Je demandais si il y avait encore des personnes active ici, car enregistré des messages audio prend du temps, et je ne veux pas que ce temps serve à personne.
J’aimerais travailler en 2 étapes
étape 1
prononcer des phrases et les afficher dans un fichier texte (même si il y a beaucoup d’erreurs, le but est d’avoir un petit résultats pour motiver des amis à m’aider )
étape 2
je ne sais pas encore comment faire, mais j’aimerais récupérer les phrase prononcé avec des probabilités de mot ou syllabe … afin de pouvoir des phrases cohérentes.
Je pense utiliser pocketsphinx installé sur Rasberry PI (dédié), je ne sais pas du tout, pour le moment, comment utiliser pocketsphinx et comment les modèles acoustiques sont utiliser. J’espère pouvoir faire assez d’enregistrement pour avoir un model acoustique français pour valider l’étape 1 le moi prochain .
>I wondered if there were still people working here,
yes, there is a backlog in speech processing (basically downsampling the speech to 16kHz-16bit ), but the speech is still being collected.
>I think using pocketsphinx installed on Rasberry IP
not sure that Rasberry PI can power speech recognition... best to as on the Pocket Sphinx forum