VoxForge
Hi Ken
Recently I tried to start retraining of the sphinx models with recent improvements that were made. The hardest step in training is actually preparation of data, putting it into right folders and organizing in proper format.
The first issue I've met is the following: some archives available for download has name as topfolder:
Aaron-20080318-liy
Aaron-20080318-liy/etc
Aaron-20080318-liy/wav
Some others have etc and wav as topfolders directly like
AdrianMcNear-20091016-psv
This creates some trouble for scripts it's better to avoid. What's the best way to fix that, should we just modify the script and repackage everything?
--- (Edited on 1/20/2010 02:20 [GMT+0300] by nsh) ---
Hi nsh,
>What's the best way to fix that, should we just modify the script and
>repackage everything?
The problem originates with the move from a set of scripts containing a hideous combination of Perl and make commands (to execute Linux Gzip/Tar commands), to a Perl script that only uses the Perl Tar/GZip/Zip packages for creating tar files (revision 2691) on April 19, 2009.
Therefore anything before April 19, 2009 has the submission name as a root directory (and etc & wav as subdirectories), whereas anything on or after that date has etc and wav as root directories.
I am assuming that this makes things a bit more complicated if you want to extract a bunch of files all at once in the same directory, so the preferred approach would be to have the submission name as the root directory for all submissions...
Should not be a big change, but the uploading could take a long time (a few days to a week at a throttled bandwidth so as not to kill response time on the VoxForge webserver, and I'll have to watch my upload bandwidth limits... might have to split it across Jan/Feb).
Please let me know if this makes sense,
thanks,
Ken
--- (Edited on 1/20/2010 10:02 pm [GMT-0500] by kmaclean) ---
--- (Edited on 1/20/2010 10:51 pm [GMT-0500] by kmaclean) ---
As a quick work-around to this issue: use Nautilus to extract the tarfiles that don't have a root directory... Nautilus will create one for you. You can do a multiple select and extract (right-click) for multiple tarfiles.
I can't figure out a way to do this from the command line using the tar command (i.e. something like "tar -zcf"), so I will fix the ones on the repository server using a script (so they will be consistent), and rsync them with the acoustic model creation server some other time.
Ken
--- (Edited on 1/27/2010 11:53 pm [GMT-0500] by kmaclean) ---
Thanks Ken
Please update me when it will be done, I need to proceed with training.
--- (Edited on 1/31/2010 05:16 [GMT+0300] by nsh) ---
>Please update me when it will be done, I need to proceed with training.
completed for all languages (see Ticket #473)
Ken
--- (Edited on 2/2/2010 9:18 pm [GMT-0500] by kmaclean) ---
Ok, here is the next problem. The following files:
atterer-01202007-a
atterer-01202007-b
atterer-02052007-vf5
atterer-21012007-vf22
granthulbert-ar-01032007
granthulbert-cc-01032007
granthulbert-rp-01032007
ilopezc-20060321-rainbow
jaiger-20061231-vf7
jaiger-20061231-vf8
jaiger-20070103-vf10
jaiger-20070103-vf9
jaiger-20070209-vf11
jaiger-20070209-vf12
jaiger-20070209-vf13
jaiger-20070209-vf14
jaiger-20070209-vf15
jaiger-vf16-20070214
jaiger-vf17-20070214
jaiger-vf18-20070214
jaiger-vf19-20070220
jaiger-vf20-20070220
jimmowatt-20070308-hoe
kmaclean-12062006
kmaclean-12062006-a
robin-20030302-vf10
robin-20070201
robin-20070211
robin-20070212
robin-20070212-vf1
robin-20070212-vf2
robin-20070217-vf3
robin-20070224-vf4
robin-20070224-vf5
robin-20070224-vf6
robin-20070226-vf7
robin-20070301-vf8
robin-20070301-vf9
robin-20070302-vf11
robin-20070310-vf12
robin-20070310-vf13
robin-20070326-vf14
robin-20070326-vf15
robin-20070330-vf16
robin-20070330-vf17
robin-20070401-vf18
robin-20070401-vf19
robin-20070402-vf20
robin-20070405-vf21
robin-20070409-vf22
robin-20070411-vf23
trevarthan-070403
trevarthan-070403-vf3
Have ../../../Audio... in their PROMPTS file. It would be nice to repack them.
--- (Edited on 2/7/2010 04:01 [GMT+0300] by nsh) ---
>Ok, here is the next problem. The following files:
>Have ../../../Audio... in their PROMPTS file. It would be nice to repack
>them.
Fixed.
See ticket 21 for details.
Ken
--- (Edited on 2/10/2010 3:37 pm [GMT-0500] by kmaclean) ---
Oh great! Thank you so much
Here is the next problem. The following files have prompts instead of PROMPT
./csawtell-10112006/etc/prompts
./jaiger-10212006-NR/etc/prompts
./jaiger-11052006/etc/prompts
./jaiger-11282006/etc/prompts
./Adminvox-05232006/etc/prompts
./an4/etc/prompts
./crxssi-10112006/etc/prompts
./jaiger-12032006-5/etc/prompts
./kmaclean-06122006/etc/prompts
./kmaclean-06152006/etc/prompts
./Adminvox-05262006/etc/prompts
./jaiger-10212006/etc/prompts
./jaiger-12032006-4/etc/prompts
./cmu_us_bdl_arctic/etc/prompts
./kmaclean-06092006/etc/prompts
./jaiger-12032006-3/etc/prompts
./jaiger-12032006-6/etc/prompts
./cmu_us_jmk_arctic/etc/prompts
./jaiger-10272006/etc/prompts
./cmu_com_kal_ldom/etc/prompts
./cmu_us_rms_arctic/etc/prompts
./cmu_us_ksp_arctic/etc/prompts
./cmu_us_slt_arctic/etc/prompts
./cmu_us_awb_arctic/etc/prompts
./cmu_us_clb_arctic/etc/prompts
For script simplicity and consistency it would be nice to convert them to upper case. I wanted to do it myself but gave up to checkout few gigs from svn.
--- (Edited on 2/11/2010 01:44 [GMT+0300] by nsh) ---
At least Ken please check svn access setup, because it's hard to commit new model into the svn. I just get 403 Error.
--- (Edited on 2/15/2010 02:15 [GMT+0300] by nsh) ---
>At least Ken please check svn access setup, because it's hard to commit
>new model into the svn. I just get 403 Error.
I'm having some problems committing the last changes your requested... it should be fixed soon.
Ken
--- (Edited on 2/14/2010 10:05 pm [GMT-0500] by kmaclean) ---