VoxForge
One of the issues that arises when your collection of audio samples grows is that the compile of the model takes longer. I have a fairly fast computer, and running HTK_Compile_Model.sh with my collection of 16K or so audio samples takes about 23 minutes.
It's not a problem unless you find that the audio is ok, but you made a mistake in your lexicon. You don't find this out until the compile is done. So the process starts over again, another 23 minute wait. It's not very efficient.
So I was wondering if we can treat Step 5 as a separate item in the same way as Step 3, the recording of audio. If Step 5 is completely independent of anything that happens in steps 1-4 then a collection of mfc files can be left intact for any unmodified wavs. You just have to make mfc files from your new audio, update the contents of ./train/wav and ./train/mfcc and skip step 5 in the script.
If this is correct, this makes a big difference to the process. My 23 minute compile is cut to 12 minutes.
For those interested here is a Python script I am using to do the independent step 5. I have a special directory wavtmp, I dump my new audio into this directory. HCopy puts the new mfc files into the same directory. The log is recreated each time a batch of wavs is processed, the output is copied into mfcproc.log for each file, appending the output. Once the batch is finished, I move the new mfc files into train/mfcc and wavs into train/wav, leaving wavtmp empty for the next batch.
So far it seems to work well. If anyone can see any reason why this is not a good idea or can see improvements by all means let me know.
==================
# utility to create mfc files from wavs in wavtmp
import os
import subprocess as sp
os.chdir('./wavtmp')
with open('mfcproc.log','w') as f:
for file in os.listdir('.'):
if '.wav' in file:
fstem = file[:-4]
src = file
dst = fstem+'.mfc'
args = ['HCopy', '-A', '-D', '-T', '1',
'-C', '../scripts/input_files/wav_config',
src, dst]
#print args
try:
check = sp.check_output(args,)
f.write(check)
except:
print "problem with",file
break
os.chdir('../')
--- (Edited on 6/25/2012 11:32 am [GMT-0500] by colbec) ---