Audio and Prompts Discussions

Nested
Volume levelling
User: colbec
Date: 9/23/2011 9:26 am
Views: 7743
Rating: 14

OK I made a mistake.

With one particular device I recorded a set of prompts spread over a few days to get different voice qualities and now I have a set of recorded prompts that differ widely in volume. I did not find this out until I tried to build an audio model and the Voxforge auto process quit at stage 9 when it complained it could not find the results of step 8. In step 8 it could not deal with a particular label file. When I listened to the wav it was hardly audible, so HTK must have thought it was all silence and balked.

Running sox with stat -v gives me an idea of the range of volumes I am dealing with. It runs from 2.5 to about 150 on 1,300 wav files. About 75% of the set needs a volume increment of at least 40 according to sox.

I experimented a bit with sox to fix this automatically but find that even using 50% of the sox increment it still adds some fuzziness to the recording.

I guess I have 3 choices: dump the lot and re-record, keep the loudest and record the rest again, or just increment all the wavs with sox to half the volume sox thinks would be good.

Others must have run into this issue. Suggestions?

--- (Edited on 9/23/2011 9:26 am [GMT-0500] by colbec) ---

Re: Volume levelling
User: kmaclean
Date: 9/26/2011 7:17 pm
Views: 140
Rating: 12

> 3 choices: dump the lot and re-record, keep the loudest and

>record the rest again, or just increment all the wavs with sox

>to half the volume sox thinks would be good.

There is actually a 4th choice - just drop the recordings that HTK does not like.  Even with low volume, there might be recordings that HTK is OK with, and these can still be used for training... 

P.S. I did not know about the Sox volume parameter, cool, thanks

 

--- (Edited on 9/26/2011 8:17 pm [GMT-0400] by kmaclean) ---

Re: Volume levelling
User: colbec
Date: 9/27/2011 2:04 am
Views: 154
Rating: 13

That's interesting. That's how I started addressing the problem, by running the script to find out which recording was the issue, deleting it and running the script again. But I gave up after two, since 1300 is a long way to go to find out through brute force. Perhaps there is a method of getting HTK to loop through all the wavs to test them for suitability?

In the end I went with the re-record option. I think it is important to have a basic reliable set of prompts. The model is now working correctly. However I have saved the bad set for further analysis and re-addition to the wav pool once I have more info.

Assuming there is a straight correspondence between volume level and acceptability to HTK I ought to be able to guess at a sox determined volume level, import those prompts better than that critical level and run again. I will put this test on my list of things to do.

I ran into a similar issue with the same device in my testing routine. While running through a batch of tests on a correctly assembled model, all of a sudden I started getting lots of errors where there should have been none. My computer system somewhere had decided to reduce the input volume without my knowledge or concurrence. Once I restored the normal input volume the errors went away.

--- (Edited on 9/27/2011 2:04 am [GMT-0500] by colbec) ---

Re: Volume levelling
User: kmaclean
Date: 9/27/2011 12:43 pm
Views: 170
Rating: 11

>Perhaps there is a method of getting HTK to loop through all

>the wavs to test them for suitability?

Look at the  HVite_loghvite_log  in  Step 8 - Realigning the Training Data; look for instances of the message: "No tokens survived to final node of network at beam xxx" where xxx is your threshold.  Then listen to the audio to see what the problem might be.  This is what I use in the VoxForge acoustic model build script to flag bad recordings.

re: using the bad recordings... that is just my opinion...  if you have _data_ that shows that deleting and re-recording works best in your environment, then I can't argue with that  :)

 

--- (Edited on 9/27/2011 1:43 pm [GMT-0400] by kmaclean) ---

Re: Volume levelling
User: colbec
Date: 10/3/2011 9:34 am
Views: 3578
Rating: 14

I think I may have found an element in the volume issue. I was checking my CPU usage and found that at least one of four cores was constantly maxed out due to kmix process. There are a number of posts about this issue on the internet, together with a fix. Common elements appear to be kmix configuration files and kmix used alongside pulseaudio.

I implemented the fix and CPU usage has returned to normal. Two thoughts on this, one is that pulseaudio is much happier working inside Gnome than KDE, so next time I have to restore my system I will switch to Gnome. Second is that it might have been kmix and pulse fighting for control of my volume that caused the original issue. Just one of many possible reasons I guess.

Edit: after the fix, kmix went wild again, so I have stopped it altogether. It's not helping.

--- (Edited on 10/3/2011 9:34 am [GMT-0500] by colbec) ---

--- (Edited on 10/3/2011 9:46 am [GMT-0500] by colbec) ---

PreviousNext