VoxForge
When I run autoconf
[cui@localhost cmuclmtk]$ autoconf
configure.ac:6: error: possibly undefined macro: AM_INIT_AUTOMAKE
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
And I do it like this in configure.ac
#AM_INIT_AUTOMAKE
then I run autoconf again .(pass)
When I run ./configure
[cui@localhost cmuclmtk]$ ./configure
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for ranlib... ranlib
configure: error: cannot find install-sh or install.sh in "." "./.." "./../.."
I didn't resolve it.
--- (Edited on 8/24/2008 3:29 am [GMT-0500] by chn) ---
For sure you are not trying the latest version. It's the older one. Checkout the latest with subversion:
svn checkout https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmuclmtk
--- (Edited on 8/24/2008 3:42 am [GMT-0500] by nsh) ---
Excuse me!
When I got *.arpa with -n 3 ,the 3gram is the same as what we got online,But the 2gram is not !
what i got is :
\data\
ngram 1=13
ngram 2=12
ngram 3=13
\1-grams:
-99.0000 </s> 0.0000
-99.0000 <s> 0.0688
-1.1206 BACKWARD 0.0000
-1.1206 BROWSER 0.0000
-1.1206 EMAIL 0.0000
-1.1206 FORWARD 0.0000
-1.0414 LAST -0.2668
-1.0414 MUSIC -0.2668
-1.0414 NEW -0.2668
-1.0414 NEXT -0.2668
-0.7404 OPEN -0.2218
-1.1206 PLAYER 0.0000
-1.1206 WINDOW 0.0000
\2-grams:
-1.1139 <s> BACKWARD -0.3010
-1.1139 <s> FORWARD -0.3010
-1.1139 <s> LAST 0.0000
-1.1139 <s> NEW 0.0000
-1.1139 <s> NEXT 0.0000
-0.8129 <s> OPEN 0.0000
-0.3010 LAST WINDOW -0.3010
-0.3010 MUSIC PLAYER -0.3010
-0.3010 NEW EMAIL -0.3010
-0.3010 NEXT WINDOW -0.3010
-0.6021 OPEN BROWSER -0.3010
-0.6021 OPEN MUSIC 0.0000
\3-grams:
-0.3010 <s> BACKWARD </s>
-0.3010 <s> FORWARD </s>
-0.3010 <s> LAST WINDOW
-0.3010 <s> NEW EMAIL
-0.3010 <s> NEXT WINDOW
-0.6021 <s> OPEN BROWSER
-0.6021 <s> OPEN MUSIC
-0.3010 LAST WINDOW </s>
-0.3010 MUSIC PLAYER </s>
-0.3010 NEW EMAIL </s>
-0.3010 NEXT WINDOW </s>
-0.3010 OPEN BROWSER </s>
-0.3010 OPEN MUSIC PLAYER
\end\
What I got online tool is :
\data\
ngram 1=13
ngram 2=18
ngram 3=13
\1-grams:
-0.8873 </s> -0.3010
-0.8873 <s> -0.2407
-1.7324 BACKWARD -0.2407
-1.7324 BROWSER -0.2407
-1.7324 EMAIL -0.2407
-1.7324 FORWARD -0.2407
-1.7324 LAST -0.2846
-1.7324 MUSIC -0.2929
-1.7324 NEW -0.2929
-1.7324 NEXT -0.2846
-1.4314 OPEN -0.2846
-1.7324 PLAYER -0.2407
-1.4314 WINDOW -0.2407
\2-grams:
-1.1461 <s> BACKWARD 0.0000
-1.1461 <s> FORWARD 0.0000
-1.1461 <s> LAST 0.0000
-1.1461 <s> NEW 0.0000
-1.1461 <s> NEXT 0.0000
-0.8451 <s> OPEN 0.0000
-0.3010 BACKWARD </s> -0.3010
-0.3010 BROWSER </s> -0.3010
-0.3010 EMAIL </s> -0.3010
-0.3010 FORWARD </s> -0.3010
-0.3010 LAST WINDOW 0.0000
-0.3010 MUSIC PLAYER 0.0000
-0.3010 NEW EMAIL 0.0000
-0.3010 NEXT WINDOW 0.0000
-0.6021 OPEN BROWSER 0.0000
-0.6021 OPEN MUSIC 0.0000
-0.3010 PLAYER </s> -0.3010
-0.3010 WINDOW </s> -0.3010
\3-grams:
-0.3010 <s> BACKWARD </s>
-0.3010 <s> FORWARD </s>
-0.3010 <s> LAST WINDOW
-0.3010 <s> NEW EMAIL
-0.3010 <s> NEXT WINDOW
-0.6021 <s> OPEN BROWSER
-0.6021 <s> OPEN MUSIC
-0.3010 LAST WINDOW </s>
-0.3010 MUSIC PLAYER </s>
-0.3010 NEW EMAIL </s>
-0.3010 NEXT WINDOW </s>
-0.3010 OPEN BROWSER </s>
-0.3010 OPEN MUSIC PLAYER
\end\
Thanks!
--- (Edited on 8/24/2008 8:14 pm [GMT-0500] by chn) ---
Hm, indeed there is a problem. QuickLm script generates exactly the correct output:
http://www.speech.cs.cmu.edu/tools/download/quick_lm.pl
but it's not efficient. I'll try to look what's the problem with cmuclmtk.
--- (Edited on 8/25/2008 2:41 am [GMT-0500] by nsh) ---
hey after installing the cmuclmtk using
make install
i am unable to run any of its file like
text2wfreq,.. etc
or can you give me the exact steps to follow in building language model using cmuclmtk
thank
--- (Edited on 5/20/2009 7:35 am [GMT-0500] by sarvesh) ---
CMU-Cambridge Statistical Language Modeling Tookit v2
=====================================================
Documentation:
--------------
For installation and usage instructions for the toolkit, see
doc/toolkit_documentation.html
(for the sake of convenience, the installation instructions are also
given below).
Installation:
-------------
For "big-endian" machines (eg those running HP-UX, IRIX, SunOS,
Solaris) the installation procedure is simply to type
cd src
make install
The executables will then be copied into the bin/ directory, and the
library file SLM2.a will be copied into the lib/ directory.
For "little-endian" machines (eg those running Ultrix, Linux) the
variable "BYTESWAP_FLAG" will need to be set in the Makefile. This can
be done by editing src/Makefile directly, so that the line
#BYTESWAP_FLAG = -DSLM_SWAP_BYTES
is changed to
BYTESWAP_FLAG = -DSLM_SWAP_BYTES
Then the program can be installed as before.
If you are unsure of the "endian-ness" of your machine, then the shell
script endian.sh should be able to provide some assistance.
In case of problems, then more information can be found by examining
src/Makefile.
Files:
------
endian.sh Shell script to report "endian-ness" (see installation
instructions). Not terribly robust; needs to be able to see gcc,
for example.
doc/toolkit_documentation.html The standard html documentation for the
toolkit. View using netscape or equivalent.
doc/toolkit_documentation_no_tables.html As above, but doesn't use
tables, so is suitable for use with browsers which don't support
tables (eg lynx).
doc/toolkit_documentation.txt The documentation in flat text.
doc/change_log.html List of changes from version to version.
doc/change_log.txt The above in flat text.
src/*.c src/*.h The toolkit source files
src/Makefile The standard make file.
src/install-sh Shell script to install executables in the appropriate
directory. An improvement on cp, as it will check to see whether it is
about to overwrite an execuatable which is already in use.
include/SLM2.h File containing all of src/*.h, allowing
functions from the toolkit to be included in new software.
bin/ Directory where executables will be installed.
lib/ Directory where SLM2.a will be stored (useful in conjunction with
include/SLM2.h for including functions from the toolkit to be included
in new software.)
$INPUT_NAME="ec-asr_train.transcription";
$INPUT_DIR="/root/Desktop";
$OUTPUT_NAME ="ec-asr.word.transcription";
$OUTPUT_DIR = "/root/Desktop";
$BIN_DIR = "/root/tutorial/CMU-Cam_Toolkit_v2/bin";
system("$BIN_DIR/text2wfreq <$INPUT_DIR/$INPUT_NAME.text >$OUTPUT_DIR/$OUTPUT_NAME.wfreq");
system("$BIN_DIR/wfreq2vocab <$OUTPUT_DIR/$OUTPUT_NAME.wfreq >$OUTPUT_DIR/$OUTPUT_NAME.temp.vocab");
$n=3;
#########################################genarate .idngram directly
system("$BIN_DIR/text2idngram -n $n -vocab $OUTPUT_DIR/$OUTPUT_NAME.temp.vocab <$INPUT_DIR/$INPUT_NAME.text >$OUTPUT_DIR/$OUTPUT_NAME.dir.idngram ");
###################################generate .dir.arpa
system("$BIN_DIR/idngram2lm -vocab_type 0 -idngram $OUTPUT_DIR/$OUTPUT_NAME.dir.idngram -vocab $OUTPUT_DIR/$OUTPUT_NAME.temp.vocab -arpa $OUTPUT_DIR/$OUTPUT_NAME.dir.${n}gram.arpa -n $n -witten_bell -context $INPUT_DIR/a.ccs ");
system("$BIN_DIR/idngram2lm -vocab_type 0 -idngram $OUTPUT_DIR/$OUTPUT_NAME.dir.idngram -vocab $OUTPUT_DIR/$OUTPUT_NAME.temp.vocab -binary $OUTPUT_DIR/$OUTPUT_NAME.dir.${n}gram.binlm -n $n -witten_bell -context $INPUT_DIR/a.ccs ");
############# evallm .binlm
#system("$BIN_DIR/evallm -binary $OUTPUT_DIR/$OUTPUT_NAME.dir.${n}gram.binlm");
##########################################generate .DMP file
system("$BIN_DIR/lm3g2dmp $OUTPUT_DIR/$OUTPUT_NAME.dir.${n}gram.arpa $OUTPUT_DIR");
--- (Edited on 5/20/2009 8:10 pm [GMT-0500] by chn) ---
i build the "open vocabulary model (type 2)" using cmuclmtk, i dont know why am not getting any value assigned to
2-gram discounting ratios :
3-gram discounting ratios :
only it assign
1-gram discounting ratios : 0.81
can you figure out the cause for this.
--- (Edited on 9/3/2009 3:25 pm [GMT+0530] by sarvesh) ---