Speech Recognition in the News

Click the 'Add' link to add a comment to this page.

Note: You need to be logged in to add a comment!

SpinVox Names Dr. Tony Robinson as Director

By kmaclean - 3/22/2008

You may have seen some posts on this site by Dr. Tony Robinson. I'd like to congratulate Tony on his recent appointment as Director of SpinVox' Advanced Speech Group. From this press release on the SpinVox site:

Robinson's remit will be to further build a team from the ASR expertise that is concentrated in the Cambridge area. Under his leadership, the SpinVox ASG will further develop the Voice Message Conversion System that is at the heart of SpinVox services.

We'd like to wish him luck, and thank him for all his help to the VoxForge community in the past year and a half.

Ken

Zero Crossings as an Effective Feature In Speech Recognition for Embedded Applications

By kmaclean - 3/21/2008

This is an interesting article on the use of zero crossing rather than feature vectors (such as the MFCCs we use with HTK/Julius) that are traditionally used in speech recognition. Shubhendu Trivedi was looking to create a speaker dependent, isolated word, speech recognizer for a 8051 micro-controller. But traditional HMM approaches using MFCC based feature vectors were too computationally intensive to work on this controller.

He found a paper that provided the solution. In it, the authors describe a way of only using zero crossings of the speech signal to determine the feature vector. Shubhendu says in his article:

This feature vector is basically the histogram of the time interval between successive zero-crossings of the utterance in a short time window. These feature vectors for each window are then combined together to form a feature matrix. Since we are dealing with only small time series (isolated words), we can employ Dynamic Time Warping to compare the input matrix with the reference matrix’ stored.

Using Speech Recognition to Display Contextual Ads While Watching a Video

By kmaclean - 2/11/2008

Microsoft is working on a system to allow contextual ads to be served alongside video. The system uses speech recognition to identify the topic of a video. This would allow advertisers to display ads for sports gear alongside a video about soccer or furniture with a video for home improvement.

See video at this link: One To Watch: Microsoft's Video Advertising Systems

Ken

Simon on Slashdot

By kmaclean - 1/19/2008

From Slashdot article Open Source Speech Recognition:

"The first version of the open source speech recognition suite simon was released. It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model. These components are united under an easy-to-use graphical user interface. Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts. It also provides means to train the language model with new samples and add new words."

Cell Phone Voice Transcription Services

By kmaclean - 12/13/2007

Here is an interesting list of services that let you convert your speech to an email or text message over your telephone.

Jott - You call Jott's toll-free number from your cell phone, specify who should receive your message (from a predetermined list of people, or yourself - for note taking), and dictate for up to 30 seconds. Your message or reminder is transcribed and e-mailed or text-messaged to the appropriate parties. They use a combination of machine and human transcription to convert voice to text.
SpinVox is a service that can be accessed from any phone. Speak Freely and SpinVox will convert your words into text and send them wherever you decide: mobile, inbox, TV screen, blog. They use a Voice Message Conversion System called ‘D2’ (the Brain), that takes spoken words and converts them to text.
SimulScribe utilizes voice recognition technology to convert voicemail messages into text. They deliver your transcribed voicemail, along with the original audio, to your mobile phone, PDA, and/or email account.

Has anyone tried these services? What were your experiences?

MIT's speech recognition interface to Google maps

By Visitor - 11/27/2007

MIT's Address Browser provides an speech recognition interface to Google maps. From the site:

The AddressBrowser is a prototype speech-based interface that allows users to speak any city or address in the United States. You can say any valid address that follows a simple pattern, e.g., 32 Vassar Street in Cambridge, Massachusetts, the intersection of Main Street and Vassar Street in Cambridge, Massachusetts, or just Cambridge, Massachusetts.
In order to be heard you will need a microphone connected to your computer.
Press and hold the big green button and say your address. The button will turn red when it is recording your voice. Just speak your address naturally. Release the button after you have finished talking.
...

It seems like it uses a Java Applet client front-end and a back-end speech recognition server.

Early look at Android

By kmaclean - 11/13/2007 - 2 Replies

Android is a new operating system for cell phones, designed by Google engineers. Unlike most existing cell phone operating systems, it'll be friendly to applications created by outside software developers.

Basically the phone (scheduled for 2008) will run on a Linux core, with Java-based apps.

Setup of the "early look" SDK is quite easy (if you know Eclipse). I was able to create the HelloAndroid app without much problem. When you press Run, the Android Emulator starts up, and you can see the results on the screen.

Here is what I have gleaned from comments from Dan Morrill on the Android Developer Google Group list:

Text-to-Speech

Currently there is no support for text-to-speech. We are considering the general problem of accessibility, but don't yet have any concrete plans in this area.

Speech Recognition

Android will include voice-recognition software that can (and
will) be used to create voice dialers. You'll be able to use the same APIs to build speech-enabled applications. However, the APIs for that are disabled in the current early look, because they aren't ready for use yet; they'll be enabled in a future SDK version.

Licensing:

The core (Linux) is GPL with LGL components
other code (I am assuming the application framework and included apps) uses the Apache Software License.

Nabaztag robotic rabbit

By kmaclean - 11/9/2007

The Nabaztag robotic rabbit is a wireless Internet contraption that can speak, move its ears and flash its lights in response to user inputs, and includes speech recognition.

From the site's "Voice Recognition FAQ" (sic):

Services available with voice recognition :

Weather
Air Quality
Paris Traffic
Stock ticker (free and full)
Radio

The commands it recognizes are pretty simplistic:

Weather
Air or Smog
Traffic
Market
Radio

but it seems to be an interesting harbinger of what Consumer Speech Recognition Appliances might look like in the not to distant future.

GPhone to include Open Source Speech Recognition?

By kmaclean - 11/6/2007 - 1 Replies

In a PC World Interview with Google co-founder Rich Miner, Miner says:

When we looked at the other [mobile] Linux activities out there, oftentimes they're initiatives that are based on Linux but their resulting platforms aren't completely open. Or they're completely open and they're Linux, but they're missing most of the things that [Android has]. They probably don't have video codecs, Midi sequencer, speech recognition. So they're not a complete phone stack. The goal with Android was to build into it everything you needed to release a phone: an entire stack to build a competitive smartphone or high-end feature phone.

Although Android is to be released under the Apache License, the speech recognition component likely will *not* be, since Nuance is also an Open Handset Alliance partner. The Android™ SDK is set to be released on November 12, 2007.

Voice recognition technology nabs Colombian drug kingpin

By kmaclean - 8/10/2007

From a Globe and Mail Article:

A reputed leader of Colombia's biggest drug cartel radically altered his facial appearance with repeated plastic surgeries. But his own words gave him away, thanks to advanced voice recognition technology that has become a key tool in the war against drugs and terrorism.

U.S. agents confirmed the identity of Juan Carlos Ramirez Abadia using the equivalent of a vocal fingerprint, his attorney said Friday.

Background on voice recognition (from Wikipedia):

Speaker recognition, or voice recognition is the task of recognizing people from their voices. Such systems extract features from speech, model them and use them to recognize the person from his/her voice.

Note that strictly speaking there is a difference between speaker recognition (recognizing who is speaking) and speech recognition (recognizing what is being said). Generally these two terms are frequently confused and voice recognition is used as a synonym for speech recognition instead.

Ken

--- (Edited on 8/10/2007 10:47 pm [GMT-0400] by kmaclean) ---

«Previous Page · 1 2 3 4 5 6 · Next Page»


Username	Password