VoxForge
Click the 'Add' link to add a comment to this page.
Note: You need to be logged in to add a comment!
The Mycroft AI, Inc. has released an open source platform called Mycroft core that promises to allow users to "use natural language to control the Internet of Things". the Mycroft framework also includes an intent parser called adapt and a TTS engine (based on CMU's Flite) called mimic. For speech recognition they are currently using Google's cloud-based speech recognition service.
They've also created a reference hardware implementation based on Raspberry Pi and Arduino and have had successful kickstarter and indiegogo campaigns to raise funds.
Since their stated goal is to provide an open source alternative to the likes of Amazon echo, the Mycroft AI group has started a new initiative called OpenSST (an Open Source Speech To Text project) looking to create "open source speech-to-text models"... likely for Kaldi.
In Toronto Star article http://www.thestar.com/news/world/2016/01/27/deepmind-computer-program-beats-humans-at-go.html
"As Hassabis told reporters, the same principles AlphaGo uses have many applications, from better digital personal assistants to improved medical diagnostics and far, far beyond. Because the algorithm is general-purpose, it could respond nimbly to complex information like voice instructions, for example. "
From general reading, it looks like AlphaGO uses two neural networks working together to prune the search space and evaluate the next move.
New speech recognition cloud services:
HP: IDOL (Intelligent Data Operating Layer) Speech Recognition API
Amazon: Alexa Voice Service (AVS)
NTT Com: SkyWay
more:
HPE HAVEN ondemand Speech Recognition
IBM Watson Dialog service. (github dialog tool)
Code-Q O is working on a Qt Speech Recognition API for Qt using Pocketsphinx. Source repository.
Looks like Mozilla is working on a speech recognition front end called vaani that will allow users to submitt speech in different languages directly from FireFox. This is amazing news for open source speech recognition.
Kelly Davis says that they will make the speech corpus and acoustic models available by the end of this year (2015).
MOVI (My Own Voice Interface) is an offline speech recognizer and voice synthesizer that adds voice control functionality to any Arduino project.
What is interesting is their approach to training the on-board acoustic model:
Training: MOVI’s Arduino API sends the training sentences in textual form over the serial connection to the shield. The shield phonetizes sentences using a 2GB dictionary. The phoneme sequences are used to create a temporal model that assigns higher probabilities to phonemes sequences that occurred in the trained sentences than to those that didn’t.
Given that they say they are using open source algorithms which they intend to provide when the shield is released, it will be interesting to see how they've implemented this.
Verbis Virtus, is a game by Indomitus Games (Italian game studio), where you use your voice to cast spells.
They use CMU Sphinx for speech recognition.
Are there any others?
A personal digital voice assistant based on Sphinx-4 (also supports Google and Pocketsphinx). It's offline (if you want) highly customizable and you can teach it(/her/him ^^) new commands. Besides that it comes with a nice GUI and runs on Linux, MAC and Windows.
https://sites.google.com/site/ilavoiceassistant/
cu,
Florian
from the Sirius site:
Sirius is an open end-to-end standalone speech and vision based intelligent personal assistant (IPA) service similar to Apple’s Siri, Google’s Google Now, Microsoft’s Cortana, and Amazon’s Echo. Sirius implements the core functionalities of an IPA including speech recognition, image matching, natural language processing and a question-and-answer system.
It can be run using Sphinx (sphinxbase and pocketsphinx) or Kaldi.
check it out on GitHub
From http://www.washingtonpost.com/news/speaking-of-science/wp/2014/08/07/ibm-announces-the-most-brain-like-computer-chip-to-date/
"The chip, IBM researchers wrote, will help computers handle tasks such as image and voice recognition with the alacrity of humans."