VoxForge
Click the 'Add' link to add a comment to this page.
Note: You need to be logged in to add a comment!
GigaSpeech is:
An evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc
paper
from the Facebook AI website:
Facebook AI is releasing Multilingual LibriSpeech (MLS), a large-scale, open source data set designed to help advance research in automatic speech recognition (ASR).
MLS provides more than 50,000 hours of audio across eight languages: English, German, Dutch, French, Spanish, Italian, Portuguese, and Polish. It also provides language-model training data and pretrained language models along with baselines to help researchers compare different ASR systems. Because it leverages public domain audiobooks from the LibriVox project, MLS offers a large data set with a broad range of different speakers, and it can be released with a nonrestrictive license.
MLS is available on OpenSL:
From the MLCommons website:
The People’s Speech Dataset is the world’s largest labeled open speech dataset and includes 87,000+ hours of transcribed speech in 59 different languages with a diverse set of speakers. This open dataset is large enough to train speech-to-text systems and crucially will be available with a permissive license. Just as ImageNet catalyzed machine learning for vision, the People’s Speech will unleash innovation in speech research and products that are available to users across the globe.
Facebook to release huge multilingual corpus of unlabelled speech data (paper - pub date: Jan 2021):
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised representation learning as well as semi-supervised learning. VoxPopuli also contains 1.8K hours of transcribed speeches in 16 languages and their aligned oral interpretations into 5 other languages totaling 5.1K hours. We provide speech recognition baselines and validate the versatility of VoxPopuli unlabelled data in semi-supervised learning under challenging out-of-domain settings. We will release the corpus at https://github.com/facebookresearch/voxpopuli under an open license.
Google has created a free and open dataset called the Speech Commands Dataset. It is targeted to neural network beginners to allow them to build models for simple keyword detection.
from the Googleblog website:
The dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license, and will continue to grow in future releases as more contributions are received. The dataset is designed to let you build basic but useful voice interfaces for applications, with common words like “Yes”, “No”, digits...
Mozilla has created a new project called "Common Voice" with the goal of collecting 10,000 hours of speech. They have created a very nice web app to collect and validate submitted speech. Their goal is to collect 10,000 of speech. From their website:
[...] Common Voice is a project to make voice recognition technology easily accessible to everyone. People donate their voices to a massive database that will let anyone quickly and easily train voice-enabled apps. All voice data will be available to developers.
[...]
Mozilla aims to begin to capture voices in June and release the open source database later in 2017.
All speech submitted will be released under the CC-0 license (public domain). They are starting with English, and acknowledge the need to add more languages.
From Human Interact's website:
Starship Commander is a virtual reality title driven by human speech. The audience is given agency in the middle of a sci-fi story, as part of a military embroiled in a dark intergalactic war. You’re in command of a secretive mission, and your decisions have deadly consequences.
The site has a very cool trailer, showing how you can command your very own ship like Captain Kirk.
Speech recognition and understanding is supplied by Microsoft's Custom Speech Service, a new speech service that lets you create customized acoustic and language models... Looks like Open Source has had it right all along, because we've always known that customized acoustic and language models work best for users...
In Uganda they are listening to the radio with speech recognition to analyze local issues. They listen to broadcasts in local English pronunciation and native languages.
http://pulselabkampala.ug/radiomining/
From the article "We are focussing on the open source software HTK as a platform for [the speech recognition component]"
https://www.yahoo.com/news/apple-jack-ax-ushers-voice-191944728.html
Mozilla has pivoted Vaani to be the Voice of IOT. Vaani was originally an "on-device" virtual assistant for FirefoxOS. Now they have 3 new projects related to creating a virtual assistant for the Internet of Things:
DeepSpeech: an open source speech recognition engine. It is based off of Baidu’s research and which will use Google's TensorFlow machine learning framework. It’s currently in early development.
Pipsqueak: a longer term goal to create a new speech recognition engine that implements cutting edge technology to allow Vaani to work completely off-line while still allowing for the high quality speech recognition users have become used to.
Murmur: a simple webapp for collecting speech samples to train speech recognition engines. They want to slowly build a speech corpus to train their open source models.
One thing to note, is that although they want to create their own speech corpus, for now they are planning to use a purchased speech corpus for their acoustic models.