25 Audio given sets for research
- Jun 28
FMA is designed to analyze music and consists of a full-sized HQ-Audio, pre-calculated characteristics, as well as metadata tracks and user level. This open data set was created to evaluate several tasks of searching for musical information (MIR). Its size is almost 1000 GB.
An open collection of characteristics and metadata for a million tracks. The set does not contain audio, but only the extracted characteristics. The audio seeds can be obtained from services such as 7Digital using the code provided by Colombian University. The size of the set is about 280 GB.
This open data set was created to determine the numbers pronounced in audio seeds. At the moment, it contains: 3 speakers, 1,500 records (50 with each number on the speaker), as well as variations of English pronunciation. The size of the package is only about 10 MB.
A large -scale building, including about 1000 hours of English speech. Data is taken from the audiobook of the Librivox project, their size is about 60 GB.
An extensive set of data with a size of 150 MB, consisting of almost 2000 hours of speech and designed to determine the personality of the speaker. It contains about 100,000 statements of 1251 celebrities taken from the video on YouTube. Data is almost evenly distributed on the floor (men make up 55%). Celebrities differ in accents, professions and age. Sets for development and testing do not contain coincidences.
This is a corps of unified colloquial articles from the English, German and Dutch versions of Wikipedia. Hundreds of hours of audio and annotations can be compared with the original HTML. The size of the entire set is about 38 GB.
Includes 40,000 signed signatures to 8,000 images, the size of which is 4.2 GB. This building was assembled in 2015 in order to study multimodal learning for uncontrolled detection of speech templates.
The set consists of 1495 audio recordings from the performances of Ted Talk and their full decoders created by the Maine University of the University of the University (Lium).
A set of 1.4 GB data includes 65,000 single -second statements of 30 short words made by thousands of different people. Released under the Creative Commons-by 4.0 license and is designed to create simple but useful voice interfaces with general words such as yes, no, numbers and directions of movement.
Corps of speech data in size 12 GB. The text was taken from several public sources, such as blogs, books and films, and read by users of the Common Voice website. Its main goal is to provide data for training and testing of automatic speech recognition systems.
Persian speech building for speech recognition and speaker. Contains sound audio seeds of combinations of vowels and consonant phonemes of the modern Persian language, each of which includes only one consonant and one vowel. The set contains 23 consonants and 6 vowels and presents all their possible combinations (138 semlovs for each speaker) - only 30,000 records.
A set of accented English data, which is useful to create resistance to various accents or intonations.
This set of about 4 GB is designed to solve the problems of speech recognition in a noisy environment. It contains real, modeled and clean vocal records. The real ones are presented by 9,000 notes of 4 speakers in 4 noisy places, the modeled are created by applying several environments on top of speech utterances, and clean are recorded without unnecessary noise. You can download this set here.
Speech data in English used in the Deep Speech system from Baidu.
Contains data for training from 20 patients with Parkinson's disease and 20 healthy people - only 26 types of sound recordings with a size of 20 MB.
The ultimate goal of Zero Resource Speech Challenge is to build a system that is capable of studying a dialogue in an unknown language only with the help of information accessible to the child learning the language. “Zero Resource” means the lack of linguistic knowledge (for example, spelling and linguistic transcription), and not the lack of information besides audio (visual observations, feedback, etc.). The fact that four -year -old children are able to independently study the language without the help of language experts shows that this goal is theoretically achievable.
This set of data with a size of 38.7 GB is used to solve a simple classification problem - predicting which letter is pronounced.
Speech case of the Arabic literary language (Aya) in size 1.5 GB, which is used for speech synthesis. It contains phonetic and spelling transcriptions of more than 3.7 hours of Aya-Rechi, agreed with a recorded speech at the level of phonemes. Annotations include stress on separate phonemes.
The corps was developed as part of the doctoral work of Navar Halabiat from the University of Southampton. Records are made in the South Levantine Arabic with a damask accent in a professional studio. A synthesized speech obtained using this case is a high -quality, natural voice.
The size of the case is 440 MB. Its data can be used for acoustic and phonetic studies, as well as to develop and evaluate automatic speech recognition systems. Timit contains broadband records of 630 carriers of eight main dialects of American English, each of which reads ten sentences with phonetically rich sound. It includes spelling, phonetic and verbal transcriptions synchronized in time, as well as a 16-bit file of a speech signal with a frequency of 16 kHz for each statement.
Meld is an improved and expanded version of the Emotionlines data set. It contains the same copies of dialogs as Emotionlines, as well as audio and visual modality along with the text. In it you can find more than 1,400 dialogs and 13,000 statements from the series “Friends”, each of which contains a mark of emotions: anger, disgust, sadness, joy, neutrality, surprise and fear. You can download this set here.
Contains 632 class events and a collection of 2 084 320 by hand -marked sound clips with a length of 10 seconds taken from the video on YouTube. To download this set, follow the link to Github.
It includes 6,000 events, such as breaking glass, shots and screams, divided into a training set of 4200 events and test - from 1800. To download this set of data, you need to register on the Mivia website.
The page includes data sets for the study of environmental sounds. In addition to open sets, it also contains private and commercial, and at the end several sound online services are listed, which can be used to form new data sets for special research needs.
Sets are divided into two tables:
A hierarchical collection of more than 600 sound classes, complemented by 297 159 audio shots from Freesound. As a result of this association, 678 511 annotations of candidates were created, which reflect the potential presence of sound sources in audio clips. FSD includes many everyday sounds: human speech, animal sounds, music and sounds made by things - and all this under the Creative Commons license. The data set is designed to help research that will allow machines to hear and interpret sound like people.
Freesound is a platform for the joint creation of audio collections, manually marked and based on Freesound content.
This set of data with a size of 6 GB contains 8732 sound excerpt from 10 sound classes: air conditioning noise, automobile beeps playing children, dog barking, drilling and engine, shot, junk hammer, siren and street music. The length of each is about 4 seconds. Data contains attributes such as ID - a unique identifier of the sound passage and class - the type of sound.
This set includes 1302 sound recordings, each of which marks the beginning and end of the sound event from 10 classes: air conditioner noise, automobile beeps, playing children, barking dogs, drilling and engine, shot, junk hammer, siren and street music. Some records contain several sound events, but only events from the same class are marked for each file. Classes are taken from taxonomy of city sounds.
The set is designed to create a reliable and scalable bird detection algorithm. To solve this problem, data sets are used by 5.4 GB, taken from real projects for monitoring bio -acoustics, and an objective standardized assessment structure.
#machinelearning #articalIlinTelling #ai #DatAstascience #PROGRAMMMING #TECHNOLOGY #DEEPLEARNING #CODING