The Speech Corpus of Reading-Style Standard Korean (NIKL 2005; https://github.com/homink/speech.ko)
- 120 hours (??)
- Read speech
- 120 speakers -- gender balanced (60 males; 60 females) and the age of the speakers ranged from 19 to 71 at the time of recording in 2003.
- Region: Seoul metropolitan area -- speakers of Seoul dialect
- Content: 19 well-known short stories and essays containing a total of 930 sentences
- Available format: Each sentence is stored as a separate wav file in the corpus.
- 120 speakers
- Around 88,800 audio files
- In order to get the material, you need to contact the authors.
- 40 hours
- 40 speakers (age and gender -- balanced)
- Interview speech: kind of a monologue: one hour per speaker -- sociolinguistic interview format
- Similar to the Buckeye corpus
- All the utterances are transcribed.
Pansori-TEDxKR (https://github.com/yc9701/pansori-tedxkr-corpus)
- 3 hours
- Talk speech (monologue)
- 41 speakers -- gender not balanced (32 male and 9 female)
- Regions: Seoul (14), Busan (14) and Daejeon / Daedeok (13)
- Out of 11,704 fragments: ASR corpus is close to 3 hours (2 hours 48 minutes) in audio length (corresponds to 26.4% and 23.6% of the total number of fragments and audio length, respectively).
CloveCall (https://github.com/ClovaAI/ClovaCall)
- 50 hours
- Cleaned: 60,000 utterances (cleaned from a pool: from 11,000 people: each person 10 unique sentences (repeated once or twice)).
- Read speech: 60,000 pairs of a short sentence and its corresponding spoken utterance in a restaurant reservation domain (only speakers' requests).
- 11,000 speakers (age and gender not sure) -- but only 10 utterance per speaker.
AIHub (https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=123)
- Large-scale Korean open domain dialog speech corpus from AIHub
- 610 hours: 510 hours (pre-training) , 100 hours (fine-tuning)
- Description:
1. Around 1,000 hours
2. Spontaneous speech
3. 2,000 speakers
4. Conversation between two people about various topics (e.g., weather, economics)
5. ERTI transcription rule
6. File: Segmented at the utterance level (long pause; format: 16kHz/16bits, headerless (endian) linear PCM) and transcribed (format: EUC-KR)
Zeroth (https://github.com/goodatlas/zeroth)
- 95.7 hours
- Read speech
- 46,347 utterances, 181 speakers, 27,330 uniq. sentences
KsponSpeech (https://aihub.or.kr/aidata/105)
- 69 hours
- general open-domain dialog utterances
- 2000 native Korean speakers in a clean environment
- the dialogue of two people freely conversing on a variety of topics and manually transcribing the utterances
- a dual transcription consisting of orthography and pronunciation, and disfluency tags for spontaneity of speech, such as filler words, repeated words, and word fragments
- For preprocessing, use the script at https://github.com/sooftware/ksponspeech
No comments:
Post a Comment