dc.contributor.author | Mena, Carlos |
dc.contributor.author | Borsky, Michal |
dc.contributor.author | Mollberg, David Erik |
dc.contributor.author | Guðmundsson, Smári Freyr |
dc.contributor.author | Hedström, Staffan |
dc.contributor.author | Pálsson, Ragnar |
dc.contributor.author | Jónsson, Ólafur Helgi |
dc.contributor.author | Þorsteinsdóttir, Sunneva |
dc.contributor.author | Guðmundsdóttir, Jóhanna Vigdís |
dc.contributor.author | Magnúsdóttir, Eydís Huld |
dc.contributor.author | Þórhallsdóttir, Ragnheiður |
dc.contributor.author | Gudnason, Jon |
dc.date.accessioned | 2022-01-31T13:20:41Z |
dc.date.available | 2022-01-31T13:20:41Z |
dc.date.issued | 2021-09-01 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/185 |
dc.description | This release of data from the Samrómur collection focuses on children (4-17 years old). It contains more than 137.000 (131 hours) of validated speech recordings uttered by children in Icelandic. The corpus is a result of the crowd-sourcing effort run by the Language and Voice Lab (LVL) at the Reykjavik University, in cooperation with Almannarómur, Center for Language Technology. The recording process was started in October 2019 and continues to this day (January 2022). The corpus contains 3,175 speakers and has been split into train, dev, and test sets. Lengths of the sets are: train = 127h25m, test = 1h50m, dev = 1h50m. Each subset contains folders that correspond to speaker IDs, and the audio files inside use the following naming convention: {speaker_ID}-{utterance_ID}.flac. The average recording length is 3.4 seconds. Þessi útgáfa gagna úr safni Samróms einblínir á börn (4-17 ára). Útgáfan inniheldur 137.000 (131 klst.) staðfestar talupptökur frá börnum á íslensku. Málheildin er afrakstur lýðvistunar (e. crowd sourcing) á vegum Mál- og raddtæknistofu (LVL) við Gervigreindarsetur Háskólans í Reykjavík, í samvinnu við Almannaróm, miðstöð máltækni á Íslandi. Upptökur hófust í október 2019 og standa enn yfir (janúar 2022). Hópurinn inniheldur 3.175 mismunandi raddir og hefur verið skipt í þjálfunar- (train), þróunar- (dev) og prófunarsett (test). Lengd settanna er: þjálfunarsett = 127 klst 25 mín, prófunarsett = 1 klst 50 mín, þróunarsett = 1 klst 50 mín. Hvert sett inniheldur möppur sem samsvara auðkenni raddar. Hljóðskrárnar nota eftirfarandi nafnareglur: {speaker_ID}-{utterance_ID}.flac. Meðallengd upptöku er 3,4 sekúnda. |
dc.language.iso | isl |
dc.publisher | Reykjavík University |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://github.com/cadia-lvl/samromur |
dc.subject | audio corpus |
dc.subject | speech recognition |
dc.subject | automatic speech recognition |
dc.subject | children |
dc.title | Samromur Children 21.09 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | audio |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Jon Gudnason jg@ru.is Reykjavík University |
sponsor | Ministry of Education, Science and Culture Data recording using Eyra/Samrómur (H1) Language Technology for Icelandic 2019-2023 nationalFunds |
size.info | 8 gb |
size.info | 131 hours |
size.info | 137597 utterances |
files.size | 6829364691 |
files.count | 4 |
Files in this item
Download all files in item (6.36 GB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- samromur_children_21.09.zip
- Size
- 512.99 MB
- Format
- application/zip
- Description
- main file for corpus
- MD5
- c89cbabaafaf68f71f6d01aed1030826
- Name
- samromur_children_21.09.z01
- Size
- 1.95 GB
- Format
- Unknown
- Description
- part 1
- MD5
- 43ad12f3eaadb25a24ed1dfa9c5090a3
- Name
- samromur_children_21.09.z02
- Size
- 1.95 GB
- Format
- Unknown
- Description
- part 2
- MD5
- ad0def6d195d53e02de4323b729299f2
- Name
- samromur_children_21.09.z03
- Size
- 1.95 GB
- Format
- Unknown
- Description
- part 3
- MD5
- f5ae6da99404fd5260ede7de68170a78