Show simple item record

 
dc.contributor.author Hedström, Staffan
dc.contributor.author Fong, Judy Y.
dc.contributor.author Þórhallsdóttir, Ragnheiður
dc.contributor.author Mollberg, David Erik
dc.contributor.author Guðmundsson, Smári Freyr
dc.contributor.author Jónsson, Ólafur Helgi
dc.contributor.author Þorsteinsdóttir, Sunneva
dc.contributor.author Magnúsdóttir, Eydís Huld
dc.contributor.author Gudnason, Jon
dc.date.accessioned 2022-09-26T14:15:57Z
dc.date.available 2022-09-26T14:15:57Z
dc.date.issued 2022-07-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/265
dc.description This release of data from the Samrómur collection contains all the collected data not present in other releases. The data is mostly UNVERIFIED. It contains 2,159,314 (2233 hours) speech-recordings in Icelandic, of which 84,161 have been verified. Ca 700,000 utterances have been scored with marosijo, this score indicates how likely the audio is to match the transcript. For more information about marosijo, please see [1] and [2]. The corpus contains 17,984 unique speakers. The corpus is NOT split into train, dev and test subsets. For such subsets please look at other Samrómur releases. All demographics are self reported. The dataset contains folders that correspond to speaker IDs, and the audio files inside use the following naming convention: {speaker_ID}-{utterance_ID}.flac. Average utterance length is 3.7 seconds. Þessi útgáfa af gögnum úr safni Samróma eru öll söfnuð gögn sem ekki eru til í öðrum útgáfum. Gögnin eru að mestu óyfirfarin. Útgáfan inniheldur 2.159.314 (2.233 klst.) talupptökur á íslensku, þar af 84.161 sem hafa verið staðfest. Um 700.000 hafa verið skoruð með marosijo sem gefur til kynna hvort líklegt sé að það sé gilt eða ekki. Fyrir frekari upplýsingar um Marosijo sjá [1] og [2]. Málheildin inniheldur 17.984 mismunandi raddir og hefur EKKI verið skipt upp í þjálfunar- (train), þróunar- (dev) og prófunarsett (test). Allar lýðfræðilegar upplýsingar hafa notendur sjálfir slegið inn. Hvert sett inniheldur möppur sem samsvara auðkenni raddar. Hljóðskrárnar nota eftirfarandi nafnareglur: {speaker_ID}-{utterance_ID}.flac. Meðallengd upptöku er 3,7 sekúndur. [1] Gudason et al., "Building ASR corpora using Eyra" https://www.isca-speech.org/archive/pdfs/interspeech_2017/gunason17_interspeech.pdf [2] Guðmundsson, Smári Freyr, "Samrómur automated verification wrapup" https://github.com/cadia-lvl/samromur-tools/tree/master/QualityCheck
dc.language.iso isl
dc.publisher Reykjavik University
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/cadia-lvl/samromur
dc.subject audio
dc.subject corpus
dc.subject automatic speech recognition
dc.subject speaker verification
dc.subject speaker identification
dc.title Samromur Unverified 22.07
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
has.files yes
branding Clarin IS Repository
demo.uri http://openslr.org/128/
contact.person Jon Gudnason jg@ru.is Reykjavík University
sponsor Ministry of Education, Science and Culture Data recording using Eyra/Samrómur (H1) Language Technology for Icelandic 2019-2022 nationalFunds
size.info 2233 hours
size.info 2159314 utterances
size.info 125 gb
files.size 116369268880
files.count 12


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
README.txt
Size
8.27 KB
Format
Text file
Description
Readme
MD5
15c9ed914997d9b270c0d3a1a53f8199
 Download file  Preview
 File Preview  
--------------------------------------------------------------------------------
                  Samrómur Unverified 22.07
--------------------------------------------------------------------------------

Language        : Icelandic

Authors         : Staffan Hedström, Judy Y. Fong, Ragnheiður Þórhallsdóttir,
                  David Erik Mollberg, Smári Freyr Guðmundsson, Ólafur Helgi
                  Jónsson, Sunneva Þorsteinsdóttir, Eydís Huld Magnúsdóttir,
                  Jon Gudnason

Recommended use : speech recognition, speaker verification, speaker
                  identification and speaker enrollment

--------------------------------------------------------------------------------
Description
--------------------------------------------------------------------------------

This release of data from the Samrómur collection contains all available 
utterances from native Icelandic speakers. Only parts of the data have been 
validated. The corpus contains . . .
                                            
Icon
Name
samromur_unverified_22.07.zip
Size
8.38 GB
Format
application/zip
Description
main
MD5
153e2b3a4b2df668a6fefddb33e0e7b1
 Download file
Icon
Name
samromur_unverified_22.07.z01
Size
10 GB
Format
Unknown
Description
part 1
MD5
946eab26f38d64255ce1a1fb36279634
 Download file
Icon
Name
samromur_unverified_22.07.z02
Size
10 GB
Format
Unknown
Description
part 2
MD5
a6151869209ad44edf18d7c3ad6fc224
 Download file
Icon
Name
samromur_unverified_22.07.z03
Size
10 GB
Format
Unknown
Description
part 3
MD5
f6a738e32d7350adf4e15fea8a1a7371
 Download file
Icon
Name
samromur_unverified_22.07.z04
Size
10 GB
Format
Unknown
Description
part 4
MD5
1e25d01ca1bef77c345b6b5059eabe53
 Download file
Icon
Name
samromur_unverified_22.07.z05
Size
10 GB
Format
Unknown
Description
part 5
MD5
ca82c4d87402dc2fdd03e1d019d08c37
 Download file
Icon
Name
samromur_unverified_22.07.z06
Size
10 GB
Format
Unknown
Description
part 6
MD5
1984ca305f35faa5d802852884f70649
 Download file
Icon
Name
samromur_unverified_22.07.z07
Size
10 GB
Format
Unknown
Description
part 7
MD5
60bf661650292e501f89dba14c2e7fb0
 Download file
Icon
Name
samromur_unverified_22.07.z08
Size
10 GB
Format
Unknown
Description
part 8
MD5
97e2601e8222277636693668c2f41aca
 Download file
Icon
Name
samromur_unverified_22.07.z09
Size
10 GB
Format
Unknown
Description
part 9
MD5
dcc2ca46998ceec4ccbc3c54dba27188
 Download file
Icon
Name
samromur_unverified_22.07.z10
Size
10 GB
Format
Unknown
Description
part 10
MD5
9f1a7cbac890f263d992cae85fa4ebf4
 Download file

Show simple item record