Show simple item record

 
dc.contributor.author Nikulásdóttir, Anna Björk
dc.date.accessioned 2022-01-25T10:05:19Z
dc.date.available 2022-01-25T10:05:19Z
dc.date.issued 2022-01-31
dc.identifier.uri http://hdl.handle.net/20.500.12537/182
dc.description This test set contains sentences for intelligibility testing of a TTS system. It is a set of 50 sentences where each sentence occurs twice: once in its correct version and once containing one spelling error. Half of the 50 sentences are constructed in the form of Semantically Unpredictable Sentences (SUS) and the other half consists of sentences extracted from the Icelandic Error Corpus and from development data for a text normalizer. The spelling errors for the SUS are taken from a list of search queries error list. The package also contains audio files with generated speech from all test sentences, with and without spelling errors. The audio files were used to measure the impact of spelling errors on intelligibility. This package contains two versions of the test set: one version is labeled testset_1 and testset_2, where half of the sentences in each set contain spelling errors. If a sentence A contains a spelling error in testset_1 it will be correctly spelled in testset_2. The second version contains two files: a correct version of each sentence and the spelling error version of each sentence. Thus, the test set can be used independently of spelling errors. List of files in this package: - Readme.md : about this package - test_1.zip : audio files produced from testset1.txt - test_2.zip : audio files produced from testset2.txt - testset_correct.txt : the 50 test sentences in their correct version - testset_spelling_errors.txt : the 50 test sentences each containing a spelling error - testset1.txt : randomly mixed list of 25 sentences containing a spelling error and 25 sentences in their correct version - testset2.txt : randomly mixed list of 25 sentences containing a spelling error and 25 sentences in their correct version The project is funded by the Icelandic Government as a part of the Language Technology Programme for Icelandic 2019–2023 which is described in the following publication: Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson. 2020. Language Technology Programme for Icelandic 2019–2023. Proceedings of LREC 2020 (https://arxiv.org/pdf/2003.09244.pdf) Þetta prófunarsett inniheldur setningar til skilningsprófana (e. intelligibility tests) fyrir talgervla. Um er að ræða 50 setningar þar sem hver setning kemur tvisvar fyrir: einu sinni rétt skrifuð og einu sinni með einni stafsetningarvillu. Helmingur setninganna er samsettur samkvæmt merkingarlegum ófyrirsjáanleika (e. Semantically Unpredictable Sentences) og hinn helmingurinn er tekinn úr íslensku villumálheildinni og úr þróunargögnum fyrir textanormunarkerfi. Stafsetningarvillurnar fyrir merkingarlega ófyrirsjáanlegu setningarnar eru teknar úr lista yfir leitarfyrirspurnarvillur. Hirslan inniheldur einnig hljóðskrár með talgervilsupptökum allra setninganna, með og án stafsetningarvillna. Hljóðskrárnar voru notaðar til þess að mæla áhrif stafsetningarvillna á skilning á talgervilslestri. Tvær útgáfur af prófunarsettinu eru fyrir hendi: testset_1 og testset_2 eru settin sem voru notuð sem inntak fyrir talgervilinn, hvort sett inniheldur 25 rétt stafsettar setningar og 25 setningar með stafsetningarvillu. Þær setningar sem eru rétt skrifaðar í testset_1 innihalda villu í testset_2 og öfugt. Hin útgáfan eru tvö skjöl þar sem rétt stafsettar setningar eru í einu skjali og setningar sem innihalda stafsetningarvillu í öðru. Það er því hægt að nýta þetta sett án þess að taka tillit til stafsetningarvillna, þó að það hafi verið upphaflegi tilgangur gagnanna. Innihald hirslunnar: - Readme.md : um prófunarsettið - test_1.zip : hljóðskrár fyrir testset1.txt - test_2.zip : hljóðskrár fyrir testset2.txt - testset_correct.txt : 50 rétt stafsettar setningar - testset_spelling_errors.txt : 50 setningar með stafsetningarvillu - testset1.txt : blandaður listi 25 setninga með stafsetningarvillu og 25 setninga án villu - testset2.txt : blandaður listi 25 setninga með stafsetningarvillu og 25 setninga án villu Verkefnið er hluti af Máltækniáætlun fyrir íslensku 2019–2023. Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson. 2020. Language Technology Programme for Icelandic 2019–2023. Proceedings of LREC 2020 (https://arxiv.org/pdf/2003.09244.pdf)
dc.language.iso isl
dc.publisher Grammatek ehf
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.subject test set
dc.subject intelligibility test
dc.subject text-to-speech
dc.subject speech synthesis
dc.subject spell checking
dc.subject spelling correction
dc.title Test Set for TTS Intelligibility Tests 22.01
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding Clarin IS Repository
contact.person Anna Björk Nikulásdóttir anna@grammatek.com Grammtak ehf
sponsor Ministry of Education, Science and Culture Spell correction in language technology software (L10) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 100 sentences
files.size 4480774
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
testset_TTS_intelligibility.zip
Size
4.27 MB
Format
application/zip
Description
Zip file containing the test set
MD5
6aab76712e0cfbf800befc3740c6f25f
 Download file  Preview
 File Preview  
  • testset_TTS_intelligibility
    • testset1.txt-1 B
    • Readme.md-1 B
    • test_1.zip-1 B
    • .DS_Store-1 B
    • testset2.txt-1 B
    • testset_spelling_errors.txt-1 B
    • test_2.zip-1 B
    • testset_correct.txt-1 B
  • __MACOSX
    • testset_TTS_intelligibility
      • ._testset_spelling_errors.txt-1 B
      • ._testset_correct.txt-1 B
      • ._.DS_Store-1 B
      • ._testset1.txt-1 B
      • ._testset2.txt-1 B

Show simple item record