Show simple item record

 
dc.contributor.author Nikulásdóttir, Anna Björk
dc.contributor.author Ármannsson, Bjarki
dc.contributor.author Bergþórsdóttir, Bryndís
dc.date.accessioned 2024-05-15T11:43:43Z
dc.date.available 2024-05-15T11:43:43Z
dc.date.issued 2024-05-15
dc.identifier.uri http://hdl.handle.net/20.500.12537/331
dc.description The Icelandic Pronunciation Dictionary contains manually revised transcriptions in four pronunciation variants of Icelandic: the standard pronunciation, the northern post-aspiration variant ("harðmæli"), the north-eastern variant post-aspiration + voiced pronunciation, and the southern hv-variant. For descriptions of Icelandic pronunciation variants, see the respective documents 'IPA_Pronunciation.pdf' or 'SAMPA_Pronunciation.pdf' (the documents have identical content but the former describes the matter using the IPA phonetic alphabet and the latter uses the SAMPA phonetic alphabet). The file 'sampa_ipa_single.tsv' contains the set of SAMPA symbols used in the dictionaries and their mappings to IPA on the one hand and on a custom developed single-character alphabet developed for use in end-to-end speech synthesis. The repository contains training and test data, both for the training and testing of g2p models, and for the testing of automatic syllabification and stress labeling algorithms. The project is funded by the Icelandic Government as a part of the Language Technology Programme for Icelandic 2019–2023 which is described in the following publication: Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson. 2020. Language Technology Programme for Icelandic 2019–2023. Proceedings of LREC 2020 (https://arxiv.org/pdf/2003.09244.pdf)
dc.description Íslensk framburðarorðabók inniheldur handyfirfarnar hljóðritanir í fjórum framburðartilbrigðum íslensku: það sem kalla má hefðbundinn framburð, norðlenskt harðmæli, harðmæli + raddaðan framburð sem einkennandi er fyrir norð-austurland, og sunnlenskan hv-framburð. Nánari lýsingar á framburði og framburðartilbrigðum í íslensku er að finna í skjölunum 'IPA_Pronunciation.pdf' eða 'SAMPA_Pronunciation.pdf' (þessi skjöl eru eins að öllu leyti, nema að fyrra skjalið notar IPA-hljóðritunarstafrófið en það seinna SAMPA-stafrófið til þess að lýsa framburði). Skjalið 'sampa_ipa_single.tsv' inniheldur lista SAMPA-hljóðritunartákna sem notuð eru í orðabókinni og varpanir yfir á tákn í IPA annars vegar og yfir á nýtt stafróf hins vegar, þar sem hvert hljóð er alltaf táknað með einum bókstaf/tölustaf, en slíkt stafróf getur nýst vel í „end-to-end“-talgervingu. Hirslan inniheldur einnig þjálfunar- og prófunargögn fyrir þjálfun grapheme-to-phoneme (g2p) líkana, sem og prófunarsett fyrir sjálfvirka atkvæðaskiptingu og áherslumerkingar. Verkefnið er hluti af Máltækniáætlun fyrir íslensku 2019–2023. Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson. 2020. Language Technology Programme for Icelandic 2019–2023. Proceedings of LREC 2020 (https://arxiv.org/pdf/2003.09244.pdf)
dc.language.iso isl
dc.publisher Grammatek ehf
dc.relation.replaces http://hdl.handle.net/20.500.12537/154
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/grammatek/iceprondict
dc.subject phonetics
dc.subject pronunciation
dc.subject dialectal variation
dc.subject phonetic transcription
dc.subject pronunciation dictionary
dc.subject ipa
dc.subject sampa
dc.title Icelandic Pronunciation Dictionary for Language Technology 22.01
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Anna Björk Nikulásdóttir anna@grammatek.com Grammatek ehf
sponsor Ministry of Education, Science and Culture Pronunciation dictionary (G6) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 65282 entries
files.size 4669688
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
iceprondict-1.1.1.zip
Size
4.45 MB
Format
application/zip
Description
repository
MD5
40b9229a622b14842e2f487d0cd261d4
 Download file  Preview
 File Preview  
  • iceprondict-1.1.1
    • SAMPA_Pronunciation.pdf717 kB
    • sampa_ipa_single.tsv473 B
    • README.md3 kB
    • train_dev_test
      • dev
        • standard_clear_dev.tsv30 kB
        • north_clear_dev.tsv31 kB
        • northeast_clear_dev.tsv31 kB
        • south_clear_dev.tsv30 kB
      • train
        • north_clear_train.tsv182 kB
        • northeast_clear_train.csv183 kB
        • standard_clear_train.tsv180 kB
        • south_clear_train.tsv180 kB
      • test
        • north_clear_test.tsv31 kB
        • south_clear_test.tsv30 kB
        • standard_clear_test.tsv30 kB
        • northeast_clear_test.tsv31 kB
    • processing
      • README.md3 kB
      • check_agreement.py16 kB
    • cc-by-4-0.txt18 kB
    • IPA_Pronunciation.pdf713 kB
    • syllab_stress
      • testset_syllab_stress.csv95 kB
    • dictionaries
      • ice_pron_dict_north_clear.csv1 MB
      • ice_pron_dict_northeast_clear.csv1 MB
      • ice_pron_dict_south_clear.csv1 MB
      • ice_pron_dict_standard_clear_v2.csv1 MB
      • ice_pron_dict_north_clear_v2.csv1 MB
      • ice_pron_dict_complete.csv4 MB
      • ice_pron_dict_standard_clear.csv1 MB
      • ice_pron_dict_small.csv2 MB

Show simple item record