Show simple item record

 
dc.contributor.author Nikulásdóttir, Anna Björk
dc.date.accessioned 2024-04-18T11:32:24Z
dc.date.available 2024-04-18T11:32:24Z
dc.date.issued 2024-04-16
dc.identifier.uri http://hdl.handle.net/20.500.12537/327
dc.description This labeled corpus of Icelandic homographs contains a) a list of all labeled homographs (ice_homographs.txt) and b) a corpus of sentences extracted from a selection of the Icelandic Gigaword Corpus (IGC-2022 - annotated version, http://hdl.handle.net/20.500.12537/254 ), containing these homographs, labeled by pronunciation. The folder ice_homographs_labeled contains one tsv-file for each homograph. The files contain a selection of sentences where the homograph in question is marked: [[homograph]] and tab-separated a label, 0 or 1. The homographs all follow the pattern V-ll-(V|$), i.e. vowel-ll-vowel_or_end. Examples: 'villa', 'holl'. There are two pronunciation possibilities for the homographs: /tl/ and /l/ and the labels are set accordingly, /tl/ = 1, /l/ = 0. The corpus can be used e.g. for the training of a homograph classifier, and for linguistic research. Þetta gagnasett inniheldur a) lista af samstafa orðum (homographs) í skjalinu ice_homographs.txt og b) gagnasett af setningum úr Risamálheildinni (IGC-2022 - annotated version, http://hdl.handle.net/20.500.12537/254 ) sem innihalda þessi orð, merkt eftir framburði. Mappan ice_homographs_labeled inniheldur eitt tsv-skjal fyrir hvert samstafa orð í gagnasettinu. Skjölin innihalda safn setninga þar sem viðkomandi samstafa orð er merkt með hornklofum: [[samstafa]] og merkingin 0 eða 1 er í enda línu á eftir "tab". Öll samstafa orðin fylgja mynstrinu V-ll-(V|$), þ.e. sérhljóð-ll-sérhljóð_eða_lok_orðs. Dæmi: 'villa', 'holl'. Þessi orð eru borin fram á mismunandi hátt eftir merkingu þeirra, annars vegar sem /tl/ og hins vegar sem /l/. Merkingarnar í gagnasettinu eru: /tl/ = 1 og /l/ = 0. Gangasettið má m.a. nota til þess að þjálfa líkan sem flokkar samstafa orð eftir framburði og til málvísindarannsókna.
dc.language.iso isl
dc.publisher Grammatek ehf.
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.subject homographs
dc.subject tts
dc.title Labeled Corpus of Icelandic Homographs (24.04)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Anna Björk Nikulásdóttir anna@grammatek.com Grammatek ehf.
sponsor Ministry of Culture and Business Affairs Prosody and Intonation Analysis (T11) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 9 mb
files.size 8726681
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
ice_homographs.zip
Size
8.32 MB
Format
application/zip
Description
corpus
MD5
99eee02e5394112a9249305b42c02309
 Download file  Preview
 File Preview  
  • ice_homographs_labelled
    • gellur.tsv159 kB
    • villuna.tsv177 kB
    • pollanna.tsv5 kB
    • villunni.tsv54 kB
    • pollinn.tsv155 kB
    • halla.tsv1 MB
    • gallanna.tsv9 kB
    • ella.tsv1 MB
    • pollarnir.tsv21 kB
    • grilli.tsv266 kB
    • ullar.tsv220 kB
    • kalli.tsv1 MB
    • holla.tsv350 kB
    • halló.tsv328 kB
    • villunnar.tsv14 kB
    • göllum.tsv27 kB
    • drolla.tsv45 kB
    • villu.tsv674 kB
    • malla.tsv430 kB
    • gulla.tsv343 kB
    • gallann.tsv116 kB
    • villum.tsv231 kB
    • gallinn.tsv159 kB
    • möllum.tsv1 kB
    • mallar.tsv76 kB
    • böll.tsv328 kB
    • palli.tsv676 kB
    • holl.tsv479 kB
    • grilla.tsv522 kB
    • kalla.tsv2 MB
    • dalla.tsv159 kB
    • polli.tsv177 kB
    • ollu.tsv845 kB
    • bollunum.tsv86 kB
    • gallanum.tsv65 kB
    • böllum.tsv235 kB
    • kolla.tsv260 kB
    • galli.tsv165 kB
    • villurnar.tsv153 kB
    • palla.tsv402 kB
    • villi.tsv354 kB
    • dillum.tsv7 kB
    • pollum.tsv91 kB
    • polla.tsv118 kB
    • villunum.tsv33 kB
    • lalli.tsv276 kB
    • pollana.tsv20 kB
    • bolla.tsv327 kB
    • bollum.tsv203 kB
    • villan.tsv148 kB
    • galla.tsv261 kB
    • villanna.tsv1 kB
    • villa.tsv1 MB
    • gullu.tsv158 kB
    • alla.tsv164 kB
    • lalla.tsv261 kB
    • gallana.tsv124 kB
    • grillir.tsv31 kB
    • pollar.tsv108 kB
    • gallans.tsv26 kB
    • villur.tsv540 kB
    • hollum.tsv397 kB
    • halli.tsv998 kB
    • elli.tsv593 kB
    • gella.tsv137 kB
    • holli.tsv118 kB
    • kollu.tsv207 kB
    • dill.tsv262 kB
    • gallar.tsv153 kB
    • malli.tsv89 kB
    • gulli.tsv810 kB
    • README.txt2 kB
    • ice_homographs.txt647 B

Show simple item record