dc.contributor.author | Nikulásdóttir, Anna Björk |
dc.date.accessioned | 2024-04-18T11:32:24Z |
dc.date.available | 2024-04-18T11:32:24Z |
dc.date.issued | 2024-04-16 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/327 |
dc.description | This labeled corpus of Icelandic homographs contains a) a list of all labeled homographs (ice_homographs.txt) and b) a corpus of sentences extracted from a selection of the Icelandic Gigaword Corpus (IGC-2022 - annotated version, http://hdl.handle.net/20.500.12537/254 ), containing these homographs, labeled by pronunciation. The folder ice_homographs_labeled contains one tsv-file for each homograph. The files contain a selection of sentences where the homograph in question is marked: [[homograph]] and tab-separated a label, 0 or 1. The homographs all follow the pattern V-ll-(V|$), i.e. vowel-ll-vowel_or_end. Examples: 'villa', 'holl'. There are two pronunciation possibilities for the homographs: /tl/ and /l/ and the labels are set accordingly, /tl/ = 1, /l/ = 0. The corpus can be used e.g. for the training of a homograph classifier, and for linguistic research. Þetta gagnasett inniheldur a) lista af samstafa orðum (homographs) í skjalinu ice_homographs.txt og b) gagnasett af setningum úr Risamálheildinni (IGC-2022 - annotated version, http://hdl.handle.net/20.500.12537/254 ) sem innihalda þessi orð, merkt eftir framburði. Mappan ice_homographs_labeled inniheldur eitt tsv-skjal fyrir hvert samstafa orð í gagnasettinu. Skjölin innihalda safn setninga þar sem viðkomandi samstafa orð er merkt með hornklofum: [[samstafa]] og merkingin 0 eða 1 er í enda línu á eftir "tab". Öll samstafa orðin fylgja mynstrinu V-ll-(V|$), þ.e. sérhljóð-ll-sérhljóð_eða_lok_orðs. Dæmi: 'villa', 'holl'. Þessi orð eru borin fram á mismunandi hátt eftir merkingu þeirra, annars vegar sem /tl/ og hins vegar sem /l/. Merkingarnar í gagnasettinu eru: /tl/ = 1 og /l/ = 0. Gangasettið má m.a. nota til þess að þjálfa líkan sem flokkar samstafa orð eftir framburði og til málvísindarannsókna. |
dc.language.iso | isl |
dc.publisher | Grammatek ehf. |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.subject | homographs |
dc.subject | tts |
dc.title | Labeled Corpus of Icelandic Homographs (24.04) |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Anna Björk Nikulásdóttir anna@grammatek.com Grammatek ehf. |
sponsor | Ministry of Culture and Business Affairs Prosody and Intonation Analysis (T11) Language Technology for Icelandic 2019-2023 nationalFunds |
size.info | 9 mb |
files.size | 8726681 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- ice_homographs.zip
- Size
- 8.32 MB
- Format
- application/zip
- Description
- corpus
- MD5
- 99eee02e5394112a9249305b42c02309
- ice_homographs_labelled
- gellur.tsv159 kB
- villuna.tsv177 kB
- pollanna.tsv5 kB
- villunni.tsv54 kB
- pollinn.tsv155 kB
- halla.tsv1 MB
- gallanna.tsv9 kB
- ella.tsv1 MB
- pollarnir.tsv21 kB
- grilli.tsv266 kB
- ullar.tsv220 kB
- kalli.tsv1 MB
- holla.tsv350 kB
- halló.tsv328 kB
- villunnar.tsv14 kB
- göllum.tsv27 kB
- drolla.tsv45 kB
- villu.tsv674 kB
- malla.tsv430 kB
- gulla.tsv343 kB
- gallann.tsv116 kB
- villum.tsv231 kB
- gallinn.tsv159 kB
- möllum.tsv1 kB
- mallar.tsv76 kB
- böll.tsv328 kB
- palli.tsv676 kB
- holl.tsv479 kB
- grilla.tsv522 kB
- kalla.tsv2 MB
- dalla.tsv159 kB
- polli.tsv177 kB
- ollu.tsv845 kB
- bollunum.tsv86 kB
- gallanum.tsv65 kB
- böllum.tsv235 kB
- kolla.tsv260 kB
- galli.tsv165 kB
- villurnar.tsv153 kB
- palla.tsv402 kB
- villi.tsv354 kB
- dillum.tsv7 kB
- pollum.tsv91 kB
- polla.tsv118 kB
- villunum.tsv33 kB
- lalli.tsv276 kB
- pollana.tsv20 kB
- bolla.tsv327 kB
- bollum.tsv203 kB
- villan.tsv148 kB
- galla.tsv261 kB
- villanna.tsv1 kB
- villa.tsv1 MB
- gullu.tsv158 kB
- alla.tsv164 kB
- lalla.tsv261 kB
- gallana.tsv124 kB
- grillir.tsv31 kB
- pollar.tsv108 kB
- gallans.tsv26 kB
- villur.tsv540 kB
- hollum.tsv397 kB
- halli.tsv998 kB
- elli.tsv593 kB
- gella.tsv137 kB
- holli.tsv118 kB
- kollu.tsv207 kB
- dill.tsv262 kB
- gallar.tsv153 kB
- malli.tsv89 kB
- gulli.tsv810 kB
- README.txt2 kB
- ice_homographs.txt647 B