Sýna einfalda færslu atriðis
dc.contributor.author |
Barkarson, Starkaður |
dc.contributor.author |
Andrésdóttir, Þórdís Dröfn |
dc.contributor.author |
Hafsteinsdóttir, Hildur |
dc.contributor.author |
Magnússon, Árni Davíð |
dc.contributor.author |
Rúnarsson, Kristján |
dc.contributor.author |
Steingrímsson, Steinþór |
dc.contributor.author |
Jónsson, Haukur Páll |
dc.contributor.author |
Loftsson, Hrafn |
dc.contributor.author |
Sigurðsson, Einar Freyr |
dc.contributor.author |
Rögnvaldsson, Eiríkur |
dc.contributor.author |
Helgadóttir, Sigrún |
dc.date.accessioned |
2021-06-03T07:32:36Z |
dc.date.available |
2021-06-03T07:32:36Z |
dc.date.issued |
2021-06-02 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/113 |
dc.description |
[ENGLISH] MIM-GOLD 21.05 is a gold standard for PoS-tagging and lemmatizing Icelandic texts. This new version contains the same texts as version 20.05 but lemmas have been added and some corrections have been made to the PoS-tagging. The gold standard contains approximately 1 million running words with manually annotated PoS-tags and lemmas. The texts are from The Tagged Icelandic Corpus (MÍM), which was published in 2013. The tagset was revised in 2019-2020. It builds upon a tagging scheme created for the Icelandic Frequency Dictionary in 1991. The tagging scheme is described in the package.
[ICELANDIC] MÍM-GULL 21.05 er gullstaðall fyrir mörkun of lemmun íslenskra texta. Þessi nýja útgáfa inniheldur sama texta og útgáfa 20.05 en lemmum hefur verið bætt við og einhver mörk leiðrétt. Gullstaðallinn inniheldur u.þ.b. 1 milljón orða og mörkin eru handyfirfarin. Textarnir eru úr Markaðri íslenskri málheild (MÍM), sem var gefin út 2013. Markamengið var endurskoðað 2019-2020. Það byggir á markaskrá sem var gerð fyrir Íslenska orðtíðnibók árið 1991. Markamenginu er lýst í skrá sem fylgir gullstaðlinum. |
dc.language.iso |
isl |
dc.publisher |
The Árni Magnússon Institute for Icelandic Studies |
dc.relation.isreferencedby |
http://www.ru.is/~hrafn/papers/corpusTagging.final.pdf |
dc.rights |
Icelandic Mim Gold Standard for PoS Tagging |
dc.rights.uri |
https://repository.clarin.is/repository/xmlui/page/license-mim-gold |
dc.rights.label |
PUB |
dc.subject |
gold standard |
dc.subject |
pos-tagging |
dc.subject |
morphosyntactic tagging |
dc.subject |
lemmatization |
dc.title |
MIM-GOLD 21.05 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
Clarin IS Repository |
contact.person |
Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies |
sponsor |
Ministry of Education, Science and Culture A Gold Standard for PoS Tagging (G10) Language Technology for Icelandic 2019-2023 nationalFunds |
size.info |
1000218 tokens |
size.info |
58412 sentences |
files.size |
9284697 |
files.count |
1 |
Files in this item
This item is
Publicly Available
and licensed under:
Icelandic Mim Gold Standard for PoS Tagging
- Name
- MIM-GOLD-21.05.zip
- Size
- 8.85
MB
- Format
- application/zip
- Description
- MIM-GOLD-21.05
- MD5
- 46aefe051f79300c58fd2b63056749b5
Download file
Preview
- MIM-GOLD21.05
- MIM_GOLD_DESCRIPTION_EN.pdf-1 B
- MIM_GOLD_DESCRIPTION_IS.pdf-1 B
- mim.tsv-1 B
- README-1 B
- data
- fbl.tsv-1 B
- websites.tsv-1 B
- blog.tsv-1 B
- webmedia.tsv-1 B
- school-essays.tsv-1 B
- scienceweb.tsv-1 B
- emails.tsv-1 B
- written-to-be-spoken.tsv-1 B
- books.tsv-1 B
- adjucations.tsv-1 B
- laws.tsv-1 B
- mbl.tsv-1 B
- radio-tv-news.tsv-1 B
Sýna einfalda færslu atriðis