dc.contributor.author |
Barkarson, Starkaður |
dc.contributor.author |
Steingrímsson, Steinþór |
dc.contributor.author |
Hafsteinsdóttir, Hildur |
dc.contributor.author |
Andrésdóttir, Þórdís Dröfn |
dc.contributor.author |
Eiríksdóttir, Inga Guðrún |
dc.contributor.author |
Magnússon, Bolli |
dc.contributor.author |
Ingimundarson, Finnur |
dc.date.accessioned |
2022-02-14T09:30:36Z |
dc.date.available |
2022-02-14T09:30:36Z |
dc.date.issued |
2021-12-31 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/192 |
dc.description |
[ENGLISH]
The IGC-project (Icelandic Gigaword corpus) aims to collect as much as possible of Icelandic texts that can be published, under an open or restricted licence. The project is divided into eight individual corpora that are listed here below. Each corpus comes in two formats. One contains the texts untokenized and untagged where each paragraph is contained inside of a <p> tag, while the other one has been tokenized, POS-tagged and lemmatized.
[ICELANDIC]
IGC-verkefnið (Íslenska risamálheildin - Icelandic Gigaword corpus) hefur að markmiði að safna eins miklum texta og mögulegt er sem gefa má út með opnu eða takmörkuðu leyfi. Verkefnið samanstendur af átta sjálfstæðum málheildum sem eru listaðar hér að neðan. Hver málheild er gefin út í tveimur hlutum. Annar hlutinn inniheldur skjöl með hreinum texta, án þess að hann hafi verið tókaður. Hinn hlutinn inniheldur textann tókaðan, markaðan og lemmaðan.
IGC-Adjud: http://hdl.handle.net/20.500.12537/101
IGC-Books: http://hdl.handle.net/20.500.12537/126
IGC-Journals: http://hdl.handle.net/20.500.12537/166
IGC-Laws: http://hdl.handle.net/20.500.12537/116
IGC-News1: http://hdl.handle.net/20.500.12537/141
IGC-News2: http://hdl.handle.net/20.500.12537/142
IGC-Parla: http://hdl.handle.net/20.500.12537/179
IGC-Social: http://hdl.handle.net/20.500.12537/138 |
dc.language.iso |
isl |
dc.publisher |
The Árni Magnússon Institue for Icelandic Studies |
dc.relation.isreferencedby |
https://www.aclweb.org/anthology/L18-1690.pdf |
dc.rights.label |
PUB |
dc.source.uri |
http://igc.arnastofnun.is |
dc.subject |
igc |
dc.subject |
gigaword corpus |
dc.subject |
lemmatized |
dc.subject |
pos-tagged |
dc.subject |
pos |
dc.title |
The Icelandic Gigaword Corpus (IGC) 2021 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hidden |
false |
hasMetadata |
false |
has.files |
no |
branding |
Clarin IS Repository |
demo.uri |
https://malheildir.arnastofnun.is |
contact.person |
Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institue for Icelandic Studies |
sponsor |
Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) The Icelandic Gigaword Corpus (G1) Language Technology for Icelandic 2019-2023 nationalFunds |
size.info |
1871000000 words |
files.size |
0 |
files.count |
0 |