dc.contributor.author |
Barkarson, Starkaður |
dc.contributor.author |
Steingrímsson, Steinþór |
dc.date.accessioned |
2021-09-29T15:02:40Z |
dc.date.available |
2021-09-29T15:02:40Z |
dc.date.issued |
2021-09-30 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/141 |
dc.description |
[ENGLISH]
IGC-News1 is a part of the IGC-project (Icelandic Gigaword corpus) that aims to collect as much as possible of Icelandic texts that can be published under an open or restricted license. IGC-News1 contains texts from news media. IGC-News1 is published under CC_BY while IGC-News2 is published under a restricted licence. The corpus comes in two formats. One contains the texts untokenized and untagged while the other has been tokenized, POS-tagged and lemmatized.
[ICELANDIC]
IGC-News1 er hluti af IGC-verkefninu (Íslenska risamálheildin - Icelandic Gigaword corpus) sem hefur að markmiði að safna eins miklum texta og mögulegt er sem gefa má út með opnu eða takmörkuðu leyfi. IGC-News1 inniheldur texta fréttamiðla. IGC-News1 er gefin út með CC_BY leyfi en IGC-News2 er gefin út með takmökuðuð leyfi. Málheildin er tvískipt. Annar hluti hennar inniheldur skjöl með hreinum texta, án þess að hann hafi verið tókaður. Hinn hlutinn inniheldur textann tókaðan, markaðan og lemmaðan.
The corpus-file is temporarily available at https://repository.clarin.is/opin_gogn/documents/IGC-News1-21.05.zip but will later be uploaded to the repository. |
dc.language.iso |
isl |
dc.publisher |
The Árni Magnússon Institue for Icelandic Studies |
dc.relation.isreferencedby |
https://www.aclweb.org/anthology/L18-1690.pdf |
dc.relation.isreplacedby |
http://hdl.handle.net/20.500.12537/237 |
dc.relation.isreplacedby |
http://hdl.handle.net/20.500.12537/236 |
dc.rights |
Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri |
https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label |
PUB |
dc.source.uri |
http://igc.arnastofnun.is |
dc.subject |
corpora |
dc.subject |
news |
dc.subject |
pos-tagged |
dc.subject |
lemmatized |
dc.subject |
tei |
dc.title |
IGC-News1-21.05 (The Icelandic Gigaword Corpus: News 1) |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
no |
branding |
Clarin IS Repository |
demo.uri |
https://malheildir.arnastofnun.is |
contact.person |
Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institue for Icelandic Studies |
sponsor |
Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) Language Technology for Icelandic 2019-2023 Icelandic Gigaword Corpus (G1) nationalFunds |
size.info |
20952712 sentences |
size.info |
354459688 words |
size.info |
390196047 tokens |
files.size |
0 |
files.count |
0 |