dc.contributor.author |
Barkarson, Starkaður |
dc.contributor.author |
Steingrímsson, Steinþór |
dc.date.accessioned |
2021-09-29T15:10:36Z |
dc.date.available |
2021-09-29T15:10:36Z |
dc.date.issued |
2021-09-30 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/142 |
dc.description |
[ENGLISH]
IGC-News2 is a part of the IGC-project (Icelandic Gigaword corpus) that aims to collect as much as possible of Icelandic texts that can be published under aThe corpus-file is temporarily available at https://repository.clarin.is/opin_gogn/documents/IGC-News1-21.05.zip but will later be uploaded to the repository.n open or restricted license. IGC-News1 contains texts from news media. IGC-News2 is published under a restricted licence while IGC-News1 is published under CC_BY. The corpus comes in two formats. One contains the texts untokenized and untagged while the other has been tokenized, POS-tagged and lemmatized.
[ICELANDIC]
IGC-News2 er hluti af IGC-verkefninu (Íslenska risamálheildin - Icelandic Gigaword corpus) sem hefur að markmiði að safna eins miklum texta og mögulegt er sem gefa má út með opnu eða takmörkuðu leyfi. IGC-News2 inniheldur texta fréttamiðla. IGC-News2 er gefin út með takmörkuðu leyfi en IGC-News1 er gefin út með CC_BY-leyfi. Málheildin er tvískipt. Annar hluti hennar inniheldur skjöl með hreinum texta, án þess að hann hafi verið tókaður. Hinn hlutinn inniheldur textann tókaðan, markaðan og lemmaðan.
The corpus-file is temporarily available at https://repository.clarin.is/opin_gogn/documents/IGC-News2-21.05.zip but will later be uploaded to the repository. |
dc.language.iso |
isl |
dc.publisher |
The Árni Magnússon Institue for Icelandic Studies |
dc.relation.isreplacedby |
http://hdl.handle.net/20.500.12537/239 |
dc.relation.isreplacedby |
http://hdl.handle.net/20.500.12537/238 |
dc.rights |
Icelandic Gigaword Corpus |
dc.rights.uri |
https://repository.clarin.is/repository/xmlui/page/license-gigaword-corpus |
dc.rights.label |
PUB |
dc.source.uri |
http://igc.arnastofnun.is |
dc.subject |
corpora |
dc.subject |
news |
dc.subject |
pos-tagged |
dc.subject |
lemmatized |
dc.subject |
tei |
dc.title |
IGC-News2-21.05 (The Icelandic Gigaword Corpus: News 2) |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
no |
branding |
Clarin IS Repository |
demo.uri |
https://malheildir.arnastofnun.is |
contact.person |
Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institue for Icelandic Studies |
sponsor |
Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) Language Technology for Icelandic 2019-2023 Icelandic Gigaword Corpus (G1) nationalFunds |
size.info |
49246907 sentences |
size.info |
855480334 words |
size.info |
952174584 tokens |
files.size |
0 |
files.count |
0 |