Show simple item record

 
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Steingrímsson, Steinþór
dc.date.accessioned 2021-09-29T15:02:40Z
dc.date.available 2021-09-29T15:02:40Z
dc.date.issued 2021-09-30
dc.identifier.uri http://hdl.handle.net/20.500.12537/141
dc.description [ENGLISH] IGC-News1 is a part of the IGC-project (Icelandic Gigaword corpus) that aims to collect as much as possible of Icelandic texts that can be published under an open or restricted license. IGC-News1 contains texts from news media. IGC-News1 is published under CC_BY while IGC-News2 is published under a restricted licence. The corpus comes in two formats. One contains the texts untokenized and untagged while the other has been tokenized, POS-tagged and lemmatized. [ICELANDIC] IGC-News1 er hluti af IGC-verkefninu (Íslenska risamálheildin - Icelandic Gigaword corpus) sem hefur að markmiði að safna eins miklum texta og mögulegt er sem gefa má út með opnu eða takmörkuðu leyfi. IGC-News1 inniheldur texta fréttamiðla. IGC-News1 er gefin út með CC_BY leyfi en IGC-News2 er gefin út með takmökuðuð leyfi. Málheildin er tvískipt. Annar hluti hennar inniheldur skjöl með hreinum texta, án þess að hann hafi verið tókaður. Hinn hlutinn inniheldur textann tókaðan, markaðan og lemmaðan. The corpus-file is temporarily available at https://repository.clarin.is/opin_gogn/documents/IGC-News1-21.05.zip but will later be uploaded to the repository.
dc.language.iso isl
dc.publisher The Árni Magnússon Institue for Icelandic Studies
dc.relation.isreferencedby https://www.aclweb.org/anthology/L18-1690.pdf
dc.relation.isreplacedby http://hdl.handle.net/20.500.12537/237
dc.relation.isreplacedby http://hdl.handle.net/20.500.12537/236
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri http://igc.arnastofnun.is
dc.subject corpora
dc.subject news
dc.subject pos-tagged
dc.subject lemmatized
dc.subject tei
dc.title IGC-News1-21.05 (The Icelandic Gigaword Corpus: News 1)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files no
branding Clarin IS Repository
demo.uri https://malheildir.arnastofnun.is
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institue for Icelandic Studies
sponsor Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) Language Technology for Icelandic 2019-2023 Icelandic Gigaword Corpus (G1) nationalFunds
size.info 20952712 sentences
size.info 354459688 words
size.info 390196047 tokens
files.size 0
files.count 0


Show simple item record