Show simple item record

 
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Steingrímsson, Steinþór
dc.date.accessioned 2021-09-29T15:10:36Z
dc.date.available 2021-09-29T15:10:36Z
dc.date.issued 2021-09-30
dc.identifier.uri http://hdl.handle.net/20.500.12537/142
dc.description [ENGLISH] IGC-News2 is a part of the IGC-project (Icelandic Gigaword corpus) that aims to collect as much as possible of Icelandic texts that can be published under aThe corpus-file is temporarily available at https://repository.clarin.is/opin_gogn/documents/IGC-News1-21.05.zip but will later be uploaded to the repository.n open or restricted license. IGC-News1 contains texts from news media. IGC-News2 is published under a restricted licence while IGC-News1 is published under CC_BY. The corpus comes in two formats. One contains the texts untokenized and untagged while the other has been tokenized, POS-tagged and lemmatized. [ICELANDIC] IGC-News2 er hluti af IGC-verkefninu (Íslenska risamálheildin - Icelandic Gigaword corpus) sem hefur að markmiði að safna eins miklum texta og mögulegt er sem gefa má út með opnu eða takmörkuðu leyfi. IGC-News2 inniheldur texta fréttamiðla. IGC-News2 er gefin út með takmörkuðu leyfi en IGC-News1 er gefin út með CC_BY-leyfi. Málheildin er tvískipt. Annar hluti hennar inniheldur skjöl með hreinum texta, án þess að hann hafi verið tókaður. Hinn hlutinn inniheldur textann tókaðan, markaðan og lemmaðan. The corpus-file is temporarily available at https://repository.clarin.is/opin_gogn/documents/IGC-News2-21.05.zip but will later be uploaded to the repository.
dc.language.iso isl
dc.publisher The Árni Magnússon Institue for Icelandic Studies
dc.relation.isreplacedby http://hdl.handle.net/20.500.12537/239
dc.relation.isreplacedby http://hdl.handle.net/20.500.12537/238
dc.rights Icelandic Gigaword Corpus
dc.rights.uri https://repository.clarin.is/repository/xmlui/page/license-gigaword-corpus
dc.rights.label PUB
dc.source.uri http://igc.arnastofnun.is
dc.subject corpora
dc.subject news
dc.subject pos-tagged
dc.subject lemmatized
dc.subject tei
dc.title IGC-News2-21.05 (The Icelandic Gigaword Corpus: News 2)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files no
branding Clarin IS Repository
demo.uri https://malheildir.arnastofnun.is
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institue for Icelandic Studies
sponsor Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) Language Technology for Icelandic 2019-2023 Icelandic Gigaword Corpus (G1) nationalFunds
size.info 49246907 sentences
size.info 855480334 words
size.info 952174584 tokens
files.size 0
files.count 0


Show simple item record