Show simple item record

 
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Steingrímsson, Steinþór
dc.date.accessioned 2022-07-20T11:24:54Z
dc.date.available 2022-07-20T11:24:54Z
dc.date.issued 2022-10-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/237
dc.description ENGLISH: IGC-News1 and IGC-News2 are a part of the IGC-Project (https://igc.arnastofnun.is) that aims to collect as many as possible of Icelandic texts that can be published under an open or restricted licence. IGC-News1 has an open licence while IGC-News2 has a restricted licence. The two news-corpora contain texts from news media, online and written as well as some from tv and radio. Each text collection is published in two versions, as two independent corpora. One is unannotated and each paragraph is contained in a separate tag (p) while the second one is tokenized, lemmatized and morphosyntactically tagged. This corpus contains the tokenized and annotated version of IGC-News1, where each paragraph is contained inside of a <p> tag. The unannotated version can be found here: http://hdl.handle.net/20.500.12537/236.
dc.description ÍSLENSKA: IGC-News1 og IGC-News2 eru hluti af IGC-verkefninu (https://igc.arnastofnun.is) sem miðar að því að safna eins miklu og mögulegt er af íslenskum texta sem hægt er að birta með opnu eða takmörkuðu leyfi. IGC-News1 er með opið leyfi á meðan IGC-News2 er með takmarkað leyfi. Þessar tvær fréttamálheildir innihalda texta frá fréttamiðlum, á netinu og ritaða, auk sumra úr sjónvarpi og útvarpi. Hvert textasafn er gefið út í tveimur útgáfum, sem tveir sjálfstæðir hlutar. Önnur er ómörkuð og hver málsgrein er í sérstöku tagi (p) en sú síðari er tókuð, lemmuð og með málfræðilegum mörkum. Þessi málheild inniheldur tókaða og markaða útgáfu af IGC-News1. Ómarkaða útgáfu má finna hér: http://hdl.handle.net/20.500.12537/236.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.isreferencedby http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.254.pdf
dc.relation.replaces http://hdl.handle.net/20.500.12537/141
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri http://igc.arnastofnun.is
dc.subject corpora
dc.subject news
dc.subject pos
dc.subject lemmas
dc.subject lemmatized
dc.subject pos-tagged
dc.subject annotated
dc.title IGC-News1 22.10 (annotated version)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hasMetadata false
has.files yes
branding Clarin IS Repository
demo.uri https://malheildir.arnastofnun.is
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) The Icelandic Gigaword Corpus (G1) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 1787715 articles
size.info 23435693 sentences
size.info 396651451 words
size.info 436672313 tokens
files.size 8038944784
files.count 3


 Files in this item

 Download all files in item (7.49 GB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
IGC-News1-22.10.ana.zip
Size
2.49 GB
Format
application/zip
Description
IGC-News1-22.10.ana
MD5
322643e5b8ca0a72be4dd095e4bee387
 Download file
Icon
Name
IGC-News1-22.10.ana.z01
Size
5 GB
Format
Unknown
Description
IGC-News1-22.10.ana (part 2 of zip-file)
MD5
6b0e740463e996c75e622c53060d3dde
 Download file
Icon
Name
readme
Size
715 bytes
Format
Unknown
Description
instructions on how to unzip
MD5
d5457a14c854ddb8078d0bd890e99aeb
 Download file

Show simple item record