Show simple item record

 
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Steingrímsson, Steinþór
dc.date.accessioned 2022-07-20T11:25:11Z
dc.date.available 2022-07-20T11:25:11Z
dc.date.issued 2022-10-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/239
dc.description ENGLISH: IGC-News1 and IGC-News2 are a part of the IGC-Project (https://igc.arnastofnun.is) that aims to collect as many as possible of Icelandic texts that can be published under an open or restricted licence. IGC-News1 has an open licence while IGC-News2 has a restricted licence. The two news-corpora contain texts from news media, online and written as well as some from tv and radio. Each text collection is published in two versions, as two independent corpora. One is unannotated and each paragraph is contained in a separate tag (p) while the second one is tokenized, lemmatized and morphosyntactically tagged. This corpus contains the tokenized and annotated version of IGC-News2, where each paragraph is contained inside of a <p> tag. The unannotated version can be found here: http://hdl.handle.net/20.500.12537/238.
dc.description ÍSLENSKA: IGC-News1 og IGC-News2 eru hluti af IGC-verkefninu (https://igc.arnastofnun.is) sem miðar að því að safna eins miklu og mögulegt er af íslenskum texta sem hægt er að birta með opnu eða takmörkuðu leyfi. IGC-News1 er með opið leyfi á meðan IGC-News2 er með takmarkað leyfi. Þessar tvær fréttamálheildir innihalda texta frá fréttamiðlum, á netinu og ritaða, auk sumra úr sjónvarpi og útvarpi. Hvert textasafn er gefið út í tveimur útgáfum, sem tveir sjálfstæðir hlutar. Önnur er ómörkuð og hver málsgrein er í sérstöku tagi (p) en sú síðari er tókuð, lemmuð og með málfræðilegum mörkum. Þessi málheild inniheldur tókaða og markaða útgáfu af IGC-News2. Ómarkaða útgáfu má finna hér: http://hdl.handle.net/20.500.12537/238.
dc.language.iso isl
dc.publisher The Árni Magnússon Institue for Icelandic Studies
dc.relation.isreferencedby http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.254.pdf
dc.rights Icelandic Gigaword Corpus
dc.rights.uri https://repository.clarin.is/repository/xmlui/page/license-gigaword-corpus
dc.rights.label PUB
dc.source.uri http://igc.arnastofnun.is
dc.subject corpora
dc.subject news
dc.subject annotated
dc.subject pos
dc.subject pos-tagged
dc.subject lemmas
dc.subject lemmatized
dc.subject TEI
dc.title IGC-News2-22.10 (annotated version)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hasMetadata false
has.files yes
branding Clarin IS Repository
demo.uri https://malheildir.arnastofnun.is
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institue for Icelandic Studies
sponsor Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) Language Technology for Icelandic 2019-2023 The Icelandic Gigaword Corpus (G1) nationalFunds
size.info 3225606 articles
size.info 51915950 sentences
size.info 899836406 words
size.info 1001582774 tokens
files.size 16924104569
files.count 5


 Files in this item

 Download all files in item (15.76 GB)
This item is
Publicly Available
and licensed under:
Icelandic Gigaword Corpus
Icon
Name
IGC-News2-22.10.ana.zip
Size
1.11 GB
Format
application/zip
Description
zip-file 1/4
MD5
c80c8617c24bdb6d220fca0ae93d0b17
 Download file
Icon
Name
IGC-News2-22.10.ana.z01
Size
4.88 GB
Format
Unknown
Description
zip file 2/4
MD5
d3fc513f81e101d292b3aa079255396a
 Download file
Icon
Name
IGC-News2-22.10.ana.z02
Size
4.88 GB
Format
Unknown
Description
zip file 3/4
MD5
9997764e4d3d051fc30cdd5530b646c2
 Download file
Icon
Name
IGC-News2-22.10.ana.z03
Size
4.88 GB
Format
Unknown
Description
zip file 4/4
MD5
06caf1ac72902491a30b7c8243db0afe
 Download file
Icon
Name
readme
Size
716 bytes
Format
Unknown
Description
instructions for how to unzip
MD5
e2577385ef9e54eedd890d43918f3c77
 Download file

Show simple item record