Show simple item record

 
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Steingrímsson, Steinþór
dc.contributor.author Andrésdóttir, Þórdís Dröfn
dc.contributor.author Hafsteinsdóttir, Hildur
dc.contributor.author Ingimundarson, Finnur Ágúst
dc.contributor.author Magnússon, Árni Davíð
dc.date.accessioned 2022-09-22T12:52:39Z
dc.date.available 2022-09-22T12:52:39Z
dc.date.issued 2022-10-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/254
dc.description [ENGLISH] The IGC-project (Icelandic Gigaword corpus) aims to collect as much as possible of Icelandic texts that can be published, under an open or restricted licence. The project is divided into nine individual corpora that are listed here below. Each corpus comes in two versions. One contains the texts untokenized and untagged where each paragraph is contained inside of a <p> tag, while the other one has been tokenized, POS-tagged and lemmatized. The corpora listed here below are the annotated versions. The unannotated versions can be found at http://hdl.handle.net/20.500.12537/253. [ICELANDIC] IGC-verkefnið (Íslenska risamálheildin - Icelandic Gigaword corpus) hefur að markmiði að safna eins miklum texta og mögulegt er sem gefa má út með opnu eða takmörkuðu leyfi. Verkefnið samanstendur af níu sjálfstæðum málheildum sem eru listaðar hér að neðan. Hver málheild er gefin út í tveimur útgáfum. Önnur inniheldur skjöl með hreinum texta, án þess að hann hafi verið tókaður. Hin inniheldur textann tókaðan, markaðan og lemmaðan. Málheildirnar hér að neðan innihalda markaðan texta. Nálgast má ómörkuðu málheildirnar á http://hdl.handle.net/20.500.12537/253. Adjud http://hdl.handle.net/20.500.12537/241 Books http://hdl.handle.net/20.500.12537/317 Journals http://hdl.handle.net/20.500.12537/246 Law http://hdl.handle.net/20.500.12537/248 News1 http://hdl.handle.net/20.500.12537/237 News2 http://hdl.handle.net/20.500.12537/239 Parla http://hdl.handle.net/20.500.12537/216 Social http://hdl.handle.net/20.500.12537/243 Wiki http://hdl.handle.net/20.500.12537/252
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.isreferencedby http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.254.pdf
dc.source.uri https://igc.arnastofnun.is
dc.subject igc
dc.subject annotated
dc.subject pos-tagged
dc.subject lemmatized
dc.title Icelandic Gigaword Corpus (IGC-2022) - annotated version
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hasMetadata false
has.files no
branding Clarin IS Repository
demo.uri https://malheildir.arnastofnun.is
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) The Icelandic Gigaword Corpus (G1) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 2428573565 words
size.info 156052431 sentences
files.size 0
files.count 0


Show simple item record