Show simple item record

 
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Steingrímsson, Steinþór
dc.contributor.author Hafsteinsdóttir, Hildur
dc.contributor.author Andrésdóttir, Þórdís Dröfn
dc.contributor.author Eiríksdóttir, Inga Guðrún
dc.contributor.author Magnússon, Bolli
dc.contributor.author Ingimundarson, Finnur
dc.date.accessioned 2022-02-14T09:30:36Z
dc.date.available 2022-02-14T09:30:36Z
dc.date.issued 2021-12-31
dc.identifier.uri http://hdl.handle.net/20.500.12537/192
dc.description [ENGLISH] The IGC-project (Icelandic Gigaword corpus) aims to collect as much as possible of Icelandic texts that can be published, under an open or restricted licence. The project is divided into eight individual corpora that are listed here below. Each corpus comes in two formats. One contains the texts untokenized and untagged where each paragraph is contained inside of a <p> tag, while the other one has been tokenized, POS-tagged and lemmatized. [ICELANDIC] IGC-verkefnið (Íslenska risamálheildin - Icelandic Gigaword corpus) hefur að markmiði að safna eins miklum texta og mögulegt er sem gefa má út með opnu eða takmörkuðu leyfi. Verkefnið samanstendur af átta sjálfstæðum málheildum sem eru listaðar hér að neðan. Hver málheild er gefin út í tveimur hlutum. Annar hlutinn inniheldur skjöl með hreinum texta, án þess að hann hafi verið tókaður. Hinn hlutinn inniheldur textann tókaðan, markaðan og lemmaðan. IGC-Adjud: http://hdl.handle.net/20.500.12537/101 IGC-Books: http://hdl.handle.net/20.500.12537/126 IGC-Journals: http://hdl.handle.net/20.500.12537/166 IGC-Laws: http://hdl.handle.net/20.500.12537/116 IGC-News1: http://hdl.handle.net/20.500.12537/141 IGC-News2: http://hdl.handle.net/20.500.12537/142 IGC-Parla: http://hdl.handle.net/20.500.12537/179 IGC-Social: http://hdl.handle.net/20.500.12537/138
dc.language.iso isl
dc.publisher The Árni Magnússon Institue for Icelandic Studies
dc.relation.isreferencedby https://www.aclweb.org/anthology/L18-1690.pdf
dc.rights.label PUB
dc.source.uri http://igc.arnastofnun.is
dc.subject igc
dc.subject gigaword corpus
dc.subject lemmatized
dc.subject pos-tagged
dc.subject pos
dc.title The Icelandic Gigaword Corpus (IGC) 2021
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files no
branding Clarin IS Repository
demo.uri https://malheildir.arnastofnun.is
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institue for Icelandic Studies
sponsor Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) The Icelandic Gigaword Corpus (G1) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 1871000000 words
files.size 0
files.count 0


Show simple item record