Sýna einfalda færslu atriðis
dc.contributor.author |
Barkarson, Starkaður |
dc.contributor.author |
Steingrímsson, Steinþór |
dc.date.accessioned |
2025-01-09T10:34:04Z |
dc.date.available |
2025-01-09T10:34:04Z |
dc.date.issued |
2024-12-31 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/359 |
dc.description |
ENGLISH:
This version, IGC-2024ext, is an extension to IGC-2022 [http://hdl.handle.net/20.500.12537/253] and in most cases only contains texts from 2022 and 2023 (see README for more details).
The IGC-project (Icelandic Gigaword corpus) aims to collect as much as possible of Icelandic texts that can be published, under an open or restricted licence. The project is divided into nine individual corpora, but this version only contains new data for five of them. Each corpus comes in two versions. One contains the texts untokenized and untagged where each paragraph is contained inside of a <p> tag, while the other one has been tokenized, POS-tagged and lemmatized. The corpora listed here below are the unannotated versions. The annotated corpora can be found at http://hdl.handle.net/20.500.12537/358. |
dc.description |
ÍSLENSKA:
Þessi útgáfa, IGC-2024ext, inniheldur viðbót við útgáfu IGC-2022 [http://hdl.handle.net/20.500.12537/253] og í flestum tilvikum innihalda málheildirnar texta frá 2022 og 2023 (sjá README fyrir nánari upplýsingar).
IGC-verkefnið (Íslenska risamálheildin - Icelandic Gigaword corpus) [https://igc.arnastofnun.is] hefur að markmiði að safna eins miklum texta og mögulegt er sem gefa má út með opnu eða takmörkuðu leyfi. Verkefnið samanstendur af níu sjálfstæðum málheildum en þessi útgáfa inniheldur aðeins gögn fyrir fimm þeirra. Hver málheild er gefin út í tveimur útgáfum. Önnur inniheldur skjöl með hreinum texta, án þess að hann hafi verið tókaður. Hin inniheldur textann tókaðan, markaðan og lemmaðan. Málheildirnar hér að neðan innihalda ómarkaðan texta. Mörkuðu útgáfuna má nágast á http://hdl.handle.net/20.500.12537/358.
Adjud http://hdl.handle.net/20.500.12537/333
Law http://hdl.handle.net/20.500.12537/353
News1 http://hdl.handle.net/20.500.12537/340
News2 http://hdl.handle.net/20.500.12537/338
Parla http://hdl.handle.net/20.500.12537/354 |
dc.language.iso |
isl |
dc.publisher |
The Árni Magnússon Institute for Icelandic Studies |
dc.relation.isreferencedby |
http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.254.pdf |
dc.relation.replaces |
http://hdl.handle.net/20.500.12537/253 |
dc.source.uri |
http://igc.arnastofnun.is |
dc.subject |
igc |
dc.subject |
rmh |
dc.subject |
risamálheild |
dc.subject |
gigaword |
dc.subject |
icelandic |
dc.title |
Icelandic Gigaword Corpus (IGC-2024ext) - unannotated version |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hidden |
false |
has.files |
yes |
branding |
Clarin IS Repository |
demo.uri |
https://malheildir.arnastofnun.is |
contact.person |
Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies |
size.info |
162171483 words |
files.size |
2222 |
files.count |
1 |
Files in this item
- Name
- README
- Size
- 2.17
KB
- Format
- Unknown
- Description
- Readme file
- MD5
- 78a4ab15a78a7f511b8484408c2a2923
Download file
Sýna einfalda færslu atriðis