• Home
  • Repository
  • About CLARIN-IS
  • CLARIN
  •  Login
  • English íslenska
  • CLARIN-IS Repository Home
  • View Item
  •  
  •   What can you do?
  •   Browse  
    •    All of the Repository  
      •   Issue Date
      •   Authors
      •   Titles
      •   Subjects
      •   Publisher
      •   Language
      •   Type
      •   Rights Label
  •   My Account  
    •    Login
  •   Statistics  
    •    StatisticsBETA
  •   General Information  
    •    Deposit
    •    Cite
    •    Submission Lifecycle
    •    FAQ
    •    About
    •    Help Desk
 
 

IGC-News1-21.05 (The Icelandic Gigaword Corpus: News 1)

 
Clarin IS Repository
  Authors
Barkarson, Starkaður and Steingrímsson, Steinþór
  Item identifier
http://hdl.handle.net/20.500.12537/141
 Project URL
http://igc.arnastofnun.is
 Demo URL
https://malheildir.arnastofnun.is
 Referenced by
https://www.aclweb.org/anthology/L18-1690.pdf
 Date issued
2021-09-30
 Type
corpus, text
 Size
20952712 sentences, 354459688 words, 390196047 tokens
 Language(s)
Icelandic
 Description
[ENGLISH] IGC-News1 is a part of the IGC-project (Icelandic Gigaword corpus) that aims to collect as much as possible of Icelandic texts that can be published under an open or restricted license. IGC-News1 contains texts from news media. IGC-News1 is published under CC_BY while IGC-News2 is published under a restricted licence. The corpus comes in two formats. One contains the texts untokenized and untagged while the other has been tokenized, POS-tagged and lemmatized. [ICELANDIC] IGC-News1 er hluti af IGC-verkefninu (Íslenska risamálheildin - Icelandic Gigaword corpus) sem hefur að markmiði að safna eins miklum texta og mögulegt er sem gefa má út með opnu eða takmörkuðu leyfi. IGC-News1 inniheldur texta fréttamiðla. IGC-News1 er gefin út með CC_BY leyfi en IGC-News2 er gefin út með takmökuðuð leyfi. Málheildin er tvískipt. Annar hluti hennar inniheldur skjöl með hreinum texta, án þess að hann hafi verið tókaður. Hinn hlutinn inniheldur textann tókaðan, markaðan og lemmaðan. The corpus-file is temporarily available at https://repository.clarin.is/opin_gogn/documents/IGC-News1-21.05.zip but will later be uploaded to the repository.
 Publisher
The Árni Magnússon Institue for Icelandic Studies
 Acknowledgement

Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið)

Project code: Language Technology for Icelandic 2019-2023

Project name: Icelandic Gigaword Corpus (G1)

 Subject(s)
corpora news pos-tagged lemmatized tei
 Collection(s)
Clarin IS
 
This item is replaced by newer submissions:
http://hdl.handle.net/20.500.12537/237
http://hdl.handle.net/20.500.12537/236
Show full item record
 
 

Partners, Coordination, Funding

  • Arni Magnusson Institute for Icelandic Studies
  • Ministry of Culture and Business Affairs

Repository

  • Main page
  • Submission Lifecycle
  • FAQ
  • About and Policies

More

  • CLARIN
  • META-Net

CLARIN-IS is fully supported by the Ministry of Culture and Business Affairs

Copyright (c) 2023. Arni Magnusson Institute for Icelandic Studies. All rights reserved.