• Home
  • Repository
  • About Clarin-IS
  • CLARIN
  •  Login
  • English íslenska
  • CLARIN-IS Repository Home
  • View Item
  •  
  •   What can you do?
  •   Browse  
    •    All of the Repository  
      •   Issue Date
      •   Authors
      •   Titles
      •   Subjects
      •   Publisher
      •   Language
      •   Type
      •   Rights Label
  •   My Account  
    •    Login
  •   Statistics  
    •    StatisticsBETA
  •   General Information  
    •    Deposit
    •    Cite
    •    Submission Lifecycle
    •    FAQ
    •    About
    •    Help Desk
 
 

Text Normalization Corpus 21.10 (2021-10-25)

 
Clarin IS Repository
  Authors
Sigurðardóttir, Helga Svala
  Item identifier
http://hdl.handle.net/20.500.12537/158
 Project URL
https://github.com/cadia-lvl/regina_normalizer
 Referenced by
https://aclanthology.org/2021.nodalida-main.45.pdf
 Date issued
2021-10-01
 Type
corpus, text
 Size
6 files
 Language(s)
Icelandic
 Description
A corpus of: * 70,000 sentences taken from general text, both before normalization and normalized using Regína normalizer * 70,000 sentences taken from sports news, both before normalization and normalized using Regína normalizer * 40,000 sentences taken from all domains, manually normalized Textasafn sem samanstendur af: * 70,000 setningum af almennum fréttum, bæði fyrir normun og eftir normun með Regínu normara * 70,000 setningum af íþróttafréttum, bæði fyrir normun og eftir normun með Regínu normara * 40,000 handnormuðum setningum úr alls konar texta
 Publisher
Reykjavik University
 Acknowledgement

Ministry of Education, Science and Culture

Project code: Text Normalization Corpus (T9)

Project name: Language Technology for Icelandic 2019-2023

 Subject(s)
text-normalization normalization
 Collection(s)
Clarin IS
 Other versions
Show full item record
 
 

Partners, Coordination, Funding

  • Arni Magnusson Institute
  • Ministry of Education, Science and Culture

Repository

  • Main page
  • Submission Lifecycle
  • FAQ
  • About and Policies

More

  • CLARIN
  • META-Net

CLARIN-IS IS FULLY SUPPORTED BY THE MINISTRY OF EDUCATION, SCIENCE and CULTURE

Copyright (c) 2019 Stofnun Árna Magnússonar. All rights reserved.