Show simple item record

 
dc.contributor.author Sigurðardóttir, Helga Svala
dc.date.accessioned 2021-10-11T15:21:07Z
dc.date.available 2021-10-11T15:21:07Z
dc.date.issued 2021-10-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/155
dc.description A corpus of: * 70,000 sentences taken from general text, both before normalization and normalized using Regína normalizer * 70,000 sentences taken from sports news, both before normalization and normalized using Regína normalizer * 40,000 sentences taken from all domains, manually normalized Textasafn sem samanstendur af: * 70,000 setningum af almennum fréttum, bæði fyrir normun og eftir normun með Regínu normara * 70,000 setningum af íþróttafréttum, bæði fyrir normun og eftir normun með Regínu normara * 40,000 handnormuðum setningum úr alls konar texta
dc.language.iso isl
dc.publisher Reykjavik University
dc.relation.isreferencedby https://aclanthology.org/2021.nodalida-main.45.pdf
dc.relation.isreplacedby http://hdl.handle.net/20.500.12537/158
dc.rights Icelandic Gigaword Corpus
dc.rights.uri https://repository.clarin.is/repository/xmlui/page/license-gigaword-corpus
dc.rights.label PUB
dc.source.uri https://github.com/cadia-lvl/regina_normalizer
dc.subject text-normalization
dc.subject normalization
dc.title Text Normalization Corpus 21.10
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Helga Svala Sigurðardóttir helgas@ru.is Reykjavik University
sponsor Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) Máltækniáætlun Máltækniáætlun nationalFunds
size.info 6 files
files.size 17796824
files.count 1
withdrawn.reason Duplication of http://hdl.handle.net/20.500.12537/158


 Files in this item

This item is
Publicly Available
and licensed under:
Icelandic Gigaword Corpus
Icon
Name
NormalizationCorpus21.10.zip
Size
16.97 MB
Format
application/zip
Description
NormalizationCorpus21.10
MD5
04dfd3cb59ec0dbd51caba6c01b0ee39
 Download file  Preview
 File Preview  
  • NormalizationCorpus21.10
    • normalized_other_aligned.txt-1 B
    • normalized_manual.txt-1 B
    • README.md-1 B
    • original_other_aligned.txt-1 B
    • original_manual.txt-1 B
    • normalized_sport_aligned.txt-1 B
    • original_sport_aligned.txt-1 B

Show simple item record