Sýna einfalda færslu atriðis

 
dc.contributor.author Sigurðardóttir, Helga Svala
dc.date.accessioned 2021-10-25T16:39:28Z
dc.date.available 2021-10-25T16:39:28Z
dc.date.issued 2021-10-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/158
dc.description A corpus of: * 70,000 sentences taken from general text, both before normalization and normalized using Regína normalizer * 70,000 sentences taken from sports news, both before normalization and normalized using Regína normalizer * 40,000 sentences taken from all domains, manually normalized Textasafn sem samanstendur af: * 70,000 setningum af almennum fréttum, bæði fyrir normun og eftir normun með Regínu normara * 70,000 setningum af íþróttafréttum, bæði fyrir normun og eftir normun með Regínu normara * 40,000 handnormuðum setningum úr alls konar texta
dc.language.iso isl
dc.publisher Reykjavik University
dc.relation.isreferencedby https://aclanthology.org/2021.nodalida-main.45.pdf
dc.relation.replaces http://hdl.handle.net/20.500.12537/155
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/cadia-lvl/regina_normalizer
dc.subject text-normalization
dc.subject normalization
dc.title Text Normalization Corpus 21.10 (2021-10-25)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Helga Svala Sigurðardóttir helgas@ru.is Reykjavik University
sponsor Ministry of Education, Science and Culture Text Normalization Corpus (T9) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 6 files
files.size 17798265
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
NormalizationCorpus21.10.zip
Size
16.97 MB
Format
application/zip
Description
Normalization Corpus
MD5
50e4a0f50f281de9d1e0c7a793421871
 Download file  Preview
 File Preview  
  • NormalizationCorpus21.10
    • README.md-1 B
    • normalized_manual.txt-1 B
    • normalized_other_aligned.txt-1 B
    • original_other_aligned.txt-1 B
    • normalized_sport_aligned.txt-1 B
    • original_manual.txt-1 B
    • original_sport_aligned.txt-1 B
  • __MACOSX
    • ._NormalizationCorpus21.10-1 B
    • NormalizationCorpus21.10
      • ._original_manual.txt-1 B
      • ._original_other_aligned.txt-1 B
      • ._normalized_other_aligned.txt-1 B

Sýna einfalda færslu atriðis