dc.contributor.author | Sigurðardóttir, Helga Svala |
dc.date.accessioned | 2021-10-25T16:39:28Z |
dc.date.available | 2021-10-25T16:39:28Z |
dc.date.issued | 2021-10-01 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/158 |
dc.description | A corpus of: * 70,000 sentences taken from general text, both before normalization and normalized using Regína normalizer * 70,000 sentences taken from sports news, both before normalization and normalized using Regína normalizer * 40,000 sentences taken from all domains, manually normalized Textasafn sem samanstendur af: * 70,000 setningum af almennum fréttum, bæði fyrir normun og eftir normun með Regínu normara * 70,000 setningum af íþróttafréttum, bæði fyrir normun og eftir normun með Regínu normara * 40,000 handnormuðum setningum úr alls konar texta |
dc.language.iso | isl |
dc.publisher | Reykjavik University |
dc.relation.isreferencedby | https://aclanthology.org/2021.nodalida-main.45.pdf |
dc.relation.replaces | http://hdl.handle.net/20.500.12537/155 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://github.com/cadia-lvl/regina_normalizer |
dc.subject | text-normalization |
dc.subject | normalization |
dc.title | Text Normalization Corpus 21.10 (2021-10-25) |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Helga Svala Sigurðardóttir helgas@ru.is Reykjavik University |
sponsor | Ministry of Education, Science and Culture Text Normalization Corpus (T9) Language Technology for Icelandic 2019-2023 nationalFunds |
size.info | 6 files |
files.size | 17798265 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- NormalizationCorpus21.10.zip
- Size
- 16.97 MB
- Format
- application/zip
- Description
- Normalization Corpus
- MD5
- 50e4a0f50f281de9d1e0c7a793421871
- NormalizationCorpus21.10
- README.md-1 B
- normalized_manual.txt-1 B
- normalized_other_aligned.txt-1 B
- original_other_aligned.txt-1 B
- normalized_sport_aligned.txt-1 B
- original_manual.txt-1 B
- original_sport_aligned.txt-1 B
- __MACOSX
- ._NormalizationCorpus21.10-1 B
- NormalizationCorpus21.10
- ._original_manual.txt-1 B
- ._original_other_aligned.txt-1 B
- ._normalized_other_aligned.txt-1 B