dc.contributor.author | Steingrímsson, Steinþór |
dc.date.accessioned | 2021-10-01T08:22:19Z |
dc.date.available | 2021-10-01T08:22:19Z |
dc.date.issued | 2021-10-01 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/152 |
dc.description | This is a training set for a classifier, using one or more of the following scores to select good quality parallel segments in comparable corpora or for filtering parallel corpora: LASER, LaBSE and WAScore. The package contains 51 thousand sentence pairs in the news domain, of which 1000 are true parallel segments and 50 thousand randomly selected. Þetta er þjáfunarsett fyrir flokkara, sem notar eitt eða fleiri af eftirtöldum skorum til að velja góðar samhliða setningar frá lakara setningum í sambærilegum málheildum, eða til að sía samhliða málheildir: LASER, LaBSE og WAScore. Pakkinn inniheldur 51 þúsund setningapör úr fréttatextum. Af þeim eru 1000 samhliða setningapör en setningarnar í hinum 50 þúsund pörunum eru slembivaldar úr fréttamálheildum. |
dc.language.iso | isl |
dc.language.iso | eng |
dc.publisher | The Árni Magnússon Institute for Icelandic Studies |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.subject | comparable corpora |
dc.subject | parallel corpora |
dc.subject | parallel data |
dc.subject | machine translation |
dc.subject | filtering |
dc.subject | classification |
dc.subject | labse |
dc.subject | laser |
dc.subject | wascore |
dc.title | Icelandic-English Classification Training Set for Parallel Sentence Alignment Filtering (21.10) |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies |
size.info | 51000 sentences |
files.size | 4866874 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- parallel-sentence-classification.zip
- Size
- 4.64 MB
- Format
- application/zip
- Description
- Unknown
- MD5
- e4dfa4275b43f68cac2535beb08f4b0e