Show simple item record

 
dc.contributor.author Steingrímsson, Steinþór
dc.date.accessioned 2021-10-01T08:22:19Z
dc.date.available 2021-10-01T08:22:19Z
dc.date.issued 2021-10-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/152
dc.description This is a training set for a classifier, using one or more of the following scores to select good quality parallel segments in comparable corpora or for filtering parallel corpora: LASER, LaBSE and WAScore. The package contains 51 thousand sentence pairs in the news domain, of which 1000 are true parallel segments and 50 thousand randomly selected. Þetta er þjáfunarsett fyrir flokkara, sem notar eitt eða fleiri af eftirtöldum skorum til að velja góðar samhliða setningar frá lakara setningum í sambærilegum málheildum, eða til að sía samhliða málheildir: LASER, LaBSE og WAScore. Pakkinn inniheldur 51 þúsund setningapör úr fréttatextum. Af þeim eru 1000 samhliða setningapör en setningarnar í hinum 50 þúsund pörunum eru slembivaldar úr fréttamálheildum.
dc.language.iso isl
dc.language.iso eng
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.subject comparable corpora
dc.subject parallel corpora
dc.subject parallel data
dc.subject machine translation
dc.subject filtering
dc.subject classification
dc.subject labse
dc.subject laser
dc.subject wascore
dc.title Icelandic-English Classification Training Set for Parallel Sentence Alignment Filtering (21.10)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
size.info 51000 sentences
files.size 4866874
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Icon
Name
parallel-sentence-classification.zip
Size
4.64 MB
Format
application/zip
Description
Unknown
MD5
e4dfa4275b43f68cac2535beb08f4b0e
 Download file  Preview
 File Preview  
    • sentence-scores.txt-1 B
    • README-1 B
    • EN-IS_sentence_pairs.txt-1 B

Show simple item record