Show simple item record

 
dc.contributor.author Símonarson, Haukur Barri
dc.contributor.author Jónsson, Haukur Páll
dc.contributor.author Ragnarsson, Pétur Orri
dc.contributor.author Ingólfsdóttir, Svanhvít Lilja
dc.contributor.author Þorsteinsson, Vilhjálmur
dc.contributor.author Snæbjarnarson, Vésteinn
dc.date.accessioned 2022-09-23T09:19:45Z
dc.date.available 2022-09-23T09:19:45Z
dc.date.issued 2022-09-19
dc.identifier.uri http://hdl.handle.net/20.500.12537/259
dc.description This Icelandic-Polish translation model (bi-directional) was trained using fairseq (https://github.com/facebookresearch/fairseq) by means of semi-supervised translation by starting with the mBART50 model. The model was then trained using a multi-task curriculum to first learn to denoise sentences. Then the model was trained to translate using aligned parallel texts. Finally the model was provided with monolingual texts in both Icelandic and Polish with which it iteratively creates back-translations. For the PL-IS direction the model achieves a BLEU score of 27.60 on held out true parallel training data and 15.30 on the out-of-domain Flores devset. For the IS-PL direction the model achieves a score of 27.70 on the true data and 13.30 on the Flores devset. -- Þetta íslensk-pólska þýðingarlíkan (tvíátta) var þjálfað með fairseq (https://github.com/facebookresearch/fairseq) með hálf-sjálfvirkum aðferðum frá mBART50 líkaninu. Líkanið var þjálfað á þremur verkefnum, afruglun, samhliða þýðingum og bakþýðingum sem voru myndaðar á þjálfunartíma. Fyrir PL-IS áttina fæst BLEU skor 27.60 á raun gögnum sem voru tekin til hliðar og 15.30 á Flores þróunargögnunum. Fyrir IS-PL áttina fæst skor 27.70 á raun gögnunum og 13.30 á Flores þróunargögnunum.
dc.language.iso isl
dc.language.iso pol
dc.publisher Miðeind ehf
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://velthyding.is
dc.subject nmt
dc.subject machine translation
dc.subject model
dc.subject neural machine translation
dc.title Semi-supervised Icelandic-Polish Translation System (22.09)
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding Clarin IS Repository
demo.uri https://velthyding.is
contact.person Haukur Barri Símonarson haukur@mideind.is Miðeind ehf
sponsor Ministry of Education, Science and Culture Baseline for open-source multilingual translation to and from Icelandic - V6 Language Technology for Icelandic 2019-2023 nationalFunds
files.size 5285141115
files.count 7


 Files in this item

 Download all files in item (4.92 GB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
infer_is_pl.sh
Size
446 bytes
Format
Unknown
Description
Unknown
MD5
6f5a2b7e131da0f72421f0f93583d08c
 Download file
Icon
Name
requirements.txt
Size
37 bytes
Format
Text file
Description
Unknown
MD5
a8ead855a1c56135523174631c5de703
 Download file  Preview
 File Preview  
fairseq=0.12.2
sentencepiece==0.1.97 . . .
                                            
Icon
Name
some_is_sentences.txt
Size
66 bytes
Format
Text file
Description
Unknown
MD5
6077ba6ea0246d7b913bb59123ba7e60
 Download file  Preview
 File Preview  
Ég vil kaupa bát.
Hann verður gulur og blár.
Enginn flottari. . . .
                                            
Icon
Name
infer_pl_is.sh
Size
446 bytes
Format
Unknown
Description
Unknown
MD5
0e90de262bc7217e76cece2168def572
 Download file
Icon
Name
sentence.bpe.model
Size
4.83 MB
Format
Unknown
Description
Unknown
MD5
bf25eb5120ad92ef5c7d8596b5dc4046
 Download file
Icon
Name
README
Size
2.11 KB
Format
Unknown
Description
Unknown
MD5
6179bf91f300f95dbefea6afce67f3c8
 Download file
Icon
Name
model_ispl.pt.zip
Size
4.92 GB
Format
application/zip
Description
Unknown
MD5
2f7f6f392ef5ecf9447456cd4c0c85d4
 Download file  Preview
 File Preview  
    • model_ispl.pt8 GB

Show simple item record