dc.contributor.author | Símonarson, Haukur Barri |
dc.contributor.author | Jónsson, Haukur Páll |
dc.contributor.author | Ragnarsson, Pétur Orri |
dc.contributor.author | Ingólfsdóttir, Svanhvít Lilja |
dc.contributor.author | Þorsteinsson, Vilhjálmur |
dc.contributor.author | Snæbjarnarson, Vésteinn |
dc.date.accessioned | 2022-09-23T09:19:45Z |
dc.date.available | 2022-09-23T09:19:45Z |
dc.date.issued | 2022-09-19 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/259 |
dc.description | This Icelandic-Polish translation model (bi-directional) was trained using fairseq (https://github.com/facebookresearch/fairseq) by means of semi-supervised translation by starting with the mBART50 model. The model was then trained using a multi-task curriculum to first learn to denoise sentences. Then the model was trained to translate using aligned parallel texts. Finally the model was provided with monolingual texts in both Icelandic and Polish with which it iteratively creates back-translations. For the PL-IS direction the model achieves a BLEU score of 27.60 on held out true parallel training data and 15.30 on the out-of-domain Flores devset. For the IS-PL direction the model achieves a score of 27.70 on the true data and 13.30 on the Flores devset. -- Þetta íslensk-pólska þýðingarlíkan (tvíátta) var þjálfað með fairseq (https://github.com/facebookresearch/fairseq) með hálf-sjálfvirkum aðferðum frá mBART50 líkaninu. Líkanið var þjálfað á þremur verkefnum, afruglun, samhliða þýðingum og bakþýðingum sem voru myndaðar á þjálfunartíma. Fyrir PL-IS áttina fæst BLEU skor 27.60 á raun gögnum sem voru tekin til hliðar og 15.30 á Flores þróunargögnunum. Fyrir IS-PL áttina fæst skor 27.70 á raun gögnunum og 13.30 á Flores þróunargögnunum. |
dc.language.iso | isl |
dc.language.iso | pol |
dc.publisher | Miðeind ehf |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://velthyding.is |
dc.subject | nmt |
dc.subject | machine translation |
dc.subject | model |
dc.subject | neural machine translation |
dc.title | Semi-supervised Icelandic-Polish Translation System (22.09) |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | true |
has.files | yes |
branding | Clarin IS Repository |
demo.uri | https://velthyding.is |
contact.person | Haukur Barri Símonarson haukur@mideind.is Miðeind ehf |
sponsor | Ministry of Education, Science and Culture Baseline for open-source multilingual translation to and from Icelandic - V6 Language Technology for Icelandic 2019-2023 nationalFunds |
files.size | 5285141115 |
files.count | 7 |
Files in this item
Download all files in item (4.92 GB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- infer_is_pl.sh
- Size
- 446 bytes
- Format
- Unknown
- Description
- Unknown
- MD5
- 6f5a2b7e131da0f72421f0f93583d08c
- Name
- requirements.txt
- Size
- 37 bytes
- Format
- Text file
- Description
- Unknown
- MD5
- a8ead855a1c56135523174631c5de703
- Name
- some_is_sentences.txt
- Size
- 66 bytes
- Format
- Text file
- Description
- Unknown
- MD5
- 6077ba6ea0246d7b913bb59123ba7e60
- Name
- infer_pl_is.sh
- Size
- 446 bytes
- Format
- Unknown
- Description
- Unknown
- MD5
- 0e90de262bc7217e76cece2168def572
- Name
- sentence.bpe.model
- Size
- 4.83 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- bf25eb5120ad92ef5c7d8596b5dc4046
- Name
- README
- Size
- 2.11 KB
- Format
- Unknown
- Description
- Unknown
- MD5
- 6179bf91f300f95dbefea6afce67f3c8
- Name
- model_ispl.pt.zip
- Size
- 4.92 GB
- Format
- application/zip
- Description
- Unknown
- MD5
- 2f7f6f392ef5ecf9447456cd4c0c85d4