dc.contributor.author | Jasonarson, Atli |
dc.contributor.author | Steingrímsson, Steinþór |
dc.contributor.author | Sigurðsson, Einar Freyr |
dc.contributor.author | Daðason, Jón Friðrik |
dc.date.accessioned | 2022-12-12T09:25:34Z |
dc.date.available | 2022-12-12T09:25:34Z |
dc.date.issued | 2022-12-01 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/302 |
dc.description | ENGLISH: This Universal Dependencies parser for Icelandic was trained with Diaparser [1]. This version of it was trained on v2.11 of UD_Icelandic-IcePaHC [2] and UD_Icelandic-Modern [3]. (Note that texts in UD_Icelandic-Modern [3] labeled RUV_TGS_2017 and RUV_ESP_2017 were not included here as these were originally parsed with COMBO-based UD Parser 22.10 [4] and the output subsequently corrected.) The parser utilizes information from an ELECTRA language model [5]. Its UAS (unlabeled attachment score) is 89.58 and its LAS (labeled attachment score) is 86.46. ICELANDIC: Þessi UD-þáttari var þjálfaður með Diaparser [1]. Þessi útgáfa hans var þjálfuð á útgáfu 2.11 af UD_Icelandic-IcePaHC [2] og UD_Icelandic-Modern [3]. (Ath. að textar í UD_Icelandic-Modern [3] merktir RUV_TGS_2017 og RUV_ESP_2017 voru ekki notaðir við þjálfunina þar sem þeir voru upphaflega þáttaðir með COMBO-based UD Parser 22.10 [4] og úttakið leiðrétt að því loknu.) Þáttarinn nýtir sér upplýsingar úr ELECTRA-mállíkani [5]. Hann skorar 89.58 á UAS (unlabeled attachment score) og 86.46 á LAS (labeled attachment score). [1] Diaparser: https://github.com/Unipisa/diaparser [2] UD_Icelandic-IcePaHC: https://github.com/UniversalDependencies/UD_Icelandic-IcePaHC/ [3] UD_Icelandic-Modern: https://github.com/UniversalDependencies/UD_Icelandic-Modern/ [4] COMBO-based UD Parser 22.10: http://hdl.handle.net/20.500.12537/272 [5] electra-base-igc-is: https://huggingface.co/jonfd/electra-base-igc-is |
dc.language.iso | isl |
dc.publisher | The Árni Magnússon Institute for Icelandic Studies |
dc.relation.replaces | http://hdl.handle.net/20.500.12537/273 |
dc.rights | Apache License 2.0 |
dc.rights.uri | https://opensource.org/license/apache2-0-php/ |
dc.rights.label | PUB |
dc.subject | universal dependencies |
dc.subject | parsing |
dc.title | Biaffine-based UD Parser for Icelandic 22.12 |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | true |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies |
sponsor | Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) I5 – Parsers Language Technology for Icelandic 2019-2023 nationalFunds |
files.size | 1281221820 |
files.count | 6 |
Files in this item
Download all files in item (1.19 GB)- Name
- parse_file.py
- Size
- 609 bytes
- Format
- Unknown
- Description
- Usage example
- MD5
- 8c05e9ea1eac7f05e322cacae0b12855
- Name
- requirements.txt
- Size
- 34 bytes
- Format
- Text file
- Description
- requirements
- MD5
- 3d6e6d73fa2dc7fe8f59bd559cc75f8b
- Name
- test_file.txt
- Size
- 103 bytes
- Format
- Text file
- Description
- File for usage example
- MD5
- e826929ab7cdf131075abb15f710b634
- Name
- diap.zip
- Size
- 438.74 MB
- Format
- application/zip
- Description
- The parser itself
- MD5
- 3b10b3f539aa179a773094bf1e52b6a6
- diaparser-is-combined-v211
- diaparser.train.log120 kB
- diaparser.model474 MB
- Name
- electra.zip
- Size
- 783.13 MB
- Format
- application/zip
- Description
- The transformer model
- MD5
- a94b9e389a64e58af1d2af01a5c56e69
- transformer_models
- electra-base-igc-is
- config.json466 B
- README.md666 B
- tokenizer_config.json73 B
- special_tokens_map.json112 B
- .git
- logs
- info
- exclude240 B
- config304 B
- index625 B
- packed-refs112 B
- HEAD21 B
- refs
- description73 B
- hooks
- post-commit278 B
- push-to-checkout.sample2 kB
- applypatch-msg.sample478 B
- pre-push.sample1 kB
- commit-msg.sample896 B
- pre-rebase.sample4 kB
- post-checkout282 B
- post-update.sample189 B
- pre-receive.sample544 B
- pre-push272 B
- pre-applypatch.sample424 B
- update.sample3 kB
- pre-commit.sample1 kB
- pre-merge-commit.sample416 B
- fsmonitor-watchman.sample4 kB
- post-merge276 B
- prepare-commit-msg.sample1 kB
- objects
- 0a
- 436181c565848a6acde7a8d56b3d0083065d4d127 B
- a1
- febe62ff74744a3bdc90765101a93f8165f96c446 B
- 01
- 3c0d5067a7209f20d1483e98daf266743c3716265 B
- info
- cc
- af6b68a21f6293f20e96decf431584115c3206126 B
- e7
- b0375001f109a6b8873d756ad4f7bbb15fbaa582 B
- 21
- b29ab864793dc86d157902941b2ff4bbe2bbca58 B
- b6
- e1921af19d17e863490c057c473d1bbe5ece8179 B
- e2
- 921de06b441e2a3066da485d6fa31cf5c816a8170 B
- 42
- a65ff035a31364a5df021edbca71fc835f8f53133 kB
- pack
- d8
- 8c1c6bf57ec076a4c43dac202f23f71d6cbdad277 B
- 6d
- 34772f5ca361021038b404fb913ec8dc0b1a5a193 B
- 0a
- branches
- lfs
- pytorch_model.bin422 MB
- .gitattributes1 kB
- vocab.txt253 kB
- electra-base-igc-is
- Name
- README.md
- Size
- 2.91 KB
- Format
- Unknown
- Description
- readme
- MD5
- ff066b0557db17f9282cb59e8bd4ea6f