Show simple item record

 
dc.contributor.author Jasonarson, Atli
dc.contributor.author Steingrímsson, Steinþór
dc.contributor.author Sigurðsson, Einar Freyr
dc.contributor.author Daðason, Jón Friðrik
dc.date.accessioned 2022-12-05T11:09:52Z
dc.date.available 2022-12-05T11:09:52Z
dc.date.issued 2022-12-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/301
dc.description ENGLISH: This Universal Dependencies parser for Icelandic was trained with COMBO [1]. This version of it was trained on v2.11 of UD_Icelandic-IcePaHC [2] and UD_Icelandic-Modern [3]. (Note that texts in UD_Icelandic-Modern [3] labeled RUV_TGS_2017 and RUV_ESP_2017 were not included here as these were originally parsed with COMBO-based UD Parser 22.10 [4] and the output subsequently corrected.) The parser utilizes information from an ELECTRA language model [4]. Its UAS (unlabeled attachment score) is 88.80 (89.00 on a pre-tokenized text file) and its LAS (labeled attachment score) is 85.52 (85.71 if pre-tokenized).   ICELANDIC: Þessi UD-þáttari var þjálfaður með COMBO [1]. Hann var þjálfaður á útgáfu 2.11 af UD_Icelandic-IcePaHC [2] og UD_Icelandic-Modern [3]. (Ath. að textar í UD_Icelandic-Modern [3] merktir RUV_TGS_2017 og RUV_ESP_2017 voru ekki notaðir við þjálfunina þar sem þeir voru upphaflega þáttaðir með COMBO-based UD Parser 22.10 [4] og úttakið leiðrétt að því loknu.) Þáttarinn nýtir sér upplýsingar úr ELECTRA-mállíkani [5]. Hann skorar 88.80 (89.00 á fortókuðu skjali) á UAS (unlabeled attachment score) og 85.52 (85.71 á fortókuðu skjali) á LAS (labeled attachment score). [1] COMBO: https://gitlab.clarin-pl.eu/syntactic-tools/combo/  [2] UD_Icelandic-IcePaHC: https://github.com/UniversalDependencies/UD_Icelandic-IcePaHC/  [3] UD_Icelandic-Modern: https://github.com/UniversalDependencies/UD_Icelandic-Modern/  [4] COMBO-based UD Parser 22.10: http://hdl.handle.net/20.500.12537/272 [5] electra-base-igc-is: https://huggingface.co/jonfd/electra-base-igc-is
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.replaces http://hdl.handle.net/20.500.12537/272
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/license/apache2-0-php/
dc.rights.label PUB
dc.subject universal dependencies
dc.subject parsing
dc.title COMBO-based UD Parser for Icelandic 22.12
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding Clarin IS Repository
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) I5 – Parsers Language Technology for Icelandic 2019-2023 nationalFunds
files.size 1274449053
files.count 6


 Files in this item

 Download all files in item (1.19 GB)
This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
requirements.txt
Size
64 bytes
Format
Text file
Description
requirements.txt
MD5
be0282df09b0db5a3027501d0b8844dc
 Download file  Preview
 File Preview  
allennlp==2.10.1
combo==0.1.3
diaparser==1.1.2
Tokenizer==3.4.2 . . .
                                            
Icon
Name
test_file.txt
Size
103 bytes
Format
Text file
Description
File for usage example
MD5
e826929ab7cdf131075abb15f710b634
 Download file  Preview
 File Preview  
Komið þið sæl.
Þetta skjal er ætlað til að sýna hvernig þáttarinn virkar.
Njótið dagsins. . . .
                                            
Icon
Name
parse_file.py
Size
1.33 KB
Format
Unknown
Description
Usage example
MD5
0bc41452dbf9f674b67c41ea78c2ea11
 Download file
Icon
Name
combo-is.zip
Size
432.27 MB
Format
application/zip
Description
The parser itself
MD5
8ca428cb10ec8054d7d46b577698ce61
 Download file  Preview
 File Preview  
  • combo-is-combined-v211
    • best.th466 MB
    • metrics.json3 kB
    • config.json9 kB
    • vocabulary
      • feats_labels.txt565 B
      • non_padded_namespaces.txt12 B
      • xpostag_labels.txt1 kB
      • deprel_labels.txt240 B
      • .lock0 B
      • lemma_characters.txt229 B
      • token_characters.txt260 B
      • upostag_labels.txt77 B
Icon
Name
electra.zip
Size
783.13 MB
Format
application/zip
Description
The transformer model
MD5
388275369c97baa9ffac333386b5bcc4
 Download file  Preview
 File Preview  
  • transformer_models
    • electra-base-igc-is
      • config.json466 B
      • README.md666 B
      • tokenizer_config.json73 B
      • special_tokens_map.json112 B
      • .git
        • logs
        • info
          • exclude240 B
        • config304 B
        • index625 B
        • packed-refs112 B
        • HEAD21 B
        • refs
        • description73 B
        • hooks
          • post-commit278 B
          • push-to-checkout.sample2 kB
          • applypatch-msg.sample478 B
          • pre-push.sample1 kB
          • commit-msg.sample896 B
          • pre-rebase.sample4 kB
          • post-checkout282 B
          • post-update.sample189 B
          • pre-receive.sample544 B
          • pre-push272 B
          • pre-applypatch.sample424 B
          • update.sample3 kB
          • pre-commit.sample1 kB
          • pre-merge-commit.sample416 B
          • fsmonitor-watchman.sample4 kB
          • post-merge276 B
          • prepare-commit-msg.sample1 kB
        • objects
          • 0a
            • 436181c565848a6acde7a8d56b3d0083065d4d127 B
          • a1
            • febe62ff74744a3bdc90765101a93f8165f96c446 B
          • 01
            • 3c0d5067a7209f20d1483e98daf266743c3716265 B
          • info
            • cc
              • af6b68a21f6293f20e96decf431584115c3206126 B
            • e7
              • b0375001f109a6b8873d756ad4f7bbb15fbaa582 B
            • 21
              • b29ab864793dc86d157902941b2ff4bbe2bbca58 B
            • b6
              • e1921af19d17e863490c057c473d1bbe5ece8179 B
            • e2
              • 921de06b441e2a3066da485d6fa31cf5c816a8170 B
            • 42
              • a65ff035a31364a5df021edbca71fc835f8f53133 kB
            • pack
              • d8
                • 8c1c6bf57ec076a4c43dac202f23f71d6cbdad277 B
              • 6d
                • 34772f5ca361021038b404fb913ec8dc0b1a5a193 B
            • branches
              • lfs
            • pytorch_model.bin422 MB
            • .gitattributes1 kB
            • vocab.txt253 kB
        Icon
        Name
        README.md
        Size
        5.23 KB
        Format
        Unknown
        Description
        readme
        MD5
        9c3780b884fd4b06ffc2158ebf2db2c3
         Download file

        Show simple item record