Sýna einfalda færslu atriðis

 
dc.contributor.author Guðjónsson, Ásmundur Alma
dc.contributor.author Loftsson, Hrafn
dc.contributor.author Daðason, Jón Friðrik
dc.date.accessioned 2021-11-23T02:26:11Z
dc.date.available 2021-11-23T02:26:11Z
dc.date.issued 2021
dc.identifier.uri http://hdl.handle.net/20.500.12537/159
dc.description A dockerized Named Entity Recognition (NER) API for Icelandic. It uses a the IceBERT language model from Miðeind as its primary model, but it also offers the possibility to use 3 other transformer language models with it ( ELECTRA-base, convbert-small, and multilingual-BERT) and combines them with CombiTagger. They were all fine tuned for NER using MIM-GOLD-NER. IceBERT was the best individual model as it achieves F1-score of ~92.73 on the test set for MIM-GOLD-NER, while the combination of the four, in the form of CombiTagger, achieved F1-score of 93.21. The code for the API is available at https://github.com/cadia-lvl/Icelandic-NER-API and the files for the fine tuned models are available in this submission. Dockerútfærð forritaskil fyrir nafnakennsl (NER) á íslensku. Þau notast við IceBERT mállíkan frá Miðeind sem sitt megin líkan, en þau bjóða líka upp á möguleikann að láta IceBERT vinna með 3 öðrum líkönum (ELECTRA-base, convbert-small og multilingual-BERT). Þau hafa öll verið fínstillt fyrir NER með nafnakennslamálheildinni MIM-GOLD-NER. Ef við skoðum hvert líkan fyrir sig, þá er IceBERT líkanið best, en það nær 92.73 í F1, á meðn CombiTagger nær 93.21 í F1. Forritunarkóðinn fyrir forritaskilinu eru aðgengileg hérna: https://github.com/cadia-lvl/Icelandic-NER-API og skrárnar fyrir fínstilltu líkönin má finna í þessari færslu.
dc.language.iso isl
dc.publisher Reykjavík University
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/license/apache2-0-php/
dc.rights.label PUB
dc.source.uri https://github.com/cadia-lvl/Icelandic-NER-API/releases/tag/1.9
dc.subject named entity recognition
dc.subject transformer
dc.subject webservice
dc.subject api
dc.title Icelandic NER API - Ensamble model (21.09)
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding Clarin IS Repository
demo.uri https://electra-ner-icelandic-gwafmrdfha-ez.a.run.app
contact.person Ásmundur Alma Guðjónsson asmundurg@gmail.com Reykjavík University
sponsor Ministry of Education Science and Culture Support tools: Named Entity Recognition (I7) Language Technology for Icelandic 2019-2023 nationalFunds
files.size 1602264849
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
models.zip
Size
1.49 GB
Format
application/zip
Description
Unknown
MD5
5babea450aad376b5570ad62b493c425
 Download file  Preview
 File Preview  
  • model.multilingual
    • pytorch_model.bin676 MB
    • tokenizer_config.json552 B
    • test_results.txt359 B
    • merges.txt581 kB
    • config.json1 kB
    • vocab.json1010 kB
    • training_args.bin2 kB
    • vocab.txt972 kB
    • special_tokens_map.json112 B
    • eval_results.txt378 B
    • test_predictions.txt903 kB
  • model.ELECTRA
    • pytorch_model.bin420 MB
    • tokenizer_config.json438 B
    • test_results.txt366 B
    • config.json1 kB
    • training_args.bin2 kB
    • vocab.txt253 kB
    • special_tokens_map.json112 B
    • eval_results.txt380 B
    • test_predictions.txt904 kB
  • model.IceBERT
    • README.md2 kB
    • pytorch_model.bin472 MB
    • all_results.json844 B
    • tokenizer_config.json1 kB
    • merges.txt581 kB
    • train_results.json198 B
    • config.json1 kB
    • vocab.json912 kB
    • runs
    • predictions.txt274 kB
    • training_args.bin2 kB
    • predict_results.json324 B
    • tokenizer.json1 MB
    • special_tokens_map.json772 B
    • trainer_state.json5 kB
    • eval_results.json345 B
  • model.convbert-small
    • README.md2 kB
    • all_results.json849 B
    • pytorch_model.bin82 MB
    • tokenizer_config.json436 B
    • train_results.json197 B
    • config.json1 kB
    • runs
    • predictions.txt273 kB
    • training_args.bin2 kB
    • predict_results.json327 B
    • tokenizer.json1 MB
    • vocab.txt875 kB
    • special_tokens_map.json112 B
    • trainer_state.json5 kB
    • eval_results.json348 B

Sýna einfalda færslu atriðis