dc.contributor.author | Sigurðardóttir, Helga Svala |
dc.contributor.author | Helgadóttir, Inga Rún |
dc.date.accessioned | 2020-09-14T12:03:39Z |
dc.date.available | 2020-09-14T12:03:39Z |
dc.date.issued | 2020-09-14 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/52 |
dc.description | A python package that punctuates Icelandic text. The input data is unpunctuated text and punctuated text is returned. The user can choose between two punctuation models, a BERT-based Transformer and a bidirectional RNN ([Punctuator 2](www.github.com/ottokart/punctuator2)) in Tensorflow 2. [Icelandic] Python-pakki sem greinarmerkjasetur íslenskan texta. Inntakið er á formi ógreinarmerkjasetts texta og greinarmerkjasettum texta er skilað. Notandinn getur valið milli tveggja greinarmerkjasetningalíkana, annars vegar umbreytis sem byggir á BERT og tvístefnu-endurkvæmnisneti ([Punctuator 2](www.github.com/ottokart/punctuator2)) í Tensorflow 2. |
dc.language.iso | isl |
dc.publisher | Reykjavík University |
dc.relation.replaces | http://hdl.handle.net/20.500.12537/49 |
dc.rights | The MIT License (MIT) |
dc.rights.uri | https://opensource.org/licenses/mit-license.php |
dc.rights.label | PUB |
dc.source.uri | http://github.com/cadia-lvl/punctuation-prediction |
dc.subject | punctuation prediction |
dc.subject | punctuation |
dc.title | Punctuation model (20.09) |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | true |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Helga Svala Sigurðardóttir helgas@ru.is Reykjavík University |
sponsor | Ministry of Education, Science and Culture Punctuation and sentence boundary detection (H12) Language Technology for Icelandic 2019-2023 nationalFunds |
files.size | 166751710 |
files.count | 8 |
Files in this item
Download all files in item (159.03 MB)- Name
- Model_tf2_isl_big_1009_h256_lr0.02.pcl
- Size
- 105.43 MB
- Format
- Unknown
- Description
- A bidirectional RNN model.
- MD5
- eac231f06d311be7764fefea93154ad4
- Name
- vocabulary
- Size
- 1.09 MB
- Format
- Unknown
- Description
- A list of the vocabulary used to train the biRNN model.
- MD5
- 8f8f19b584ba5e09a6558ea22305fb0c
- Name
- punctuations
- Size
- 35 bytes
- Format
- Unknown
- Description
- A list of the punctuations used to train the biRNN model.
- MD5
- e0661aff981fb8082ce316e091fe048e
- Name
- pytorch_model.bin
- Size
- 52.26 MB
- Format
- Unknown
- Description
- The Electra model
- MD5
- 70f7791cb759674d2dda1665d160a1cf
- Name
- vocab.txt
- Size
- 254.19 KB
- Format
- Text file
- Description
- A list of the vocabulary used in the Electra model
- MD5
- 50d2b6ba7979f6d0f4b17bc27854af37
[SEP] [UNK] [CLS] [PAD] [MASK] [unused0] [unused1] [unused2] [unused3] [unused4] [unused5] [unused6] [unused7] [unused8] [unused9] [unused10] [unused11] [unused12] [unused13] [unused14] [unused15] [unused16] [unused17] [unused18] [unused19] [unused20] [unused21] [unused22] [unused23] [unused24] [unused25] [unused26] [unused27] [unused28] [unused29] [unused30] [unused31] [unused32] [unused33] [unused34] [unused35] [unused36] [unused37] [unused38] [unused39] [unused40] [unused41] [unused42] [unused43] [unused44] [unused45] [unused46] [unused47] [unused48] [unused49] [unused50] [unused51] [unused52] [unused53] [unused54] [unused55] [unused56] [unused57] [unused58] [unused59] [unused60] [unused61] [unused62] [unused63] [unused64] [unused65] [unused66] [unused67] [unused68] [unused69] [unused70] [unused71] [unused72] [unused73] [unused74] [unused75] [unused76] [unused77] [unused78] [unused79] [unused80] [unused81] [unused82] [unused83] [unused84] [unused85] [unused86] [unused87] [unused88] . . .
- Name
- config.json
- Size
- 658 bytes
- Format
- Unknown
- Description
- Configure file for the Electra model
- MD5
- b4f432f1e85789e98ada42e2d3fcb421
- Name
- tokenizer_config.json
- Size
- 210 bytes
- Format
- Unknown
- Description
- Configure file for the tokenizer in the Electra model
- MD5
- 82a0aa0e14596b79da6dceb9c7bc3d98
- Name
- special_tokens_map.json
- Size
- 112 bytes
- Format
- Unknown
- Description
- A map of special tokens for the Electra model
- MD5
- 8b3fb1023167bb4ab9d70708eb05f6ec