Punctuation model (20.09)

Sigurðardóttir, Helga Svala; Helgadóttir, Inga Rún

dc.contributor.author	Sigurðardóttir, Helga Svala
dc.contributor.author	Helgadóttir, Inga Rún
dc.date.accessioned	2020-09-14T12:03:39Z
dc.date.available	2020-09-14T12:03:39Z
dc.date.issued	2020-09-14
dc.identifier.uri	http://hdl.handle.net/20.500.12537/52
dc.description	A python package that punctuates Icelandic text. The input data is unpunctuated text and punctuated text is returned. The user can choose between two punctuation models, a BERT-based Transformer and a bidirectional RNN ([Punctuator 2](www.github.com/ottokart/punctuator2)) in Tensorflow 2. [Icelandic] Python-pakki sem greinarmerkjasetur íslenskan texta. Inntakið er á formi ógreinarmerkjasetts texta og greinarmerkjasettum texta er skilað. Notandinn getur valið milli tveggja greinarmerkjasetningalíkana, annars vegar umbreytis sem byggir á BERT og tvístefnu-endurkvæmnisneti ([Punctuator 2](www.github.com/ottokart/punctuator2)) í Tensorflow 2.
dc.language.iso	isl
dc.publisher	Reykjavík University
dc.relation.replaces	http://hdl.handle.net/20.500.12537/49
dc.rights	The MIT License (MIT)
dc.rights.uri	https://opensource.org/licenses/mit-license.php
dc.rights.label	PUB
dc.source.uri	https://github.com/icelandic-lt/punctuation-prediction
dc.subject	punctuation prediction
dc.subject	punctuation
dc.title	Punctuation model (20.09)
dc.type	toolService
metashare.ResourceInfo#ContentInfo.detailedType	tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent	true
has.files	yes
branding	Clarin IS Repository
contact.person	Helga Svala Sigurðardóttir helgas@ru.is Reykjavík University
sponsor	Ministry of Education, Science and Culture Punctuation and sentence boundary detection (H12) Language Technology for Icelandic 2019-2023 nationalFunds
files.size	166751710
files.count	8

Files in this item

Download all files in item (159.03 MB)

This item is

Publicly Available

and licensed under:
The MIT License (MIT)

Name: Model_tf2_isl_big_1009_h256_lr0.02.pcl
Size: 105.43 MB
Format: Unknown
Description: A bidirectional RNN model.
MD5: eac231f06d311be7764fefea93154ad4

Download file

Name: vocabulary
Size: 1.09 MB
Format: Unknown
Description: A list of the vocabulary used to train the biRNN model.
MD5: 8f8f19b584ba5e09a6558ea22305fb0c

Download file

Name: punctuations
Size: 35 bytes
Format: Unknown
Description: A list of the punctuations used to train the biRNN model.
MD5: e0661aff981fb8082ce316e091fe048e

Download file

Name: pytorch_model.bin
Size: 52.26 MB
Format: Unknown
Description: The Electra model
MD5: 70f7791cb759674d2dda1665d160a1cf

Download file

Name: vocab.txt
Size: 254.19 KB
Format: Text file
Description: A list of the vocabulary used in the Electra model
MD5: 50d2b6ba7979f6d0f4b17bc27854af37

Download file Preview

File Preview

[SEP]
[UNK]
[CLS]
[PAD]
[MASK]
[unused0]
[unused1]
[unused2]
[unused3]
[unused4]
[unused5]
[unused6]
[unused7]
[unused8]
[unused9]
[unused10]
[unused11]
[unused12]
[unused13]
[unused14]
[unused15]
[unused16]
[unused17]
[unused18]
[unused19]
[unused20]
[unused21]
[unused22]
[unused23]
[unused24]
[unused25]
[unused26]
[unused27]
[unused28]
[unused29]
[unused30]
[unused31]
[unused32]
[unused33]
[unused34]
[unused35]
[unused36]
[unused37]
[unused38]
[unused39]
[unused40]
[unused41]
[unused42]
[unused43]
[unused44]
[unused45]
[unused46]
[unused47]
[unused48]
[unused49]
[unused50]
[unused51]
[unused52]
[unused53]
[unused54]
[unused55]
[unused56]
[unused57]
[unused58]
[unused59]
[unused60]
[unused61]
[unused62]
[unused63]
[unused64]
[unused65]
[unused66]
[unused67]
[unused68]
[unused69]
[unused70]
[unused71]
[unused72]
[unused73]
[unused74]
[unused75]
[unused76]
[unused77]
[unused78]
[unused79]
[unused80]
[unused81]
[unused82]
[unused83]
[unused84]
[unused85]
[unused86]
[unused87]
[unused88] . . .

Name: config.json
Size: 658 bytes
Format: Unknown
Description: Configure file for the Electra model
MD5: b4f432f1e85789e98ada42e2d3fcb421

Download file

Name: tokenizer_config.json
Size: 210 bytes
Format: Unknown
Description: Configure file for the tokenizer in the Electra model
MD5: 82a0aa0e14596b79da6dceb9c7bc3d98

Download file

Name: special_tokens_map.json
Size: 112 bytes
Format: Unknown
Description: A map of special tokens for the Electra model
MD5: 8b3fb1023167bb4ab9d70708eb05f6ec

Download file

Show simple item record

Files in this item

Partners, Coordination, Funding

Repository

More