Show simple item record

 
dc.contributor.author Þorsteinsson, Vilhjálmur
dc.date.accessioned 2020-01-14T09:45:05Z
dc.date.available 2020-01-14T09:45:05Z
dc.date.issued 2020-01-14
dc.identifier.uri http://hdl.handle.net/20.500.12537/11
dc.description Tokenizer is a compact pure-Python (2 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, date, e-mail, URL/URI, etc. It also segments the token stream into sentences, considering corner cases such as abbreviations and dates in the middle of sentences.
dc.language.iso isl
dc.publisher Miðeind ehf.
dc.relation.isreplacedby http://hdl.handle.net/20.500.12537/65
dc.rights The MIT License (MIT)
dc.rights.uri http://opensource.org/licenses/mit-license.php
dc.rights.label PUB
dc.source.uri https://github.com/mideind/Tokenizer/releases/tag/2.0.3
dc.subject tokenization
dc.subject token detection
dc.subject sentence detection
dc.title Tokenizer for Icelandic text (2.0.3)
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
hidden false
hasMetadata false
has.files yes
branding Clarin IS Repository
contact.person Vilhjálmur Þorsteinsson vthorsteinsson@mideind.is Miðeind ehf
files.size 245368
files.count 1


 Files in this item

Icon
Name
Tokenizer-2.0.3.zip
Size
239.62 KB
Format
application/zip
Description
Python Tokenizer version 2.0.3
MD5
6f555389c399d7c16b1cb67d153c44eb
 Download file

Show simple item record