Sýna einfalda færslu atriðis
dc.contributor.author
Þorsteinsson, Vilhjálmur
dc.contributor.author
Óladóttir, Hulda
dc.date.accessioned
2020-09-30T16:29:42Z
dc.date.available
2020-09-30T16:29:42Z
dc.date.issued
2020-09-25
dc.identifier.uri
http://hdl.handle.net/20.500.12537/80
dc.description
Icegrams is a Python 3 package that encapsulates a large trigram library for Icelandic. 14 million unique trigrams and their frequency counts are heavily compressed using radix tries and quasi-succinct indices employing Elias-Fano encoding. This enables the ~43 megabyte compressed trigram file to be mapped directly into memory, with no ex ante decompression, for fast queries (typically ~10 microseconds per lookup). More information at: https://github.com/mideind/Icegrams
Icegrams er Python 3 pakki sem inniheldur stórt safn orðaþrennda (trigrams) fyrir íslensku. Í safninu eru um 14 milljónir ólíkra þrennda ásamt tíðniupplýsingum. Öllu safninu hefur verið þjappað niður í u.þ.b. 43 megabæti sem varpað er beint í minni þannig að uppfletting er mjög hraðvirk (~10 míkrósekúndur fyrir hverja uppflettingu). Frekari upplýsingar á: https://github.com/mideind/Icegrams
dc.language.iso
isl
dc.publisher
Miðeind ehf.
dc.relation.replaces
http://hdl.handle.net/20.500.12537/55
dc.relation.isreplacedby
http://hdl.handle.net/20.500.12537/176
dc.rights
The MIT License (MIT)
dc.rights.uri
https://opensource.org/licenses/mit-license.php
dc.rights.label
PUB
dc.source.uri
https://github.com/mideind/Icegrams/releases/tag/1.0.2
dc.subject
language model
dc.subject
trigrams
dc.subject
ngrams
dc.title
Icegrams (2020-09-30)
dc.type
languageDescription
metashare.ResourceInfo#ContentInfo.detailedType
other
metashare.ResourceInfo#ContentInfo.mediaType
text
has.files
yes
branding
Clarin IS Repository
contact.person
Vilhjálmur Þorsteinsson mideind@mideind.is Miðeind ehf.
sponsor
Ministry of Education, Science and Culture Word lists and language models (L4) Language Technology for Icelandic 2019-2023 nationalFunds
size.info
14M trigrams
files.size
43217919
files.count
3
Files in this item
Download all files in item (41.22
MB)
×
Large Size
The requested files are being packed into one large file. This process can take some time, please be patient.
Continue
Cancel
This item is
Publicly Available
and licensed under:
The MIT License (MIT)
Name
trigrams.bin
Size
41.07
MB
Format
Unknown
Description
Binary file containing trigrams
MD5
b94753ceb209da31c1fbf5952655a1b3
Download file
Name
Icegrams-1.0.2.tar.gz
Size
69.59
KB
Format
application/gzip
Description
Unknown
MD5
9bcf9458970203b59d29d91832bd6834
Download file
Preview
Icegrams-1.0.2 src icegrams trie.h 3 kB trie_build.py 4 kB trie.cpp 24 kB __init__.py 1 kB resources correct.txt 57 kB trigrams.bin 133 B split.txt 8 kB delete.txt 1 kB py.typed 0 B trie.py 12 kB ngrams.py 65 kB setup.py 4 kB .gitignore 890 B README.rst 12 kB .travis.yml 575 B test.py 1 kB test .gitattributes 72 B wheels.sh 751 B utils doc release.sh 522 B LICENSE 1 kB MANIFEST.in 227 B build_wheels.sh 745 B
Name
Icegrams-1.0.2.zip
Size
79.94
KB
Format
application/zip
Description
Unknown
MD5
5c35b8106fa9eef399073e12f1467a5a
Download file
Preview
Icegrams-1.0.2 src icegrams trie.h 3 kB trie_build.py 4 kB trie.cpp 24 kB __init__.py 1 kB resources correct.txt 57 kB trigrams.bin 133 B split.txt 8 kB delete.txt 1 kB py.typed 0 B trie.py 12 kB ngrams.py 65 kB setup.py 4 kB .gitignore 890 B README.rst 12 kB .travis.yml 575 B test.py 1 kB test .gitattributes 72 B wheels.sh 751 B utils doc release.sh 522 B LICENSE 1 kB MANIFEST.in 227 B build_wheels.sh 745 B
Sýna einfalda færslu atriðis