dc.contributor.author | Arnardóttir, Þórunn |
dc.contributor.author | Ingason, Anton Karl |
dc.date.accessioned | 2020-04-22T12:51:30Z |
dc.date.available | 2020-04-22T12:51:30Z |
dc.date.issued | 2020-04-22 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/17 |
dc.description | The Icelandic Neural Parsing Pipeline (IceNeuralParsingPipeline) includes all steps necessary for parsing plain Icelandic text, i.e. preprocessing, parsing and post processing. The preprocessing step consists of tokenization, both punctuation and matrix clause splitting. The parsing step consists of an Icelandic model of the Berkeley Neural Parser, trained on IcePaHC, which reports an 84.74 F1 score. The output's annotation scheme is the same as IcePaHC's, except that neither empty phrases, e.g. traces and zero subjects, nor lemmas are shown. The post processing step includes minor steps for cleaning and formatting the parsed text. |
dc.description | Íslenska taugaþáttunarpípan (IceNeuralParsingPipeline) er þáttunarpípa sem inniheldur öll skref sem eru nauðsynleg til að þátta hreinan íslenskan texta, þ.e. skref fyrir forvinnslu, þáttun og eftirvinnslu texta. Forvinnsluskrefið samanstendur af tókun, bæði eftir greinarmerkjum og aðalsetningum. Þáttunarskrefið inniheldur íslenskt líkan af Berkeley-taugaþáttaranum sem var þjálfað á IcePaHC-trjábankanum og skilar 84,74% f-mælingu. Þáttunarskema úttaksins er líkt og skema IcePaHC, en hvorki tómir liðir, þ.e. spor eða núllfrumlög, né uppflettimyndir eru sýndar. Eftirvinnsluskrefið inniheldur minniháttar skref til að hreinsa og breyta sniði þáttaða textans. |
dc.language.iso | isl |
dc.publisher | Háskóli Íslands |
dc.rights | The MIT License (MIT) |
dc.rights.uri | https://opensource.org/licenses/mit-license.php |
dc.rights.label | PUB |
dc.source.uri | https://github.com/antonkarl/iceParsingPipeline |
dc.subject | parsing |
dc.subject | neural parsing |
dc.subject | parsing pipelines |
dc.subject | berkeley neural parser |
dc.title | IceNeuralParsingPipeline 20.04 |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | true |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Þórunn Arnardóttir tha86@hi.is Háskóli Íslands |
files.size | 1483935083 |
files.count | 1 |
Files in this item
- Name
- IceNeuralParsingPipeline.zip
- Size
- 1.38 GB
- Format
- application/zip
- Description
- zip file containing the parsing pipeline
- MD5
- f40c4d6469a14648aabec9b786d0bfc3
- IceNeuralParsingPipeline
- LICENSE-1 B
- README.md-1 B
- .DS_Store-1 B
- demoOutput.psd-1 B
- runallNeural.sh-1 B
- tools
- scripts
- postprocess.py~-1 B
- postprocess.py-1 B
- preprocess.py~-1 B
- preprocess.py-1 B
- postprocessNeural.sh-1 B
- cs
- CS_2.002.75.jar-1 B
- formatpsd.sh-1 B
- donothing.q-1 B
- formatpsd.sh~-1 B
- donothing.q~-1 B
- .DS_Store-1 B
- neuralParser
- _dev=84.91.pt-1 B
- src
- chart_helper.pyx-1 B
- evaluate.py-1 B
- __pycache__
- parse_nk.cpython-36.pyc-1 B
- nkutil.cpython-38.pyc-1 B
- evaluate.cpython-38.pyc-1 B
- vocabulary.cpython-38.pyc-1 B
- trees.cpython-38.pyc-1 B
- evaluate.cpython-36.pyc-1 B
- nkutil.cpython-36.pyc-1 B
- vocabulary.cpython-36.pyc-1 B
- trees.cpython-36.pyc-1 B
- parse_nk.cpython-38.pyc-1 B
- parse_nk.py-1 B
- vocabulary.py-1 B
- main.py-1 B
- trees.py-1 B
- transliterate.py-1 B
- nkutil.py-1 B
- viz.py-1 B
- splitter
- wordtags.tsv-1 B
- icepunct.gz-1 B
- iceconj.gz-1 B
- __pycache__
- splitter.cpython-36.pyc-1 B
- splitter.py-1 B
- scripts
- demoTextOutput.txt-1 B
- demoinput.txt-1 B
- __MACOSX
- IceNeuralParsingPipeline
- ._README.md-1 B
- ._.DS_Store-1 B
- ._demoOutput.psd-1 B
- ._demoTextOutput.txt-1 B
- ._runallNeural.sh-1 B
- tools
- scripts
- ._postprocess.py~-1 B
- ._preprocess.py~-1 B
- ._preprocess.py-1 B
- ._postprocessNeural.sh-1 B
- ._postprocess.py-1 B
- ._.DS_Store-1 B
- cs
- ._donothing.q~-1 B
- ._formatpsd.sh-1 B
- ._formatpsd.sh~-1 B
- ._donothing.q-1 B
- ._CS_2.002.75.jar-1 B
- neuralParser
- src
- ._nkutil.py-1 B
- __pycache__
- ._trees.cpython-38.pyc-1 B
- ._parse_nk.cpython-36.pyc-1 B
- ._evaluate.cpython-38.pyc-1 B
- ._nkutil.cpython-36.pyc-1 B
- ._trees.cpython-36.pyc-1 B
- ._evaluate.cpython-36.pyc-1 B
- ._vocabulary.cpython-38.pyc-1 B
- ._vocabulary.cpython-36.pyc-1 B
- ._parse_nk.cpython-38.pyc-1 B
- ._nkutil.cpython-38.pyc-1 B
- ._viz.py-1 B
- ._vocabulary.py-1 B
- ._evaluate.py-1 B
- ._main.py-1 B
- ._transliterate.py-1 B
- ._chart_helper.pyx-1 B
- ._parse_nk.py-1 B
- ._trees.py-1 B
- .__dev=84.91.pt-1 B
- src
- splitter
- ._icepunct.gz-1 B
- ._wordtags.tsv-1 B
- ._splitter.py-1 B
- ._iceconj.gz-1 B
- __pycache__
- ._splitter.cpython-36.pyc-1 B
- scripts
- ._demoinput.txt-1 B
- ._LICENSE-1 B
- IceNeuralParsingPipeline