Sýna einfalda færslu atriðis

 
dc.contributor.author Friðriksdóttir, Steinunn Rut
dc.contributor.author Jasonarson, Atli
dc.date.accessioned 2021-08-12T13:25:35Z
dc.date.available 2021-08-12T13:25:35Z
dc.date.issued 2021-08-12
dc.identifier.uri http://hdl.handle.net/20.500.12537/124
dc.description The total list of stop words includes 59.664 words or non-words that were handpicked from the Icelandic Gigaword Corpus. The sublists are as follows: - 6.576 abbreviations. - 27.144 foreign words (especially proper names). - 588 function words. - 147 last names or company names. - 978 mislemmatized words. - 9.736 outdated words. - 12.473 typos and OCR errors. The list is compiled from the 2019 version of the IGC and should not be considered exhaustive.
dc.description ÍSLENSKA: Heildarlistinn inniheldur 59.664 orð eða orðleysur sem voru handvalin úr Risamálheildinni. Undirlistarnir eru eftirfarandi: - 6.576 styttingar, skammstafanir og annað slíkt. Inniheldur bæði styttingar á borð við Alþingisfrv (frumvarp) og A-Skaftafellssýsla (austur) og skammstafanir á borð við LHÍ (Listaháskóli Íslands). - 27.144 erlend orð (einkum sérnöfn). - 588 kerfisorð (t.d. sér, hann, í, hvenær...). - 147 föðurnöfn (sum stytt) eða fyrirtækjanöfn (t.d. Friðleifsd, hannesson, Essó). - 978 rangt lemmuð orð (t.d guðspjallur, notönd, allsher). - 9.736 úrelt orð (t.d. íslenzkir, rjettur). - 12.473 rangt skrifuð orð og ljóslestrarvillur (t.d. klukkka, komuþeir, skattakerfl). Listanum er safnað úr 2019 útgáfu Risamálheildarinnar og það ætti ekki að líta á hann sem tæmandi.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/license/apache2-0-php/
dc.rights.label PUB
dc.source.uri https://github.com/steinunnfridriks/rmh_filters
dc.subject stop-words
dc.subject word list
dc.subject filters
dc.title Stopporðalisti fyrir Risamálheildina / Stop-words for the Icelandic Gigaword Corpus (21.08)
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Steinunn Rut Friðriksdóttir srf2@hi.is The Árni Magnússon Institute for Icelandic Studies
files.size 1024789
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
rmh_filters.zip
Size
1000.77 KB
Format
application/zip
Description
zipped txt files
MD5
3b020d6f334f027440a325abacc17b1c
 Download file  Preview
 File Preview  
  • rmh_filters
    • README.md-1 B
    • IGC_filters_all.txt-1 B
    • mislemmatized.txt-1 B
    • lastnames_companies.txt-1 B
    • other.txt-1 B
    • abbrevs.txt-1 B
    • function_words.txt-1 B
    • foreign.txt-1 B
    • typos_ocr.txt-1 B
    • .git
      • logs
      • info
        • exclude-1 B
      • config-1 B
      • packed-refs-1 B
      • index-1 B
      • HEAD-1 B
      • refs
      • description-1 B
      • hooks
        • applypatch-msg.sample-1 B
        • pre-push.sample-1 B
        • commit-msg.sample-1 B
        • post-update.sample-1 B
        • pre-rebase.sample-1 B
        • pre-receive.sample-1 B
        • update.sample-1 B
        • pre-applypatch.sample-1 B
        • pre-commit.sample-1 B
        • pre-merge-commit.sample-1 B
        • fsmonitor-watchman.sample-1 B
        • prepare-commit-msg.sample-1 B
      • objects
        • 07
          • bd9e6357e7e34dfd4df3b787e62abec5b59392-1 B
        • b8
          • 9266c8f7b08ca7a072c29ca72241b63fe5be31-1 B
        • 0e
          • 749894ac2f90add66ee52201a5865031b3f65b-1 B
        • b6
          • 0a614e950e07d8d5066300ba22f797ae83cf5b-1 B
        • e5
          • e8f5a20a8e9f12ea4f23025da3c4e125defb08-1 B
        • 65
          • 844a95975e8d4d017913d559467a630ad520e0-1 B
        • b4
          • 0b4774d332f472f49f7b579e9d70ef11647255-1 B
        • e3
          • b1d8c33af1146b36d0cd48ba280334894631fd-1 B
        • 32
          • 3a72387050e6638c5c4fcfd7f0e53011fdb09f-1 B
        • e0
          • 8d81f25df70034dbe0b04c6df8392a8b10fb20-1 B
        • 19
          • fadb6dc170e93fe257f5ebba065395c57e789c-1 B
        • 15
          • 388d65afd9b10d45b29a2a9285f9109db78c64-1 B
        • f5
          • 597c39490a4c420a61ceca496873cc3bf0556c-1 B
        • pack
          • info
            • d7
              • a1a900303f7053da925f98d81ac4f7eec92dfb-1 B
            • a6
              • 0b9c85c72e83227f859df9a2e2efa82d6bcc15-1 B
            • 26
              • 1eeb9e9f8b2b4b0d119366dda99c6fd7d35c64-1 B
            • ad
              • 8453dd0e8552e3ec60980af6e103f5a941bdc8-1 B
            • 84
              • 4954303fa8cd07349b4394554b8ef2da40c2a5-1 B
          • branches
          • LICENSE-1 B
          • outdated.txt-1 B

        Sýna einfalda færslu atriðis