############################################################ ############ Daglegt mál and Íslenskt mál [HJH] ############ ############ http://hdl.handle.net/20.500.12537/323 ######## ############################################################ The texts by Helgi J. Halldórsson were originally broadcast on the radio programs Daglegt mál and Íslenskt mál from 1973 to 1986. THE RADIO SHOW Helgi J. Halldórsson’s series Daglegt mál was broadcast in four cycles: First cycle: 26.04.1973 – 26.09.1974 (142 episodes) Second cycle: 02.06.1975 – 21.11.1975 (50 episodes) Third cycle: 11.06.1976 – 28.07.1977 (113 episodes) Fourth cycle: 04.05.1981 – 14.01.1982 (72 episodes) In total, there were 377 episodes, each containing over 700 words. Additionally, Helgi presented 24 episodes under the title Íslenskt mál from 04.09.1985 to 26.02.1986. (Sigrún Helgadóttir wrote and performed three of these episodes.) Thus, the total number of episodes amounts to 401. THE TEXTS The original scripts were stored in folders and typed on a manual typewriter. In the spring of 2020, Sigrún Helgadóttir digitized the series using equipment at the Árni Magnússon Institute. The process involved: Scanning the original pages using a specialized scanner Optical Character Recognition (OCR) to extract text Conversion into digital text using ABBYY FineReader For each episode, a PDF file containing an image of the original text and a corresponding digitally processed text document was created. Guðný Helgadóttir received the text documents, which were processed using the application Skrambi (http://skrambi.arnastofnun.is/), developed by Jón Friðrik Daðason. The documents were then corrected with Skrambi and compared against the original text. Guðný Helgadóttir carried out this work during the summer and autumn of 2020. FONT CHANGES Since Helgi used a typewriter, the only form of text emphasis he employed was underlining. However, no consistent pattern was observed in its use. While underlining was likely intended for emphasis, it does not necessarily indicate a focus on specific words. For this reason, underlining has not been included in the modified text files. CORRECTIONS Helgi’s original documents contain handwritten corrections, which were followed during the digitization process. The scanning system sometimes had difficulty processing heavily annotated pages, so some content had to be manually transcribed. SPELLING Helgi adhered to the official spelling conventions of his time and did not use the letter ‘z’, which was no longer in standard usage. However, some parts of the texts discuss this character. Any clear spelling or typographical errors have been corrected. THIS PUBLICATION The texts are published in TEI format and have been tokenized, POS-tagged, and lemmatized. The root document (HJH-22.02.ana.xml) contains key corpus information and references to the individual episode files. Tokenization: Performed using Tokenizer from Miðeind (https://github.com/mideind/Tokenizer) POS Tagging: Done with the grammatical tagger ABLTagger (http://hdl.handle.net/20.500.12537/98) Lemmatization: Conducted using Nefnir (https://github.com/jonfd/Nefnir)