########################################################################### ############ Textar af Vísindavef og Evrópuvef [VV_EV] ############### ############ http://hdl.handle.net/20.500.12537/361 ############### ########################################################################### [DESCRIPTION] The corpus contains questions and answers from the Icelandic Web of Science (www.visindavefur.is) and the European Web (www.evropuvefur.is), run by the University of Iceland. The corpus does not contain all the texts from the websites but only those authorized by the authors. This package contains the untokenized and untagged version or the corpus. The tokenized and pos-tagged version can be accessed at http://hdl.handle.net/20,500.12537/362. The file VV_EV-2502.xml contains information about the corpus including size and list of all the authors and categories used on the websites. The file also contains a path to all TEI-files, each containing a single article (question and answer). Each TEI file contains at least two div tags. They have the parameter 'type' but its value is either ‘question’ or ‘answer’ depending on whether it contains a question or answer. In some cases, the value of 'type' is 'question_long' but then the div-tag includes a longer version of the question. Where possible information that appears after the answer (references, footnotes ...) are stored in div with type='rest'. [LICENCE] https://repository.clarin.com/repository/xmlui/page/license-gigaword-corpus [PUBLISHER] Árni Magnússon Institute for Icelandic Studies. [STATISTICS] Articles: 11.431 Running words: ca. 4.6 millions