dc.contributor.author | Einarsson, Hafsteinn |
dc.contributor.author | Friðriksdóttir, Steinunn Rut |
dc.contributor.author | Arnardóttir, Þórunn |
dc.date.accessioned | 2024-11-07T09:13:13Z |
dc.date.available | 2024-11-07T09:13:13Z |
dc.date.issued | 2024-10-24 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/352 |
dc.description | Íslenska lyndisgreiningarmálheildin (Hotter and Colder) er yfirgripsmikið safn af merktum ummælum við bloggfærslur sem inniheldur samhengisupplýsingar og lýsigögn með yfir 19000 merkingum. Hver færsla táknar eitt merkingaverkefni sem notandi framkvæmir á tiltekin ummæli, sem inniheldur alla ummælasöguna og bloggfærsluna. Nota þarf hydration.py skriftuna til að sækja ummæli og blogg tengd merkingu. Málheildin nær yfir ýmis merkingaverkefni sem beinast að mismunandi þáttum netumræðu, þar á meðal lyndisgreiningu, greiningu á eitruðum ummælum, hatursorðræðugreiningu, mat á félagslegu samþykki, tilfinningagreiningu, kaldhæðnisgreiningu, mat á uppbyggileika, greiningu á hvatningu og samúð, mat á kurteisi, greiningu á nettröllaskap, greiningu á hrútskýringum og greiningu á alhæfingum um hópa. Uppbygging gagnasafnsins samanstendur af nokkrum meginreitum sem verða aðgengilegir eftir að hydration.py skriftan er keyrð, þar á meðal user_id til að auðkenna merkjandann, annotation_task_name sem tilgreinir tegund verkefnis, label_given_by_user sem inniheldur svör merkjandans, comment_annotated með fullum texta sem er greindur, comment_author_name, comment_datetime fyrir tímastimpil færslu, previous_comments sem sýnir fyrri ummæli og blog_post sem inniheldur bloggfærsluna sem ummælin eru við. The Icelandic Sentiment Corpus (Hotter and Colder) is a comprehensive collection of annotated blog comments that includes contextual information and metadata with over 19000 annotations. Each entry represents a single annotation task performed by a user on a specific comment, containing the complete comment history and blog post context. The dataset, which requires hydration using the hydration.py script to fetch comments and blogs related to an annotation, encompasses various annotation tasks focused on different aspects of online discourse, including sentiment analysis, toxicity detection, hate speech identification, social acceptance evaluation, emotion detection, sarcasm recognition, constructiveness assessment, encouragement and sympathy detection, politeness evaluation, trolling identification, mansplaining detection, and analysis of group generalizations. The dataset's structure consists of several main fields that become available after hydration, including user_id for identifying the annotator, annotation_task_name specifying the type of task, label_given_by_user containing the annotator's response, comment_annotated with the full text being analyzed, comment_author_name, comment_datetime for posting timestamp, previous_comments showing the thread history, and blog_post containing the full context. |
dc.language.iso | isl |
dc.publisher | Háskóli Íslands |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://www.ummælagreining.is |
dc.subject | Sentiment Analysis |
dc.subject | Toxicity detection |
dc.subject | Hate speech detection |
dc.subject | Emotion Analysis |
dc.subject | Social Acceptability Analysis |
dc.title | Icelandic Sentiment Corpus (Hotter and Colder) |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | Clarin IS Repository |
demo.uri | https://www.ummælagreining.is |
contact.person | Hafsteinn Einarsson hafsteinne@hi.is Háskóli Íslands |
sponsor | Ministry of Culture and Business Affairs Bias and Toxicity (G12) Language Technology for Icelandic nationalFunds |
size.info | 19828 entries |
files.size | 294803 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- Icelandic_Sentiment_Corpus.zip
- Size
- 287.89 KB
- Format
- application/zip
- Description
- labels and hydration script
- MD5
- 6f26a58c5771158c0f9492096222ad6c
- clarin_submission
- hydration.py10 kB
- README.md3 kB
- data_unhydrated.csv1 MB
- requirements.txt37 B