2017-07-26T15:33:20.918Z 1,10MB Creative Commons Attribution Share-Alike Portuguese Hate Speech Twitter Dataset is a dataset of Twitter messages manually annotated for Hate Speech using a hierarchical structure of classes. 5,668 messages were collected on Twitter, from 1,156 distinct users and classified as containing hate speech using a hierarchical structure of classes. A multiclass and multilabel approach was considered. Two different formats of the dataset are provided, plus the hierarchy of classes. The text of the tweets is omitted in this dataset due to the conditions and terms of the Twitter API. INESC TEC Hate speech dataset annotated for Portuguese Paula Fortuna *.CSV EN Master's thesis: FORTUNA, Paula (2017). Automatic detection of hate speech in text: an overview of the topic and dataset annotation with hierarchical classes. Porto: Faculdade de Engenharia da Universidade do Porto Porto, Portugal Hate speech,Automatic detection,Social Network Tweets and classes taxonomy paula.fortuna@fe.up.pt 2017-12-28T16:02:35.373Z folder fa-folder http://127.0.0.1:3001/project/hatespeech/data/hate speech folder 0 2017-12-28T16:47:11.355Z http://www.semanticdesktop.org/ontologies/2007/01/19/nie#InformationElement,http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Folder,http://dendro.fe.up.pt/ontology/0.1/Resource /r/file/76a8ccc3-4607-4a3c-81ac-3b883384e74e,/r/file/e9df21c0-588d-438c-a123-9a95a5538006,/r/file/ff3ec916-04c7-4025-87d9-e3f175225d6f,/r/file/5f823edc-bf72-40e5-955e-8f2a8712b92b,/r/file/10de71fe-4168-465d-a1e5-90b920c5ca8f /r/folder/e0b490aa-6734-442e-ad8b-f489217a0748 Portuguese Hate Speech Twitter Dataset The classes follow a hierarchical organization. This hierarchy is represented as a Directed Acyclic Graph (DAG) in CSV format with the source (first column, named 'Source') and destiny (second column, named 'Target') nodes. 100 lines. graph hierarchical classes 2017-12-28T16:03:52.354Z csv true fa-file-o /r/folder/bf7dd361-074a-48df-8bfe-9dd986ca710a/graph_hierarchical_classes.csv 0 2017-12-28T16:38:15.358Z http://www.semanticdesktop.org/ontologies/2007/01/19/nie#InformationElement,http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject,http://dendro.fe.up.pt/ontology/0.1/Resource /r/folder/bf7dd361-074a-48df-8bfe-9dd986ca710a graph_hierarchical_classes.csv CSV file containing the dataset as a matrix with dummy variables for each class. The first column contains the Twitter ID of each tweet (first column, named 'tweet_id'), plus 79 columns representing all classes, as converted to dummy variables. 5669 lines. dataset dummy classes 2017-12-28T16:03:52.112Z csv fa-file-o /r/folder/bf7dd361-074a-48df-8bfe-9dd986ca710a/dataset_dummy_classes.csv 0 2017-12-28T16:37:03.504Z http://www.semanticdesktop.org/ontologies/2007/01/19/nie#InformationElement,http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject,http://dendro.fe.up.pt/ontology/0.1/Resource /r/folder/bf7dd361-074a-48df-8bfe-9dd986ca710a dataset_dummy_classes.csv CSV file containing the dataset of tweets – represented by Twitter ID (first column, named 'tweet_id'), plus the annotator classification (second column, named 'class'). 5669 lines. dataset annotator classes 2017-12-28T16:03:51.945Z csv true fa-file-o /r/folder/bf7dd361-074a-48df-8bfe-9dd986ca710a/annotator_classes.csv 0 2017-12-28T16:36:26.138Z http://www.semanticdesktop.org/ontologies/2007/01/19/nie#InformationElement,http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject,http://dendro.fe.up.pt/ontology/0.1/Resource /r/folder/bf7dd361-074a-48df-8bfe-9dd986ca710a annotator_classes.csv A readme file containing the full description of the "Hate speech dataset annotated for Portuguese" 2017-12-28T16:49:22.823Z txt fa-file-text-o /r/folder/bf7dd361-074a-48df-8bfe-9dd986ca710a/README.txt 0 2017-12-28T16:53:12.876Z http://www.semanticdesktop.org/ontologies/2007/01/19/nie#InformationElement,http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject,http://dendro.fe.up.pt/ontology/0.1/Resource /r/folder/bf7dd361-074a-48df-8bfe-9dd986ca710a README.txt