Terms of use:

The resources listed here are available free for research purposes only. If you use some of these resources, please cite the corresponding papers.
The papers associated with each resource are listed below, and also in the READMEs for individual resource.
If you are interested in commercial use of any of these resources, please contact me. A nominal one-time licensing fee may apply.
Do not redistribute the data. Direct interested parties to this page.
National Research Council Canada (NRC) disclaims any responsibility for the use of the resources listed here and does not provide technical support.

Available Resources:

Ethics
- Equity Evaluation Corpus
Best-Worst Scaling
- Scripts for Best-Worst Scaling
Sentiment and Emotion Analysis

Equity Evaluation Corpus

Equity Evaluation Corpus: 8,640 English sentences carefully chosen to tease out biases towards certain races and genders. The sentences were constructed from templates using the pre-defined sets of noun phrases corresponding to males and females as well as first names associated with African American and European American males and females. We used the dataset to examine 219 automatic sentiment analysis systems that took part in a recent shared task, SemEval-2018 Task 1 'Affect in Tweets'. We found that several of the systems showed statistically significant bias; that is, they consistently provide slightly higher sentiment intensity predictions for one race or one gender. We make the EEC freely available, and encourage its use to evaluate biases in sentiment and other NLP tasks. [download]
Svetlana Kiritchenko and Saif M. Mohammad. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems In Proceedings of *Sem, New Orleans, LA, USA, June 2018 [pdf]

Scripts for Best-Worst Scaling

Best-Worst Scaling (BWS), also sometimes referred to as Maximum Difference Scaling (MaxDiff), is an annotation scheme that exploits the comparative approach to annotation (Louviere and Woodworth, 1990; Cohen, 2003; Louviere et al., 2015). Annotators are given four items (4-tuple) and asked which item is the Best (highest in terms of the property of interest) and which is the Worst (least in terms of the property of interest). These annotations can then be easily converted into real-valued scores of association between the items and the property, which eventually allows for creating a ranked list of items as per their association with the property of interest.

We showed that BWS ranks terms more reliably than rating scales (RS) do; that is, when comparing the term rankings obtained from two groups of annotators for the same set of terms, the correlation between the two sets of ranks produced by BWS is significantly higher than the correlation for the two sets obtained with RS. The difference in reliability is more marked when about 5N (or less) total annotations are obtained, which is the case in many NLP annotation projects (Strapparava and Mihalcea, 2007; Socher et al., 2013; Mohammad and Turney, 2013).

We provide scripts to assist with Best-Worst Scaling annotations. The package includes:
- a script to produce 4-tuples with desired term distributions,
- a script to produce real-valued scores from Best-Worst annotations,
- a script to calculate split-half reliability of the annotations.

We have used BWS to annotate single words, short phrases, and whole tweets for emotion and sentiment intensity.

Kiritchenko, S., and Mohammad, S. (2016) Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best-Worst Scaling. Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), San Diego, California, 2016 [pdf]

Kiritchenko, S. and Mohammad, S. (2017) Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017 [pdf]

WikiArt Emotions Dataset

WikiArt Emotions: a dataset of 4,105 pieces of art (mostly paintings) that has annotations for emotions evoked in the observer. The pieces of art were selected from WikiArt.org's collection for twenty-two categories from four western styles (Renaissance Art, Post-Renaissance Art, Modern Art, and Contemporary Art). In addition to emotions, the art was also annotated for whether it includes the depiction of a face and how much the observers like the art. We do not redistribute the art images, we provide only the annotations. [download]

Each piece of art was annotated by at least 10 people. The annotations include:
- all emotions that the image of art brings to mind;
- all emotions that the title of art brings to mind;
- all emotions that the art as a whole (title and image) brings to mind;
- a rating on a scale from -3 (dislike it a lot) to 3 (like it a lot);
- whether the image shows the face or the body of at least one person or animal;
- whether the art is a painting or something else (e.g., sculpture).

We asked annotators to identify the emotions that the art evokes in three scenarios:
Scenario I: we present only the image (no title), and ask the annotator to identify the emotions it evokes;
Scenario II: we present only the title of the art (no image), and ask the annotator to identify the emotions it evokes;
Scenario III: we present both the title and the image of the art, and ask the annotator to identify the emotions that the art as a whole evokes.
Saif M. Mohammad and Svetlana Kiritchenko. WikiArt Emotions: An Annotated Dataset of Emotions Evoked by Art. In Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018), May 2018, Miyazaki, Japan [pdf]

Manually Created Sentiment Lexicons Using Best-Worst Scaling

Sentiment Composition Lexicon of Negators, Modals, and Adverbs (SCL-NMA), aka SemEval-2016 General English Sentiment Modifiers Lexicon: a list of ~3,2000 single words and multi-word phrases and their associations with positive and negative sentiment. Single words come from the list of positive and negative words collected by Osgood for his seminal study on word meaning, available in General Inquirer. The phrases are formed by combining an Osgood word and a modifier, where a modifier is a negator, an auxilary verb, a degree adverb, or a combination of those. The complete lists of negators, modals, and degree adverbs used in this lexicon are included into this distribution. The sentiment associations were obtained manually through crowdsourcing using the Best-Worst Scaling annotation technique. [download] [interactive visualization]

Parts of this lexicon were used as development and test sets for SemEval-2016 shared task on Determining Sentiment Intensity of English and Arabic Phrases (Task 7) -- General English Sentiment Modifiers Set. All terms from SCL-NMA except the ones that were used in the previous competition (SemEval-2015 Task 10 Subtask E) were included into the SemEval-2016 dataset (2,999 terms). The sentiment association scores were converted into the range 0..1 for the SemEval competition.
Kiritchenko, S., and Mohammad, S. (2016) The Effect of Negators, Modals, and Degree Adverbs on Sentiment Composition. Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), San Diego, California, 2016 [pdf]

Kiritchenko, S., Mohammad, S., and Salameh, M. (2016) SemEval-2016 Task 7: Determining Sentiment Intensity of English and Arabic Phrases. Proceedings of the International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, 2016 [pdf]
Sentiment Composition Lexicon of Opposing Polarity Phrases (SCL-OPP) aka SemEval-2016 English Twitter Mixed Polarity Lexicon: a list of ~1,200 single words and multi-word phrases and their associations with positive and negative sentiment. The phrases consist of two or three words. Each phrase includes at least one positive word and at least one negative word. The single words are taken from the set of words that are part of multi-word phrases. The words and phrases are drawn from tweets, and therefore include a small number of hashtag words and creatively spelled words. The sentiment associations were obtained manually through crowdsourcing using the Best-Worst Scaling annotation technique. [download] [interactive visualization]

Parts of this lexicon were used as development and test sets for SemEval-2016 shared task on Determining Sentiment Intensity of English and Arabic Phrases (Task 7) -- English Twitter Mixed Polarity Set. All terms from SCL-OPP except the ones that were used in the previous competition (SemEval-2015 Task 10 Subtask E) and in other datasets of SemEval-2016 Task 7 were included. The SemEval-2016 set additionally included some same polarity phrases. In total, there were 1,269 terms. The sentiment association scores were converted into the range 0..1 for the SemEval competition.
Kiritchenko, S., and Mohammad, S. (2016) Happy Accident: A Sentiment Composition Lexicon for Opposing Polarities Phrases. Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC), Portoroz, Slovenia, 2016 [pdf]

Kiritchenko, S., and Mohammad, S. (2016) Sentiment Composition of Words with Opposing Polarities. Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), San Diego, California, 2016 [pdf]

Kiritchenko, S., Mohammad, S., and Salameh, M. (2016) SemEval-2016 Task 7: Determining Sentiment Intensity of English and Arabic Phrases. Proceedings of the International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, 2016 [pdf]
SemEval-2015 English Twitter Sentiment Lexicon: a list of ~ 1,500 single words and simple two-word negated expressions and their associations with positive and negative sentiment. The terms are drawn from English Twitter and include general English words, misspellings, hashtags, and other categories frequently used in Twitter. The negated expressions are in the form of 'negator w', where 'negator' is a negation trigger. The complete list of the negation triggers is included into this distribution. The sentiment associations were obtained manually through crowdsourcing using the Best-Worst Scaling annotation technique. [download]

This lexicon was used in SemEval-2015 shared task on Sentiment Analysis in Twitter (Task 10 Subtask E). The sentiment association scores were converted into the range 0..1 for the SemEval competition.
Kiritchenko, S., Zhu, X., Mohammad, S. (2014). Sentiment Analysis of Short Informal Texts. Journal of Artificial Intelligence Research, 50:723-762, 2014 [pdf]

Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S., Ritter, A., and Stoyanov, V. (2015) SemEval-2015 Task 10: Sentiment Analysis in Twitter. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval-2015), Denver, Colorado, USA, 2015 [pdf]
SemEval-2016 Arabic Twitter Sentiment Lexicon: a list of ~1,400 single words and simple two-word negated expressions and their associations with positive and negative sentiment. The terms are drawn from Arabic Twitter and include both standard and dialectal Arabic, Romanized words, misspellings, hashtags, and other categories frequently used in Twitter. The negated expressions are in the form of 'negator w', where 'negator' is a negation trigger from a list of 16 common Arabic negation words. The complete list of the negation triggers is included into this distribution. The sentiment associations were obtained manually through crowdsourcing using the Best-Worst Scaling annotation technique. [download]

This lexicon was used in SemEval-2016 shared task on Determining Sentiment Intensity of English and Arabic Phrases (Task 7) -- Arabic Twitter Set. The sentiment association scores were converted into the range 0..1 for the SemEval competition.
Kiritchenko, S., Mohammad, S., and Salameh, M. (2016) SemEval-2016 Task 7: Determining Sentiment Intensity of English and Arabic Phrases. Proceedings of the International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, 2016 [pdf]

Automatically Generated Sentiment Lexicons

NRC Emoticon Affirmative Context Lexicon and NRC Emoticon Negated Context Lexicon: associations of ~55,000 single English words and ~275,000 two-word phrases with sentiment automatically generated from 1.6 million tweets with emoticons collected by Go and colleagues (Go, A., Bhayani, R., and Huang, L. Twitter sentiment classication using distant supervision. Tech. rep., Stanford University, 2009.). The word-sentiment association scores are calculated using the following formula: score = PMI(w, pos) - PMI(w, neg), where PMI stands for Point-wise Mutual Information between a word w and the positive class (tweets with positive emoticons) or the negative class (tweets with negative emoticons). The word-sentiment association scores are calculated separately for words appearing in affirmative contexts and for words appearing in negated contexts. [download]

These lexicons were an integral part of our sentiment analysis system that was ranked first in both subtasks of the SemEval-2014 Sentiment Analysis in Twitter shared task (Task 9).
Kiritchenko, S., Zhu, X., Mohammad, S. (2014). Sentiment Analysis of Short Informal Texts. Journal of Artificial Intelligence Research, 50:723-762, 2014 [pdf]

Zhu, X., Kiritchenko, S., and Mohammad, S. (2014) NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets. Proceedings of the 8th International Workshop on Semantic Evaluation Exercises (SemEval-2014), Dublin, Ireland, 2014 [pdf]
NRC Hashtag Affirmative Context Sentiment Lexicon and NRC Hashtag Negated Context Sentiment Lexicon: associations of ~45,000 single English words and ~185,000 two-word phrases with sentiment automatically generated from 775,000 tweets with sentiment-word hashtags collected by NRC. The word-sentiment association scores are calculated using the following formula: score = PMI(w, pos) - PMI(w, neg), where PMI stands for Point-wise Mutual Information between a word w and the positive class (tweets with positive hashtags) or the negative class (tweets with negative hashtags). The word-sentiment association scores are calculated separately for words appearing in affirmative contexts and for words appearing in negated contexts. [download]

These lexicons were an integral part of our sentiment analysis system that was ranked first in both subtasks of the SemEval-2014 Sentiment Analysis in Twitter shared task (Task 9).
Kiritchenko, S., Zhu, X., Mohammad, S. (2014). Sentiment Analysis of Short Informal Texts. Journal of Artificial Intelligence Research, 50:723-762, 2014 [pdf]

Zhu, X., Kiritchenko, S., and Mohammad, S. (2014) NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets. Proceedings of the 8th International Workshop on Semantic Evaluation Exercises (SemEval-2014), Dublin, Ireland, 2014 [pdf]
NRC Yelp Restaurant Sentiment Lexicons: associations of ~40,000 single English words and ~275,000 two-word phrases with sentiment automatically generated from 185,000 customer reviews for food-related businesses from the Yelp Phoenix Academic Dataset. The word-sentiment association scores are calculated using the following formula: score = PMI(w, pos) - PMI(w, neg), where PMI stands for Point-wise Mutual Information between a word w and the positive class (four- and five-star reviews) or the negative class (one- and two-star reviews). The word-sentiment association scores are calculated separately for words appearing in affirmative contexts and for words appearing in negated contexts. [download]

These lexicons were an integral part of our aspect-based sentiment analysis system that was ranked first in both sentiment subtasks of the SemEval-2014 Aspect Based Sentiment Analysis shared task (Task 4).
Kiritchenko, S., Zhu, X., Cherry, C., and Mohammad, S. (2014) NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews. Proceedings of the 8th International Workshop on Semantic Evaluation Exercises (SemEval-2014), Dublin, Ireland, 2014 [pdf]
NRC Amazon Laptop Sentiment Lexicons: associations of ~25,000 single English words and ~155,000 two-word phrases with sentiment automatically generated from 125,000 customer reviews on electronic products collected from Amazon.com by McAuley and Leskovec (Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems, pages 165-172.). Only reviews that mention 'laptop' or 'notebook' were used. The word-sentiment association scores are calculated using the following formula: score = PMI(w, pos) - PMI(w, neg), where PMI stands for Point-wise Mutual Information between a word w and the positive class (four- and five-star reviews) or the negative class (one- and two-star reviews). The word-sentiment association scores are calculated separately for words appearing in affirmative contexts and for words appearing in negated contexts. [download]

These lexicons were an integral part of our aspect-based sentiment analysis system that was ranked first in both sentiment subtasks of the SemEval-2014 Aspect Based Sentiment Analysis shared task (Task 4).
Kiritchenko, S., Zhu, X., Cherry, C., and Mohammad, S. (2014) NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews. Proceedings of the 8th International Workshop on Semantic Evaluation Exercises (SemEval-2014), Dublin, Ireland, 2014 [pdf]
NRC Yelp Word-Aspect Association Lexicons: associations of single English words with the aspect categories of food, price, service, ambiance, and anecdotes automatically generated from 185,000 customer reviews for food-related businesses from the Yelp Phoenix Academic Dataset. Each sentence of the reviews was labeled with zero, one, or more of the five aspect categories by our aspect category classification system (described in the corresponding paper). Then, for each term w and each category c an association score is calculated as follows: score(w,c) = PMI(w,c) - PMI(w,not c), where PMI stands for Point-wise Mutual Information between a word w and the positive class (sentences automatically labeled with the corresponding aspect category c) or the negative class (sentences automatically labeled as not belonging to the corresponding aspect category c). [download]

These lexicons were an integral part of our aspect category detection system that was ranked first in the SemEval-2014 Aspect Based Sentiment Analysis shared task (Task 4).
Kiritchenko, S., Zhu, X., Cherry, C., and Mohammad, S. (2014) NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews. Proceedings of the 8th International Workshop on Semantic Evaluation Exercises (SemEval-2014), Dublin, Ireland, 2014 [pdf]

Data Manually Annotated for Sentiment and Emotion

SemEval-2018 Shared Task on Affect in Tweets: ~22,000 tweets in English, Arabic, and Spanish annotated for emotion and sentiment. The data is annotated for coarse classes (i.e., no anger, low anger, moderate anger, high anger) as well as for fine-grained real-valued scores indicating the intensity of emotion. The annotations were done manually via crowdsourcing using the Best-Worst Scaling method.
Mohammad, S., Bravo-Marquez, F., Salameh, M., and Kiritchenko, S. (2018). Semeval-2018 Task 1: Affect in tweets. In Proceedings of the International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA, June 2018 [pdf]

Mohammad, S. and Kiritchenko, S. (2018). Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories. In Proceedings of the 11th edition of the Language Resources and Evaluation Conference, May 2018, Miyazaki, Japan [pdf]
SemEval-2016 Shared Task on Stance Detection in Tweets: 4870 English tweets annotated for stance towards six commonly known targets in the United States ('Atheism', 'Climate Change is a Real Concern', 'Feminist Movement', 'Hillary Clinton', 'Legalization of Abortion', and 'Donald Trump'). The data were also annotated for 'target of opinion' and sentiment. The annotations were done manually via crowdsourcing.
Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., and Cherry, C. (2016) SemEval-2016 Task 6: Detecting Stance in Tweets. Proceedings of the International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, 2016 [pdf] [interactive visualization]

Mohammad, S., Sobhani, P., Kiritchenko, S. (2017). Stance and Sentiment in Tweets. ACM Transactions on Internet Technology, 17(3), 2017 [pdf] [interactive visualization]

Svetlana Kiritchenko, Ph.D.

National Research Council Canada