14 Dictionary-Based Word Counts

In Chapter 13, we transformed our corpus into a DFM with counts of each word in each document. But not all words are created equal; some words are much more psychologically interesting than others. The simplest way to count relevant words while ignoring others is by using a dictionary.

This chapter introduces the basics of dictionary-based methodology. Chapter 15 and Chapter 16 will build on this chapter, exploring more advanced ways to use token counting for measurement.

14.1 Dictionaries

A dictionary is a list of words (or other tokens) associated with a given psychological or other construct. For example, a dictionary for depression might include words like “sleepy” and “down.” We can use the dictionary to count construct-related words in each text—texts that use more construct-related words are then assumed to be more construct-related overall.

Let’s give a more concrete example: Recall that in the Hippocorpus data, the memType variable indicates whether the participant was told to tell a story that happened to them recently (“recalled”), a story that they had already told a few months earlier (“retold”), or an entirely fictional story (“imagined”).

Sap et al. (2022) hypothesized that true autobiographical stories would include more surprising events than imagined stories. To test this hypothesis, we could use a dictionary of surprise-related words. Where could we find such a dictionary? Perhaps we could try making one up?

surprise_dict <- dictionary(
    list(
      surprise = c("surprise", "wow", "suddenly", "bang")
    )
  )
surprise_dict

#> Dictionary object with 1 key entry.
#> - [surprise]:
#>   - surprise, wow, suddenly, bang

Generating a sentiment dictionary is not easy. Luckily, other researchers have done the work for us: The NRC Word-Emotion Association Lexicon (S. M. Mohammad & Turney, 2013; S. Mohammad & Turney, 2010), included in the quanteda.sentiment package, has a list of 534 surprise words.

surprise_dict <- quanteda.sentiment::data_dictionary_NRC["surprise"]
surprise_dict

#> Dictionary object with 1 key entry.
#> Polarities: pos = "positive"; neg = "negative" 
#> - [surprise]:
#>   - abandonment, abduction, abrupt, accident, accidental, accidentally, accolade, advance, affront, aghast, alarm, alarming, alertness, alerts, allure, amaze, amazingly, ambush, angel, anomaly [ ... and 514 more ]

The NRC Word-Emotion Association Lexicon is a crowdsourced dictionary; S. M. Mohammad & Turney (2013) generated it by presenting individual words to thousands of online participants and asking them to rate how much each word is “associated with the emotion surprise.” The final dictionary includes all the words that were consistently reported to be at least moderately associated with surprise.

14.2 Understand Your Dictionary

In Chapter 11, we emphasized the importance of reading through your data before conducting any analyses. The same is true for dictionaries: Before using any dictionary-based methods, always look through your dictionary and ask yourself two questions:

How was my dictionary constructed?
How context-dependent are the words in my dictionary?

Let’s expand on each of these questions.

14.2.1 How Was Your Dictionary Constructed?

The surprise dictionary we are using was generated by asking participants how much each word was “associated with the emotion surprise” (S. M. Mohammad & Turney, 2013). A word can be “associated with” surprise because it reflects surprise (e.g. “suddenly”), but it can also be “associated with” surprise because it reflects the exact opposite of surprise. Indeed, if we look through the dictionary, we find words like “leisure” and “lovely”.

set.seed(8)
sample(surprise_dict$surprise, 20)

#>  [1] "outburst"    "godsend"     "alarming"    "intense"     "lawsuit"    
#>  [6] "leisure"     "scrimmage"   "curiosity"   "reappear"    "placard"    
#> [11] "diversion"   "receiving"   "thirst"      "lovely"      "frenetic"   
#> [16] "perfection"  "playground"  "fearfully"   "guess"       "unfulfilled"

This means that we are not, in fact, measuring how surprising each story is. At best, we are measuring how much each story deals with surprise (or lack thereof) one way or another.

As you look through your dictionary, make sure you are aware of the process used to construct the dictionary. If it was generated by asking participants about individual words, how was the question formulated? How might that question have been interpreted by the participants?

14.2.2 How Context-Dependent are the Words in Your Dictionary?

The participants generating our dictionary were asked about one word at a time. People presented words out of context often fail to consider how words are actually used in natural discourse. For example, imagine that you are an online participant, and you are asked about your associations with the word “guess”. Seeing “guess” by itself might sound like an imperative, calling to mind a situation in which someone is asking you to guess something about which you are unsure—perhaps a game show. Since this sort of situation generally results in a surprise when the truth is revealed, you report that “guess” is associated with surprise. In fact, though, “guess” is much more frequently used in the phrase “I guess”, which signifies reluctance and has very little to do with surprise. We can check how “guess” is used our corpus by using Quanteda’s kwic() function, which gives a dataframe of Key Words In Context (KIWC).

hippocorpus_tokens |> 
  kwic("guess") |> 
  mutate(text = paste(pre, keyword, post)) |> 
  pull(text)

#> [1] "his 30th birthday and I guess that's why he decided to"            
#> [2] "healthier after a month I guess it was the stress of"              
#> [3] "already made cake So i guess it wasn't that bad"                   
#> [4] "wrong Was she serious I guess so When I finished packing"          
#> [5] "up our unit And I guess that's it I never saw"                     
#> [6] "I'm not sure yet I guess I will see how the"                       
#> [7] "FINALLY got admitted D I guess all those crazy contractions worked"
#> [8] "we made it safely I guess even the car got tired"

With the possible exception of #6, none of these examples give the impression of an impending surprise. Nevertheless, “guess” does appear in the NRC surprise dictionary.

As you look through your dictionary, think about how each word might really be used in context. Are there ways to use the word that do not have to do with your construct?

14.3 Raw Word Counts

At this point, you might be pretty skeptical about using the NRC surprise dictionary to measure surprise. Even so, let’s try it out. To count how many times surprise words appear in each of our texts, we use the dfm_lookup() function.

hippocorpus_surprise <- hippocorpus_dfm |> 
  dfm_lookup(surprise_dict)

hippocorpus_surprise

#> Document-feature matrix of: 6,854 documents, 1 feature (5.09% sparse) and 6 docvars.
#>                                 features
#> docs                             surprise
#>   32RIADZISTQWI5XIVG5BN0VMYFRS4U        2
#>   3018Q3ZVOJCZJFDMPSFXATCQ4DARA2        0
#>   3IRIK4HM3B6UQBC0HI8Q5TBJZLEC61        4
#>   3018Q3ZVOJCZJFDMPSFXATCQG04RAI        3
#>   3MTMREQS4W44RBU8OMP3XSK8NMJAWZ        4
#>   3018Q3ZVOJCZJFDMPSFXATCQG06AR3        6
#> [ reached max_ndoc ... 6,848 more documents ]

14.3.1 Modeling Raw Word Counts

Recall that we wanted to test whether true autobiographical stories include more surprise than imagined stories. Now that we have counted the number of surprise words in each document, how do we test our hypothesis?

A good first step is to reattach the word counts to our original corpus. As we do this, we convert both to dataframes.

hippocorpus_surprise_df <- hippocorpus_surprise |> 
  convert("data.frame") |> # convert to dataframe
  right_join(
    hippocorpus_corp |> 
      convert("data.frame") # convert to dataframe
    )

It makes sense to control for the total number of words in each text, since longer texts have more opportunities to use surprise words¹. To count the total number of tokens in each text, we can use the ntoken() function on our DFM and add the result directly to the new dataframe.

hippocorpus_surprise_df <- hippocorpus_surprise_df |> 
  mutate(wc = ntoken(hippocorpus_dfm))

We are now ready for modeling! When your dependent variable is a count of words, we recommend using negative binomial regression, available in R with the MASS package². For extra sensitivity to the variable rates at which word frequencies grow with text length (see Baayen, 2001), we include wc as a both a predictor and an offset offset(log(wc)) in the regression (an offset is just a predictor with its parameter at 1). We use log() to account for the fact that negative binomial regression links the predictors with the outcome variable through a log link. This means that including offset(log(wc)) is equivalent to modeling the ratio of surprise words to total words (for a more detailed explanation of this dynamic, see the discussion here).

surprise_mod <- MASS::glm.nb(surprise ~ memType + wc + offset(log(wc)),
                             data = hippocorpus_surprise_df)
summary(surprise_mod)

#> 
#> Call:
#> MASS::glm.nb(formula = surprise ~ memType + wc + offset(log(wc)), 
#>     data = hippocorpus_surprise_df, init.theta = 6.070929358, 
#>     link = log)
#> 
#> Coefficients:
#>                   Estimate Std. Error  z value Pr(>|z|)    
#> (Intercept)     -3.9065113  0.0258623 -151.050  < 2e-16 ***
#> memTyperecalled -0.0324360  0.0176595   -1.837  0.06625 .  
#> memTyperetold   -0.0614152  0.0219399   -2.799  0.00512 ** 
#> wc              -0.0008833  0.0000876  -10.082  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for Negative Binomial(6.0709) family taken to be 1)
#> 
#>     Null deviance: 7490.2  on 6853  degrees of freedom
#> Residual deviance: 7370.5  on 6850  degrees of freedom
#> AIC: 30997
#> 
#> Number of Fisher Scoring iterations: 1
#> 
#> 
#>               Theta:  6.071 
#>           Std. Err.:  0.270 
#> 
#>  2 x log-likelihood:  -30987.333

Looking at the p-values for the coefficients, we see that there was no significant difference between recalled and imagined stories (p = 0.066). There was, however, a significant difference between retold and imagined stories, such that retold stories used fewer surprise words (p = 0.005).

An example of using raw word counts in research: Simchon et al. (2023) collected Twitter activity over a three month period from over 2.7 million users. Using a dictionary, they then counted the number of passive auxiliary verbs (e.g. “they were analyzed”; “my homework will be completed”) in each user’s activity. They found that users with more followers (indicating higher social status) used much fewer passive auxiliary verbs, controlling for total word count.

14.4 Polarity

How can we improve our measurement of surprise? As we saw above, one problem with the dictionary approach is that a word might be associated with a construct because it reflects the opposite of that construct. One solution to this problem is to measure the ratio between the target dictionary and its opposite. In sentiment analysis, this approach is called polarity. Polarity is most commonly used to analyze the overall valence of a text by comparing positive words (e.g. “happy”, “great”) with negative words (e.g. “disappointed”, “terrible”). In principle though, we can use it to compare any sort of opposites.

What is the opposite of surprise? Plutchik (1962) argues that the opposite of surprise is anticipation. Luckily, the NRC Word-Emotion Association Lexicon also includes a dictionary of anticipation-associated words. Using this dictionary, we can measure how much a text is associated with surprise as opposed to anticipation.

Quanteda’s built-in function for polarity is textstat_polarity(). To use this function, we first have to set the “positive” and “negative” polarities of the dictionary, and then call textstat_polarity() on our DFM. By default, this outputs the log ratio of positive to negative counts for each document:

library(quanteda.sentiment)

#> 
#> Attaching package: 'quanteda.sentiment'

#> The following object is masked from 'package:quanteda':
#> 
#>     data_dictionary_LSD2015

# subset dictionary
surprise_anticipation_dict <- data_dictionary_NRC[c("surprise", "anticipation")]

# set surprise and anticipation as polarity
polarity(surprise_anticipation_dict) <- list(pos = "surprise", neg = "anticipation")

# get polarity
hippocorpus_surprise_polarity <- 
  textstat_polarity(hippocorpus_dfm, surprise_anticipation_dict) |> 
  rename(surprise_vs_anticipation = sentiment)

While textstat_polarity() can sometimes be useful for visualizations or downstream analyses, it is not helpful for modeling polarity as an outcome variable.

14.4.1 Modeling Polarity

To test whether true autobiographical stories include more surprise relative to anticipation than imagined stories, we first count the surprise and anticipation words in each document, and rejoin the results to the full dataset.

# count surprise/anticipation words
hippocorpus_surprise_anticipation <- hippocorpus_dfm |> 
  dfm_lookup(surprise_anticipation_dict)

# convert to dataframe and join to full data
hippocorpus_surprise_anticipation_df <- 
  hippocorpus_surprise_anticipation |> 
  convert("data.frame") |> 
  right_join(
    hippocorpus_corp |> 
      convert("data.frame"), # convert to dataframe
    by = "doc_id"
    ) |> 
  mutate(wc = ntoken(hippocorpus_dfm))

Since we are still modelling word counts as an output, we again use negative binomial regression. Rather than controlling for the total word count, however, we can control for the total number of surprise words plus the number of anticipation words. Because of the log link function (along with the endlessly useful properties of logarithms) entering this sum as a log offset (offset(log(surprise + anticipation))) is equivalent to modeling the ratio of surprise-related to anticipation-related words.

# remove zeros to prevent divide by zero errors
hippocorpus_surprise_anticipation_df <- 
  hippocorpus_surprise_anticipation_df |> 
    filter(surprise + anticipation > 0)

set.seed(2024)
surprise_anticipation_mod <- MASS::glm.nb(
  surprise ~ memType + wc + offset(log(surprise + anticipation)),
  data = hippocorpus_surprise_anticipation_df,
  # increase iterations to ensure model converges
  control = glm.control(maxit = 10000) 
  )

summary(surprise_anticipation_mod)

#> 
#> Call:
#> MASS::glm.nb(formula = surprise ~ memType + wc + offset(log(surprise + 
#>     anticipation)), data = hippocorpus_surprise_anticipation_df, 
#>     control = glm.control(maxit = 10000), init.theta = 2.949221746e+17, 
#>     link = log)
#> 
#> Coefficients:
#>                   Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)     -1.107e+00  1.990e-02 -55.659   <2e-16 ***
#> memTyperecalled -1.128e-02  1.356e-02  -0.831    0.406    
#> memTyperetold   -1.966e-02  1.697e-02  -1.158    0.247    
#> wc              -5.675e-05  6.462e-05  -0.878    0.380    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for Negative Binomial(2.949222e+17) family taken to be 1)
#> 
#>     Null deviance: 4884.6  on 6843  degrees of freedom
#> Residual deviance: 4882.1  on 6840  degrees of freedom
#> AIC: 10
#> 
#> Number of Fisher Scoring iterations: 1
#> 
#> 
#>               Theta:  2.949222e+17 
#>           Std. Err.:  6.158994e+14 
#> 
#>  2 x log-likelihood:  0

There is no significant difference between true and imagined stories in the ratio of surprise to anticipation words.

14.5 Lexical Norms

So far we have covered raw word counts, which use one list of words to represent a construct, and we have covered polarities, which use two lists of words to represent a construct and its opposite. The third and final dictionary-based method takes a more nuanced approach than either of these: In lexical norms, words are allowed to represent the construct or its opposite to continuously varying degrees, represented by numbers on a scale. In quanteda.sentiment, this scale is called “valence”, though elsewhere it can be called “lexical affinity” or “lexical association”.

The same group that created the NRC Word-Emotion Association Lexicon also created a parallel dictionary with continuous scores: the NRC Hashtag Emotion Lexicon (S. M. Mohammad & Kiritchenko, 2015). Whereas the NRC Word-Emotion Association Lexicon was crowdsourced, the NRC Hashtag Emotion Lexicon was generated algorithmically from a corpus of Twitter posts which contained hashtags like “#anger” and “#surprise”. The dictionary includes the words that were most predictive of each hashtag, with scores indicating the strength of their statistical connection with the category (higher score indicates more representative). We can access the NRC Hashtag surprise dictionary from Github:

path <- "https://raw.githubusercontent.com/bwang482/emotionannotate/master/lexicons/NRC-Hashtag-Emotion-Lexicon-v0.2.txt"

hashtag <- read_tsv(path, col_names = c("emotion", "token", "score"))

#> Rows: 32389 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (2): emotion, token
#> dbl (1): score
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

hashtag |> 
  filter(emotion == "surprise") |> 
  head()

#> # A tibble: 6 × 3
#>   emotion  token         score
#>   <chr>    <chr>         <dbl>
#> 1 surprise yada           1.49
#> 2 surprise #preoccupied   1.49
#> 3 surprise jaden          1.49
#> 4 surprise #easilyamused  1.49
#> 5 surprise #needtofocus   1.49
#> 6 surprise #amazement     1.49

# Create dictionary
surprise_dict_hashtag <- dictionary(
  list(surprise = hashtag$token[hashtag$emotion == "surprise"])
)

# Set dictionary valence
valence(surprise_dict_hashtag) <- list(
  surprise = hashtag$score[hashtag$emotion == "surprise"]
  )

To measure suprise in the Hippocorpus data, we find the suprise score of each token and compute the average score for the tokens of each document. With quanteda.sentiment, we can do this by calling the textstat_valence() function on our DFM. Since a score of zero in the NRC Hashtag Emotion Lexicon represents zero surprise, we will add normalization = "all" to code non-dictionary words as zero by default.

# compute valence
hippocorpus_valence <- textstat_valence(
  hippocorpus_dfm, # data
  surprise_dict_hashtag, # dictionary
  normalization = "all"
  )

# rejoin to original data
hippocorpus_valence <- hippocorpus_valence |> 
  rename(surprise = sentiment) |> 
  right_join(
    hippocorpus_corp |> 
      convert("data.frame") # convert to dataframe
    )

14.5.1 Modeling Norms

Norm scores, unlike raw word counts and polarities, can be reasonably modeled using standard linear regression. Furthermore, because the score is an average rather than a sum or count, there is no need to control for total word count. Let’s test one more time whether true autobiographical stories include more surprise-related language than imagined stories:

surprise_score_mod <- lm(surprise ~ memType, hippocorpus_valence)

summary(surprise_score_mod)

#> 
#> Call:
#> lm(formula = surprise ~ memType, data = hippocorpus_valence)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -0.085708 -0.015726 -0.000448  0.015093  0.104459 
#> 
#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)     0.1402018  0.0004433 316.300  < 2e-16 ***
#> memTyperecalled 0.0029688  0.0006256   4.746 2.12e-06 ***
#> memTyperetold   0.0021648  0.0007791   2.779  0.00548 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.02327 on 6851 degrees of freedom
#> Multiple R-squared:  0.003406,   Adjusted R-squared:  0.003116 
#> F-statistic: 11.71 on 2 and 6851 DF,  p-value: 8.388e-06

We found a significant difference between recalled and imagined stories (p < .001), such that recalled stories have more surprise-related language! This supports Sap et al.’s hypothesis that true autobiographical stories would include more surprising events than imagined stories. The new model also indicated a significant difference between retold and imagined stories, such that retold stories used more surprise-related language—the opposite direction relative to our original finding with the crowdsourced dictionary (p = 0.005).

14.6 Sources of Dictionaries

So far we have seen the NRC Word-Emotion Association Lexicon, which used a crowdsourcing approach to generate the dictionary, and the NRC Hashtag Emotion Lexicon, which used a corpus-based approach, relying on hashtags for labeling. Crowdsourcing and algorithmic corpus-based generation are far from the only ways to generate a dictionary. Here we review various types of dictionaries and where to find them.

14.6.1 Crowdsourced Dictionaries

Besides the surprise dictionary, the NRC Word-Emotion Association Lexicon includes dictionaries for anger, fear, anticipation, trust, sadness, joy, and disgust. The same group has also produced other crowdsourced emotion dictionaries:

NRC VAD (S. M. Mohammad, 2018a) contains 20,007 words with ratings between 0 and 1 for valence, arousal and dominance.
NRC Affect Intensity (S. M. Mohammad, 2018b) contains 4192 words with ratings between 0 and 1 for anger, fear, sadness and joy.

Psychologists have used crowdsourcing questionnaires to create dictionaries (especially norms) for decades. As such, crowdsourced dictionaries exist for many psychologically interesting constructs:

Brysbaert et al. (2014) used an internet questionnaire to obtain norms for concreteness (i.e. the extent to which a word refers to a perceptible entity). The result, including nearly 40,000 words and 2-grams, is available as an Excel file here.
Kuperman et al. (2012) asked participants at what age they learned each word, resulting in age-of-acquisition norms for 30,000 English words.
Warriner et al. (2013) crowdsourced norms for valence, arousal, and dominance, expanding on the ANEW dictionary included in quanteda.sentiment. The expanded norms are available as a zip file here.
Stadthagen-Gonzalez & Davis (2006) collected norms for age-of-acquisition, familiarity, and imageability (the ease with which a word evokes mental images) by surveying undergraduates.
Diveica et al. (2023) asked online participants to rate the social relevance of words. The resulting “socialness” norms are available here.

14.6.2 Expert-Generated Dictionaries

Words are used in many contexts, sometimes with many possible meanings. To take these into account, some groups rely on experts to generate their dictionaries. By far the most prominent collection of expert-generated dictionaries is LIWC (pronounced “Luke”), which includes word lists for grammatical patterns, emotional content, cognitive processes, and more. With its rigorous approach, LIWC has dominated the field of dictionary-based analysis in psychology for decades. The most recent version of LIWC (Boyd et al., 2022) was generated by a team of experts who went through numerous stages of brainstorming, voting, and reliability analysis before arriving at the final word lists.

14.6.3 Corpus-Based Dictionaries

Human raters are much better at judging full texts than individual words. Corpus-based dictionaries take advantage of this by extracting their word lists from corpora of full texts that have been rated by humans. We have already seen the NRC Hashtag Emotion Lexicon (S. M. Mohammad & Kiritchenko, 2015), which used Twitter hashtags to gather a corpus of Tweets labeled with emotions by their original authors. A more classic example of corpus-based dictionary generation is Rao et al. (2014), who used a corpus of 1,246 news headlines, each rated manually for anger, disgust, fear, joy, sad and surprise on a scale from 0 to 100 (Strapparava & Mihalcea, 2007). By correlating these ratings with frequencies of words (see Chapter 15), they extracted the words that were most representative of high ratings in each category. Araque et al. (2018) used a similar technique to create DepecheMood, which includes ratings for each word on eight emotional dimensions: afraid, amused, angry, annoyed, don’t care, happy, inspired, and sad. This base dictionary was updated with additional resources by Badaro et al. (2018) to create EmoWordNet, which can be accessed through the Internet Archive.

Many statistical techniques have been used to extract dictionaries from labeled corpora, some of which will be covered briefly in Chapter 15 and Chapter 18 of this book. For a recent review of methods, see Bandhakavi et al. (2021).

14.6.4 Other Approaches to Dictionary Generation

Thesaurus Mining: Strapparava & Valitutti (2004) started with a short list of strongly affect-related words (e.g. “anger”, “doubt”, “cry”), and used WordNet, a database of conceptual relations between words, to find close synonyms of the original words on the list. The result was WordNet Affect. Strapparava & Mihalcea (2007) used WordNet Affect to generate short lists of words associated with anger, disgust, fear, joy, sadness, and surprise, downloadable from here.
Decontextualized Embeddings: In Chapter 18, we will cover a family of methods for measuring the similarities between words based on how frequently they appear together in text: decontextualized embeddings. These methods can be used on their own for measuring psychological constructs, but they can also be used as a tool for building dictionaries. For example, Buechel et al. (2020) started with a small seed lexicon and used word embeddings (Section 18.3) to find other words that are likely to appear in texts of the same topic. The result—including dictionaries for valence, arousal, dominance, joy anger, sadness, fear, and disgust—is available for download online.
Combined Methods: Vegt et al. (2021) used a combination of expert input, thesaurus data from WordNet, word embeddings (Section 18.3), and crowdsourcing from online participants to generate norms for numerous constructs associated with grievance-fueled violence (e.g. desperation, fixation, frustration, hate, weapons). The final product is available here.

Advantages of Dictionary-Based Word Counts

Efficient Processing: Counting is a simple operation for computers. For very large datasets, this can make a big difference.
Easy to Interpret: Dictionaries for sentiment analysis are usually not more than a few hundred words long. This means that they are easy to read through and understand intuitively. The intuitive appeal is also good for explaining your research to others—“we counted the number of anger-related words” is a method that any non-expert can understand.

Disadvantages of Dictionary-Based Word Counts

No Context: Dictionary-based word counts treat texts as bags of words. This means they entirely ignore word order (aside from the order of any n-grams that might be included in the dictionary).
May Reflect Various Constructs: Dictionaries are often generated by asking participants to identify associations with words. These associations do not necessarily reflect the construct in which the researcher is interested.
Unnuanced: Words are either in a dictionary or they are not. Raw counts carry no nuance about the varying degrees to which different words may reflect the construct of interest. Norms can fix this problem, but are not available for many psychological dimensions.
Unnaturalistic Generation Process: Dictionaries are generally crowdsourced by asking participants to report their associations with individual words. People presented words out of context often fail to consider how words are actually used in natural discourse.
Limited Dictionaries Available: Dictionaries are expensive and labor intensive to produce. Researchers are generally reliant on dictionaries already produced by other teams, which may not reflect the construct of interest precisely.

Araque, O., Gatti, L., Staiano, J., & Guerini, M. (2018). DepecheMood++: A bilingual emotion lexicon built through simple yet powerful techniques. CoRR, abs/1810.03660. http://arxiv.org/abs/1810.03660

Baayen, R. H. (2001). Word frequency distributions. Springer Netherlands. https://link.springer.com/book/10.1007/978-94-010-0844-0

Badaro, G., Jundi, H., Hajj, H., & El-Hajj, W. (2018). EmoWordNet: Automatic expansion of emotion lexicon using English WordNet. In M. Nissim, J. Berant, & A. Lenci (Eds.), Proceedings of the seventh joint conference on lexical and computational semantics (pp. 86–93). Association for Computational Linguistics. https://doi.org/10.18653/v1/S18-2009

Bandhakavi, A., Wiratunga, N., Massie, S., & P., D. (2021). Emotion‐aware polarity lexicons for twitter sentiment analysis. Expert Systems, 38(7).

Boyd, R., Ashokkumar, A., Seraj, S., & Pennebaker, J. (2022). The development and psychometric properties of LIWC-22. https://doi.org/10.13140/RG.2.2.23890.43205

Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known english word lemmas. Behavior Research Methods, 46, 904–911.

Buechel, S., Rücker, S., & Hahn, U. (2020). Learning and evaluating emotion lexicons for 91 languages. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 1202–1217). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.112

Diveica, V., Pexman, P. M., & Binney, R. J. (2023). Quantifying social semantics: An inclusive definition of socialness and ratings for 8388 english words. Behavior Research Methods, 55(2), 461–473.

Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 english words. Behavior Research Methods, 44, 978–990.

Mohammad, S. M. (2018a). Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 english words. Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL).

Mohammad, S. M. (2018b). Word affect intensities. Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018).

Mohammad, S. M., & Kiritchenko, S. (2015). Using hashtags to capture fine emotion categories from tweets. Computational Intelligence, 31, 301–326. https://api.semanticscholar.org/CorpusID:2498838

Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word-emotion association lexicon. Computational Intelligence, 29(3), 436–465.

Mohammad, S., & Turney, P. (2010). Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, 26–34. https://aclanthology.org/W10-0204

Plutchik, R. (1962). The emotions. Random House.

Rao, Y., Lei, J., Wenyin, L., Li, Q., & Chen, M. (2014). Building emotional dictionary for sentiment analysis of online news. World Wide Web (Bussum), 17(4), 723–742.

Sap, M., Jafarpour, A., Choi, Y., Smith, N. A., Pennebaker, J. W., & Horvitz, E. (2022). Quantifying the narrative flow of imagined versus autobiographical stories. Proceedings of the National Academy of Sciences, 119(45), e2211715119. https://doi.org/10.1073/pnas.2211715119

Simchon, A., Hadar, B., & Gilead, M. (2023). A computational text analysis investigation of the relation between personal and linguistic agency. Communications Psychology, 1–9. https://doi.org/10.1038/s44271-023-00020-1

Stadthagen-Gonzalez, H., & Davis, C. J. (2006). The bristol norms for age of acquisition, imageability, and familiarity. Behavior Research Methods, 38(4), 598–605.

Strapparava, C., & Mihalcea, R. (2007). SemEval-2007 task 14: Affective text. In E. Agirre, L. Màrquez, & R. Wicentowski (Eds.), Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 70–74). Association for Computational Linguistics. https://aclanthology.org/S07-1013

Strapparava, C., & Valitutti, A. (2004). Wordnet affect: An affective extension of wordnet. Lrec, 4, 40.

Vegt, I. van der, Mozes, M., Kleinberg, B., & Gill, P. (2021). The grievance dictionary: Understanding threatening language use. Behavior Research Methods, 1–15.

Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 english lemmas. Behavior Research Methods, 45, 1191–1207.

We use total word count here for the sake of the example, but total word count may not always be the appropriate measure of text length. For example, you may want to measure the amount of surprise relative to other emotional content. In this case, it would be more appropriate to control for the total number of emotion-related words, as opposed to the total word count. Similarly, if you were measuring the number of first person singular pronouns, you may want to control for the total number of pronouns rather than the total word count.↩︎
We use a simple count of words as the dependent variable here, but keep in mind that it may be more appropriate to apply a transformation such as Simple Good-Turing frequency estimation (Section 16.6).↩︎