21  Topic Modeling

This page is still under construction. Come back soon!

Topic Modeling

LDA (Blei et al., 2003)

Dirichlet is generallly pronounced either “Deereekleh” or “Deerishleh”

Poldrack et al. (2012)


lda <- textmodel_lda(dfm, k = 10, verbose = TRUE)

For larger corpora, set batch_size lower


21.1 Supervised LDA

Blei & McAuliffe (2010)

sLDA in R

21.2 Semi-Supervised LDA

seededLDA in R

An Example of Semi-Supervised LDA in Research: Curini & Vignoli (2021)

21.3 BERTopic: Neural Topic Modeling

Grootendorst (2022)

Advantages of Topic Modeling
  • :
Disadvantages of Topic Modeling
  • :
Blei, D. M., & McAuliffe, J. D. (2010). Supervised topic models. https://arxiv.org/abs/1003.0783
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res., 3(null), 993–1022. https://doi.org/10.5555/944919.944937
Curini, L., & Vignoli, V. (2021). Committed Moderates and Uncommitted Extremists: Ideological Leaning and Parties’ Narratives on Military Interventions in Italy. Foreign Policy Analysis, 17(3), orab016. https://doi.org/10.1093/fpa/orab016
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv Preprint arXiv:2203.05794.
Poldrack, R. A., Mumford, J. A., Schonberg, T., Kalar, D., Barman, B., & Yarkoni, T. (2012). Discovering Relations Between Mind, Brain, and Mental Disorders Using Topic Mapping. PLOS Computational Biology, 8(10), e1002707. https://doi.org/10.1371/journal.pcbi.1002707