Data Science for Psychology: Natural Language


This is the website for Data Science for Psychology: Natural Language.

This book will teach you state-of-the-art methods for analyzing psychological properties of text. It will also cover the fundamentals of data visualization, data collection, and scientific methodology necessary to produce meaningful work in the field.

At every step of the way, we will give examples in R, using the tools and rules of the tidyverse. This book will also teach you the basics of the quanteda and text packages for natural language processing (NLP), and the vosonSML package for collecting data from popular social media sites.

Unless otherwise noted, all figures in this book were generated by us using ggplot2. The full, reproducible code for their generation can be viewed by clicking the “View Source” button at the bottom of each page.

To obtain practice assignments that go along with the book, or for any other questions, contact us at

How to Cite This Book:

Teitelbaum, L., & Simchon, A. (2024). Data Science for Psychology: Natural Language.

This book was written for the “Data Science for Psychology Lab” course in the psychology department at Ben-Gurion University of the Negev (BGU), and is supported by funding from BGU.

This book is an open source project and uses a Contributor Code of Conduct. By contributing to this book, you agree to abide by its terms.

On the Cover: Posts on Reddit’s r/relationship_advice, distributed by their emotional content according to the Pleasure-Arousal-Dominance model. The x axis represents pleasure, the y axis represents dominance, and the color scale represents arousal. The size of the points represents the number of comments responding to each post. For more details, see the source code.