Cite as:

Vincent Wong and Yaneer Bar-Yam, How do people differ? A social media approach, arXiv:1708.02900 (August 9, 2017).


Research from a variety of fields including psychology and linguistics have found correlations and patterns in personal attributes and behavior, but efforts to understand the broader heterogeneity in human behavior have not yet integrated these approaches and perspectives with a cohesive methodology. Here we extract patterns in behavior and relate those patterns together in a high- dimensional picture. We use dimension reduction to analyze word usage in text data from the online discussion platform Reddit. We find that pronouns can be used to characterize the space of the two most prominent dimensions that capture the greatest differences in word usage, even though pronouns were not included in the determination of those dimensions. These patterns overlap with patterns of topics of discussion to reveal relationships between pronouns and topics that can describe the user population. This analysis corroborates findings from past research that have identified word use differences across populations and synthesizes them relative to one another. We believe this is a step toward understanding how differences between people are related to each other.

Topics of discussion in the space of the two most prominent dimensions of word usage differences. A. Hockey-related words ('NHL,' 'hockey'). B. Video game related words ('game,' 'enemy,' 'kill,' 'points'). Both A and B exemplify the way that many of the sport/game related topics extracted with LDA coincide in the population. C. Global politics ('world,' 'politics,' 'government,' 'money'). D. An assortment of common words determined by LDA to be a distinct topic labeled by the word "actually." This analysis shows that this word group is meaningful even if it is non-trivial to determine the concept underpinning it.