In part 1 on this blog, “A mini-history of author analysis”, I pointed out that attempting to draw conclusions about the author of a text based on traits of the text alone has a long tradition in forensics (identifying perpetrators or revealing forgeries), literary studies (authorship identification) and psychology (from psychoanalysis to modern customer/consumer behavior studies). In its modern, machine learning version, psychological author profiling is often based on the ”Big Five” model (see figure 1) going back to McCrae & Costa 1989.
But how do human readers decode and interpret concrete features of text as to its authors personality?
In order to find preliminary answers to this question, I performed a mini-experiment in my lesson on “Communicative conventions” in my class “Social Media and Communication” at the new ARTS supplementary program “Social Minds” at AU in spring 2021. 21 students (7 male, 14 female), both Danish and international, assessed the author’s personality of a now deleted, and hence anonymity protected, anti-vaccination Facebook post, based on the Big Five taxonomy.
The scores were to be given on an 11-point scale going from ‘minus 5’ over ‘zero’ to ‘plus 5’ for each trait. The assignment, as intended by me, was to take determinate linguistic and paralinguistic dimensions (see figure 2) into account, while giving an explanation for the scores (i.e. a reflection based assessment, not an automatic, more or less subconscious one as in everyday reading). Some students had misunderstood the assignment (and I apologize for not having been clear enough) as being about what the writer wanted us to believe about their personality. This did not turn out to be a confound, as the explanations given for the respective assessments make it clear (“the writer wants to appear open-minded”), and were discarded. One group (1m, 2f) only gave assessment scores, but no explanations. So, the below results are based on the explanations assessed as relevant to the assignment given by 18 informants.
It has to be noted that some of the dimensions to be taken into account in the assignment go beyond what standard machine analyses (for now?) are able to successfully process (register, punctuation, capitals, text structure). Texts are usually prepared for machine analysis by removing capitals and punctuation.
Before going into the details, I want to present the central findings in an overview:
- The students gave a much more complete picture of the assessment processes than the original assignment was aiming at. They also referred to clues from the content (self-description by the author: personality and actions; rhetorical strategies), the text genre per se (argumentative) and its traits (argumentation patterns), the publication channel (Facebook post), the personal agenda of the author (being an activist), and finally the bigger societal context of the author’s agenda (being against vaccination = going against the consensus of the majority, incl. the reader).
- The same textual traits were often taken to mirror contradicting personality traits by different readers (e.g. the use of emojis).
- When statements by the author gave room for interpretation of the author’s underlying intention, this same intention could be interpreted differently, and hence associated with different personality traits (e.g. “I do not do this to make friends” as “non-interest in social bonding” vs. “honest intentions”).
- The overall scores roughly overlapped for the personality traits ‘conscientiousness’ (only positive scores) and ‘extroversion’ (zero to +5), but gave very contradictory results for ‘openness’ (-3 to +5), ‘agreeableness’ (-4 to +5) and ‘neuroticism’ (-5 to +5).
- The extremely varied scores for single textual traits on the personality scales made it clear that the absolute scores were not very informative. Many students calculated their final scores – in a reasonable way! – by adding and subtracting single textual traits, e.g. expressing empathy (adding to ‘agreeableness’) and one-sided/critical argumentation (subtracting from ‘agreeableness’). This exact calculation was performed by two different students, ending up at ‘+2’ and ‘-2’ for agreeableness respectively. Hence, what seemed to be more informative than the absolute scores was the fact whether a feature was assessed as adding to or subtracting from the score, but not to which degree.
In the following account, I turn the usual analysis upside down: instead of making an inventory of the communicative traits that suggest a determinate personality trait, I want to start from the single communicative traits and look at how they were invoked by the students for arguing for (often very different) personality trait scores, i.e. mirroring the direction ‘decoding > interpretation’ during the reading process.
I will present to the most illustrative ones with graphic representations, sometimes merging smaller phenomena to more global ones for a better overview.
The following textual dimensions will be discussed: grammar, orthography, punctuation, choice of words, use of emojis, sentiment analysis, argumentation style, rhetorical devices as well as the author’s mission.
Most surface dimensions in the text have been assessed as standard, neutral, traditional, norm-conforming and appropriate: grammar, (most of) orthography, punctuation, choice of words. The text was experienced as well structured (topic related paragraphs with emojis as bullet points; long reference list), with the only exception of the word “Listen.” strategically presented as a single-liner. Merged together, these experiences of the text resulted in the following personality assessments (numbers in brackets represent the numbers of incidences where a textual trait was interpreted as an indicator for a personality trait): ‘plus openness’ (2), ‘plus conscientiousness’ (11), ‘score 0 extraversion’ (due to ”neutral vocabulary”: 1), ‘plus agreeableness’ (2), and for ‘minus neuroticism’ (8); however, the word “Listen.” presented as a single-liner with full stop was regarded as a sign of ‘plus neuroticism’ (2; for more on “style” and “choice of words”, see below under “sentiment analysis”), see figure 3.
One orthographic strategy stuck out, though, as having very different effects on different readers. The use of caps lock (whole words spelled with capital letters) was regarded as an indication for ‘plus openness’ (3), but also for ‘minus openness’ (1), for ‘plus extroversion’ (2), for ‘plus agreeableness’ (1), but also for ‘minus agreeableness’ (1), while its “moderate use” was seen as a sign of ‘minus neuroticism’ (2), see figure 4.
As mentioned above, emojis are used in the text instead of bullet points to separate 5 paragraphs/topics (see figure 5). The use of these emojis got a lot of attention and split its audience even more than the use of caps lock. The “praying” emoji was seen as a sign of ‘plus openness’ (1), as was the use of emojis in general, but with a suspicion of the intention to appear as such (1/discarded), and, more concretely, the “choice” of emojis (1). The “praying” plus “hand to heart” (referring to “raised hand”; non-official interpretation) emojis were taken as an indication of ‘plus conscientiousness’ (1), as was emoji use in itself (1). Emojis in general were also seen as a sign of ‘plus extroversion’ (4), as well as suspected intention to want to appear as such (1/discarded). An interesting opposition was also the reference to “unnecessary emojis” as a sign of ‘minus agreeableness’ (1) versus “only a few emojis” for ‘minus neuroticism’ (1). The sum of the emoji-related personality assessments is represented in figure 6.
So much for the more or less form-related clues. The dimensions “choice of words” brought the students to taking aspects of semantics into account that went beyond the register analysis intended by me, namely sentiment analysis, a typical dimension of machine learning analyses that scores the positive or negative polarity of a linguistic unit (a word, a sentence, a Facebook post). Expression of negativity in our Facebook post (regarding topic, entities (doctors, politicians), as well as disagreement in itself), together with experience (by some) of non-aggressiveness in several dimensions (stance, argumentation, writing style) gave a summed-up score for sentiment of ‘plus agreeableness’ (1), ‘minus agreeableness’ (2), ‘plus neuroticism’ (7) and ‘minus neuroticism’ (5), see figure 7.
The same text was read, by different readers, as showing what might be subsumed under high quality vs. low quality argumentation. The respective interpretations were then taken as input for the personality assessment, which inevitably differed (also related to different dimensions of personality). High quality argumentation was discussed as two-sided, elaborate, but firm, and backed up by references (which were taken at face value regarding their soundness). Low quality argumentation was referred to as one-sided and superficial, rebutting counterarguments and building on non-backed up claims and doubtful references (Facebook posts, blogs etc.). The reading of the text as showing high quality argumentation resulted in an assessment of the author for ‘plus agreeableness’ (4) and ‘minus neuroticism’ (1), see figure 8. The reading of the text as showing low quality argumentation resulted in an assessment of the author for ‘minus openness’ (4), ‘minus conscientiousness’ (3), ‘minus agreeableness (1) and ‘minus extroversion’ (1), see figure 9.
The author’s use of rhetorical devices made a first and foremost positive impression on the readers. I analyzed the results according to the categories “rhetorical moves” (i.e. rhetorical micro-strategies applied that serve the overall rhetorical goal of a text), “tropes” (i.e. recurring motifs or clichés) as well as appeal forms “ethos”, “logos”, pathos”. If one boldly sums up all results for the reception by the readers of the author’s rhetorical devices, a very variegated picture of the author’s assessed personality emerges: ‘plus openness’ (3), ‘plus conscientiousness’ (15), ‘plus extroversion’ (13), ‘minus extroversion’ (1), ‘plus agreeableness’ (16) and ‘minus agreeableness’ (2), see figure 10. The relatively high numbers of scores show that the readers, even though not solicited to do so in the assignment, put considerable weight on the rhetorical traits of the text.
Self-presentation going further than the mere mention of credentials, e.g. the mentioning of own actions and own characteristics, is in its essence a rhetorical move, but since it is a much more complex phenomenon than a trope like “do your own research” and at the same time depending on the concrete, case-specific content, it seems more informative to treat it as a textual dimension of its own. The author’s respective statements (e.g. “it took me years of daily investigation and research into this issue”; “I do not take this issue lightly”) were by some readers taken at face value and gave scores for ‘plus conscientiousness’ (1), ‘plus extroversion’ (1), minus neuroticism (2), see figure 11.
The very fact that the author had a mission and was publicly fighting for a cause (by posting an opinion piece, expressing concern/commitment, exposing themselves on social media, sharing beliefs and findings) resulted in an assessment ‘plus extroversion’ (9). However, the fact that the author expressed and promoted an opinion diverging from the majority of the population’s consensus (the readers included), i.e. against vaccinations, and incited to action directed against this consensus resulted in a personality assessment ‘minus agreeableness’ (8) as well as ‘plus neuroticism’ (1). See the overall score for author’s mission in figure 12.
If we add all scores from all dimensions up to one aggregate personality score, the author of the text was assessed, by their collective audience, in the following way: ‘plus openness’ (10), ‘minus openness’ (5), ‘plus conscientiousness’ (29), ‘minus conscientiousness’ (3), ‘plus extroversion’ (29), ‘score 0 extraversion’ (1), ‘minus extroversion’ (2), ‘plus agreeableness’ (24), ‘minus agreeableness’ (15), ‘plus neuroticism’ (10) and ‘minus neuroticism’ (19), see figure 13.
What can we conclude from this quick (and not-superclean) experiment?
Asked to assess the personality of a Facebook post according to the Big Five taxonomy, 18 students based their ratings on clues from the following (con-)textual dimensions: grammar, orthography, punctuation, choice of words, use of emojis, sentiment analysis, argumentation style, rhetorical devices as well as the author’s mission. Even if there is tendentially more agreement regarding the traits of ‘conscientiousness’ and ‘extroversion’, the very variegated results suggest that reading and interpreting a text is to a non-negligible extent a subjective matter, and making conclusions as to the personality of its author even more so. While ‘conscientiousness’ seems to be a personality trait that is quite directly accessible through among other clues norm-observing linguistic behavior, the other four personality traits have to be inferred largely based on textual clues the decoding of which is already subject to personal biases, which could be taste, text interpretation literacy and sensibility/attention to determinate phenomena. Furthermore, contextual knowledge also plays an important role, especially in the absence of those extra-linguistic clues that support decoding in oral communication settings, with irony detection being one notorious example. Another factor are the writing skills of the author of a concrete text. A skilled, aware writer can deliberately veil or enhance the appearance of certain of their personality traits in their writing: in our concrete example, for instance, the positive self-descriptions of the author were taken at face value by some readers. Last, but not least, no single human has a monolithic, stable personality structure. We are all full of contradictions, even at identical points in time, in identical situations, we are furthermore subject to mood swings, and personality traits can change throughout life. So – how much can we trust ourselves as readers? Machine learning experts sometimes tend to hold that automated text processing is more reliable than processing by human readers, also partially because, as writers of texts, we leave behind clues beyond our own and our readers’ attention. However, as Markham pointed out, the (biased) reconstruction of an author persona is a process that happens more or less automatically. As conscious readers we might want to keep that in mind.
”I read your Facebook post and I know who you are”? – Hmm … I guess I think that I do …
BIG 5 score figures designed by Anastasia Kratschmer
Head image made by Gordon Johnson
Argamon, Shlomo, Koppel, Moshe, Pennebaker, James W. & Schler, Jonathan. 2009. Automatically Profiling the Author of an Anonymous Text. Communications of the ACM 52(2):119-123, DOI: 10.1145/1461928.1461959
Kaye, Linda K., Malone, Stephanie A., and Helen J. Wall. ”Emojis: Insights, Affordances, and Possibilities for Psychological Science.” In Trends in Cognitive Sciences, Vol. 21, No. 2. 66-68.
Kelly, John M. 2019. “Emojiology: Folded Hands.” Emojipedia. https://blog.emojipedia.org/emojiology-folded-hands/
Markham, Annette. 1998. Life online: Researching Real Experience in Virtual Space. Boulder: Altamira Press.
McCrae, R. R., & Costa, P. T. 1989. Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality. Journal of Personality, 57(1), 17–40. https://doi.org/10.1111/j.1467-6494.1989.tb00759.x
Pennebaker, James W., Mehl, Matthias R., Niederhoffer, Kate G. 2003. Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology 54, 547-577.
Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., et al. 2013. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE 8(9): e73791. https://doi.org/10.1371/journal.pone.0073791
Tausczik, Yla R. & Pennebaker, James W. 2010. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology 29(1) 24–54, DOI: 10.1177/0261927X09351676
Alexandra Kratschmer is an associate professor at the Department of Linguistics, Aarhus University, specialized among other things in text linguistics and discourse analysis.