Methods

Overview

We analyse a novel, balanced panel of 99,274 academics whose 138 million English-language tweets (January 2016 – December 2022) are linked to detailed OpenAlex publication records. Large-language-model (LLM) pipelines let us study both what scholars talk about (policy stances and narratives) and how they talk (tone, civility, emotion).

Data Collection

Starting from a seed of ≈300 k Twitter–OpenAlex matches , we used Twitter’s Academic API to pull complete timelines—tweets, replies, quote-retweets, and retweets—for each scholar. OpenAlex contributes publication, citation, field-of-study and affiliation metadata.

Topic Detection

GPT-4 generated year-specific keyword sets for seven public-policy themes (e.g. climate action, immigration, welfare/tax). Filtering on those terms reduces the corpus to ≈22 million topical tweets (3.6 % of all posts), ensuring high recall for evolving Twitter jargon.

Stance Detection

Each topical tweet is fed to GPT-3.5-Turbo with a zero-shot prompt that returns pro / anti / neutral/ unrelated labels. On the SemEval-2016 benchmark the classifier scores F1 = 84–92 %, matching state-of-the-art performance.

Narrative Extraction

Within climate-action tweets we further label four mutually non-exclusive frames—techno-optimism, behavioural adjustment, both, neither—capturing the balance between technological and lifestyle solutions.

Tone and Style Analysis

Egocentrism: share of first-person-singular pronouns.
Toxicity: Google Perspective API probability.
Emotionality / Reasoning: LIWC affect-word count ÷ cognitive-word count.

Demographic and Institutional Data

Gender: LLM name inference (Male / Female / Unclear).
Affiliation & Rank: article-level affiliations from OpenAlex, mapped to Times Higher Education World University Rankings (bands 1-100, 101-500, 501-1500, 1501+).
Field: OpenAlex root concepts grouped into STEM, Social Sciences, Humanities.
Location: modal country of affiliation per year.

Balancing the Sample

Only scholars who posted at least once in both Jan-Jun 2016 and Jul-Dec 2022 are retained—yielding the 99 k balanced panel. The same rule produces a comparison sample of 61,259 U.S. Twitter users (after bot filtering) for Figure 6.

What this enables

The pipeline delivers month-level panels of stance scores, narrative shares, and tone metrics—powering the longitudinal trends in Figure 5 and the academic-versus-public contrasts in Figure 6. All code and anonymised aggregates are open on Zenodo; raw tweets remain API-restricted.

For the full technical details, see Methods in the manuscript. [here].

Page updated

Google Sites

Report abuse