We have built a novel dataset linking the Twitter profiles of 100,000 academics to their respective academic profiles, encompassing data from 12,000 institutions across 174 countries and 19 disciplines. The dataset captures all posts, comments, shares, metadata, and social following networks of these accounts from 2016 to 2022.
Using scalable large-language model (LLM) based classification techniques, we categorize and label the content, focusing on both the substantive content (the "what") and the language and tone of expression (the "how"). Our analysis covers topics such as climate crisis, economic policy, cultural dimensions, and the tone of language used.
This work is licensed under an Apache License 2.0 [LICENSE}. The following paper has to be cited in all publications that make use of or refer in any kind to the files provided:
Garg. P & Fetzer. T (2024). Political Expression of Academics on Social Media. In Review. Preprint available at https://www.researchsquare.com/article/rs-4480504/v1
We also kindly ask researchers using this to send us their paper, or a link to their paper. We will continuously upload or add links to these papers to inform the research community about ongoing progress and related work.
Our paper comes with a replication package to replicate the main findings [it will be available soon]. The replication data contains aggregate measures at individual academic by time level, along with OpenAlex identifiers. That data should be enough for most exploratory analysis and replication purposes.
The full dataset is large and stored on cloud servers. Due to its complexity and legality, the raw data cannot be accessed publicly. However, we are very open to collaboration ideas for academic research. If you are interested in accessing the raw data or have research collaboration ideas and would like to access repl, please feel free to contact us. Fill out the form below or via this link: https://forms.gle/3LjVbm4e3wprBfko7