There have never been as many things to talk about, or channels through which to talk about them, as there are today. The academic exercise of tracking, collecting and analyzing what is said is an established practice, but its form is in constant and accelerating mutation. In this dissertation, I empirically explore the impact of elite communication on two grand outcomes - national elections and asset pricing - with a strong methodological focus that explores different data collection and treatment procedures and incorporates recent developments in large language model (LLM) technology. I further design and describe an original method for audiovisual data collection that aims to greatly facilitate access to an extremely large and rich, but relatively hard to explore corpus of data, allowing for the extension of existing studies and undertaking of many new ones. By exploring and applying computational tools such as machine learning, LLMs or face and voice recognition, this work stresses the richness of political statement data and strives to demonstrate how to extract the most of and from it. In the first chapter, through an original dataset of tweets by Portuguese politicians and a natural experiment derived from the outbreak of the Russo-Ukrainian war, I evaluate how stigmatizing behavior towards a radical left party can impact electoral results. Using regression discontinuity design and difference in differences approaches for inference, I obtain results that indicate that the stigmatized party suffers persistent vote intention losses. The second chapter looks at a Twitter dataset covering German publicly traded firms and Swedish activist Greta Thunberg. It investigates how vocal activism by renowned opinion leaders can create an impact on firms' stock market performance, and how firm behavior might influence this relationship. Results suggest that companies that align themselves with opinion leaders can pass through this process unscathed, while others suffer a stock price decrease. In both the first and second chapter, machine learning and LLM classifiers are employed to refine the data by extracting meaning indicators from the raw text inputs. The third chapter, finally, describes a method for building an analyzable transcript from audiovisual political data, such as broadcast debates or interviews, allowing for individual speaker diarization and recognition with minimal manual prep-work. This framework can be applied to material in any language, while its agile nature means it is easily adaptable to the specificities of different formats (e.g. short-form social media videos, multi-participant debates, live broadcasts). Through these avenues, it bears the potential to massively expand the amount of available data for political analysis. This work makes several contributions. Firstly, it employs different methods for the collection and treatment of text, exemplifying their usage, allowing for their comparison, and using them for robust inference. Secondly, it approaches these exercises in a constructive way, aiming to provide better means of obtaining raw data and refining it into its most useful state. It shows, thus, how to employ LLM-based approaches to improve on mainstay methods such as machine learning classifiers or interpolation processes. Thirdly, it introduces a method that provides easily implementable and mostly-automatic access to a particularly rich type of data that was previously hidden behind either notoriously laborious or methodologically complex processes - debate and interview transcripts - and is ready to be adapted to alternative inputs. Insofar as political elites use the different channels at their disposal to convey a cohesive message, these approaches are likely to provide full coverage of politician stances and interventions. However, they go even deeper by tracking each individual to a level of granularity that easily allows for intra-politician message analysis.

A communist, an environmentalist and an android walk into a bar: the measurement and measurable effects of elite communication

MARQUES UCHA MEIRELES ALPALHAO, HENRIQUE
2026

Abstract

There have never been as many things to talk about, or channels through which to talk about them, as there are today. The academic exercise of tracking, collecting and analyzing what is said is an established practice, but its form is in constant and accelerating mutation. In this dissertation, I empirically explore the impact of elite communication on two grand outcomes - national elections and asset pricing - with a strong methodological focus that explores different data collection and treatment procedures and incorporates recent developments in large language model (LLM) technology. I further design and describe an original method for audiovisual data collection that aims to greatly facilitate access to an extremely large and rich, but relatively hard to explore corpus of data, allowing for the extension of existing studies and undertaking of many new ones. By exploring and applying computational tools such as machine learning, LLMs or face and voice recognition, this work stresses the richness of political statement data and strives to demonstrate how to extract the most of and from it. In the first chapter, through an original dataset of tweets by Portuguese politicians and a natural experiment derived from the outbreak of the Russo-Ukrainian war, I evaluate how stigmatizing behavior towards a radical left party can impact electoral results. Using regression discontinuity design and difference in differences approaches for inference, I obtain results that indicate that the stigmatized party suffers persistent vote intention losses. The second chapter looks at a Twitter dataset covering German publicly traded firms and Swedish activist Greta Thunberg. It investigates how vocal activism by renowned opinion leaders can create an impact on firms' stock market performance, and how firm behavior might influence this relationship. Results suggest that companies that align themselves with opinion leaders can pass through this process unscathed, while others suffer a stock price decrease. In both the first and second chapter, machine learning and LLM classifiers are employed to refine the data by extracting meaning indicators from the raw text inputs. The third chapter, finally, describes a method for building an analyzable transcript from audiovisual political data, such as broadcast debates or interviews, allowing for individual speaker diarization and recognition with minimal manual prep-work. This framework can be applied to material in any language, while its agile nature means it is easily adaptable to the specificities of different formats (e.g. short-form social media videos, multi-participant debates, live broadcasts). Through these avenues, it bears the potential to massively expand the amount of available data for political analysis. This work makes several contributions. Firstly, it employs different methods for the collection and treatment of text, exemplifying their usage, allowing for their comparison, and using them for robust inference. Secondly, it approaches these exercises in a constructive way, aiming to provide better means of obtaining raw data and refining it into its most useful state. It shows, thus, how to employ LLM-based approaches to improve on mainstay methods such as machine learning classifiers or interpolation processes. Thirdly, it introduces a method that provides easily implementable and mostly-automatic access to a particularly rich type of data that was previously hidden behind either notoriously laborious or methodologically complex processes - debate and interview transcripts - and is ready to be adapted to alternative inputs. Insofar as political elites use the different channels at their disposal to convey a cohesive message, these approaches are likely to provide full coverage of politician stances and interventions. However, they go even deeper by tracking each individual to a level of granularity that easily allows for intra-politician message analysis.
21-gen-2026
Inglese
CAVALLI, NICOLO'
STANIG, PIERO
Università Bocconi
File in questo prodotto:
File Dimensione Formato  
Revised thesis_Alpalhao_Henrique.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 11.4 MB
Formato Adobe PDF
11.4 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/355886
Il codice NBN di questa tesi è URN:NBN:IT:UNIBOCCONI-355886