Humans create and continuously improve machines, while machines also influence human life. Language, one of the key differences between humans and animals, is also one of the main ways humans and machines communicate with each other. Many behaviors of machines are quite different from those of humans, and I aim to analyze them from the perspective of "distribution". Survey data is a central focus of my research. I tried to identify potentially more efficient combinations of variables for survey sample selection and post-stratification, with the help of tree-based models, including Random Forests, XGBoost, and LightGBM. Large language models (LLMs) play an important role in my projects. Using the European Social Survey (ESS) data as a benchmark, I measured biases and stereotypes in several LLMs for different subjective questions with the corresponding demographic variables, and the influence of different prompts. The widespread use of LLMs is impacting human society. I proposed a model based on word frequency and simulation to estimate the impact of LLMs on academic writing and presentations. In over a million papers and more than 1,000 conference presentations, the impact of LLMs has increased over time. I have also observed the co-evolution of humans and LLMs. Adopting the lens of "distribution" has proven beneficial for these tasks, which deserves further attention and reflection. Other impacts of artificial intelligence (AI) on human society have also been carefully analyzed and discussed, such as ethical issues, model collapse, paradigm shifts, and more.

Reflections on Distributions: Human, Machines, Languages

GENG, MINGMENG
2025

Abstract

Humans create and continuously improve machines, while machines also influence human life. Language, one of the key differences between humans and animals, is also one of the main ways humans and machines communicate with each other. Many behaviors of machines are quite different from those of humans, and I aim to analyze them from the perspective of "distribution". Survey data is a central focus of my research. I tried to identify potentially more efficient combinations of variables for survey sample selection and post-stratification, with the help of tree-based models, including Random Forests, XGBoost, and LightGBM. Large language models (LLMs) play an important role in my projects. Using the European Social Survey (ESS) data as a benchmark, I measured biases and stereotypes in several LLMs for different subjective questions with the corresponding demographic variables, and the influence of different prompts. The widespread use of LLMs is impacting human society. I proposed a model based on word frequency and simulation to estimate the impact of LLMs on academic writing and presentations. In over a million papers and more than 1,000 conference presentations, the impact of LLMs has increased over time. I have also observed the co-evolution of humans and LLMs. Adopting the lens of "distribution" has proven beneficial for these tasks, which deserves further attention and reflection. Other impacts of artificial intelligence (AI) on human society have also been carefully analyzed and discussed, such as ethical issues, model collapse, paradigm shifts, and more.
3-feb-2025
Inglese
Trotta, Roberto
Rozza, Gianluigi
SISSA
Trieste
File in questo prodotto:
File Dimensione Formato  
Mingmeng Geng PhD Thesis.pdf

accesso aperto

Dimensione 27.48 MB
Formato Adobe PDF
27.48 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/190121
Il codice NBN di questa tesi è URN:NBN:IT:SISSA-190121