Humans create and continuously improve machines, while machines also influence human life. Language, one of the key differences between humans and animals, is also one of the main ways humans and machines communicate with each other. Many behaviors of machines are quite different from those of humans, and I aim to analyze them from the perspective of "distribution". Survey data is a central focus of my research. I tried to identify potentially more efficient combinations of variables for survey sample selection and post-stratification, with the help of tree-based models, including Random Forests, XGBoost, and LightGBM. Large language models (LLMs) play an important role in my projects. Using the European Social Survey (ESS) data as a benchmark, I measured biases and stereotypes in several LLMs for different subjective questions with the corresponding demographic variables, and the influence of different prompts. The widespread use of LLMs is impacting human society. I proposed a model based on word frequency and simulation to estimate the impact of LLMs on academic writing and presentations. In over a million papers and more than 1,000 conference presentations, the impact of LLMs has increased over time. I have also observed the co-evolution of humans and LLMs. Adopting the lens of "distribution" has proven beneficial for these tasks, which deserves further attention and reflection. Other impacts of artificial intelligence (AI) on human society have also been carefully analyzed and discussed, such as ethical issues, model collapse, paradigm shifts, and more.
Reflections on Distributions: Human, Machines, Languages
GENG, MINGMENG
2025
Abstract
Humans create and continuously improve machines, while machines also influence human life. Language, one of the key differences between humans and animals, is also one of the main ways humans and machines communicate with each other. Many behaviors of machines are quite different from those of humans, and I aim to analyze them from the perspective of "distribution". Survey data is a central focus of my research. I tried to identify potentially more efficient combinations of variables for survey sample selection and post-stratification, with the help of tree-based models, including Random Forests, XGBoost, and LightGBM. Large language models (LLMs) play an important role in my projects. Using the European Social Survey (ESS) data as a benchmark, I measured biases and stereotypes in several LLMs for different subjective questions with the corresponding demographic variables, and the influence of different prompts. The widespread use of LLMs is impacting human society. I proposed a model based on word frequency and simulation to estimate the impact of LLMs on academic writing and presentations. In over a million papers and more than 1,000 conference presentations, the impact of LLMs has increased over time. I have also observed the co-evolution of humans and LLMs. Adopting the lens of "distribution" has proven beneficial for these tasks, which deserves further attention and reflection. Other impacts of artificial intelligence (AI) on human society have also been carefully analyzed and discussed, such as ethical issues, model collapse, paradigm shifts, and more.File | Dimensione | Formato | |
---|---|---|---|
Mingmeng Geng PhD Thesis.pdf
accesso aperto
Dimensione
27.48 MB
Formato
Adobe PDF
|
27.48 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/190121
URN:NBN:IT:SISSA-190121