OpenAI's ChatGPT, part of the Transformer-based Large Language Models (LLMs) family, has gained popularity for its advanced text generation capabilities. However, LLMs also raise concerns, especially in the context of social media, where they can amplify the spread of misinformation, potentially threatening democracy. Malicious actors may use LLMs for creating fake social media personas or for disseminating harmful content. This thesis explores methods to detect LLM-generated texts and harmful content on social media, focusing on the challenges posed by the short texts typical of these platforms. Initial work involved developing detectors for "deepfake" texts, leading to the creation of the TweepFake dataset containing a mix of genuine and fake tweets. The most effective detection method achieved around 93.4% accuracy by fine-tuning a pre-trained LLM with a neural network classifier. Further contributions include techniques for identifying the source model of generated texts, emphasizing the role of linguistic features. Additionally, the thesis introduces an unsupervised framework for detecting users' stances on various topics through their Twitter activity and identifies linguistic patterns distinguishing conspiracy theorists from ordinary users. These efforts aim to enhance the detection of fake content and understand LLMs' impact, offering insights into mitigating the risks associated with these powerful technologies.
Digital Sentinels: Unraveling the Societal Implications and Social Media Defence Strategies Against Large Language Models
GAMBINI, MARGHERITA
2024
Abstract
OpenAI's ChatGPT, part of the Transformer-based Large Language Models (LLMs) family, has gained popularity for its advanced text generation capabilities. However, LLMs also raise concerns, especially in the context of social media, where they can amplify the spread of misinformation, potentially threatening democracy. Malicious actors may use LLMs for creating fake social media personas or for disseminating harmful content. This thesis explores methods to detect LLM-generated texts and harmful content on social media, focusing on the challenges posed by the short texts typical of these platforms. Initial work involved developing detectors for "deepfake" texts, leading to the creation of the TweepFake dataset containing a mix of genuine and fake tweets. The most effective detection method achieved around 93.4% accuracy by fine-tuning a pre-trained LLM with a neural network classifier. Further contributions include techniques for identifying the source model of generated texts, emphasizing the role of linguistic features. Additionally, the thesis introduces an unsupervised framework for detecting users' stances on various topics through their Twitter activity and identifies linguistic patterns distinguishing conspiracy theorists from ordinary users. These efforts aim to enhance the detection of fake content and understand LLMs' impact, offering insights into mitigating the risks associated with these powerful technologies.File | Dimensione | Formato | |
---|---|---|---|
reportFinaleAttivitaPhDETD.pdf
non disponibili
Dimensione
431.04 kB
Formato
Adobe PDF
|
431.04 kB | Adobe PDF | |
thesis.pdf
accesso aperto
Dimensione
7.55 MB
Formato
Adobe PDF
|
7.55 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/216334
URN:NBN:IT:UNIPI-216334