In recent years, smartphones and wearable devices have been the two most dynamic ecosystems, with billions of users and millions of applications driving their growth. Indeed, according to Datareportal, as of July 2025, there are 7.4 billion smartphones in use globally, while wearable devices have reached 600 million units according to Statista. Furthermore, there are 8.93 million applications (aka, apps) released worldwide, with 3.553 million apps in the Google Play Store and 1.642 million in the Apple App Store, as reported by Bankmycell. On average, each user installs more than 40 apps on their device. However, the growth of these two ecosystems is built on a trade-off in user privacy, as 65.83\% of the ecosystem’s revenue comes from advertising. This raises concerns about the serious invasion of users' privacy as app developers and hackers continuously exploit their sensitive information for revenue reasons. Although the European Union and the USA have enacted laws to protect privacy, such as the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA), that require apps to notify users and obtain explicit consent before collecting and processing sensitive data, violations remain widespread and have become increasingly sophisticated. Specifically, these violations can take advantage of users' common smartphone usage habits and the weaknesses of smartphone operating systems (OS), especially the Android OS. Indeed, in this thesis, we first introduce a novel attack vector that demonstrates how sharing images containing sensitive metadata can unintentionally or intentionally lead to the leakage of users' personal or confidential information. To validate our finding and assess its prevalence, we use traditional analysis. While the results confirm that this newly discovered attack vector has a significant impact, they also highlight the inherent limitations of traditional analysis. Therefore, we propose a new solution based on Large Language Models (LLM) to build an early-warning system capable of detecting potential leaks of sensitive metadata embedded in images. Our evaluation, conducted on datasets from traditional analysis, shows highly promising results. Finally, we develop our LLM-based solution toward a more general framework by assessing privacy non-compliance in wearable apps. Specifically, we evaluate whether these apps respect users’ privacy in sharing sensitive data and its destinations across 14 sensitive categories as defined by Google.

In recent years, smartphones and wearable devices have been the two most dynamic ecosystems, with billions of users and millions of applications driving their growth. Indeed, according to Datareportal, as of July 2025, there are 7.4 billion smartphones in use globally, while wearable devices have reached 600 million units according to Statista. Furthermore, there are 8.93 million applications (aka, apps) released worldwide, with 3.553 million apps in the Google Play Store and 1.642 million in the Apple App Store, as reported by Bankmycell. On average, each user installs more than 40 apps on their device. However, the growth of these two ecosystems is built on a trade-off in user privacy, as 65.83\% of the ecosystem’s revenue comes from advertising. This raises concerns about the serious invasion of users' privacy as app developers and hackers continuously exploit their sensitive information for revenue reasons. Although the European Union and the USA have enacted laws to protect privacy, such as the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA), that require apps to notify users and obtain explicit consent before collecting and processing sensitive data, violations remain widespread and have become increasingly sophisticated. Specifically, these violations can take advantage of users' common smartphone usage habits and the weaknesses of smartphone operating systems (OS), especially the Android OS. Indeed, in this thesis, we first introduce a novel attack vector that demonstrates how sharing images containing sensitive metadata can unintentionally or intentionally lead to the leakage of users' personal or confidential information. To validate our finding and assess its prevalence, we use traditional analysis. While the results confirm that this newly discovered attack vector has a significant impact, they also highlight the inherent limitations of traditional analysis. Therefore, we propose a new solution based on Large Language Models (LLM) to build an early-warning system capable of detecting potential leaks of sensitive metadata embedded in images. Our evaluation, conducted on datasets from traditional analysis, shows highly promising results. Finally, we develop our LLM-based solution toward a more general framework by assessing privacy non-compliance in wearable apps. Specifically, we evaluate whether these apps respect users’ privacy in sharing sensitive data and its destinations across 14 sensitive categories as defined by Google.

Privacy Compliance Analysis in Mobile and Wearable Applications

NGUYEN, TRAN THANH LAM
2026

Abstract

In recent years, smartphones and wearable devices have been the two most dynamic ecosystems, with billions of users and millions of applications driving their growth. Indeed, according to Datareportal, as of July 2025, there are 7.4 billion smartphones in use globally, while wearable devices have reached 600 million units according to Statista. Furthermore, there are 8.93 million applications (aka, apps) released worldwide, with 3.553 million apps in the Google Play Store and 1.642 million in the Apple App Store, as reported by Bankmycell. On average, each user installs more than 40 apps on their device. However, the growth of these two ecosystems is built on a trade-off in user privacy, as 65.83\% of the ecosystem’s revenue comes from advertising. This raises concerns about the serious invasion of users' privacy as app developers and hackers continuously exploit their sensitive information for revenue reasons. Although the European Union and the USA have enacted laws to protect privacy, such as the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA), that require apps to notify users and obtain explicit consent before collecting and processing sensitive data, violations remain widespread and have become increasingly sophisticated. Specifically, these violations can take advantage of users' common smartphone usage habits and the weaknesses of smartphone operating systems (OS), especially the Android OS. Indeed, in this thesis, we first introduce a novel attack vector that demonstrates how sharing images containing sensitive metadata can unintentionally or intentionally lead to the leakage of users' personal or confidential information. To validate our finding and assess its prevalence, we use traditional analysis. While the results confirm that this newly discovered attack vector has a significant impact, they also highlight the inherent limitations of traditional analysis. Therefore, we propose a new solution based on Large Language Models (LLM) to build an early-warning system capable of detecting potential leaks of sensitive metadata embedded in images. Our evaluation, conducted on datasets from traditional analysis, shows highly promising results. Finally, we develop our LLM-based solution toward a more general framework by assessing privacy non-compliance in wearable apps. Specifically, we evaluate whether these apps respect users’ privacy in sharing sensitive data and its destinations across 14 sensitive categories as defined by Google.
8-gen-2026
Inglese
In recent years, smartphones and wearable devices have been the two most dynamic ecosystems, with billions of users and millions of applications driving their growth. Indeed, according to Datareportal, as of July 2025, there are 7.4 billion smartphones in use globally, while wearable devices have reached 600 million units according to Statista. Furthermore, there are 8.93 million applications (aka, apps) released worldwide, with 3.553 million apps in the Google Play Store and 1.642 million in the Apple App Store, as reported by Bankmycell. On average, each user installs more than 40 apps on their device. However, the growth of these two ecosystems is built on a trade-off in user privacy, as 65.83\% of the ecosystem’s revenue comes from advertising. This raises concerns about the serious invasion of users' privacy as app developers and hackers continuously exploit their sensitive information for revenue reasons. Although the European Union and the USA have enacted laws to protect privacy, such as the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA), that require apps to notify users and obtain explicit consent before collecting and processing sensitive data, violations remain widespread and have become increasingly sophisticated. Specifically, these violations can take advantage of users' common smartphone usage habits and the weaknesses of smartphone operating systems (OS), especially the Android OS. Indeed, in this thesis, we first introduce a novel attack vector that demonstrates how sharing images containing sensitive metadata can unintentionally or intentionally lead to the leakage of users' personal or confidential information. To validate our finding and assess its prevalence, we use traditional analysis. While the results confirm that this newly discovered attack vector has a significant impact, they also highlight the inherent limitations of traditional analysis. Therefore, we propose a new solution based on Large Language Models (LLM) to build an early-warning system capable of detecting potential leaks of sensitive metadata embedded in images. Our evaluation, conducted on datasets from traditional analysis, shows highly promising results. Finally, we develop our LLM-based solution toward a more general framework by assessing privacy non-compliance in wearable apps. Specifically, we evaluate whether these apps respect users’ privacy in sharing sensitive data and its destinations across 14 sensitive categories as defined by Google.
mobile and wearable; privacy and security; Hybrid Analysis, LLM; RAG, GraphRAG, FSL; EXIF metadata
FERRARI, ELENA
CARMINATI, BARBARA
Università degli Studi dell'Insubria
File in questo prodotto:
File Dimensione Formato  
Lam_thesis_final.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 13.4 MB
Formato Adobe PDF
13.4 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/354787
Il codice NBN di questa tesi è URN:NBN:IT:UNINSUBRIA-354787