In recent years, smartphones and wearable devices have been the two most dynamic ecosystems, with billions of users and millions of applications driving their growth. Indeed, according to Datareportal, as of July 2025, there are 7.4 billion smartphones in use globally, while wearable devices have reached 600 million units according to Statista. Furthermore, there are 8.93 million applications (aka, apps) released worldwide, with 3.553 million apps in the Google Play Store and 1.642 million in the Apple App Store, as reported by Bankmycell. On average, each user installs more than 40 apps on their device. However, the growth of these two ecosystems is built on a trade-off in user privacy, as 65.83\% of the ecosystem’s revenue comes from advertising. This raises concerns about the serious invasion of users' privacy as app developers and hackers continuously exploit their sensitive information for revenue reasons. Although the European Union and the USA have enacted laws to protect privacy, such as the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA), that require apps to notify users and obtain explicit consent before collecting and processing sensitive data, violations remain widespread and have become increasingly sophisticated. Specifically, these violations can take advantage of users' common smartphone usage habits and the weaknesses of smartphone operating systems (OS), especially the Android OS. Indeed, in this thesis, we first introduce a novel attack vector that demonstrates how sharing images containing sensitive metadata can unintentionally or intentionally lead to the leakage of users' personal or confidential information. To validate our finding and assess its prevalence, we use traditional analysis. While the results confirm that this newly discovered attack vector has a significant impact, they also highlight the inherent limitations of traditional analysis. Therefore, we propose a new solution based on Large Language Models (LLM) to build an early-warning system capable of detecting potential leaks of sensitive metadata embedded in images. Our evaluation, conducted on datasets from traditional analysis, shows highly promising results. Finally, we develop our LLM-based solution toward a more general framework by assessing privacy non-compliance in wearable apps. Specifically, we evaluate whether these apps respect users’ privacy in sharing sensitive data and its destinations across 14 sensitive categories as defined by Google.
In recent years, smartphones and wearable devices have been the two most dynamic ecosystems, with billions of users and millions of applications driving their growth. Indeed, according to Datareportal, as of July 2025, there are 7.4 billion smartphones in use globally, while wearable devices have reached 600 million units according to Statista. Furthermore, there are 8.93 million applications (aka, apps) released worldwide, with 3.553 million apps in the Google Play Store and 1.642 million in the Apple App Store, as reported by Bankmycell. On average, each user installs more than 40 apps on their device. However, the growth of these two ecosystems is built on a trade-off in user privacy, as 65.83\% of the ecosystem’s revenue comes from advertising. This raises concerns about the serious invasion of users' privacy as app developers and hackers continuously exploit their sensitive information for revenue reasons. Although the European Union and the USA have enacted laws to protect privacy, such as the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA), that require apps to notify users and obtain explicit consent before collecting and processing sensitive data, violations remain widespread and have become increasingly sophisticated. Specifically, these violations can take advantage of users' common smartphone usage habits and the weaknesses of smartphone operating systems (OS), especially the Android OS. Indeed, in this thesis, we first introduce a novel attack vector that demonstrates how sharing images containing sensitive metadata can unintentionally or intentionally lead to the leakage of users' personal or confidential information. To validate our finding and assess its prevalence, we use traditional analysis. While the results confirm that this newly discovered attack vector has a significant impact, they also highlight the inherent limitations of traditional analysis. Therefore, we propose a new solution based on Large Language Models (LLM) to build an early-warning system capable of detecting potential leaks of sensitive metadata embedded in images. Our evaluation, conducted on datasets from traditional analysis, shows highly promising results. Finally, we develop our LLM-based solution toward a more general framework by assessing privacy non-compliance in wearable apps. Specifically, we evaluate whether these apps respect users’ privacy in sharing sensitive data and its destinations across 14 sensitive categories as defined by Google.
Privacy Compliance Analysis in Mobile and Wearable Applications
NGUYEN, TRAN THANH LAM
2026
Abstract
In recent years, smartphones and wearable devices have been the two most dynamic ecosystems, with billions of users and millions of applications driving their growth. Indeed, according to Datareportal, as of July 2025, there are 7.4 billion smartphones in use globally, while wearable devices have reached 600 million units according to Statista. Furthermore, there are 8.93 million applications (aka, apps) released worldwide, with 3.553 million apps in the Google Play Store and 1.642 million in the Apple App Store, as reported by Bankmycell. On average, each user installs more than 40 apps on their device. However, the growth of these two ecosystems is built on a trade-off in user privacy, as 65.83\% of the ecosystem’s revenue comes from advertising. This raises concerns about the serious invasion of users' privacy as app developers and hackers continuously exploit their sensitive information for revenue reasons. Although the European Union and the USA have enacted laws to protect privacy, such as the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA), that require apps to notify users and obtain explicit consent before collecting and processing sensitive data, violations remain widespread and have become increasingly sophisticated. Specifically, these violations can take advantage of users' common smartphone usage habits and the weaknesses of smartphone operating systems (OS), especially the Android OS. Indeed, in this thesis, we first introduce a novel attack vector that demonstrates how sharing images containing sensitive metadata can unintentionally or intentionally lead to the leakage of users' personal or confidential information. To validate our finding and assess its prevalence, we use traditional analysis. While the results confirm that this newly discovered attack vector has a significant impact, they also highlight the inherent limitations of traditional analysis. Therefore, we propose a new solution based on Large Language Models (LLM) to build an early-warning system capable of detecting potential leaks of sensitive metadata embedded in images. Our evaluation, conducted on datasets from traditional analysis, shows highly promising results. Finally, we develop our LLM-based solution toward a more general framework by assessing privacy non-compliance in wearable apps. Specifically, we evaluate whether these apps respect users’ privacy in sharing sensitive data and its destinations across 14 sensitive categories as defined by Google.| File | Dimensione | Formato | |
|---|---|---|---|
|
Lam_thesis_final.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
13.4 MB
Formato
Adobe PDF
|
13.4 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/354787
URN:NBN:IT:UNINSUBRIA-354787