This thesis advances patent analytics in the domains of the circular economy (CE) and critical raw materials (CRMs) using state-of-the-art deep learning and natural language processing (NLP) techniques. Specifically, it introduces a novel two-step framework for classifying circular economy patents: first, large language models (LLMs) such as GPT-3.5 and a specialized pretrained model (BERT for Patents) are employed to identify CE-related patents, and second, an advanced topic modeling approach refines this classification into key subcategories. To enhance classification accuracy, the method integrates a retrieval-augmented generation (RAG) strategy. The thesis then extends this approach to CRMs by distinguishing patents based on material substitutability and deploying advanced natural language understanding (NLU) techniques to systematically detect CRM-related innovations. This yields a more nuanced mapping of CRM technologies and overcomes the limitations of traditional keyword or classification searches. Finally, an empirical analysis examines how exposure to CRM price shocks—exemplified by volatility in copper—affects firm-level innovation and patenting. The findings reveal that firms initially curtail innovation under resource stress but later adapt by intensifying technological development, underscoring the resilience of innovation systems. Methodologically, this work demonstrates the power of LLM-based NLP (including retrieval-augmented approaches and topic modeling) in improving patent analytics. From a policy perspective, it enables better tracking of green innovation, informs strategic industrial policy on critical materials, and supports the resilience of innovation ecosystems in the face of raw material dependencies.

Advancing Patent Analytics in Circular Economy and Critical Raw Materials: A Deep Learning and NLP Approach

MANERA, MARIA
2025

Abstract

This thesis advances patent analytics in the domains of the circular economy (CE) and critical raw materials (CRMs) using state-of-the-art deep learning and natural language processing (NLP) techniques. Specifically, it introduces a novel two-step framework for classifying circular economy patents: first, large language models (LLMs) such as GPT-3.5 and a specialized pretrained model (BERT for Patents) are employed to identify CE-related patents, and second, an advanced topic modeling approach refines this classification into key subcategories. To enhance classification accuracy, the method integrates a retrieval-augmented generation (RAG) strategy. The thesis then extends this approach to CRMs by distinguishing patents based on material substitutability and deploying advanced natural language understanding (NLU) techniques to systematically detect CRM-related innovations. This yields a more nuanced mapping of CRM technologies and overcomes the limitations of traditional keyword or classification searches. Finally, an empirical analysis examines how exposure to CRM price shocks—exemplified by volatility in copper—affects firm-level innovation and patenting. The findings reveal that firms initially curtail innovation under resource stress but later adapt by intensifying technological development, underscoring the resilience of innovation systems. Methodologically, this work demonstrates the power of LLM-based NLP (including retrieval-augmented approaches and topic modeling) in improving patent analytics. From a policy perspective, it enables better tracking of green innovation, informs strategic industrial policy on critical materials, and supports the resilience of innovation ecosystems in the face of raw material dependencies.
13-giu-2025
Inglese
QUATRARO, Francesco
Università degli Studi di Torino
File in questo prodotto:
File Dimensione Formato  
Thesis.pdf

accesso aperto

Dimensione 3.03 MB
Formato Adobe PDF
3.03 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/212805
Il codice NBN di questa tesi è URN:NBN:IT:UNITO-212805