This thesis advances patent analytics in the domains of the circular economy (CE) and critical raw materials (CRMs) using state-of-the-art deep learning and natural language processing (NLP) techniques. Specifically, it introduces a novel two-step framework for classifying circular economy patents: first, large language models (LLMs) such as GPT-3.5 and a specialized pretrained model (BERT for Patents) are employed to identify CE-related patents, and second, an advanced topic modeling approach refines this classification into key subcategories. To enhance classification accuracy, the method integrates a retrieval-augmented generation (RAG) strategy. The thesis then extends this approach to CRMs by distinguishing patents based on material substitutability and deploying advanced natural language understanding (NLU) techniques to systematically detect CRM-related innovations. This yields a more nuanced mapping of CRM technologies and overcomes the limitations of traditional keyword or classification searches. Finally, an empirical analysis examines how exposure to CRM price shocks—exemplified by volatility in copper—affects firm-level innovation and patenting. The findings reveal that firms initially curtail innovation under resource stress but later adapt by intensifying technological development, underscoring the resilience of innovation systems. Methodologically, this work demonstrates the power of LLM-based NLP (including retrieval-augmented approaches and topic modeling) in improving patent analytics. From a policy perspective, it enables better tracking of green innovation, informs strategic industrial policy on critical materials, and supports the resilience of innovation ecosystems in the face of raw material dependencies.
Advancing Patent Analytics in Circular Economy and Critical Raw Materials: A Deep Learning and NLP Approach
MANERA, MARIA
2025
Abstract
This thesis advances patent analytics in the domains of the circular economy (CE) and critical raw materials (CRMs) using state-of-the-art deep learning and natural language processing (NLP) techniques. Specifically, it introduces a novel two-step framework for classifying circular economy patents: first, large language models (LLMs) such as GPT-3.5 and a specialized pretrained model (BERT for Patents) are employed to identify CE-related patents, and second, an advanced topic modeling approach refines this classification into key subcategories. To enhance classification accuracy, the method integrates a retrieval-augmented generation (RAG) strategy. The thesis then extends this approach to CRMs by distinguishing patents based on material substitutability and deploying advanced natural language understanding (NLU) techniques to systematically detect CRM-related innovations. This yields a more nuanced mapping of CRM technologies and overcomes the limitations of traditional keyword or classification searches. Finally, an empirical analysis examines how exposure to CRM price shocks—exemplified by volatility in copper—affects firm-level innovation and patenting. The findings reveal that firms initially curtail innovation under resource stress but later adapt by intensifying technological development, underscoring the resilience of innovation systems. Methodologically, this work demonstrates the power of LLM-based NLP (including retrieval-augmented approaches and topic modeling) in improving patent analytics. From a policy perspective, it enables better tracking of green innovation, informs strategic industrial policy on critical materials, and supports the resilience of innovation ecosystems in the face of raw material dependencies.File | Dimensione | Formato | |
---|---|---|---|
Thesis.pdf
accesso aperto
Dimensione
3.03 MB
Formato
Adobe PDF
|
3.03 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/212805
URN:NBN:IT:UNITO-212805