Computer-aided stylometry is a powerful tool in authorship attribution. Recent models can point the author of an anonymous text among thousands or distinguish different contributors to one text. However, most methods are quite complex and depend on the language. We propose a new Authorship Attribution method based on inference using a stochastic process. Every author is associated with the process that is most likely to reproduce their known corpus. We assign a text to the author whose process gives the highest probability of producing the text. We find high attribution rates independent of the language of the text or the tokenisation. Inference using stochastic processes offers exciting opportunities for stylometry and information retrieval.
Generative models for inference: an application to authorship attribution
TANI RAFFAELLI, GIULIO
2022
Abstract
Computer-aided stylometry is a powerful tool in authorship attribution. Recent models can point the author of an anonymous text among thousands or distinguish different contributors to one text. However, most methods are quite complex and depend on the language. We propose a new Authorship Attribution method based on inference using a stochastic process. Every author is associated with the process that is most likely to reproduce their known corpus. We assign a text to the author whose process gives the highest probability of producing the text. We find high attribution rates independent of the language of the text or the tokenisation. Inference using stochastic processes offers exciting opportunities for stylometry and information retrieval.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_TaniRaffaelli.pdf
accesso aperto
Dimensione
5.69 MB
Formato
Adobe PDF
|
5.69 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/97109
URN:NBN:IT:UNIROMA1-97109