The problem of data compression having specific security properties in order to guarantee user’s privacy is a living matter. On the other hand, high-throughput systems in genomics (e.g. the so-called Next Generation Sequencers) generate massive amounts of genetic data at affordable costs. As a consequence, huge DBMSs integrating many types of genomic information, clinical data and other (personal, environmental, historical, etc.) information types are on the way. This will allow for an unprecedented capability of doing large-scale, comprehensive and in-depth analysis of human beings and diseases; however, it will also constitute a formidable threat to user’s privacy. Whilst the confidential storage of clinical data can be done with well-known methods in the field of relational databases, it is not the same for genomic data; so the main goal of my research work was the design of new compressed indexing schemas for the management of genomic data with confidentiality protection. For the effective processing of a huge amount of such data, a key point will be the possibility of doing high speed search operations in secondary storage, directly operating on the data in compressed and encrypted form; therefore, I spent a big effort to obtain algorithms and data structures enabling pattern search operations on compressed and encrypted data in secondary storage, so that there is no need to preload data in main memory before starting that operations. [edited by Author]

Compression and indexing of genomic data with confidentiality protection

MONTECUOLLO, FERDINANDO
2015

Abstract

The problem of data compression having specific security properties in order to guarantee user’s privacy is a living matter. On the other hand, high-throughput systems in genomics (e.g. the so-called Next Generation Sequencers) generate massive amounts of genetic data at affordable costs. As a consequence, huge DBMSs integrating many types of genomic information, clinical data and other (personal, environmental, historical, etc.) information types are on the way. This will allow for an unprecedented capability of doing large-scale, comprehensive and in-depth analysis of human beings and diseases; however, it will also constitute a formidable threat to user’s privacy. Whilst the confidential storage of clinical data can be done with well-known methods in the field of relational databases, it is not the same for genomic data; so the main goal of my research work was the design of new compressed indexing schemas for the management of genomic data with confidentiality protection. For the effective processing of a huge amount of such data, a key point will be the possibility of doing high speed search operations in secondary storage, directly operating on the data in compressed and encrypted form; therefore, I spent a big effort to obtain algorithms and data structures enabling pattern search operations on compressed and encrypted data in secondary storage, so that there is no need to preload data in main memory before starting that operations. [edited by Author]
30-apr-2015
Inglese
Confidentiality protection
Genomic sequences
Indexed data compression
TAGLIAFERRI, Roberto
LEONE, Antonietta
Università degli Studi di Salerno
File in questo prodotto:
File Dimensione Formato  
108029604304619694906716462172027622479.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 2.86 MB
Formato Adobe PDF
2.86 MB Adobe PDF Visualizza/Apri
137810161322407549620429259355793787776.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 29.67 kB
Formato Adobe PDF
29.67 kB Adobe PDF Visualizza/Apri
166527093388294444938734498415714173450.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 29.57 kB
Formato Adobe PDF
29.57 kB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/312445
Il codice NBN di questa tesi è URN:NBN:IT:UNISA-312445