Nowadays the available computing and information-storage resources grew up to a level that allows to easily collect and preserve huge amount of data. However, several organizations are still lacking the knowledge or the tools to process these data into useful informations. In this thesis work we will investigate several issues that can be solved effectively by means of machine learning techniques, ranging from web defacement detection to electricity prices forecasting, from Support Vector Machines to Genetic Programming. We will investigate a framework for web defacement detection meant to allow any organization to join the service by simply providing the URLs of the resources to be monitored along with the contact point of an administrator. Our approach is based on anomaly detection and allows monitoring the integrity of many remote web resources automatically while remaining fully decoupled from them, in particular, without requiring any prior knowledge about those resources†"thus being an unsupervised system. Furthermore, we will test several machine learning algorithms normally used for anomaly detection on the web defacement detection problem. We will present a scrolling system to be used on mobile devices to provide a more natural and effective user experience on small screens. We detect device motion by analyzing the video stream generated by the camera and then we transform the motion in a scrolling of the content rendered on the screen. This way, the user experiences the device screen like a small movable window on a larger virtual view, without requiring any dedicated motion-detection hardware. As regards information retrieval, we will present an approach for information extraction for multi-page printed document; the approach is designed for scenarios in which the set of possible document classes, i.e., document sharing similar content and layout, is large and may evolve over time. Our approach is based on probability: we derived a general form for the probability that a sequence of blocks contains the searched information. A key step in the understanding of printed documents is their classification based on the nature of information they contain and their layout; we will consider both a static and a dynamic scenario, in which document classes are/are not known a priori and new classes can/can not appear at any time. Finally, we will move to the edge of machine learning: Genetic Programming. The electric power market is increasingly relying on competitive mechanisms taking the form of day-ahead auctions, in which buyers and sellers submit their bids in terms of prices and quantities for each hour of the next day. We propose a novel forecasting method based on Genetic Programming; key feature of our proposal is the handling of outliers, i.e., regions of the input space rarely seen during the learning.

Machine learning in engineering applications

-
2011

Abstract

Nowadays the available computing and information-storage resources grew up to a level that allows to easily collect and preserve huge amount of data. However, several organizations are still lacking the knowledge or the tools to process these data into useful informations. In this thesis work we will investigate several issues that can be solved effectively by means of machine learning techniques, ranging from web defacement detection to electricity prices forecasting, from Support Vector Machines to Genetic Programming. We will investigate a framework for web defacement detection meant to allow any organization to join the service by simply providing the URLs of the resources to be monitored along with the contact point of an administrator. Our approach is based on anomaly detection and allows monitoring the integrity of many remote web resources automatically while remaining fully decoupled from them, in particular, without requiring any prior knowledge about those resources†"thus being an unsupervised system. Furthermore, we will test several machine learning algorithms normally used for anomaly detection on the web defacement detection problem. We will present a scrolling system to be used on mobile devices to provide a more natural and effective user experience on small screens. We detect device motion by analyzing the video stream generated by the camera and then we transform the motion in a scrolling of the content rendered on the screen. This way, the user experiences the device screen like a small movable window on a larger virtual view, without requiring any dedicated motion-detection hardware. As regards information retrieval, we will present an approach for information extraction for multi-page printed document; the approach is designed for scenarios in which the set of possible document classes, i.e., document sharing similar content and layout, is large and may evolve over time. Our approach is based on probability: we derived a general form for the probability that a sequence of blocks contains the searched information. A key step in the understanding of printed documents is their classification based on the nature of information they contain and their layout; we will consider both a static and a dynamic scenario, in which document classes are/are not known a priori and new classes can/can not appear at any time. Finally, we will move to the edge of machine learning: Genetic Programming. The electric power market is increasingly relying on competitive mechanisms taking the form of day-ahead auctions, in which buyers and sellers submit their bids in terms of prices and quantities for each hour of the next day. We propose a novel forecasting method based on Genetic Programming; key feature of our proposal is the handling of outliers, i.e., regions of the input space rarely seen during the learning.
2011
en
Defacement Detection
Genetic Programming
Machine Learning
Rilevazione Defacement
SCUOLA DI DOTTORATO DI RICERCA DI INGEGNERIA DELL'INFORMAZIONE
Università degli Studi di Trieste
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/245875
Il codice NBN di questa tesi è URN:NBN:IT:UNITS-245875