Streaming data are relevant to finance, computer science, and engineering, while they are becoming increasingly important to medicine and biology. Continuous time Bayesian networks are designed for analyzing efficiently multivariate streaming data, exploiting the conditional independencies in continuous time homogeneous Markov processes. Continuous time Bayesian network classifiers are a specialization of continuous time Bayesian networks designed for multivariate streaming data classification when time duration of events matters and the class occurs in the future. Continuous time Bayesian network classifiers are presented and analyzed. Structural learning is introduced for this class of models when complete data are available. A conditional log-likelihood scoring is derived to improve the marginal log- likelihood structural learning on continuous time Bayesian net- work classifiers. The expectation maximization algorithm is developed to address the unsupervised learning of continuous time Bayesian network classifiers when the class is unknown. Performances of continuous time Bayesian network classifiers in the case of classification and clustering are analyzed with the help of a rich set of numerical experiments on synthetic and real data sets. Continuous time Bayesian network classifiers learned by maximizing marginal log-likelihood and conditional log-likelihood are compared with continuous time naive Bayes and dynamic Bayesian networks. Results show that the conditional log-likelihood scoring combined with Bayesian parameter estimation outperforms marginal log-likelihood scoring and dynamic Bayesian networks in the case of supervised classification. Conditional log-likelihood scoring becomes even more effective when the amount of available data is limited. Continuous time Bayesian network classifiers outperform dynamic Bayesian networks even on data sets generated from dis- crete time models. Clustering results show that in the case of unsupervised learning the marginal log-likelihood score is the most effective way to learn continuous time Bayesian network classifiers. Continuous time models again outperform dynamic Bayesian networks even when applied on discrete time data sets. A Java software toolkit implementing the main theoretical achievements of the thesis has been designed and developed under the name of the CTBNCToolkit. It provides a free stand- alone toolkit for multivariate trajectory classification and an open source library, which can be extend in accordance with the GPL v.2.0 license. The CTBNCToolkit allows classification and clustering of multivariate trajectories using continuous time Bayesian network classifiers. Structural learning, maximizing marginal log-likelihood and conditional log-likelihood scores, is provided.

Continuous time bayesian network classifiers

CODECASA, DANIELE
2014

Abstract

Streaming data are relevant to finance, computer science, and engineering, while they are becoming increasingly important to medicine and biology. Continuous time Bayesian networks are designed for analyzing efficiently multivariate streaming data, exploiting the conditional independencies in continuous time homogeneous Markov processes. Continuous time Bayesian network classifiers are a specialization of continuous time Bayesian networks designed for multivariate streaming data classification when time duration of events matters and the class occurs in the future. Continuous time Bayesian network classifiers are presented and analyzed. Structural learning is introduced for this class of models when complete data are available. A conditional log-likelihood scoring is derived to improve the marginal log- likelihood structural learning on continuous time Bayesian net- work classifiers. The expectation maximization algorithm is developed to address the unsupervised learning of continuous time Bayesian network classifiers when the class is unknown. Performances of continuous time Bayesian network classifiers in the case of classification and clustering are analyzed with the help of a rich set of numerical experiments on synthetic and real data sets. Continuous time Bayesian network classifiers learned by maximizing marginal log-likelihood and conditional log-likelihood are compared with continuous time naive Bayes and dynamic Bayesian networks. Results show that the conditional log-likelihood scoring combined with Bayesian parameter estimation outperforms marginal log-likelihood scoring and dynamic Bayesian networks in the case of supervised classification. Conditional log-likelihood scoring becomes even more effective when the amount of available data is limited. Continuous time Bayesian network classifiers outperform dynamic Bayesian networks even on data sets generated from dis- crete time models. Clustering results show that in the case of unsupervised learning the marginal log-likelihood score is the most effective way to learn continuous time Bayesian network classifiers. Continuous time models again outperform dynamic Bayesian networks even when applied on discrete time data sets. A Java software toolkit implementing the main theoretical achievements of the thesis has been designed and developed under the name of the CTBNCToolkit. It provides a free stand- alone toolkit for multivariate trajectory classification and an open source library, which can be extend in accordance with the GPL v.2.0 license. The CTBNCToolkit allows classification and clustering of multivariate trajectories using continuous time Bayesian network classifiers. Structural learning, maximizing marginal log-likelihood and conditional log-likelihood scores, is provided.
17-feb-2014
Inglese
STELLA, FABIO ANTONIO
Università degli Studi di Milano-Bicocca
File in questo prodotto:
File Dimensione Formato  
phd_unimib_063161.pdf

accesso aperto

Dimensione 1.93 MB
Formato Adobe PDF
1.93 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/73832
Il codice NBN di questa tesi è URN:NBN:IT:UNIMIB-73832