In the present Ph.D. Thesis, an innovative approach to derive Quantitative Structure-Property/Activity Relationships (QSPR/QSARs) was investigated and discussed by applying it to various predictive problems. This approach is based on the direct and adaptive treatment of molecular structure by means of a Recursive Neural Network (RNN). Chemical compounds are represented through appropriate graphical tools and no numerical descriptors are needed. In the first part, the RNN-QSPR method was applied to predicting the melting point (Tm) of a set of 126 pyridinium bromides and the glass transition temperature (Tg) of a set of 337 (meth)acrylic homopolymers. Particular emphasis was placed on the representation of cyclic moieties, which can be achieved in different ways by exploiting the flexibility of the structured approach. Various representations were devised, each one having different advantages and sampling requirements. The performance did not show significant variations when passing from a more specific representation to a more general one. The best result obtained for the Tm of pyridinium bromides showed, for the test set of 37 molecules, a mean absolute residual (MAR) of 25 K, a standard error of prediction (S) of 29.6 K and a squared correlation coefficient (R2) of 0.62. The best outcome for the Tg of poly(meth)acrylates had MAR, S and R2 values of 15.8 K, 20.4 K and 0.85, respectively, for the test set of 54 molecules. In the second part, the representation used for the treatment of homopolymers was expanded to treat copolymers. A data set containing the Tg of 275 random (meth)acrylic copolymers was investigated, either alone or mixed with homopolymer data. The prediction on copolymers was excellent, with MAR, S and R2 for the 57 compounds in the test set of 4.9 K, 6.1 K and 0.98. The method yielded a good performance also on the total data set comprising homopolymers and copolymers together. In the last part, the RNN approach was employed to model and predict the toxicity of two sets of aromatic molecules. The first data set involved the median growth impairment concentration (IGC50) of 221 phenols towards Tetrahymena pyriformis. The results were good for the training set, but the performance on the test set (41 molecules) was not on par with that of other methods in the literature. However, it must be stressed that the referenced methods employ a priori information synthesized into appropriate numerical descriptors, whereas our method does not make use of any background knowledge. The second data set concerned the median Lethal Concentration (LC50) of 69 substituted benzenes towards Pimephales promelas. This data set was also investigated by means of a descriptor-based MLR technique. The performance was good for both calculations, yielding MAR ≈ 0.22, S ≈ 0.25 and R2 ≈ 0.80 on the test set of 18 molecules. The results obtained by RNN and MLR were very similar, despite the radically different approaches of these two methods.

PREDICTION OF THE PHYSICO-CHEMICAL PROPERTIES OF LOW AND HIGH MOLECULAR WEIGHT COMPOUNDS

2010

Abstract

In the present Ph.D. Thesis, an innovative approach to derive Quantitative Structure-Property/Activity Relationships (QSPR/QSARs) was investigated and discussed by applying it to various predictive problems. This approach is based on the direct and adaptive treatment of molecular structure by means of a Recursive Neural Network (RNN). Chemical compounds are represented through appropriate graphical tools and no numerical descriptors are needed. In the first part, the RNN-QSPR method was applied to predicting the melting point (Tm) of a set of 126 pyridinium bromides and the glass transition temperature (Tg) of a set of 337 (meth)acrylic homopolymers. Particular emphasis was placed on the representation of cyclic moieties, which can be achieved in different ways by exploiting the flexibility of the structured approach. Various representations were devised, each one having different advantages and sampling requirements. The performance did not show significant variations when passing from a more specific representation to a more general one. The best result obtained for the Tm of pyridinium bromides showed, for the test set of 37 molecules, a mean absolute residual (MAR) of 25 K, a standard error of prediction (S) of 29.6 K and a squared correlation coefficient (R2) of 0.62. The best outcome for the Tg of poly(meth)acrylates had MAR, S and R2 values of 15.8 K, 20.4 K and 0.85, respectively, for the test set of 54 molecules. In the second part, the representation used for the treatment of homopolymers was expanded to treat copolymers. A data set containing the Tg of 275 random (meth)acrylic copolymers was investigated, either alone or mixed with homopolymer data. The prediction on copolymers was excellent, with MAR, S and R2 for the 57 compounds in the test set of 4.9 K, 6.1 K and 0.98. The method yielded a good performance also on the total data set comprising homopolymers and copolymers together. In the last part, the RNN approach was employed to model and predict the toxicity of two sets of aromatic molecules. The first data set involved the median growth impairment concentration (IGC50) of 221 phenols towards Tetrahymena pyriformis. The results were good for the training set, but the performance on the test set (41 molecules) was not on par with that of other methods in the literature. However, it must be stressed that the referenced methods employ a priori information synthesized into appropriate numerical descriptors, whereas our method does not make use of any background knowledge. The second data set concerned the median Lethal Concentration (LC50) of 69 substituted benzenes towards Pimephales promelas. This data set was also investigated by means of a descriptor-based MLR technique. The performance was good for both calculations, yielding MAR ≈ 0.22, S ≈ 0.25 and R2 ≈ 0.80 on the test set of 18 molecules. The results obtained by RNN and MLR were very similar, despite the radically different approaches of these two methods.
12-feb-2010
Italiano
Tinè, Maria Rosaria
Baratta, Walter
Marongiu, Bruno
Fuoco, Roger
Micheli, Alessio
Università degli Studi di Pisa
File in questo prodotto:
File Dimensione Formato  
Appendice1_2_3_4.pdf

embargo fino al 18/02/2050

Tipologia: Altro materiale allegato
Dimensione 273.33 kB
Formato Adobe PDF
273.33 kB Adobe PDF
Capitolo1.pdf

embargo fino al 18/02/2050

Tipologia: Altro materiale allegato
Dimensione 177.15 kB
Formato Adobe PDF
177.15 kB Adobe PDF
Capitolo2.pdf

embargo fino al 18/02/2050

Tipologia: Altro materiale allegato
Dimensione 904.48 kB
Formato Adobe PDF
904.48 kB Adobe PDF
Capitolo3.pdf

embargo fino al 18/02/2050

Tipologia: Altro materiale allegato
Dimensione 2.58 MB
Formato Adobe PDF
2.58 MB Adobe PDF
Conclusioni.pdf

embargo fino al 18/02/2050

Tipologia: Altro materiale allegato
Dimensione 78.38 kB
Formato Adobe PDF
78.38 kB Adobe PDF
Copertina_Indice_Abstract.pdf

embargo fino al 18/02/2050

Tipologia: Altro materiale allegato
Dimensione 113.53 kB
Formato Adobe PDF
113.53 kB Adobe PDF
Appendice5.pdf

embargo fino al 18/02/2050

Tipologia: Altro materiale allegato
Dimensione 600.72 kB
Formato Adobe PDF
600.72 kB Adobe PDF
Capitolo4.pdf

embargo fino al 18/02/2050

Tipologia: Altro materiale allegato
Dimensione 2.85 MB
Formato Adobe PDF
2.85 MB Adobe PDF
Capitolo5.pdf

embargo fino al 18/02/2050

Tipologia: Altro materiale allegato
Dimensione 1.65 MB
Formato Adobe PDF
1.65 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/150982
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-150982