Introduction. Post hepatectomy liver failure (PHLF) is associated with high risk of failure-to-rescue and mortality after liver resection for hepatocellular carcinoma (HCC). Its diagnosis is prevalently clinical in the postoperative course. The aim of this study was to train machine-learning models to predict the occurrence of PHLF by the preoperative CT scan combined with clinical parameters. Methods. Clinical data and 3-phases CT scans were retrospectively collected among 12 Italian centres. DICOM files were manually segmented to detect the liver parenchyma. Radiomics features were extracted after setting a ROI in the healthy liver area. Data obtained were explored and principal component analysis (PCA) was performed to reduce the dimensions of the dataset, keeping only the PC’s explaining 75% of the variability. After normalization, data were divided between training (70%) and test (30%) sets. An oversampling was run (ADASYN) in the training set to overcome the imbalance among the target variable. Random-Forest (RF), extreme gradient boosting (XGB) and support vector machine (SVM) models were fitted to predict PHLF. Hyperparameters tuning was made per each model to reduce the out-of-bag error by grid search. The training was run in the training set, and the best parameters were estimated in a validation cohort created by 10-fold cross validation. Final evaluation of the metrics was run in the test set. Isotonic calibration was applied to each model prediction. The area under the curve (AUC) was estimated by ROC analysis and settled as the primary endpoint. The best models in terms of discrimination and calibration were included in a averaging ensemble model (AEM). Results. Between 2008 and 2022, 500 consecutive preoperative CT scans of patients affected by HCC and submitted to surgery were collected with the relative clinical data. Of them, 17 (3.4%) experienced a PHLF. First and second order radiomics features were extracted, obtaining 672 variables per patient. After data exploration, PCA selected 19 dimensions explaining >75% of the variance. The clinical variables selected by logistic regression were size, macrovascular invasion, cirrhosis, major resection and MELD score. After the dataset splitting, a training cohort (n=351) and a test cohort (n=149) were created. After a grid search, the following tuning parameters were selected for XGB: gamma=3, eta=0.3, maxdepth=7, min.child weight=5, subsample=1, colsample=0.8. For RF: mtry=1, n. rounds=2500. The XGB model obtained in the test set an AUC=85.3% (Spec.=62.5%, Sens.=100%, accuracy=97.9%, PPV=20%, NPV=97.9%). The RF model obtained an AUC=89.1% (Spec.=70.1%, Sens.=100%, accuracy= 71.1%, PPV=10.4%, NPV=100%). The SVM model showed an AUC=87.8% (Spec.=88.9%, Sens.=60.0%, Accuracy=87.8%, PPV=15.7%, NPV= 98.4%). The AEM combined the XGB and RF model, obtaining an AUC= 90.1% (Spec.=89.5%, Sens.=80.0%, accuracy=89.2%, PPV=21.0%, NPV=99.2%). Conclusion. The AEM obtained the best results in terms of discrimination and true positive identification. This could lead to change the treatment allocation, the surgical extension and the postoperative management for those patients. The algorithm will be freely distributed online for medical purpose.

Preoperative prediction of post hepatectomy liver failure after surgery for hepatocellular carcinoma on CT-scan by machine learning and radiomics analyses.

SIMONE, FAMULARO
2024

Abstract

Introduction. Post hepatectomy liver failure (PHLF) is associated with high risk of failure-to-rescue and mortality after liver resection for hepatocellular carcinoma (HCC). Its diagnosis is prevalently clinical in the postoperative course. The aim of this study was to train machine-learning models to predict the occurrence of PHLF by the preoperative CT scan combined with clinical parameters. Methods. Clinical data and 3-phases CT scans were retrospectively collected among 12 Italian centres. DICOM files were manually segmented to detect the liver parenchyma. Radiomics features were extracted after setting a ROI in the healthy liver area. Data obtained were explored and principal component analysis (PCA) was performed to reduce the dimensions of the dataset, keeping only the PC’s explaining 75% of the variability. After normalization, data were divided between training (70%) and test (30%) sets. An oversampling was run (ADASYN) in the training set to overcome the imbalance among the target variable. Random-Forest (RF), extreme gradient boosting (XGB) and support vector machine (SVM) models were fitted to predict PHLF. Hyperparameters tuning was made per each model to reduce the out-of-bag error by grid search. The training was run in the training set, and the best parameters were estimated in a validation cohort created by 10-fold cross validation. Final evaluation of the metrics was run in the test set. Isotonic calibration was applied to each model prediction. The area under the curve (AUC) was estimated by ROC analysis and settled as the primary endpoint. The best models in terms of discrimination and calibration were included in a averaging ensemble model (AEM). Results. Between 2008 and 2022, 500 consecutive preoperative CT scans of patients affected by HCC and submitted to surgery were collected with the relative clinical data. Of them, 17 (3.4%) experienced a PHLF. First and second order radiomics features were extracted, obtaining 672 variables per patient. After data exploration, PCA selected 19 dimensions explaining >75% of the variance. The clinical variables selected by logistic regression were size, macrovascular invasion, cirrhosis, major resection and MELD score. After the dataset splitting, a training cohort (n=351) and a test cohort (n=149) were created. After a grid search, the following tuning parameters were selected for XGB: gamma=3, eta=0.3, maxdepth=7, min.child weight=5, subsample=1, colsample=0.8. For RF: mtry=1, n. rounds=2500. The XGB model obtained in the test set an AUC=85.3% (Spec.=62.5%, Sens.=100%, accuracy=97.9%, PPV=20%, NPV=97.9%). The RF model obtained an AUC=89.1% (Spec.=70.1%, Sens.=100%, accuracy= 71.1%, PPV=10.4%, NPV=100%). The SVM model showed an AUC=87.8% (Spec.=88.9%, Sens.=60.0%, Accuracy=87.8%, PPV=15.7%, NPV= 98.4%). The AEM combined the XGB and RF model, obtaining an AUC= 90.1% (Spec.=89.5%, Sens.=80.0%, accuracy=89.2%, PPV=21.0%, NPV=99.2%). Conclusion. The AEM obtained the best results in terms of discrimination and true positive identification. This could lead to change the treatment allocation, the surgical extension and the postoperative management for those patients. The algorithm will be freely distributed online for medical purpose.
28-feb-2024
Inglese
hcc; machinelearning; PHLF; AI
TORZILLI, Guido
DONADON, Matteo Davide
Humanitas University
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/121782
Il codice NBN di questa tesi è URN:NBN:IT:HUNIMED-121782