Background: Longer life expectancies and increasing prevalence of chronic diseases drive up healthcare services demand and related costs. In Italy, 32% of people aged 65 and over, and 48% of those over 85 have major chronic conditions and multimorbidity. In 2019, individuals aged 65 and over accounted for 46% of hospital admissions and 60% of pharmaceutical expenditures, highlighting the aging burden on the healthcare system. In terms of costs, population segments with high chronic disease prevalence account for a large portion of healthcare spending. Accurate predictions of future costs, for both the whole population and subgroups of patients with high clinical and economic impact, are crucial for healthcare planning. Aim: To predict yearly direct healthcare costs based on data of past Italian National Health Service (NHS) resources utilization for the whole population and for high-impact segments. Methods: Administrative databases of the Health Protection Agency of Bergamo (2011–2023) were used to trace NHS resource utilization (inpatient and outpatient services, drug dispensations) and associated costs for individuals aged ≥18 years. A supervised machine learning approach (random forest) was applied, using individual data of 5-year resource utilization of the NHS to predict individual’s healthcare costs in the following year. We evaluated different outcome measures, including total cost (TC), defined as the sum of all inpatient, outpatient, and drug dispensation costs. Individual cost predictions were aggregated to derive total and mean cost estimates for the whole population and for specific high-impact subgroups, namely patients undergoing dialysis, with type 2 diabetes, heart failure, Parkinson’s disease or parkinsonisms, and active neoplasms. Prediction error (PE) was calculated as the ratio of the difference between predicted and actual cost to actual cost, and variability intervals for the mean predicted costs were derived based on the 2.5 and 97.5 quantiles of the distribution of the mean costs predicted across trees. Results: PE values were generally negative, indicating a tendency to underestimate mean costs. Excluding 2020-2021, when predictions were affected by the COVID-19 pandemic, PEs for the whole population were close to zero, indicating a slight discrepancy between predicted and actual costs. In 2023, for example, predicted total TC was €1,103,322,372 compared to an actual of €1,111,657,382 (PE = -0.7%). For high-impact groups, dialysis and diabetic patients showed a consistent temporal trend in PE values, generally negative and not exceeding -10%. For instance, in 2023, the predicted mean TC for dialysis patients was €44,128 [variability interval: €40,421-47,605] compared to an actual of €46,254 (PE = -4.6%), while for diabetic patients it was €3,455 [€3,337-3,571] vs. €3,646 (PE = -5.2%). For patients with heart failure, following a substantial underestimation of mean costs in 2021, a gradual reduction in the deviation between predicted and actual costs was observed in 2022 and 2023. Finally, for patients with Parkinson’s disease and active neoplasms, predicted mean costs were markedly underestimated compared to actual costs, with PE values reaching up to -30%, though remaining consistent over time. For instance, in 2023, the predicted mean TC for patients with Parkinson’s disease was €3,768 [€3,349-4,434] compared to an actual of €4,698 (PE = -19.8%), while for patients with active neoplasia it was €5,537 [€5,252-5,798] vs. €6,773 (PE = -18.2%). Conclusions: While a systematic underestimation of mean costs was observed - more pronounced in heterogeneous groups, such as patients with Parkinson’s disease or active neoplasms, and less so in homogeneous subgroups, such as dialysis patients - the prediction accuracy remained consistent over time within each subgroup, supporting the robustness and validity of the predictive algorithm.
Background: Longer life expectancies and increasing prevalence of chronic diseases drive up healthcare services demand and related costs. In Italy, 32% of people aged 65 and over, and 48% of those over 85 have major chronic conditions and multimorbidity. In 2019, individuals aged 65 and over accounted for 46% of hospital admissions and 60% of pharmaceutical expenditures, highlighting the aging burden on the healthcare system. In terms of costs, population segments with high chronic disease prevalence account for a large portion of healthcare spending. Accurate predictions of future costs, for both the whole population and subgroups of patients with high clinical and economic impact, are crucial for healthcare planning. Aim: To predict yearly direct healthcare costs based on data of past Italian National Health Service (NHS) resources utilization for the whole population and for high-impact segments. Methods: Administrative databases of the Health Protection Agency of Bergamo (2011–2023) were used to trace NHS resource utilization (inpatient and outpatient services, drug dispensations) and associated costs for individuals aged ≥18 years. A supervised machine learning approach (random forest) was applied, using individual data of 5-year resource utilization of the NHS to predict individual’s healthcare costs in the following year. We evaluated different outcome measures, including total cost (TC), defined as the sum of all inpatient, outpatient, and drug dispensation costs. Individual cost predictions were aggregated to derive total and mean cost estimates for the whole population and for specific high-impact subgroups, namely patients undergoing dialysis, with type 2 diabetes, heart failure, Parkinson’s disease or parkinsonisms, and active neoplasms. Prediction error (PE) was calculated as the ratio of the difference between predicted and actual cost to actual cost, and variability intervals for the mean predicted costs were derived based on the 2.5 and 97.5 quantiles of the distribution of the mean costs predicted across trees. Results: PE values were generally negative, indicating a tendency to underestimate mean costs. Excluding 2020-2021, when predictions were affected by the COVID-19 pandemic, PEs for the whole population were close to zero, indicating a slight discrepancy between predicted and actual costs. In 2023, for example, predicted total TC was €1,103,322,372 compared to an actual of €1,111,657,382 (PE = -0.7%). For high-impact groups, dialysis and diabetic patients showed a consistent temporal trend in PE values, generally negative and not exceeding -10%. For instance, in 2023, the predicted mean TC for dialysis patients was €44,128 [variability interval: €40,421-47,605] compared to an actual of €46,254 (PE = -4.6%), while for diabetic patients it was €3,455 [€3,337-3,571] vs. €3,646 (PE = -5.2%). For patients with heart failure, following a substantial underestimation of mean costs in 2021, a gradual reduction in the deviation between predicted and actual costs was observed in 2022 and 2023. Finally, for patients with Parkinson’s disease and active neoplasms, predicted mean costs were markedly underestimated compared to actual costs, with PE values reaching up to -30%, though remaining consistent over time. For instance, in 2023, the predicted mean TC for patients with Parkinson’s disease was €3,768 [€3,349-4,434] compared to an actual of €4,698 (PE = -19.8%), while for patients with active neoplasia it was €5,537 [€5,252-5,798] vs. €6,773 (PE = -18.2%). Conclusions: While a systematic underestimation of mean costs was observed - more pronounced in heterogeneous groups, such as patients with Parkinson’s disease or active neoplasms, and less so in homogeneous subgroups, such as dialysis patients - the prediction accuracy remained consistent over time within each subgroup, supporting the robustness and validity of the predictive algorithm.
Random forest regression for predicting healthcare costs using administrative databases
SALA, ISABELLA MARIA
2026
Abstract
Background: Longer life expectancies and increasing prevalence of chronic diseases drive up healthcare services demand and related costs. In Italy, 32% of people aged 65 and over, and 48% of those over 85 have major chronic conditions and multimorbidity. In 2019, individuals aged 65 and over accounted for 46% of hospital admissions and 60% of pharmaceutical expenditures, highlighting the aging burden on the healthcare system. In terms of costs, population segments with high chronic disease prevalence account for a large portion of healthcare spending. Accurate predictions of future costs, for both the whole population and subgroups of patients with high clinical and economic impact, are crucial for healthcare planning. Aim: To predict yearly direct healthcare costs based on data of past Italian National Health Service (NHS) resources utilization for the whole population and for high-impact segments. Methods: Administrative databases of the Health Protection Agency of Bergamo (2011–2023) were used to trace NHS resource utilization (inpatient and outpatient services, drug dispensations) and associated costs for individuals aged ≥18 years. A supervised machine learning approach (random forest) was applied, using individual data of 5-year resource utilization of the NHS to predict individual’s healthcare costs in the following year. We evaluated different outcome measures, including total cost (TC), defined as the sum of all inpatient, outpatient, and drug dispensation costs. Individual cost predictions were aggregated to derive total and mean cost estimates for the whole population and for specific high-impact subgroups, namely patients undergoing dialysis, with type 2 diabetes, heart failure, Parkinson’s disease or parkinsonisms, and active neoplasms. Prediction error (PE) was calculated as the ratio of the difference between predicted and actual cost to actual cost, and variability intervals for the mean predicted costs were derived based on the 2.5 and 97.5 quantiles of the distribution of the mean costs predicted across trees. Results: PE values were generally negative, indicating a tendency to underestimate mean costs. Excluding 2020-2021, when predictions were affected by the COVID-19 pandemic, PEs for the whole population were close to zero, indicating a slight discrepancy between predicted and actual costs. In 2023, for example, predicted total TC was €1,103,322,372 compared to an actual of €1,111,657,382 (PE = -0.7%). For high-impact groups, dialysis and diabetic patients showed a consistent temporal trend in PE values, generally negative and not exceeding -10%. For instance, in 2023, the predicted mean TC for dialysis patients was €44,128 [variability interval: €40,421-47,605] compared to an actual of €46,254 (PE = -4.6%), while for diabetic patients it was €3,455 [€3,337-3,571] vs. €3,646 (PE = -5.2%). For patients with heart failure, following a substantial underestimation of mean costs in 2021, a gradual reduction in the deviation between predicted and actual costs was observed in 2022 and 2023. Finally, for patients with Parkinson’s disease and active neoplasms, predicted mean costs were markedly underestimated compared to actual costs, with PE values reaching up to -30%, though remaining consistent over time. For instance, in 2023, the predicted mean TC for patients with Parkinson’s disease was €3,768 [€3,349-4,434] compared to an actual of €4,698 (PE = -19.8%), while for patients with active neoplasia it was €5,537 [€5,252-5,798] vs. €6,773 (PE = -18.2%). Conclusions: While a systematic underestimation of mean costs was observed - more pronounced in heterogeneous groups, such as patients with Parkinson’s disease or active neoplasms, and less so in homogeneous subgroups, such as dialysis patients - the prediction accuracy remained consistent over time within each subgroup, supporting the robustness and validity of the predictive algorithm.| File | Dimensione | Formato | |
|---|---|---|---|
|
phd_unimib_802991.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
14.03 MB
Formato
Adobe PDF
|
14.03 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/368701
URN:NBN:IT:UNIMIB-368701