Background: Because of the complexity of diet and the potential interactions between dietary components, the use of dietary patterns has been proposed, to describe variations in overall dietary intakes in a specific population and to analyze the relationship between diet and cancer risk. In the present work, factor analysis and cluster analysis were used in combination to identify groups of subjects with similar dietary patterns. Patients and methods: We analyzed data from an Italian case–control study, including 304 cases with squamous cell carcinoma of the esophagus and 743 hospital controls. Dietary habits were evaluated using a food frequency questionnaire. A posteriori dietary patterns were identified through principal component factor analysis performed on 28 selected nutrients. A varimax rotation was applied to achieve a simpler loading structure. Nutrients with absolute rotated factor loading greater or equal to 0.63 on a given pattern were used to name the patterns. For each pattern, participants were grouped into categories according to quartile of factor scores among the control population, and the odds ratios (OR) and corresponding 95% confidence intervals (CI) were estimated using unconditional multiple logistic regression models accounting for potential confounding variables. Then, cluster analysis was performed on factor scores obtained from factor analysis. The main analysis was carried out using the k-means method with Euclidean distance. The initial seeds were obtained performing preliminarily a hierarchical method (Ward’s) and cutting the resulting dendrogram at the level corresponding to 6 clusters. Results from the main analysis were compared with those from other clustering solutions identified using the k-means method with Manhattan, Lagrange and Correlation coefficient similarity measure distances and the Partitioning around Medoids method, with both Euclidean and Manhattan distances. The identified clusters were characterized by examining the distribution of several sociodemographic and lifestyle variables, and the average consumption of selected nutrients and food groups, within cluster. The ORs were estimated for each of the identified clusters, and corresponding 95% CIs were obtained referring to the floating absolute risks method. Results: PCFA allowed to identify five major dietary patterns, which explained about 80% of the total variance in the original nutrients. The Animal products and related components pattern (with high factor loadings on calcium, phosphorus, riboflavin, animal protein, saturated fatty acids, cholesterol, and zinc) was positively related to esophageal cancer risk (OR=1.64, 95% CI: 1.06-2.55). The Vitamins and fiber (with high loadings on vitamin C, total fiber, beta-carotene equivalents, soluble carbohydrates, and total folate) and the Other polyunsaturated fatty acids and vitamin D (with high loadings on other polyunsaturated fatty acids, vitamin D, and niacin) were inversely related to esophageal cancer (OR=0.50, 95% CI: 0.32-0.78, and OR=0.48, 95% CI: 0.31-0.74, respectively), while no relationship with this cancer was observed for the Starch-rich (starch, vegetable protein, and sodium) characterized by high loadings on (OR=0.80, 95% CI: 0.50-1.28) and the Other fats (with high loadings on linoleic acid, linolenic acid, and vitamin E) patterns (OR=1.04, 95% CI: 0.67-1.63). The naming of the factors, based on high factor scores characterizing each pattern, was confirmed by the distributions of selected nutrients and food groups. The subsequent cluster analysis, based on differences in the dietary patterns, yielded 6 clusters, one of which (C3) was characterized by the lowest intakes of all nutrients and food groups considered, while the remaining clusters were determined by an extreme value of the dietary patterns, one-by-one. Subjects in the C1 cluster were characterized by the highest values of the Vitamins and fiber pattern, subjects in the C2 cluster had the highest values of the Other polyunsaturated fatty acids pattern, the C4 cluster was characterized by the highest scores of the Animal products and related components, subjects in the C5 cluster had the highest values of the Other fats pattern, the C6 cluster was characterized by the highest scores of the Starch-rich pattern and had the highest intakes of bread, and pasta and rice. Significant inverse relations were observed between the C1, C5 and C6 clusters (OR=0.59, 95% CI:0.40-0.88, OR=0.42, 95% CI:0.20-0.86, and OR=0.60, 95% CI: 0.42-0.86, respectively) – which were characterized by high values of the Vitamins and fiber, Other fats, and Starch-rich patterns, respectively – as compared to the C3 cluster. No significant risk was observed for the C2, and C4 clusters (OR=0.76, 95% CI: 0.51-1.13, and OR=1.29, 95% CI: 0.80-2.07). Conclusion: The combined application of factor and cluster analyses, allows to identify key dietary aspects in a specific population, and to obtain mutually exclusive groups of subjects who are similar for these characteristics. The two techniques have limitations that arise from the subjective decisions involved in the analyses. In this application, various alternative options were tried, to check robustness and solution stability. Among these complementary analyses, results from PCFA were compared with those from another principal axis factoring, and those from PCFA analyses performed separately in strata of center and gender, and in randomly generated split samples. Moreover, the internal consistency of the identified patterns was evaluated using the Cronbach’s coefficient alphas. All these checks supported the decisions adopted in the main analyses. As concern cluster analysis, to limit the influence of the starting point, the initial seeds used in the k-means method were obtained performing a hierarchical clustering (Ward’s method) and cutting the corresponding dendrogram at the level k=6. Moreover, some alternative solutions were identified through different methods and distances, yielding comparable clustering solutions. Another limitation of cluster analysis is its sensitivity to the presence of outliers; however, the exclusion of 8 potential outliers did not materially change the results.
DIETARY PATTERNS AND ESOPHAGEAL CANCER: A POSTERIORI DIETARY PATTERNS IDENTIFIED THROUGH FACTOR ANALYSIS AND CLUSTER ANALYSIS
BRAVI, FRANCESCA
2013
Abstract
Background: Because of the complexity of diet and the potential interactions between dietary components, the use of dietary patterns has been proposed, to describe variations in overall dietary intakes in a specific population and to analyze the relationship between diet and cancer risk. In the present work, factor analysis and cluster analysis were used in combination to identify groups of subjects with similar dietary patterns. Patients and methods: We analyzed data from an Italian case–control study, including 304 cases with squamous cell carcinoma of the esophagus and 743 hospital controls. Dietary habits were evaluated using a food frequency questionnaire. A posteriori dietary patterns were identified through principal component factor analysis performed on 28 selected nutrients. A varimax rotation was applied to achieve a simpler loading structure. Nutrients with absolute rotated factor loading greater or equal to 0.63 on a given pattern were used to name the patterns. For each pattern, participants were grouped into categories according to quartile of factor scores among the control population, and the odds ratios (OR) and corresponding 95% confidence intervals (CI) were estimated using unconditional multiple logistic regression models accounting for potential confounding variables. Then, cluster analysis was performed on factor scores obtained from factor analysis. The main analysis was carried out using the k-means method with Euclidean distance. The initial seeds were obtained performing preliminarily a hierarchical method (Ward’s) and cutting the resulting dendrogram at the level corresponding to 6 clusters. Results from the main analysis were compared with those from other clustering solutions identified using the k-means method with Manhattan, Lagrange and Correlation coefficient similarity measure distances and the Partitioning around Medoids method, with both Euclidean and Manhattan distances. The identified clusters were characterized by examining the distribution of several sociodemographic and lifestyle variables, and the average consumption of selected nutrients and food groups, within cluster. The ORs were estimated for each of the identified clusters, and corresponding 95% CIs were obtained referring to the floating absolute risks method. Results: PCFA allowed to identify five major dietary patterns, which explained about 80% of the total variance in the original nutrients. The Animal products and related components pattern (with high factor loadings on calcium, phosphorus, riboflavin, animal protein, saturated fatty acids, cholesterol, and zinc) was positively related to esophageal cancer risk (OR=1.64, 95% CI: 1.06-2.55). The Vitamins and fiber (with high loadings on vitamin C, total fiber, beta-carotene equivalents, soluble carbohydrates, and total folate) and the Other polyunsaturated fatty acids and vitamin D (with high loadings on other polyunsaturated fatty acids, vitamin D, and niacin) were inversely related to esophageal cancer (OR=0.50, 95% CI: 0.32-0.78, and OR=0.48, 95% CI: 0.31-0.74, respectively), while no relationship with this cancer was observed for the Starch-rich (starch, vegetable protein, and sodium) characterized by high loadings on (OR=0.80, 95% CI: 0.50-1.28) and the Other fats (with high loadings on linoleic acid, linolenic acid, and vitamin E) patterns (OR=1.04, 95% CI: 0.67-1.63). The naming of the factors, based on high factor scores characterizing each pattern, was confirmed by the distributions of selected nutrients and food groups. The subsequent cluster analysis, based on differences in the dietary patterns, yielded 6 clusters, one of which (C3) was characterized by the lowest intakes of all nutrients and food groups considered, while the remaining clusters were determined by an extreme value of the dietary patterns, one-by-one. Subjects in the C1 cluster were characterized by the highest values of the Vitamins and fiber pattern, subjects in the C2 cluster had the highest values of the Other polyunsaturated fatty acids pattern, the C4 cluster was characterized by the highest scores of the Animal products and related components, subjects in the C5 cluster had the highest values of the Other fats pattern, the C6 cluster was characterized by the highest scores of the Starch-rich pattern and had the highest intakes of bread, and pasta and rice. Significant inverse relations were observed between the C1, C5 and C6 clusters (OR=0.59, 95% CI:0.40-0.88, OR=0.42, 95% CI:0.20-0.86, and OR=0.60, 95% CI: 0.42-0.86, respectively) – which were characterized by high values of the Vitamins and fiber, Other fats, and Starch-rich patterns, respectively – as compared to the C3 cluster. No significant risk was observed for the C2, and C4 clusters (OR=0.76, 95% CI: 0.51-1.13, and OR=1.29, 95% CI: 0.80-2.07). Conclusion: The combined application of factor and cluster analyses, allows to identify key dietary aspects in a specific population, and to obtain mutually exclusive groups of subjects who are similar for these characteristics. The two techniques have limitations that arise from the subjective decisions involved in the analyses. In this application, various alternative options were tried, to check robustness and solution stability. Among these complementary analyses, results from PCFA were compared with those from another principal axis factoring, and those from PCFA analyses performed separately in strata of center and gender, and in randomly generated split samples. Moreover, the internal consistency of the identified patterns was evaluated using the Cronbach’s coefficient alphas. All these checks supported the decisions adopted in the main analyses. As concern cluster analysis, to limit the influence of the starting point, the initial seeds used in the k-means method were obtained performing a hierarchical clustering (Ward’s method) and cutting the corresponding dendrogram at the level k=6. Moreover, some alternative solutions were identified through different methods and distances, yielding comparable clustering solutions. Another limitation of cluster analysis is its sensitivity to the presence of outliers; however, the exclusion of 8 potential outliers did not materially change the results.File | Dimensione | Formato | |
---|---|---|---|
phd_unim_R08573.pdf
Open Access dal 30/06/2014
Dimensione
1.48 MB
Formato
Adobe PDF
|
1.48 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/81790
URN:NBN:IT:UNIMI-81790