Track: Big Data and Analytics
Abstract
The purpose of this STEM project is to determine which Starbucks drinks among all coffee and tea options are best for Cardiovascular Disease (CVD) prevention. In order to do this, a health index was constructed considering different variables, including: saturated fat, cholesterol, sodium, carbohydrates, dietary fiber, sugars, protein, and caffeine. Each variable was assigned a weighting coefficient, with lower coefficients assigned to the factors that are more harmful and higher ones to those that are more beneficial. Therefore, drinks with the highest health index are determined to be the most beneficial to preventing CVD. Principal Components Analysis (PCA) was used to explore all factors in the analysis and to inform on the utility of the health index in relation to its link to CVD prevention. PCA was successfully able to decompose the dominant sources of variability in relation to the Health Index, where 66.4% and 12.6% of variation were attributable to Principal Components 1 (Prin 1) and 2 (Prin 2), respectively. Therefore, 79% of the total variation was explained on the basis of the first two Principal Components. Prin 1 did a good job grouping the data, separating Frappuccino Blended and Espresso beverages in one cluster, and mainly Cold Brew, Freshly Brewed, and Tea in another. Prin 2 largely grouped data based on cholesterol and fat content, and held less explanatory power than Principal Component 1. The health index originally derived on the basis of the scientific research, largely corroborated the results of PCA 1 vs. Drink/ Drink Category. Hierarchical Clustering was used to form 3 clusters across drink categories, and results were taken together with the Health Index/ PCA to investigate which combined set of factors contributed most to CVD prevention. This project sheds light on smarter ordering at Starbucks, making people more aware of how diet ultimately affects health and more specifically, how smart drink choices can promote CVD prevention.