TCORS: Center for the Assessment of Tobacco Regulations (CAsToR)

CAsToR Pilot Project Program Lightning Talk Sessions

“Automating the detection of American adolescents at risk of e-cigarette dependence using machine learning” with Dr. Rui “Ray” Fu (University of Toronto)

Contact the presenter
Dr. Rui “Ray” Fu <rui.fu@mail.utoronto.ca>
View Abstract +
INTRODUCTION: American adolescents are showing a concerning trend in frequent e-cigarette use (vaping) that might indicate signs of dependence. Although risk factors for vaping dependence have been assessed using conventional regression, these approaches have limitations when dealing with a large number of potential predictors and complex non-linear interactions. Furthermore, a practical model capable of accurately identifying adolescents at risk of vaping dependence has yet to be developed to allow for a timely intervention. As such, the overarching aim of this project is to develop and validate a random forest-based machine learning model to predict the status of frequent vaping—defined as nicotine-containing vaping in 20 or more days in the past 30 days—in 6 months after baseline. Using this model, we further identify the top predictors of frequent vaping and important interactions formed by sociodemographic variables to characterize vulnerable subgroups. METHODS: Using the longitudinal survey data from the Los Angeles-based Happiness and Health Study, we focused on 12th-grade students who had ever tried an e-cigarette and followed them for 6 months. A wide range of 130+ candidate predictors were entered into a cross-validation process to construct and validate a random forest model. In a post-hoc analysis, we identified the top individual predictors of 6-month frequent vaping and depicted interactions formed by sociodemographic variables using a partial dependence-based method. KEY FINDINGS: Among the 1281 ever-vaping 12th-grade students in the cohort, 40 (3.1%) reported frequent vaping at the 6-month follow-up. When compared to their infrequent vaping counterparts, frequent vapers were more likely to be male (80.0% vs. 46.3%), Asian (25.6% vs. 11.5%) or Native American/Pacific Islander (15.4% vs. 5.1%), and recipients of full-cost (61.8% vs. 43.8%) rather than reduced cost or free lunch at their school. Mothers of frequent vapers were more educated (percentage of mothers with high school diploma 89.1% vs. 75.8%), although the difference was absent between fathers (p-value=0.2). Frequent vapers also reported experiencing discrimination more often by scoring higher on average on the Everyday Discrimiation Scale (mean score=1.21 vs. 0.82). The random forest model demonstrated high predictive performance by reaching a test C-index of 0.87. Using 0.25 as a decision threshold, this model had sensitivity of 0.88 (95% CI 0.62-0.98), specificity of 0.80 (95% CI 0.76-0.83), and accuracy of 0.80 (95% CI 0.76-0.83). Higher past-month nicotine concentration in vape, more daily vaping sessions, greater self-reported nicotine dependence, increased willingness to vape, more past-month puffs per vape, high perceived discrimination, negative cigarette smoking expectancies, past-month use of nicotine in vape, higher percentage of students receiving reduced-cost lunch at school, and past-month use of marijuana in vape were the top ten most important individual predictors of 6-month frequent vaping. Interactions were found between age and perceived discrimination and between age and race/ethnicity; specifically, students who were younger than their classmates and either reported experiencing discrimination often or identified as Asian or Native American/Pacific Islander were at increased risk of becoming frequent vapers in 6 months. IMPLICATIONS: This study demonstrates the utility of machine learning in predicting the status of frequent vaping over 6 months and understanding predictors and nuanced intersectionality by sociodemographic attributes. The high performance of the random forest model has practical implications for a personalized risk calculator that supports a vaping prevention program. Public health officials need to recognize the importance of social factors that contribute to frequent vaping, particularly perceived discrimination. Youth subpopulations, including younger high school students and Asians or Native Americans/Pacific Islanders, might require specially designed interventions to help prevent habit-forming in vaping. PUBLICATIONS: Dr. Fu was the lead author of an article resulting from her pilot project work. The article, published in Nicotine & Tobacco Research, is entitled “A Machine Learning Approach to Identify Predictors of Frequent Vaping and Vulnerable Californian Youth Subgroups”.
View Bio +
Dr. Rui “Ray” Fu
Dr. Rui “Ray” FuUniversity of Toronto
Dr. Rui Fu (Ray) is a Postdoctoral Fellow in Evaluative Clinical Sciences at Sunnybrook Research Institute, University of Toronto, Canada. She is also a Postdoctoral researcher with the Centre for Addiction and Mental Health (Toronto, Canada) where she studies e-cigarette use and addiction among youth and young adults. As a health services researcher, Ray has a passion for developing and creatively applying statistical methodology, including machine learning, to analyze real-world data. Substantively, she has delved into many fields of application by being the primary responsible statistician on the team. Her overarching goal is to produce theory-driven, interpretable, and reproducible findings that can advance healthcare policy making and quality of care.