TCORS: CAsToR Pilot Project Program Lightning Talk Sessions

“Predicting smoking behaviors using machine learning” with Dr. Mona Issabakhsh (Georgetown University) and Dr. Thuy Le (University of Michigan)

Slides (PDF)
Slides (PDF)
Contact the presenters: Dr. Mona Issabakhsh <mi416@georgetown.edu>, Dr. Thuy Le <thuyttle@umich.edu>
View Abstract +: The United States’ smoking prevalence has significantly decreased over time (from 23.3% in 2000 to 13.7% in 2018). Cigarette smoking, however, is currently responsible for about 480,000 deaths annually and is still a major public health issue. Identification of factors and policies driving the transition of individuals between never smokers, current smokers, and former smokers is a critical need. Machine learning has been investigated widely in the last decades in various research studies and can recognize patterns and detect complicated relationships among data features, which humans are not able to do, to make accurate estimations, predictions, and decisions. Several studies have recently started to apply machine learning algorithms in tobacco research, such as smoker status classification from narrative clinical texts, and tobacco-related outcome prediction using administrative, survey, or clinical trial data. Current literature on smoking prevalence mainly employs mathematical and statistical models, accounting for few predictors (e.g., age, sex, and race). Complex multistate Markov models are established for smoking prevalence prediction, considering more predictors (e.g., age, sex, race, education, and income). These models focus only on a limited number of factors and policies to explain the transition of individuals between never, current, and former smokers. Another disadvantage of these models is that the state transition rates must be estimated, which increases the complexity of the model development. Machine learning algorithms make use of a flexible model structure with little or no parameter estimations, allowing for rapid updates and modifications. Machine learning also enables more efficient use of massive data for tobacco research by accounting for multiple predictors and policies and the discovery of complex patterns in large datasets to produce high-quality estimations and predictions. Given the enormous data on tobacco use, both cross-sectional and longitudinal, machine learning is a promising tool to leverage all the available data. For this study, we will apply machine learning classification models to group individuals by smoking status (never, current, and former cigarette use). Classification is a supervised machine learning methodology that determines which class the dependent variable (response) belongs to, based on one or more independent variables (predictors). Classification algorithms involve predicting a qualitative response for an observation, or in other words, assigning the observation to a category or class. A classification algorithm learns from labeled data. After understanding the data, it determines how to best map input data to specific class labels by associating patterns to the unlabeled new data and learns how to assign labels to the new data. The dataset, therefore, must sufficiently represent the problem and have multiple examples of each class label. Many possible classification techniques or classifiers are available in the literature. The specific aims of this project are: Aim 1: Develop and train machine learning classifiers using the PATH data from earlier waves ( waves 1-2) to find the key variables involved in the transition of individuals between never smokers, current smokers, and former smokers. Expected Outcomes. The model of each classifier will be developed with relevant features from the PATH survey, and the classifiers will be trained to detect the smoking status of individuals, using the trained dataset. Aim 2: Compare the performance of the trained classifiers and select the best one(s). Expected Outcomes. The best classifier(s) will be selected to predict the smoking behavior of individuals. Aim 3: Validate and test the performance of the selected classifiers using the PATH data from later waves (waves 3-5). Expected Outcomes. This study produces an exploratory model to understand how individuals’ transitions between never, current, and former smokers happen over time, and to provide insights into which attributes and policies are relevant to an individual's decision to initiate or quit smoking.
View Bio +: Dr. Mona Issabakhsh (Georgetown University) and Dr. Thuy Le (University of Michigan)
Dr. Issabakhsh is currently a Research Instructor in the Department of Oncology at Georgetown University’s School of Medicine and a member of the Center for the Assessment of Tobacco Regulations (CAsToR). Her work is focused on tobacco simulation modeling and using machine learning algorithms to predict and analyze tobacco use behaviors. Dr. Issabakhsh has a BS, MS, and PhD degree in Industrial Engineering. During her undergraduate and graduate studies, she gained experience in developing operations research, data analytics, and machine learning tools for studying medical decision making and healthcare planning problems. She received her PhD from the University of Miami in Industrial Engineering, focusing on simulation-based optimization for outpatient scheduling. Dr. Le is an assistant research scientist at the University of Michigan Department of Health Management and Policy. Dr Le is also a member of the UM/Georgetown TCORS Center for the Assessment of Tobacco Regulations (CAsToR). Dr. Le is interested in mathematical modeling for cancer- and tobacco-related problems, and machine learning applications in tobacco regulatory science. Dr. Le has developed mathematical models to evaluate the benefits and harms of breast cancer mammography and predict the number of white blood cells during acute lymphoblastic maintenance therapy in children. Dr. Le's recent work focuses on employing mathematical models to quantify the burden of menthol cigarettes on public health and estimate the smoking cessation rate. Dr. Le is working on applying machine learning techniques to predict and understand smoking behaviors.