Students tackle drug resistance by teaching machine learning
![Two students talking to a third student sitting at a microscope](/sites/default/files/styles/sf_state_1440x564/public/images/BiologyLab_1200x700.jpg?h=33ada183&itok=qVzxPg5g)
SFSU researchers have published a step-by-step tutorial for applying machine learning to drug resistance
Antimicrobial resistance is a growing health crisis that could lead to millions of deaths by 2050, according to the World Health Organization. Antibiotics are critical for human health, but many microbes are evolving resistance to one or more drugs. San Francisco State University researchers are among those using machine learning to predict drug resistance in patients. And they’re trying to remedy a related problem, too: the lack of resources that teach how to use machine learning to detect antibiotic resistance.
In a new paper in PLOS Computational Biology, the SFSU team published a step-by-step machine learning tutorial for beginners. Other than Biology Professor Pleuni Pennings, the remaining seven researchers on the paper were undergraduate, graduate students and post-baccalaureate students; many were first-time researchers, and nearly all were new to machine learning.
“We wanted to do a tutorial paper instead [of a research paper] because we thought it was more important to put out a teachable resource. We struggled to find one, so we wanted to make our own,” said co-first author Faye Orcales (B.S., ’21), who worked on the project as a post-bac.
As beginners from a variety of backgrounds, the team made sure the paper would be accessible to their student peers and educators in biology and chemistry as well as anyone in health sciences. Though the lesson is beginner friendly, the authors recommend having introductory coding knowledge, something that is beyond the scope of this paper.
“Because it’s in a peer-reviewed journal, it makes it feel real because other scientists — not just your professor or friends — reviewed the article. The peer review process was crucial because it gives other perspectives,” said co-first author Lucy Moctezuma, a Statistics graduate student at CSU East Bay who has a background in psychology. She joined Pennings’ SFSU lab through a friend and was part of the lab for nearly three years. She and Orcales led the effort to write the manuscript and address any feedback. “We were a bunch of students trying to figure it out and we were able to! I think that we should all be proud of that,” Moctezuma said.
Using a previously published data set — comprised of 1,936 E. coli strains from patients that were tested against 12 antibiotics — the students developed a step-by-step tutorial for four different popular machine-learning models to predict drug resistance to E. coli. To improve accessibility, they used Google Colab, a free, cloud-based platform to write and run Python codes — which means users don’t have to install software to follow the tutorial. The SFSU team provided six free Google Colab “notebooks” with tutorials: one for each of the four models (logistic regression, random forests, extreme gradient-boosted trees and neural networks) plus two for data preparation and result visualization.
![Eight SFSU students and Professor Pleuni Pennings wearing CODE lab sweatshirts](/sites/default/files/images/CODELab_600x350.jpg)
Left to right: Students MaryGracy Antony, Faye Orcales, Lucy Moctezuma, John Matthew Suntay, Florentine van Nouhuijs, Meris Johnson-Hagler, Jameel Ali, Kristiene Recto and Professor Pleuni Pennings (sitting). Photo courtesy of Faye Orcales.
“The students may not realize that it’s sort of bold [to submit this paper to PLOS]. It just shows that we do very high-quality work,” said Pennings, adding that the students really took ownership over the writing and pushing the manuscript forward.
Collaborating with faculty in Biology, Computer Science and Chemistry & Biochemistry, Pennings is the director or co-director for the undergraduate Promoting Inclusivity in Computing (PINC) program, graduate complement Graduate Opportunities to Learn Data Science (GOLD) and Science Coding Immersion Program (SCIP), an all-virtual, self-paced coding program for students, staff and faculty. All the student researchers initially learned coding and/or machine learning from one of these programs and then continued to develop their skills via longer-term research experiences.
“One of my motivations to making all of these materials is because I’m teaching these classes and I wish there was a book about machine learning for health or biology. Something that is doable, fun and relevant. Something that’s intuitive, practical and discusses the ethical side,” said Pennings, noting that she’s already using this published tutorial in her classes.
“When I joined the PINC program, I could see that the instructors were motivated to teach coding in a very accessible way to Biology students. I felt really comfortable in the program because my peers were fellow biologists eager to learn,” said Orcales, now a computational scientist at UCSF applying to Ph.D. programs. She hopes this new tutorial will help introduce more of her peers into the machine-learning space. “I hope our readers take away that machine learning isn’t this daunting difficult thing to learn when you have the right resources.”
Visit SFSU’s Department of Biology to learn more about student opportunities like Promoting Inclusivity in Computing (PINC), Graduate Opportunities to Learn Data Science (GOLD) and Science Coding Immersion Program (SCIP).
Tags