Defense: Predicting and Understanding Outcomes by Leveraging Chemical Structures and Molecular Information with Interpretable Deep Learning

Speaker Name
Ioannis Anastopoulos
Speaker Title
Ph.D. Candidate
Speaker Organization
Biomolecular Engineering and Bioinformatics Ph.D.
Start Time
End Time
Location
Virtual Event

Join us on Zoom: https://ucsc.zoom.us/j/95877681270?pwd=bVdtbkEyT3F4Y0ExSEZhSVNGdTUrdz09 - Passcode: 534385

Abstract: As biological data become more readily available and more convoluted, equally involved methods are needed to predict and understand outcomes in biological systems. Classical machine learning methods are not well suited for prediction tasks that need to integrate heterogeneous sources of information to predict the target variable. Deep learning is capable of integrating such disparate inputs with impressive results.

Here, I present my work in integrating cancer cell line transcriptomic information with the chemical structure information of the perturbagen they were treated with. This work leverages recent developments in deep learning for aligning domains (here cell lines and patients) in a data-driven way and advanced featurization of molecules. I show that by integrating these methods predicting drug response in patients is improved compared to more conventional methods. This model can be used to identify therapies a patient might be more susceptible to by using only transcriptomics and chemical information.

Further, I developed a model for leveraging chemical information of a variable number of drugs, along with demographic and genetic information, on a per patient basis to predict probability of death and length of hospitalization. I experimented with traditional molecular representations and compared the performance of the model with graph convolution featurization. I found that graph convolution provides a superior encoding to the static ones that Morgan fingerprints produce leading to more accurate predictions.

Lastly, I present my work on applying chemical featurization on nanopore sequences to de novo model nucleotide modifications. Given the polynomical nature of the possible modifications, producing gold standard training data to identify such events is a daunting task. I show that knowledge learned on chemical features in the canonical (un-modified) context can be transferred to identify nucleotide modifications with a high degree of accuracy.

Graduate Program
Biomolecular Engineering and Bioinformatics Ph.D.