Computational Approaches for Cancer Workshop 2023
CAFCW23
Filter displayed posters (48 tags)
Hyperparameter Optimization for deep-learning models predictive of anti-cancer drug responses
Rylie Weaver, Rohan Gnanaolivu, Rajeev Jain, Chen Wang, Oleksandyr Narykov
The inputs of the developed HPO framework include DL models containerized in a singularity instance, specified hyper-parameters, as well as value ranges for constraining optimization searching space. Within the HPO framework, we devised a genetic algorithm (GA), which firstly initializes a population of hyper-parameter configurations inside the hyperparameter space, and then simulates evolution of configurations through iterations of mutation, mating, and selection. According to a fitness function corresponding to the DL model's validation loss, we designed the entire GA process to iteratively improve configurations and choose the best HPO solution over multiple evolution generations.
As the use-case evaluation, our developed HPO framework is applied to DrugCell and DeepTTA DL models. DrugCell is an interpretable method that constructs a visible NN following gene ontology relationships and predicts anti-cancer drug responses by incorporating biochemical properties and structures of compounds. DeepTTA’s focus is a creation of robust drug representations, which is achieved via transformer architecture applied to Explainable Substructure Partition Fingerprints (ESPF). This method also utilizes gene expression data from tumors and creates a separate embedding for biological samples. Then both drug and tumor representations are used for regression purposes. We examined a set of critical hyper-parameters of these models, which are learning rate, batch size, optimizer, and dropout using training-testing-validation schema on the CCLE and GDSC cancer cell line / drug datasets. We achieved improvement in the concordance of predicted CCL-drug response versus measured ones with HPO compared to the default settings and a random search in both models. Through the landscape analysis of HPO results, we identified that the learning rate and batch size were the most influential hyper-parameter for performance out of the ones chosen. Our study demonstrates the importance of HPO and robustness of GA in finding optimal hyperparameters for anti-cancer drug research, as well as some hyperparameter configuration guidance for researchers with similar models. Moreover, after improvement with HPO, the interpretability and effectiveness of our models give further insights into the effectiveness of cancer drugs and the mechanisms behind cancer-drug prediction.
https://clinicalunitmapping.com/show/COVID19_Ensemble_Latest.html
Jacob Barhak
The model performs simulation at the individual level while modeling entire populations using the MIcro-Simulation Tool (MIST), employing High Performance Computing (HPC), and using machine learning techniques to combine models.
The Reference Model technology was transformed to model COVID-19 near the start of the epidemic. The model is now composed of multiple models from multiple contributors that represent different phenomena: It includes infectiousness models, transmission models, human response / behavior models, mortality models, and observation models. Some of those models were calculated at different scales including cell scale, organ scale, individual scale, and population scale.
The Reference Model has therefore reached the achievement of being the first known multi-scale ensemble model for COVID-19. This project is ongoing and this presentation is constantly updated for each venue. To access the most recent publication please use this link https://www.clinicalunitmapping.com/show/COVID19_Ensemble_Latest.html
This is an interactive presentation - please explore the tabs above and interact with the figures - they have sliders and widgets and hover information that will allow interaction. Following the tabs in order from left to right will tell the story
Finding Novel Drug Discovery Experiments with QSAR
John Marinelli, Thomas Passaro, Rida Saifullah, Daniel Salinas Duron
Influencing factors on false positive rates when classifying tumor cell line response to drug treatment
Priyanka Vasanthakumari, Thomas Brettin, Yitan Zhu, Hyunseung Yoo, Maulik Shukla, Alexander Partin, Fangfang Xia, Oleksandr Narykov, Rick L. Stevens
Building an Online Interactive Volumetric Surface Viewer to Visualize the Spatial Distribution of Brain Metastases
William Delery, Ricky Savjani
Current management and treatment of cancers rely on individual physician expertise, years of training and experience, and integrated feedback from successful treatments versus those that induced significant toxicities. This clinical gestalt, however, is difficult to access, quantify, and teach. What if vital information from every cancer patient ever treated could be accessed instantly?
We propose to build an interactive visual search database for brain metastases. This transcends text-based spreadsheets to allow interrogation of population-based responses in an intuitive web viewer. Clinicians would be able to dynamically view the spatial distribution of tumors and corresponding radiotherapy treatments on a 3D surface. We propose to build the backend of the database to inspect clinical outcomes: One could click a region of the brain and know how likely patients with brain metastases in this region are to experience seizures, radiation necrosis, local/distant recurrence, and death. All of this information would be available in an intuitive visual online portal, facilitating dynamic data-driven treatment decisions.
Materials/Methods: We first built a custom VoxelMorph model to non-linearly register patient MRI T1w pre-contrast MRIs onto a standard template brain. This framework integrates velocity fields into a deep U-Net to be able to register brain MRIs onto an atlas in under 1 second on a GPU. We used an NVIDIA DGX A100 Station to train models using over 1,000 patients to normalize brains onto a common atlas.
Results/Conclusion: We have piloted our workflow on 647 patients with intracranial brain lesions treated with radiotherapy and created a population surface viewer on PyCortex. We thresholded the dose maps to the 95%, binarized them to create masks, and generated a spatial distribution of brain metastases. We have launched this interactive viewer here.
This will allow, for example, a clinician to quickly see the spatial distribution of which areas in the brain are most likely to induce seizures. These data combined will also help differentiate radiation-induced changes from tumor progression. Studies on autosegmentation of brain metastases can be rapidly deployed on our data and improved upon. Recent work on predicting the primary site of origin can also be extended. We have collected longitudinal imaging for patients at their clinical standard of care three-month follow-up imaging and plan to evaluate and visualize the evolution of these tumors spatially. Our data would be the largest in class to involve over 1,000 patients treated over 10 years, extending the work of small recent studies showing the spatial distribution of brain lesions.
Exploration of the containerized ATOM Modeling Pipeline's accessibility and cross-compatibility
J. Jedediah Smith
Data Modeling and Analytics Towards Patient Selection For Cancer End-Of-Life Care Study
Denise S. Davis, PhD Candidate in Health Informatics
Context: Our multidisciplinary research team was comprised of medical oncologist, palliative care physician, statistician, health informatician and data engineer(doctoral student). Data acquisition for this study was driven by the team's collaborative efforts.
Methods: First, the research grant requirements were analyzed and compared to existing EPIC Cogito data elements. Next, manual retrospective chart review was conducted on 121 decedents from September 2022 to December 2022. Taking both into consideration, a multidimensional relational database was designed to enforce integrity, simplify querying, reporting and visualizations, and faster aggregation. Extract-Transform-Load (ETL) processes were written in SQL to transform cancer registry data into this analytical clinical schema. ETL processes included one hot encoding to process sparse data that EHRs (electronic health records) are notorious for and capture of timed events that are used as indicators of EOL Care. Patient inclusion and exclusion criteria were based on measures of aggressive EOL care defined in the grant, such as more than one hospital stay in the last thirty days. Additional factors included anomalies and missing values discovered during the data harmonization process. Incomplete data elements were analyzed by the medical oncologist and palliative care physician for the suitability of inclusion based on their clinical knowledge. The ETL process was developed iteratively according to the system development life cycle.
Results: The entire last quarter of 2022 were extracted, transformed and analyzed resulting in 789 decedents. Then, the last seven years, 2016-2022, were processed resulting in 15,306 decedents. Large imbalances with respect to race and ethnicity were observed in both datasets. However, a gender balance was observed in both datasets. Preliminary hypothesis testing between the last quarter of 2022 to the last seven years demonstrated the proportion of decedents with more than one hospital stay in last 30 days was not significantly different from 53.9% observed in last seven years.
Conclusion: Real-world data can be useful for integrating into clinical research. There is much planning and research upfront to prepare the data before it can be used. Multidisciplinary collaborations can help demystify patient records to avoid misuse and biases associated with secondary uses. The clinical research development framework using real-world data involves analytical algorithms, data management systems, data harmonization, human-centered applications, visualization tools, clinical knowledge, shadowing, and conversations. Data modeling and analytics are important preparation steps towards unlocking knowledge in real-world data.
Disclaimer: This work was supported by a grant from BDHSC and received approval from institutional IRB to use decedent data as the source of research.
Diabetes and the Social Link to Cancer
Victoria M Conerly
Predicting HOMO-LUMO Gap Values Using An Advanced Machine Learning Platform for Drug Discovery: ATOM Modeling PipeLine (AMPL)
Renate Toldo, Sarah Norris, Justin Overhulse, Ph.D., Chloe Thangavelu, Ph.D.
Comparison of neural networks with tree-based machine learning approaches for predictive drug response models
Vineeth Gutta, Satish Ranganathan, Sara Jones, Matthew Beyers , Sunita Chandrasekaran
Prediction of key metastatic genes in head and neck squamous cell carcinoma (HNSCC) using a deep learning based context-aware foundation model for network biology
Tarak Nandi, Christina Theodoris, Alex Rodriguez and Ravi Madduri
Geneformer is pretrained on ~30 million diverse human single cell transcriptomes (using cells with low mutational) to learn fundamental properties of gene network dynamics and gene hierarchy, and fine-tuned versions of the model have been shown to perform well for several genomic tasks including identification of therapeutic targets for cardiomyopathy. For each single-cell transcriptome, the model takes as an input a list of genes sorted by their normalized expression values and embeds each gene into a 256-dimensional space that encodes the characteristics of the gene specific to the context of that cell. Subsequently, the embeddings of the genes expressed in each cell are integrated to generate cell-level embeddings that encode the characteristics of the cell state.
We extend the scope of the Geneformer model by re-training it (using a transfer learning approach) using a corpus of approximately 6 million single cells (normal and cancer cells combined) spanning 10 cancer types [2] to teach the model to delineate healthy and cancerous states. Subsequent fine-tuning on comparatively much smaller amounts of scRNA-Seq data collected from both primary tumor sites, and (early) metastatic sites (cervical lymph node) for HNSCC [3] enabled Geneformer to distinguish between primary tumor and metastatic tumor cells, and the gene hierarchy involved in metastasis. The Geneformer model is particularly effective in sparse disease-specific data settings, thanks to its knowledge about gene network dynamics acquired through extensive pretraining across a wide variety of cells.
We present results from an in-silico perturbation approach aimed at identifying genes that drive HNSCC metastasis, particularly epithelial to mesenchymal transition (EMT), that confers migratory and invasive properties to cancer cells. Genes from the cells collected from the metastatic site, whose deletion from the model shifts the cell-level output embeddings towards those corresponding to the non-metastatic state (or the over-expression of which shifts the embeddings of the primary cells towards those of the metastatic cells), are considered potential therapeutic targets.
The next phase of our work will involve the creation of a more extensive and refined cancer cell dataset to improve cancer cell specific predictions, a more detailed study of the genes identified as key for EMT to identify the affected pathways and confirm their relevance, and analyze the model's attention weights to predict gene-gene interactions driving metastatic transition.
References:
1. C. V. Theodoris, L. Xiao, A. Chopra, M. D. Chaffin, Z. R. Al Sayed, M. C. Hill, H. Mantineo et al., "Transfer learning enables predictions in network biology," Nature, vol. 123, pp. 1-9, 2023.
2. Chan Zuckerberg Initiative, "CZ CELLxGENE Discover," Online: https://cellxgene.cziscience.com/, Accessed: Aug. 11, 2023.
3. H. S. Quah, E. Y. Cao, L. Suteja, C. H. Li, H. S. Leong, F. T. Chong, S. Gupta et al., "Single cell analysis in head and neck cancer reveals potential immune evasion mechanisms during early metastasis," Nature Communications, vol. 14, no. 1, p. 1680, 2023.
Transformer Based Reinforcement Learner for Dynamic Cancer Treatment
Sarang Gawane, Xinhua Zhang, Guadalupe Canahuate, Andrew Wentzel, Clifton Fuller, Mohamed Naser, Elisa Tardini, Lisanne Van Djik, Abdallah Mohammed
The Transformer based Meta-Reinforcement Learner (TMRL) is a novel solution for DTR of cancer patients by harnessing the computational prowess of the Encoder Transformer architecture. Coupled with the Meta-learning framework and Reinforcement Learning algorithms we can train our model using the medical history of patients previously subjected to treatment. The idea is to have a framework that has a bilayered understanding of the various medical decisions taken while treating the patient. We utilised the dataset curated by the University of Texas MD Anderson Cancer Data Centre between the years 2005 to 2013. The dataset contains toxicity levels, medical history and treatment procedures conducted on 536 patients suffering Head and Neck Cancer. As in the existing setup, decisions of whether or not to proceed with certain medical procedures such as Radiotherapy, Concurrent Chemotherapy and Induction Chemotherapy are generally taken by a diverse board of clinicians where a large number of metrics and pre-treatment variables are analysed over broad insights. It is thus apparent that, as the treatment is conducted mostly on a case-by-case basis, the Meta-RL framework could be considerably effective in tackling such a problem.
We therefore frame this problem into a Markov Decision Process where the patients' conditions represent the state of the agent, the medical interventions its actions and the reward is the weighted sum of two conflicting outcomes namely 'Survival rate' and 'Quality of life'. The former refers to the longevity of the patient's life that could be improved at the cost of the latter factor, or compromised leading to a relatively painless but shorter life span. Patients generally employ different preferences of tradeoff.
The Meta Transformer architecture premised on the idea that it can create using its attention mechanism a memory contextualization of all the various treatment scenarios, along with the tradeoff preferences of a similar cohort of patients. By employing proximal policy optimization, we trained the Transformer to find a policy that delivers, compared with the state of the art, superior cumulative rewards over multiple epochs and across various difficulty levels of distinguishing the preference factor. The attention weights also allowed us to locally interpret and visualize the learned treatment policy based on similar patients' experiences. Due to the large state space, we conducted the experiment on the COMPaaS DLV system, a high-performance computing platform at the Electric Visualization Lab at the University of Illinois Chicago, which is equipped with 64x Nvidia V100 and T4 GPUs.
Transformer Based Reinforcement Learner for Dynamic Cancer Treatment
Sarang Gawane, Xinhua Zhang, Elisa Tardini, Guadalupe Canahuate, Abdallah S R Mohamed, Lisanne Van Dijk, Clifton D Fuller G Elisabeta Marai, Mohamed Naser
The Transformer based Meta-Reinforcement Learner (TMRL) is a novel solution for DTR of cancer patients by harnessing the computational prowess of the Encoder Transformer architecture. Coupled with the Meta-learning framework and Reinforcement Learning algorithms we can train our model using the medical history of patients previously subjected to treatment. The idea is to have a framework that has a bilayered understanding of the various medical decisions taken while treating the patient. We utilised the dataset curated by the University of Texas MD Anderson Cancer Data Centre between the years 2005 to 2013. The dataset contains toxicity levels, medical history and treatment procedures conducted on 536 patients suffering Head and Neck Cancer. As in the existing setup, decisions of whether or not to proceed with certain medical procedures such as Radiotherapy, Concurrent Chemotherapy and Induction Chemotherapy are generally taken by a diverse board of clinicians where a large number of metrics and pre-treatment variables are analysed over broad insights. It is thus apparent that, as the treatment is conducted mostly on a case-by-case basis, the Meta-RL framework could be considerably effective in tackling such a problem.
We therefore frame this problem into a Markov Decision Process where the patients' conditions represent the state of the agent, the medical interventions its actions and the reward is the \emph{weighted} sum of two conflicting outcomes namely 'Survival rate' and 'Quality of life'. The former refers to the longevity of the patient's life that could be improved at the cost of the latter factor, or compromised leading to a relatively painless but shorter life span. Patients generally employ different preferences of tradeoff.
The Meta Transformer architecture premised on the idea that it can create using its attention mechanism a memory contextualization of all the various treatment scenarios, along with the tradeoff preferences of a similar cohort of patients. By employing proximal policy optimization, we trained the Transformer to find a policy that delivers, compared with the state of the art, superior cumulative rewards over multiple epochs and across various difficulty levels of distinguishing the preference factor. The attention weights also allowed us to locally interpret and visualize the learned treatment policy based on similar patients' experiences. Due to the large state space, we conducted the experiment on the COMPaaS DLV system, a high-performance computing platform at the Electric Visualization Lab at the University of Illinois Chicago, which is equipped with 64x Nvidia V100 and T4 GPUs.
Towards Physiology and Synthesis-Informed Generative Modeling in Drug Discovery.
Nolan English, Belinda Akpa, Zach Fox
Coupling generative AI with graph-based retrosynthetic and differential equation-based human systems models could address these shortcomings. Retrosynthetic pathway models establish whether a path exists to synthesize a given molecule from known building blocks, and these models can score synthesis pathways based on important factors such as cost. Human systems models, especially physiologically based pharmacokinetic (PBPK) models, are crucial in predicting drug disposition in specific tissues, including tumors, and their potential effect on disease. While retrosynthesis and human systems models could address two key barriers to successful molecular design, these models are computationally expensive, and thus they create bottlenecks in generative modeling workflows.
In this presentation, I will discuss the challenges faced with embedding cross-discipline knowledge into a generative modeling framework. I will introduce how physiologically based pharmacokinetic (PBPK) models can better inform the optimization criteria for generative drug discovery. I will discuss how retrosynthetic analysis can pre-filter drug candidates by synthetic accessibility as part of the generation process. Lastly, I will present how we have integrated these models into our generative modeling framework with a client-server approach meant to compensate for the significant differences in runtime. This modular framework not only paves the way for more effective drug design through the inclusion of future problem-specific models but also allows for easier interdisciplinary collaboration within the same framework.
GDC-GPT(v0.2) A large language model for querying the Genomic Data Commons
Aarti Venkat, Anirudh Subramanyam, Robert Grossman
Large language models (LLMs) have revolutionized the field of natural language processing, but their potential for cancer research is only beginning to be understood. Here, we propose a simple framework for continual pre-training of GPT-2 on data from the Genomic Data Commons (GDC), which contains genomic data from 78 projects spanning over 86,000 cases. By utilizing various API endpoints in GDC, we demonstrate a method of composing a training corpus with clinical and genomic data, consisting of gene level somatic mutations and their impact on transcript function, exposure information such as alcohol history, demographics and ethnicity, pathological stage of mutation, primary diagnosis and associated treatment. Preliminary evaluations on randomized chunk completion prompts result in an accuracy of 99.6, suggesting the model is capable of generating accurate variant annotations and clinical descriptions of chromosomal mutations and outperforms the baseline GPT-2 model. The framework we propose can be easily extended to add additional training data from other multi-omic API endpoints in the GDC such as gene expression, copy number variants and survival and provide insights on strategies for fine tuning an LLM on data from a commons. The GDC-GPT(v0.2) model could provide a simple means to search and explore data in the GDC, including its user-friendliness for researchers unfamiliar with the APIs
Evaluating Algorithmic Bias on Triple-Negative Breast Cancer Data in Six SEER Registries
Jordan Tschida∗, Mayanka Chandrashekar∗, Alina Peluso∗, Zachary Fox∗, Charles Wiggins†, Antoinette M. Stroup‡, Stephen M. Schwartz§, Eric B. Durbin¶, Xiao-Cheng Wu∥, Heidi A. Hanson∗
Clinical relevanc— Investigating AI model performance on minority populations ensures racial disparities in data are not exacerbated when deployed in a clinical setting.
Enhancing Authenticity in Cancer-Related Information Retrieval Using Retrieval Augmented Generation LLM Framework
Ashish Mahabal(1), Asitang Mishra(2), Kristen Anton(3), Maureen Ryan Colbert(3), Sean Kelly(2), Heather Kincaid(2), Daniel Crichton(2), and the EDRN Team
Importantly, our methodology's intrinsic design allows for adaptability, rendering it applicable across various domains in the broader scientific spectrum. Through this strategy, we aim to ensure that the information generated is both authentic and reflective of established scientific knowledge, thereby aiding in the quest for trustworthy computational insights in the fight against cancer. Our method also offers an avenue for illuminating gaps in our current understanding. By examining the inconsistencies and areas where the model struggles, we can identify uncharted territories in the literature that may require further empirical scrutiny. This may motivate relevant areas for future research, driving innovation in the field.
Our initial application centers on the pivotal research on biomarkers for the early detection of cancer. We are utilizing the large body of data and published results generated by the NIH-funded Early Detection Research Network (EDRN) consortium, for which we serve as the informatics center. In particular, in the current study we can point to underuse of certain data types, e.g., genomic, proteomic, etc. by comparing EDRN output with that of broader NCI initiatives for which data is publicly accessible, like the Human Tumor Atlas Network (HTAN). Our method may point to lacunae and/or contradictions within the corpora. We will explore these as future steps. We welcome feedback that might further solidify the authenticity of generated content and pave the way for groundbreaking discoveries in computational cancer research including early detection.
Quantum-Assisted Prediction of Pharmacokinetic Parameters for Plant-Based Small Molecules Targeting Cancer Protein using ATOM Modeling PipeLine (AMPL)
Priyanka Banerjee3, Vijay P Bhatkar10, Anagha Bhuvanagiri6, Saanvi Gadila6, Jaspreet Kaur Dhanjal2, Dimple Khona6, H Kim Lyerly11 , Asheet Kumar Nath1, Ana Maria Lopez5, Amita Pathak6, Koninika Ray6, *Amit Saxena1, Smita Saxena4, Akshay Seetharam6, Anil Srivastava6, Eric Stahlberg9, Aanya Tiwari6, Richa Tripathi7, Zhao Zheng8
The initial phase of our approach involves the meticulous curation and standardization of plant-based small molecules or phytochemicals known to possess anti-cancer activity, mediated through specific protein targets. Subsequent data refinement and exploratory analysis for chemical space exploration has contributed to an improved understanding of the dataset's underlying characteristics. To augment ligand representations and optimize model performance, we employ quantum feature mapping using prominent quantum computing libraries.
Building upon the enriched ligand representations, we proceed to train the ML Regressor. By seamlessly integrating quantum-enhanced features, our approach harnesses the inherent power of quantum computing to capture intricate ligand-protein interactions, significantly elevating prediction accuracy. The results of our study showcase the immense potential of quantum machine learning in the domain of drug discovery and development. The fusion of quantum feature mapping with classical machine learning enables more accurate predictions of critical safety and pharmacokinetic parameters for phytochemicals targeting the cancer proteins. This methodology not only advances our understanding of quantum-assisted drug discovery but also presents a transformative avenue for the identification and optimization of therapeutic agents derived from natural products.
PieVal: an Open-Source, Efficient, Secure, Gamified, Rapid Document Classification Annotation Tool
Albert William Riedl, MS; Aaron Seth Rosenberg, MD; JP Graff, DO; Matthew S Renquist; Joseph M Cawood; Nicholas R Anderson, PhD
ClinicalUnitMapping.Com Takes a Small Step Towards Machine Comprehension of Clinical Trial Data
Jacob Barhak & Joshua Schertz
The intention is to unify unit standards and machine learning tools that will be able to map all units reported by clinical trials. With such capabilities, the data in this important clinical trials database would become machine comprehensible.
This is an interactive presentation - please explore the tabs above and interact with the figures - they have sliders and widgets and hover information that will allow interaction. Following the tabs in order from left to right will tell the story