CoDEx 2026 Poster Session

10:30 a.m. to Noon - Louis Room

Accelerating LLM Inference by Training Custom Speculative Decoding Models

Doğaç Eldenk, Master's Student, McCormick School of Engineering and Applied Science, et al.

Abstract: This research investigates methods to accelerate LLM inference by training custom "draft" models that leverage a target model's internal representations through speculative decoding. We evaluate how different architectural decisions influence performance, while highlighting the significant challenges of training more complex designs. By analyzing where various configurations succeed or fail, our work provides a deeper understanding of the underlying mechanisms that affect token acceptance and the practicalities of model training.

Ambient air quality and health impacts of PM2.5 from US residential wood combustion

Kyan Shlipak, Undergraduate Student, McCormick School of Engineering and Applied Science, et al.

Abstract: We model the air quality and health impacts of winter fine particulate matter (PM2.5) from residential wood combustion. Using the two-way coupled Weather Research and Forecasting (WRF) and Community Multiscale Air Quality (CMAQ) model at the 4 km resolution for the contiguous United States, we find that RWC contributes 2.43 µg/m3 (21.9%) of winter population-weighted mean PM2.5 concentrations and is associated with ~8,600 (95% CI 6,500 – 9,600) premature deaths per year.

BiomechGPT: Towards a Biomechanically Fluent Multimodal Foundation Model for Clinically Relevant Motion Tasks

Ruize Yang, PhD Student, Feinberg School of Medicine, et al.

Abstract: We developed BiomechGPT, a multimodal motion-language model that treats human movement as an additional “language,” enabling descriptions and clinically meaningful queries in natural language form over raw biomechanics motion data. It demonstrates high performance and shows positive transfer learning. It provides a new interface for biomechanical movement clinical analysis.

Computational Design of Sequence-Encoded Elasticity in Disordered Protein Materials

Gabrielle Leinbach, PhD Student, McCormick School of Engineering and Applied Science, et al.

Abstract: Nature evolved some of the highest-performing elastomers known today by harnessing sequence-level protein disorder. Sequence-specific coarse-grained MD enables high-throughput sequence design and analysis, yielding novel insights into how the quantity and ordering of disorder-inducing residues affect network-level behavior. The understanding gained will help establish molecular design rules for tunable biomimetic materials with exceptional mechanical properties that rival, or even surpass, those of natural materials.

Conversational AI for Voice-Based Medication Adherence Monitoring: NLP-Driven Extraction of Patient-Reported Barriers for Longitudinal, Data-Driven Care

Pranav Bajaj, Medical Student, Feinberg School of Medicine, et al.

Abstract: This project develops a conversational AI chatbot that engages patients through voice-based dialogue to gather rich, continuous data on barriers to medication adherence. Using natural language processing and machine learning, the system captures and analyzes both predefined and emerging themes, generating high-value datasets that enhance clinical insight and decision-making. Future evaluation will apply the 8-Dimensional Socio-Technical Model to assess how this data-driven approach impacts clinical workflows, physician support, and patient engagement. The Chatbot is available at https://patient-communication.vercel.app/ using RID001.

Curriculum Learning for Resource-Efficient Training of Vision–Language Foundation Models on Large-Scale Medical Imaging

Aryan Dwivedi, Master's Student, McCormick School of Engineering and Applied Science

Abstract: I study how curriculum learning can improve the efficiency and stability of Vision–Language Foundation Models trained on large-scale medical imaging data (~7TB of paired chest CTs and reports). By structuring training from easier to harder examples, this work shows how intelligent learning schedules can reduce computational overhead while maintaining competitive representation quality, reframing scale as a learning-design problem rather than purely a hardware challenge.

DS2 -INSTRUCT: Domain-Specific Data Synthesis for Large Language Models Instruction Tuning

Ruiyao Xu, PhD Student, Weinberg College of Arts and Sciences, et al.

Abstract: DS2-Instruct is a zero-shot framework that generates high-quality, domain-specific instruction datasets for LLM fine-tuning without requiring human supervision, seed examples, or specialized corpora.

Defining the Keloid Signature: An Integrated Single-Cell Transcriptomic Atlas of Fibroblast Heterogeneity and Immune Dysregulation in Fibrotic Skin

Shoshana Bar-Meir, Medical Student, Feinberg School of Medicine, et al.

Abstract: Single-cell RNA sequencing (scRNA-seq) technology has elucidated cellular mechanisms with extreme granularity and has successfully been applied to keloid formation to identify cell types and functional states that are associated with the pathological phenotype. To elucidate these diverse driving interactions with the resolution of an individual cell, we employed integration methods to provide new insights into the mechanisms of the two varying pathologic scar formations.

Exploring Aberrant Vimentin Expression and Localization in Glioblastoma

Rebecca Chen, Undergraduate Student, Feinberg School of Medicine and Weinberg College of Arts and Sciences, et al.

Abstract: My research investigates how glioblastoma cells adapt to chemotherapy by altering expression and localization of the intermediate filament protein vimentin (VIM). Using spatial transcriptomics, single-cell RNA sequencing, and protein-level assays, I show that VIM expression increases and redistributes following temozolomide treatment, particularly within tumor cell states associated with therapeutic resistance. This work suggests that VIM may act as a dynamic mediator of chemotherapy resistance and represents a potential target for future therapeutic strategies.

Exploring the farthest black hole collisions using the gravitational-wave background

Nico Bers, Undergraduate Student, Weinberg College of Arts and Sciences, et al.

Abstract: We present a method to learn about some of the farthest black hole collisions in the universe from a simulated population study of LIGO gravitational-wave data. We simulate over a thousand binary black hole mergers along with detector noise, and perform Bayesian inference. We determine that while the farthest and weakest collisions don't contribute significantly to the stochastic background, these do inform our constraints on the merger redshift distribution.

Hybrid LLM-LSTM Framework for Stock Market Prediction Using Textual and Price Data

Dana Monzer, PhD Student, McCormick School of Engineering and Applied Science, et al.

Abstract: This work proposes a hybrid LLM–LSTM model that integrates financial news–derived textual impact scores with historical stock price data for time-series forecasting and directional movement prediction. Experiments show that while regression gains are mixed, the model consistently improves classification performance over a price-only LSTM baseline, highlighting the value of language-informed signals in financial prediction.

Interfacial Engineering of Polymer Grafted Nanoparticle Composites via Loop-Linear Topological Graft Blends

Sri Maddukuri, PhD Student, McCormick School of Engineering and Applied Science, et al.

Abstract: This molecular research explores the use of loop-linear graft mixtures as a novel strategy for tailoring the interphases of polymer nanocomposites. Through computational and data-intensive conformational and entanglement analyses, this work provides a physical foundation for using the loop graft architectures to overcome dispersion challenges and enhance mechanical reinforcement through improved miscibility.

Is SpeechLLM Safety Sufficient?

Kris Yun, Undergraduate Student, McCormick School of Engineering and Applied Science, et al.

Abstract: This project investigates the safety of Speech Large Language Models (SpeechLLMs) by subjecting and defending a diverse set of open and closed models to a rigorous pipeline of audio-conditioned adversarial attacks and defense mechanisms. We quantify the robustness using the AABench dataset and measure the Attack Success Rate (ASR) and Defense Success Rate (DSR). This work seeks to secure SpeechLLMs by balancing defense viability with computational utility.

Orthogonal Gaussian Processes for Noisy Multidimensional Outputs

Evan Barnett, Undergraduate Student, McCormick School of Engineering and Applied Science, et al.

Abstract: MOOGP is a scalable surrogate modeling method that replaces expensive multi-output computer simulations with fast statistical predictions and calibrated uncertainty estimates. It enforces an orthogonality constraint to separate interpretable trend effects from residual variation and exploits low-rank latent structure for efficient covariance computations, enabling training on large simulation ensembles with many correlated outputs.

Phase Mapping in Electron Backscattered Diffraction With Deep Learning-based Anomaly Detection

Alfred Yan, PhD Student, McCormick School of Engineering and Applied Science, et al.

Abstract: Diffraction patterns are collected from a material sample through scanning electron microscopy. Next, deep learning-based anomaly detection methods are used for identifying potentially novel materials in the sample from the diffraction patterns.

“Quarterbacky”: An Analysis of Racial Narratives Through NFL Reddit Discourse

Pooja Kantemneni, Undergraduate Student, Weinberg College of Arts and Sciences, et al.

Abstract: We investigate whether Reddit fan sentiment toward NFL quarterbacks differs based on the racial background of the quarterback. Using around 20 million Reddit comments from the 2024-2025 NFL season, we analyzed the relationship between performance metrics and fan sentiment.

We found that while fan sentiment generally tracks performance, the relationship is significantly stronger for white quarterbacks than for non-white quarterbacks.

Resolving time-dependent phenotypes of stem cell-derived neurons harboring KCNQ2 mutations using machine learning

Syed Wafa, PhD Student, Feinberg School of Medicine, et al.

Abstract: We perform large-scale electrical recordings from human stem cell-derived neurons harboring mutations in the KCNQ2 gene, which is implicated in severe brain disorders, including epilepsy and autism. By analyzing tens of millions of electrical spikes using supervised and unsupervised machine learning algorithms, we identify temporal biomarkers of clinically distinct KCNQ2-associated disorders. Our work establishes an open-source, publicly available translational machine learning platform for studying disease mechanisms and therapeutic responses in stem cell models of neurological disorders.

Simulating The Frontier of The Universe: The Formation of Galaxy Clusters and their Central Galaxies

Gideon McFarland, PhD Student, Weinberg College of Arts and Sciences, et al.

Abstract: Understanding how our Universe evolved from a hot soup of fundamental particles into the web of galaxies we observe today requires knowledge of how the largest structures in our Universe formed: galaxy clusters. The massive galaxy at the center of the cluster, the Central Cluster Galaxy (CCG), is a fossil record of its history, allowing us to learn about the formation and growth of the cluster. To better probe the properties of CCGs, we make use of Argonne National Laboratory's Frontier Exascale simulation (Frontier-E), the largest simulation of the Universe to date. The size of Frontier-E enables us to study over 150,000 massive galaxy clusters, making statistical comparisons to real data possible for the first time.

Survivorship Navigator: Personalized Survivorship Care Plan Generation using Large Language Models

Veronica Boratyn, Medical Student, Feinberg School of Medicine, et al.

Abstract: Survivorship Navigator is a computational framework that automates the generation of personalized cancer survivorship care plans (SCPs) by synthesizing information from complex structured and unstructured EHR data. Our methodology utilizes a two-step pipeline involving task-focused prompting for data extraction and retrieval-augmented generation (RAG) to ground recommendations in a curated knowledge base of clinical guidelines, as benchmarked across various open-source and proprietary large language models. This high-throughput system reduces manual clinician labor from hours to minutes and provides a scalable model for standardizing care, supported by a dedicated interface for granular, side-by-side clinical validation.

Viral Haplotype Reconstruction from Amplicon-Based Deep Long-Read Sequencing

Natalie Stegman, PhD Student, Feinberg School of Medicine, et al.

Abstract: My research investigates the drivers of rapid viral rebound in people with HIV-1 when antiretroviral therapy fails or is interrupted by focusing on the genetic diversity preserved within persistent viral reservoirs. I developed and applied a computational pipeline that reconstructs complex viral populations from deep long-read sequencing to reveal how genetically diverse viral populations persist during therapy and drive rebound.

Weight Similarity in Neural Networks via the RV Coefficient

Feihong Xu, PhD Student, McCormick School of Engineering and Applied Science, et al.

Abstract: We propose a new model similarity metric that overcomes the calibration weaknesses of current measures and provides greater quality prediction of functional similarity. This metric is based on weight parameters of neural networks, enabling data agnostic cross-model meta analysis.