Skip to main content

Research Talks

10:10 to 10:35 a.m.

Systematic Content Analysis of Litigation EventS Open Knowledge Network (SCALES – OKN) - Arch Room

David Schwartz, William G. and Virginia K. Karnes Professor of Law, Pritzker School of Law

Building a platform to allow anyone to easily query federal court records. Focused on growing a community and equipping it with the tools it needs to understand and engage with the workings of the federal judiciary from the beginning to the end of every single case.

Understanding the Gut Microbiome Via Non-Human Dietary Patterns - Big Ten Room

Katie Amato, Associate Professor, Director of Graduate Studies, Department of Anthropology, Weinberg College of Arts and Sciences

Interactions with the gut microbiota represent an important pathway through which diet can affect health. While targeted studies of the impacts of specific foods on the gut microbiota are critical for better understanding this pathway, data from non-human primates can provide complementary insights. This discussion will focus on how comparisons of non-human primate populations can help us understand how broad dietary patterns affect the gut microbiome. These patterns provide an important foundation for future studies that target specific aspects of diet, both in non-human primates and humans.

High-Performance Inferencing on Multi-Gigapixel Pathology Images - Lake Room

Lee A.D. Cooper, Associate Professor of Pathology, Feinberg School of Medicine

Healthcare operations and research produce millions of glass slides of human tissues annually. Digital imaging is increasingly being utilized to generate high-resolution images of these slides, each containing several billion pixels. Image datasets range in size from terabytes to petabytes and inferencing over these data with machine learning and computer vision models presents significant challenges. This talk will describe software tools developed at Northwestern to simplify and accelerate inference for pathology imaging data.

10:40 a.m. to 11:05 a.m.

The Linguistic Features of the Anthropocene: Bringing Big Data and Supercomputers to the Humanities - Arch Room

Aerith Netzer, Digital Publishing and Repository Librarian, Northwestern Libraries

How can the age of big data and supercomputing aid cultural and literary analysis? We investigate the challenges, setbacks, insights, techniques, and tools used to generate massive models on terabyte-scale text analysis of literature relating to the Anthropocene. Specifically, we will discuss the difficulty in relying on external data vendors, using the Quest supercomputing center, and problems of data management when attempting to compute on national lab-scale on the budget of a library.

Efficient Genome Indexing for Large-Scale Linked Interval Data - Big Ten Room

Richard Schaefer, Research Associate, Department of Urology, Feinberg School of Medicine

Efficiently querying specific genomic regions is fundamental in bioinformatics, which allows to extract relevant feature information from large genomic datasets. While existing tools provide query capabilities, they are limited to interval manipulation and do not natively support linked interval data or complex relationship between genomic features. We introduce genogrove, a hybrid graph data structure designed to facilitate scalable interval queries. We demonstrate how it serves as an efficient interval search structure for large-scale datasets and lays the foundation for more advanced genomic analyses, supporting a wide range of applications in bioinformatics.

Streamlining Missing Data Identification Across REDCap Databases Through Automation - Lake Room

Deborah Zemlock, Director of Data Management, Mesulam Center for Cognitive Neurology and Alzheimer's Disease, Feinberg School of Medicine

Efficient data management is critical for high-quality research, yet identifying and handling missing data remains a time-intensive challenge. This presentation introduces an automated R-based methodology that enhances missing data identification within REDCap databases. By leveraging REDCap metadata and branching logic, this tool differentiates between true omissions and non-applicable data, significantly improving accuracy and efficiency. Validated using Northwestern ADRC’s Uniform Data Set, this solution surpasses REDCap’s built-in tools in speed and scalability. Future advancements will focus on expanding functionality, including automatic missing data coding and packaging the tool as an R library for broader research applications.

11:15 to 11:40 a.m.

Westernization, Nationalism, Confucianism: The Reception of Western Classical Music in Contemporary Urban China - Arch Room

Xi Cheng, PhD Student in Sociology, Weinberg College of Arts and Sciences

What cultural ambitions drive the popularity of Western classical music in contemporary urban China? While global cultural flows and localized conditions have been discussed separately or only partially, little effort has been made to synthesize them into a comprehensive framework. This study offers an integrative approach, demonstrating how global influences, state-driven cultural policies, and historical legacies interact to shape cultural reception. By specifying variations in media narratives, this study illustrates how global and local forces interact to produce cultural hierarchies. It highlights how cultural diffusion and reception are active sites of negotiation, where cultural forces shape legitimacy, meaning, and consumption.

Balancing Precision and Retention in Experimental Design - Big Ten Room

Gustavo Diaz, Assistant Professor of Instruction, Political Science, Weinberg College of Arts and Sciences

In experimental social science, precise treatment effect estimation is of utmost importance, and researchers can make design choices to increase precision. Specifically, block-randomized and pre-post designs are promoted as effective means to increase precision. However, implementing these designs requires pre-treatment covariates, and collecting this information may decrease sample sizes, which in and of itself harms precision. Therefore, despite the literature's recommendation to use block-randomized and pre-post designs, it remains unclear when to expect these designs to increase precision in applied settings. In this article, we present guidelines to assist researchers in navigating these design decisions. Using replication and simulated data, we demonstrate a counterintuitive result: precision gains from block-randomized or pre-post designs can withstand significant sample loss that may arise during implementation. Our findings underscore the importance of incorporating researchers’ practical concerns into existing experimental design advice.

Learning the Galaxies to Learn the Universe - Lake Room

Tjitske Starkenburg, Research Assistant Professor, CIERA and Department of Physics and Astronomy, Weinberg College of Arts and Sciences

How did the universe form? What is dark energy? What is dark matter? These are questions astrophysicists try to answer by looking at the visible components of our universe: stars and galaxies. With large simulation suites they can compare possible universes to the observed large scale structure as traced by millions of galaxies. However, the formation of galaxies and how they appear to us has its own big questions and uncertainties, which is often not feasible to include in large scale simulations. In this talk I will present a collaborative effort to incorporate uncertainties in the fundamental parameters of the universe as well as uncertainties in galaxy formation physics and galaxy observations into combined statistical inference methods. This combined inference illustrates how the small-scale physics and large-sale structure are connected to produce universes like the one we observe.

11:45 a.m. to 12:10 p.m.

Computational Methods for Large-Scale Analysis of Web Content in AI Training Data - Arch Room

Nick Hagar, Postdoctoral Scholar, Generative AI in the Newsroom Initiative, School of Communication

Large language models (LLMs) are trained on vast amounts of web data, but understanding what content is included—and how filtering decisions shape these datasets—remains a challenge. This talk introduces computational methods for analyzing large-scale web corpora used in AI training. I will present a data pipeline that processes and standardizes URLs from multiple LLM training sets, alongside a dataset capturing domain-level statistics across 96 Common Crawl snapshots. These resources enable researchers to audit training data composition, investigate content diversity, and explore filtering effects. By combining scalable data processing techniques with structured analysis, this work provides a foundation for more transparent and reproducible research on AI training data. 

Reliable and scalable computational Image-Based Research Using NU Research Image Processing System (NURIPS) - Big Ten Room

Todd Parrish, Professor of Radiology and Biomedical Engineering, McCormick School of Engineering and Applied Science

This presentation will describe a local cloud environment called NU Research Imaging Processing System (NURIPS) that allows NU researchers to archive, review, process and share medical imaging data. These data come from many different modalities and subject populations. Often there are longitudinal sessions to track treatment in pathologic studies, in normal development, or aging. Medical imaging datasets can be large and often requires intensive preprocessing before the final analysis can be conducted. Users can create their own processing pipelines, public versions, or those shared by other NURIPS users. The pipelines are containerized for robust and repeatable results. NURIPS incorporates the RDSS storage system and the computational capabilities of Quest, which is maintained by IT, and not individual labs. NURIPS can also push data to a cloud-based system for larger computational projects. Analysis pipelines can be setup, so they run automatically when the data hit the project folder. NURIPS allows researchers to be able to easily scale their data processing needs and provides smaller labs access to state of the art data analysis.

Impacts of Global Snowpack-Induced NOx Emissions on Atmospheric Chemistry and Air Quality - Lake Room

Debatosh Banik Partha, Postdoctoral Fellow, Department of Earth, Environmental, and Planetary Science, Weinberg College of Arts and Sciences

Nitrogen oxides (NOx) play an important role in tropospheric ozone (O3) formation, where NO2 and O3 are criteria air pollutants that can harm air quality and human health in different pathways. These compounds, particularly NOx, are emitted from both natural and anthropogenic sources, including the natural process of nitrate aerosol photolysis deposited onto the snow-covered land in the presence of sunlight. However, current chemical transport models do not include NOx emissions from snowpack nitrate photolysis on a global scale. In this study, we developed a novel parameterization of NOx emission as a function of the deposited nitrate concentration on the top 2 cm of the snow and ice-covered regions, nitrate photolysis rate, NOx yield, and the fraction of global land covered with snow and ice in the global atmospheric chemistry model - GEOS-Chem. Using this approach, we found that atmospheric NOx and O3 levels attributable to snowpack nitrate photolysis varied seasonally and regionally, explaining previously unresolved O3 distribution discrepancies in extreme polar regions. Additionally, we improved the O3 simulation performance of GEOS-Chem by 1.57% by resolving O3 chemistry at the surface and higher altitudes. This novel implementation in the GEOS-Chem model reduces bias and has crucial implications for understanding various atmospheric NOx and O3 chemistry. This may ultimately help us understand how to mitigate tropospheric NOx and O3 pollution in different regions through planned interventions that may have potential health benefits in the long term.