A statistics-first approach to biological discovery, including phenotype-to-genotype maps
Julia Salzman comes to Bergen to talk about assigning function to raw sequencing reads using the deep learning framework FLASH and thereby bypassing genome alignment.
Julia Salzman in a nutshell
A.B. in Mathematics, Princeton University (Magna Cum Laude)
Ph.D. in Statistics, Stanford University (supervised by Persi Diaconis)
Dr. Salzman鈥檚 work bridges statistical methodology and genomics.聽
She developed statistical algorithms which led the discovery of a ubiquitous expression of circular RNA missed by other computational and experimental approaches for decades. Her research aims to use data-driven experiments to uncover organizing principles of biological regulation, historically focused on RNA processing and, more recently, on microbial phenotypes.聽
Recently, has pivoted to introduce a new approach to sequencing analysis: 鈥渟tatistics-first鈥 which performs inference on raw sequencing data, bypassing genome alignment. They focus on developing and applying these approaches to provide new insights into genome regulation in several biological domains.
Talk Overview
Genomic data is now acquired at a scale providing an unprecedented opportunity to link systems-level gene and variant expression to phenotypes across the tree of life: from single cell RNA-seq to microbial DNA. Most current analyses filter this rich raw data through the lens of assembly algorithms and or alignment to annotated gene models, attenuating, and sometimes distorting the potential to discover links between genotype and phenotype.聽
We will discuss a variety of approaches to improve the scope of biological discovery by directly analyzing raw sequence data. We will primarily discuss FLASH: a new interpretable, statistically-based deep learning framework that operates directly on raw sequencing reads.聽
In over 35,000 isolates of bacteria, fungi and viruses, FLASH achieves uniformly high accuracy on independent test data, including variation never seen in training, meeting or exceeding bespoke state of the art methods. FLASH identifies canonical drug targets ab initio and new pan-species predictors of virulence, including those lacking annotation and those only partially aligned to NCBI reference databases.聽
Further, FLASH can predict phenotypes beyond the possibility of GWAS, such as bacterial host range of phage, a task that to our knowledge is impossible today. FLASH Is a highly general approach to mapping genotype and phenotype and is highly efficient and easy to use.
Why Attend?
- Explore collaboration opportunities聽
- Discuss applications in your research domain聽
- Connect across biology, statistics and data science聽
Researchers from all relevant fields are warmly encouraged to join and engage!