A statistics-first approach to biological discovery, including phenotype-to-genotype maps

Lectures and conversations

A statistics-first approach to biological discovery, including phenotype-to-genotype maps

Julia Salzman comes to Bergen to talk about assigning function to raw sequencing reads using the deep learning framework FLASH and thereby bypassing genome alignment.

Julia Salzman in a nutshell

A.B. in Mathematics, Princeton University (Magna Cum Laude)
Ph.D. in Statistics, Stanford University (supervised by Persi Diaconis)

Dr. Salzman’s work bridges statistical methodology and genomics.

She developed statistical algorithms which led the discovery of a ubiquitous expression of circular RNA missed by other computational and experimental approaches for decades. Her research aims to use data-driven experiments to uncover organizing principles of biological regulation, historically focused on RNA processing and, more recently, on microbial phenotypes.

Recently, has pivoted to introduce a new approach to sequencing analysis: “statistics-first” which performs inference on raw sequencing data, bypassing genome alignment. They focus on developing and applying these approaches to provide new insights into genome regulation in several biological domains.

Talk Overview

Genomic data is now acquired at a scale providing an unprecedented opportunity to link systems-level gene and variant expression to phenotypes across the tree of life: from single cell RNA-seq to microbial DNA. Most current analyses filter this rich raw data through the lens of assembly algorithms and or alignment to annotated gene models, attenuating, and sometimes distorting the potential to discover links between genotype and phenotype.

We will discuss a variety of approaches to improve the scope of biological discovery by directly analyzing raw sequence data. We will primarily discuss FLASH: a new interpretable, statistically-based deep learning framework that operates directly on raw sequencing reads.

In over 35,000 isolates of bacteria, fungi and viruses, FLASH achieves uniformly high accuracy on independent test data, including variation never seen in training, meeting or exceeding bespoke state of the art methods. FLASH identifies canonical drug targets ab initio and new pan-species predictors of virulence, including those lacking annotation and those only partially aligned to NCBI reference databases.

Further, FLASH can predict phenotypes beyond the possibility of GWAS, such as bacterial host range of phage, a task that to our knowledge is impossible today. FLASH Is a highly general approach to mapping genotype and phenotype and is highly efficient and easy to use.

Why Attend?

Explore collaboration opportunities
Discuss applications in your research domain
Connect across biology, statistics and data science

Researchers from all relevant fields are warmly encouraged to join and engage!

Did you know?

In the span of the last decade the annual generation rate of sequencing data has increased tenfold and is now approximately 180 zettabytes per year, where 1 zettabyte = 10²¹ bytes.
This growth is explained by the dramatic drop in costs of sequencing over the last decade. And we sequence deeper than we ever had before.
Therefore today's datasets can be so large that supercomputers with thousands of interconnected processors, or large clouds must be employed on national scale.

Contact person

Nina Langeland

���ϳԹ���Դ