הפודקאסטים הטובים ביותר ב-Roman Cheplyaka (2024)

1
#70 Prioritizing drug target genes with Marie Sadler 52:20

10M ago52:20

52:20

In this episode, Marie Sadler talksabout her recent Cell Genomics paper, Multi-layered genetic approaches toidentify approved drug targets. Previous studies have found that the drugs that target a gene linked to thedisease are more likely to be approved. Yet there are many ways to define whatit means for a gene to be linked to the disease. Perhaps …

1
#69 Suffix arrays in optimal compressed space and δ-SA with Tomasz Kociumaka and Dominik Kempa 56:46

1y ago56:46

56:46

Today on the podcast we have Tomasz Kociumaka and Dominik Kempa,the authors of the preprintCollapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space. The suffix array is one of the foundational data structures in bioinformatics,serving as an index that allows fast substring searches in a large text.However, i…

1
#68 Phylogenetic inference from raw reads and Read2Tree with David Dylus 49:11

1y ago49:11

49:11

In this episode,David Dylus talks aboutRead2Tree,a tool that builds alignment matrices and phylogenetic trees from rawsequencing reads.By leveraging the database of orthologous genes called OMA, Read2Tree bypasses traditional, time-consuming steps such as genome assembly, annotation and all-versus-all sequence comparisons. Links: Inference of phylo…

1
#67 AlphaFold and variant effect prediction with Amelie Stein 35:25

1y ago35:25

35:25

This is the third and final episode in the AlphaFold series, originally recorded on February 23, 2022,with Amelie Stein, now an associate professor at the University of Copenhagen. In the episode, Amelie explains what 𝛥𝛥G is, how it informs uswhether a particular protein mutation affects its stability, and how AlphaFold 2helps in this analysis. A n…

1
#66 AlphaFold and shape-mers with Janani Durairaj 20:51

1+ y ago20:51

20:51

This is the second episode in the AlphaFold series, originally recorded on February 14, 2022,with Janani Durairaj, a postdoctoralresearcher at the University of Basel. Janani talks about how she used shape-mers and topic modelling to discoverclasses of proteins assembled by AlphaFold 2 that were absent from the ProteinData Bank (PDB). The bioinform…

1
#65 AlphaFold and protein interactions with Pedro Beltrao 52:23

1+ y ago52:23

52:23

In this episode, originally recorded on February 9, 2022,Roman talks to Pedro Beltraoabout AlphaFold, the software developed by DeepMind that predicts a protein’s3D structure from its amino acid sequence. Pedro is an associate professor at ETH Zurich and the coordinator ofthe structural biology community assessment of AlphaFold2 applications projec…

1
#64 Enformer: predicting gene expression from sequence with Žiga Avsec 59:41

3y ago59:41

59:41

In this episode, Jacob Schreiber interviews Žiga Avsec abouta recently released model, Enformer. Their discussion begins with lifedifferences between academia and industry, specifically about how researchis conducted in the two settings. Then, they discuss the Enformer model,how it builds on previous work, and the potential that models like it have…

1
#63 Bioinformatics Contest 2021 with Maksym Kovalchuk and James Matthew Holt 1:00:47

3y ago1:00:47

1:00:47

The Bioinformatics Contest is back this year, and we are back to discussit! This year’s contest winnersMaksym Kovalchuk (1st prize) andMatt Holt (2nd prize)talk about how they approachparticipating in the contest and what strategies have earned them the topscores. Timestamps and links for the individual problems: 00:10:36 Genotype Imputation 00:21:…

1
#62 Steady states of metabolic networks and Dingo with Apostolos Chalkis 38:25

3y ago38:25

38:25

In this episode, Apostolos Chalkis presents sampling steadystates of metabolic networks as an alternative to the widely used flux balanceanalysis (FBA). We also discuss dingo, aPython package written by Apostolos that employs geometric random walks tosample steady states. You can see dingo in actionhere. Links: Dingo on GitHub Searching for COVID-1…

1
#61 3D genome organization and GRiNCH with Da-Inn Erika Lee 1:09:41

3+ y ago1:09:41

1:09:41

In this episode, Jacob Schreiber interviews Da-Inn Erika Lee aboutdata and computational methods for making sense of 3D genome structure. They begin their discussion by talking about 3D genome structure at a high level and the challenges in working with such data. Then, they discuss a method recently developed by Erika, named GRiNCH, that mines thi…

1
#60 Differential gene expression and DESeq2 with Michael Love 1:31:15

3+ y ago1:31:15

1:31:15

In this episode, Michael Love joins us to talk about the differential geneexpression analysis from bulk RNA-Seq data. We talk about the history of Mike’s own differential expression package,DESeq2, as well as other packages in this space, like edgeR and limma, and thetheory they are based upon. Mike also shares his experience of being theauthor and…

1
#59 Proteomics calibration with Lindsay Pino 48:26

3+ y ago48:26

48:26

In this episode, Lindsay Pino discusses thechallenges of making quantitative measurements in the field of proteomics.Specifically, she discusses the difficulties of comparing measurements acrossdifferent samples, potentially acquired in different labs, as well as a methodshe has developed recently for calibrating these measurements without the need…

1
#58 B cell maturation and class switching with Hamish King 1:29:11

3+ y ago1:29:11

1:29:11

In this episode, we learn about B cell maturation and class switching fromHamish King. Hamish recently published apaper on this subject in Science Immunology, where he and his coauthorsanalyzed gene expression and antibody repertoire data from human tonsils.In the episode Hamish talks about some of the interesting B cell states heuncovered and shar…

1
#57 Enhancers with Molly Gasperini 46:57

3+ y ago46:57

46:57

In this episode, Jacob Schreiber interviews Molly Gasperini aboutenhancer elements. They begin their discussion by talking about Octant Bio,and then dive into the surprisingly difficult task of defining enhancers anddetermining the mechanisms that enable them to regulate gene expression. Links: Octant Bio Towards a comprehensive catalogue of valida…

1
#56 Polygenic risk scores in admixed populations with Bárbara Bitarello 1:30:12

3+ y ago1:30:12

1:30:12

Polygenic risk scores (PRS) rely on the genome-wide association studies (GWAS)to predict the phenotype based on the genotype. However, the predictionaccuracy suffers when GWAS from one population are used to calculate PRS withina different population, which is a problem because the majority of the GWASare done on cohorts of European ancestry. In th…

1
#55 Phylogenetics and the likelihood gradient with Xiang Ji 57:02

3+ y ago57:02

57:02

In this episode, we chat about phylogenetics with Xiang Ji. We start with ageneral introduction to the field and then go deeper into the likelihood-basedmethods (maximum likelihood and Bayesian inference). In particular, we talkabout the different ways to calculate the likelihood gradient, including alinear-time exact gradient algorithm recently pu…

1
#54 Seeding methods for read alignment with Markus Schmidt 1:00:46

4y ago1:00:46

1:00:46

In this episode, Markus Schmidt explains how seeding in read alignment works.We define and compare k-mers, minimizers, MEMs, SMEMs, and maximal spanning seeds.Markus also presents his recent work on computing variable-sized seeds (MEMs,SMEMs, and maximal spanning seeds) from fixed-sized seeds (k-mers andminimizers) and his Modular Aligner. Links: A…

1
#53 Real-time quantitative proteomics with Devin Schweppe 1:03:13

4y ago1:03:13

1:03:13

In this episode, Jacob Schreiber interviews Devin Schweppe aboutthe analysis of mass spectrometry data in the field of proteomics. They beginby delving into the different types of mass spectrometry methods, including MS1, MS2, and, MS3, and the reasons for using each. They then discuss a recent paperfrom Devin, Full-Featured, Real-Time Database Sea…

1
#52 How 23andMe finds identical-by-descent segments with William Freyman 42:40

4y ago42:40

42:40

In this episode, Will Freyman talks about identity-by-descent (IBD): howit’s used at 23andMe, and how the templatedpositional Burrows-Wheeler transform can find IBD segments in the presence ofgenotyping and phasing errors. Links: Fast and robust identity-by-descent inference with the templated positional Burrows-Wheeler transform (William A. Freyma…

1
#51 Basset and Basenji with David Kelley 1:13:58

4y ago1:13:58

1:13:58

In this episode, Jacob Schreiber interviews David Kelley aboutmachine learning models that can yield insight into the consequences ofmutations on the genome. They begin their discussion by talking about Calico Labs, and then delve into a series of papers that David haswritten about using models, named Basset and Basenji, that connect genome sequenc…

1
#50 ENCODE3 with Jill Moore 56:02

4y ago56:02

56:02

In this episode, Jacob Schreiber interviews Jill Moore aboutrecent research from the ENCODE Project. They begin theirdiscussion with an overview and goals of the ENCODE Project, and thendiscuss a bundle of papers that were recently published in variousNature journals and the flagship paper, Expanded encyclopaedias of DNA elements in the human and m…

1
#49 Most Permissive Boolean Networks with Loïc Paulevé 1:04:01

4y ago1:04:01

1:04:01

In systems biology, Boolean networks are a way to model interactions such asgene regulation or cell signaling. The standardinterpretations of Boolean networks are the synchronous, asynchronous, andfully asynchronous semantics. In this episode, Loïc Paulevé explains how thesame Boolean networks can be interpreted in a new, “most permissive” way.Loïc…

1
#48 Machine learning for drug development with Marinka Zitnik 1:25:08

4y ago1:25:08

1:25:08

In this episode, Jacob Schreiber interviews Marinka Zitnik aboutapplications of machine learning to drug development.They begin their discussion with an overview of open research questions in thefield, including limiting the search space of high-throughput testing methods,designing drugs entirely from scratch, predicting ways that existing drugs ca…

1
#47 Reproducible pipelines and NGLess with Luis Pedro Coelho 57:34

4+ y ago57:34

57:34

NGLess is a programming language specificallytargeted at next generation sequencing (NGS) data processing.In this episode we chat with its main developer, Luis PedroCoelho, about the benefits of domain-specificlanguages, pros and cons of Haskell in bioinformatics, reproducibility, and ofcourse NGLess itself. Links: NGLess on GitHub NG-meta-profiler…

1
#46 HiFi reads and HiCanu with Sergey Nurk and Sergey Koren 1:09:08

4+ y ago1:09:08

1:09:08

In this episode, I continue to talk (but mostly listen) to Sergey Koren and Sergey Nurk.If you missed the previous episode, you should probably start there.Otherwise, join us to learn about HiFi reads, the tradeoff between read lengthand quality, and what tricks HiCanu employs to resolve highly similar repeats. Links: HiCanu: accurate assembly of s…

1
#45 Genome assembly and Canu with Sergey Koren and Sergey Nurk 1:16:34

4+ y ago1:16:34

1:16:34

In this episode, Sergey Nurk and Sergey Koren from the NIH share their thoughtson genome assembly. The two Sergeys tell the stories behind their amazingcareers as well as behind some of the best known genome assemblers: Celeraassembler, Canu, and SPAdes. Links: Canu on GitHub SPAdes on GitHub If you enjoyed this episode, please consider supporting …

1
#44 DNA tagging and Porcupine with Kathryn Doroschak 45:00

4+ y ago45:00

45:00

Porcupine is a molecular tagging system—a way to tag physicalobjects with pieces of DNA called molecular bits, or molbits for short.These DNA tags then can be rapidly sequenced on an Oxford Nanopore MinIONdevice without any need for library preparation. In this episode, Katie Doroschak explains how Porcupine works—how molbitsare designed and prepar…

1
#43 Generalized PCA for single-cell data with William Townes 59:44

4+ y ago59:44

59:44

Will Townes proposes a new, simpler way to analyze scRNA-seq data with uniquemolecular identifiers (UMIs). Observing that such data is not zero-inflated,Will has designed a PCA-like procedure inspired by generalized linear models(GLMs) that, unlike the standard PCA, takes into account statisticalproperties of the data and avoids spurious correlatio…

1
#42 Spectrum-preserving string sets and simplitigs with Amatur Rahman and Karel Břinda 53:20

4+ y ago53:20

53:20

In this episode, we hear from Amatur Rahmanand Karel Břinda, whoindependently of one another released preprints on the same concept, calledsimplitigs or spectrum-preserving string sets. Simplitigs offer a way toefficiently store and query large sets of k-mers—or, equivalently, large deBruijn graphs. Links: Simplitigs as an efficient and scalable re…

1
#41 Epidemic models with Kris Parag 1:08:08

4+ y ago1:08:08

1:08:08

Kris Parag is here to teach us about the mathematical modeling ofinfectious disease epidemics. We discuss the SIR model, the renewal models, and howinsights from information theory can help us predict where an epidemic isgoing. Links: Optimising Renewal Models for Real-Time Epidemic Prediction and Estimation (KV Parag, CA Donnelly) Adaptive Estimat…

1
#40 Plasmid classification and binning with Sergio Arredondo-Alonso and Anita Schürch 45:04

5y ago45:04

45:04

Does a given bacterial gene live on a plasmid or the chromosome? Whatother genes live on the same plasmid? In this episode, we hear from Sergio Arredondo-Alonso and Anita Schürch, whoseprojects mlplasmids and gplas answer these types of questions. Links: mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for singl…

1
#39 Amplicon sequence variants and bias with Benjamin Callahan 1:01:57

5y ago1:01:57

1:01:57

In this episode, Benjamin Callahan talks about some of the issues faced bymicrobiologists when conducting amplicon sequencing and metagenomic studies. The two main themes are: Why one should probably avoid using OTUs (operational taxonomic units) and use exact sequence variants (also called amplicon sequence variants, or ASVs), and how DADA2 manage…

1
#38 Issues in legacy genomes with Luke Anderson-Trocmé 1:01:13

5y ago1:01:13

1:01:13

In this episode, Luke Anderson-Trocmétalks about his findings from the 1000 Genomes Project. Namely, the earlysequenced genomes sometimes contain specific mutational signatures thathaven’t been replicated from other sources and can be found via theirassociation with lower base quality scores. Listen to Luke telling the storyof how he stumbled upon …

1
#37 Causality and potential outcomes with Irineo Cabreros 40:46

5y ago40:46

40:46

In this episode, I talk with Irineo Cabreros about causality. We discuss whycausality matters, what does and does not imply causality, and twodifferent mathematical formalizations of causality: potential outcomes anddirected acyclic graphs (DAGs). Causal models areusually considered external to and separate from statistical models, whereasIrineo’s …

1
#36 scVI with Romain Lopez and Gabriel Misrachi 1:20:08

5y ago1:20:08

1:20:08

In this episode, we hear from Romain Lopez and Gabriel Misrachi aboutscVI—Single-cell Variational Inference.scVI is a probabilistic model for single-cell gene expression data thatcombines a hierarchical Bayesian model with deep neural networks encoding theconditional distributions. scVI scales to over one million cells and can beused for scRNA-seq …

1
#35 The role of the DNA shape in transcription factor binding with Hassan Samee 1:01:45

5y ago1:01:45

1:01:45

Even though the double-stranded DNA has the famous regular helical shape,there are small variations in the geometry of the helix depending on whatexact nucleotides its made of at that position. In this episode of the bioinformatics chat, Hassan Samee talks about therole the DNA shape plays in recognition of the DNA by DNA-binding proteins,such as t…

1
#34 Power laws and T-cell receptors with Kristina Grigaityte 1:26:36

5+ y ago1:26:36

1:26:36

An αβ T-cell receptor is composed of two highly variable protein chains, the αchain and the β chain. However, based only on bulk DNA or RNA sequencing it isimpossible to determine which of the α chain and β chain sequences were pairedin the same receptor. In this episode, Kristina Grigaityte talks about her analysis of 200,000paired αβ sequences, w…

1
#33 Genome assembly from long reads and Flye with Mikhail Kolmogorov 1:12:56

5+ y ago1:12:56

1:12:56

Modern genome assembly projects are often based on long reads in an attempt tobridge longer repeats. However, due to the higher error rate of the currentlong read sequencers, assemblers based on de Bruijn graphs do not work well inthis setting, and the approaches that do work are slower. In this episode, Mikhail Kolmogorov fromPavel Pevzner’s lab j…

1
#32 Deep tensor factorization and a pitfall for machine learning methods with Jacob Schreiber 1:15:14

5+ y ago1:15:14

1:15:14

In this episode, we hear from Jacob Schreiber about his algorithm,Avocado. Avocado uses deep tensor factorization to break a three-dimensional tensor ofepigenomic data into three orthogonal dimensions corresponding to cell types,assay types, and genomic loci. Avocado can extract a low-dimensional,information-rich latent representation from the weal…

1
#31 Bioinformatics Contest 2019 with Alexey Sergushichev and Gennady Korotkevich 1:46:23

5+ y ago1:46:23

1:46:23

The third Bioinformatics Contest took place inFebruary 2019. Alexey Sergushichev, one of the organizers of the contest,and Gennady Korotkevich, the 1st prize winner,join me to discuss this year’s problems. Timestamps and links for the individual problems: Qualification round 00:07:14 Bee Population 00:14:12 Sequencing Errors 00:30:20 Transposable E…

1
#30 Bayesian inference of chromatin structure from Hi-C data with Simeon Carstens 1:05:42

5+ y ago1:05:42

1:05:42

Hi-C is a sequencing-based assay that provides information about the 3-dimensional organization of the genome.In this episode, Simeon Carstens explains how heapplied the Inferential Structure Determination (ISD) framework to build a 3Dmodel of chromatin and fit that model to Hi-C data using Hamiltonian MonteCarlo and Gibbs sampling. Links: Bayesian…

1
#29 Haplotype-aware genotyping from long reads with Trevor Pesout 1:12:08

5+ y ago1:12:08

1:12:08

Long read sequencing technologies, such as Oxford Nanopore and PacBio,produce reads from thousands to a million base pairs in length,at the cost of the increased error rate. Trevor Pesoutdescribes how he and his colleagues leverage long reads for simultaneousvariant calling/genotyping and phasing. This is possible thanks to a cleveruse of a hidden …

1
#28 Space-efficient variable-order Markov models with Fabio Cunial 1:09:17

6y ago1:09:17

1:09:17

This time you’ll hear from Fabio Cunial on the topic of Markov models andspace-efficient data structures. First we recall what a Markov model is andwhy variable-order Markov models are an improvement over the standard,fixed-order models. Next we discuss the various data structures and indexesthat allowed Fabio and his collaborators to represent the…

1
#27 Classification of CRISPR-induced mutations and CRISPRpic with HoJoon Lee and Seung Woo Cho 56:36

6y ago56:36

56:36

In this episode, HoJoon Lee and Seung Woo Cho explain how to perform a CRISPRexperiment and how to analyze its results. HoJoon and Seung Woo developed analgorithm that analyzes sequenced amplicons containing the CRISPR-induceddouble-strand break site and figures out what exactly happened there (e.g.a deletion, insertion, substitution etc.) Links: C…

1
#26 Feature selection, Relief and STIR with Trang Lê 1:08:43

6y ago1:08:43

1:08:43

Relief is a statistical method to perform feature selection. It could be used,for instance, to find genomic loci that correlate with a trait or genes whoseexpression correlate with a condition. Relief can also be made sensitive tointeraction effects (known in genetics as epistasis). In this episode, Trang Lê joins meto talk about Relief and her ver…

1
#25 Transposons and repeats with Kaushik Panda and Keith Slotkin 1:40:56

6y ago1:40:56

1:40:56

Kaushik Panda and Keith Slotkin come on the podcast to educate us aboutrepetitive DNA and transposable elements. We talk LINEs, SINEs, LTRs, and evenSleeping Beauty transposons! Kaushik and Keith explain why repeats matter for yourwhole-genome analysis and answer listeners’ questions. Links: Keith’s paper: The case for not masking away repetitive D…

1
#24 Read correction and Bcool with Antoine Limasset 59:44

6y ago59:44

59:44

Antoine Limasset joins me to talk about NGS read correction.Antoine and his colleagues built the read correction tool Bcool based on thede Bruijn graph, and it corrects reads far better than any of the current methodslike Bloocoo, Musket, and Lighter. We discuss why and when read correction is needed, how Bcool works, and whyit performs better but …

1
#23 RNA design, EteRNA and NEMO with Fernando Portela 1:31:10

6y ago1:31:10

1:31:10

In this episode, I talk to Fernando Portela,a software engineer andamateur scientistwho works on RNA design — the problem of composing an RNA sequencethat has a specific secondary structure. We talk about how Fernando and others compete and collaborate in designing RNAmolecules in the online game EteRNA and about Fernando’s newRNA design algorithm,…

1
#22 smCounter2: somatic variant calling and UMIs with Chang Xu 1:04:15

6+ y ago1:04:15

1:04:15

In this episode I’m joined by Chang Xu. Chang is a senior biostatisticianat QIAGEN and an author of smCounter2, a low-frequency somatic variant caller.To distinguish rare somatic mutations from sequencing errors, smCounter2relies on unique molecular identifiers, or UMIs, which help identify multiplereads resulting from the same physical DNA fragmen…

1
#21 Linear mixed models, GWAS, and lme4qtl with Andrey Ziyatdinov 50:50

6+ y ago50:50

50:50

Linear mixed models are used to analyze GWAS data and detect QTLs.Andrey Ziyatdinov recently released an R package, lme4qtl, that can be used toformulate and fit these models.In this episode, Andrey and I discuss linear mixed models, genome-wide association studies, and strengths and weaknesses of lme4qtl. Links: Paper: lme4qtl: linear mixed models…

פודקאסטים ששווה להאזין

פודקאסטים בנושא Roman Cheplyaka

פודקאסטים ששווה להאזין

מדריך עזר מהיר