Ian Holmes

Professor of Bioengineering

UC Berkeley

Abstract:

Continuous-time Markov chain models of point substitution on individual sites in DNA and protein sequences are commonplace in statistical phylogenetics, and can be solved by matrix exponentiation of a small rate matrix. However, the analogous insertion-deletion processes on whole sequences are harder to solve. The only known exact solution is for the special case where instantaneous indel events only ever add or remove one residue at a time (the "TKF91" model, named after the Thorne, Kishino & Felsenstein 1991 paper where it was introduced). The TKF91 model reduces to a linear birth-death process with immigration, and its solution yields a three-state Pair HMM as its finite-time distribution over pairwise alignments. Unfortunately, the assumption of single-residue indel events causes various biases when used for inference, including poor-quality alignments (c.f.alignment with a linear gap penalty, vs an affine gap penalty). Previous attempts to relax this constraint have relied on approximations that break down quickly at longer evolutionary times, such as the idea that the sequence is composed of indivisible multi-residue fragments (which underpins the TKF92 model, a sequel to TKF91), or the approximation that alignment gaps only ever correspond to one or two indel events (the Miklos Lunter Holmes 2004 model), or the closed-form Pair HMM approximations used by programs like BAliPhy and PRANK.

In this talk, Ian Holmes will present a new method of approximating indel processes by using a differential calculus of state machines. He will first review how we can think of input-output state machines as being like sequence-indexed matrices, including a well-defined concept of "multiplication" of state machines that allows us to construct them in a modular, rationally-engineered way. Holmes will then describe the application of this automata-multiplication approach to study continuous-time Markov process models of indel evolution. By multiplying an approximating HMM with an infinitesimal evolutionary HMM, we obtain a set of differential equations for the transition probabilities of the approximating HMM, which (in the special case of single-residue indels) exactly recovers the solvable TKF91 model. This builds on recent work by Nicola De Maio based on moment evolution equations for the indel process, and yields an improved fit over a broader parameter regime than both De Maio's approach and the 2004 approach of Miklos et al (which explicitly enumerates indel trajectories), without the need to introduce extra parameters or latent information such as indivisible fragment boundaries in the sequence. This work addresses a long-standing problem in computational molecular evolution, with potential applications to richer models that include both indels and heterogeneous selection pressure.

Speaker bio:

Ian Holmes cut his teeth programming as an indie game developer in Cambridge, England, in the 80s (Holmes and Reeve, 1989; Holmes, 1992), studied physics as an undergraduate at Cambridge's Cavendish Laboratory in the early 90s, and learned bioinformatics at the Sanger Centre in Hinxton (just outside Cambridge) in the late 90s. He then left Cambridge for good, and worked for a year at Los Alamos National Laboratory, undeniably benefitting from the lax security standards of the pre-9/11 era. He then did a postdoc in Berkeley (a place far more tolerant of someone unlikely ever to get a security clearance), went back to the UK to spend a couple of years in Oxford (a place completely unlike Cambridge), and ended up back in Berkeley, where he now works in the Bioengineering Department and chairs the joint Berkeley/UCSF Bioengineering Ph.D. program. During that time, he has talked and published papers about lots of things—cellular automata, finite-state automata, molecular evolution, transformational grammars, statistical deep learning, genome browsers, the RNA world, nanopore sequencing, and synthetic biology, to name a few—but mostly he just

likes building little worlds inside the computer.