Is it hard to update trees when data change?

Online phylogenomics


Alex Gavryushkin


24 February 2020

"Static" algorithms

Data -> Algorithm -> Solution

Basically equivalent to computing a function $f(x_1, \ldots, x_n) = y$

Online algorithms

Given a finite set of "requests" $R = \{r_i\}$, we want to compute a series of functions

$$ r_0(), r_1(r_0), r_2(r_0, r_1),\ldots, r_{i+1}(r_0,\ldots, r_i),\ldots $$

The (worst case) complexity (denoted by $c$) of an online algorithm is then defined as (a function of $n$) $$ \frac 1 n \max_{r_i \in R}\sum_{i = 0}^{n-1} c(r_i) $$

When online algorithms make sense

It is convenient to say that a problem has an efficient online solution if its online complexity is smaller than the "static" complexity of the worst $r_i$.

The central question for us in this project is what methods in computational biology have efficient online solutions.

Hint: Greedy methods are good candidates

When online algorithms are possible

Stability analysis

We say that an algorithm is (Lyapunov) stable if small perturbations to the input (data) result in small changes to the solution.
It is very hard to find efficient online versions of non-stable static algorithms.

Online Phylogenetics

Tree inference methods

Distance-based methods
Model-based methods

Treespace analysis methods

Distance-based methods
Path-based methods

Your ideas?

Distance-based tree inference methods

Neighbour-joining, UPGMA. Possible requests: add taxon, remove taxon, change distance (sequence). Stability considered with respect to the distance matrix

Model-based tree inference methods

Large space of possible requests.

Computing distances and paths in treespace

Paths are not stable in the BHV and tau-space but they could be those rare exceptions and have online versions.
Open question: t-space.
Local rearrangement based distances, such as RNNI, are stable for appropriate requests. So these are our focus now.

Applications (future work)

Consensus methods: Fréchet mean (centroid) based methods are stable if paths between trees (geodesics) are.

Tree proposals: (our recent algorithms) FindPath, MDistance.

Analyses of samples of trees, e.g. Rob Lanfear's proxy for ESS using focal trees (plus our FindPath algorithm).

Acknowledgements

Royal Society of New Zealand — Rutherford Discovery Fellowships; Catalyst Fund — funding

New Zealand Ministry of Business, Innovation, and Employment — Endeavour Fund; Strategic Science Investment Fund (Data Science programmes) — funding

bioDS lab and CS dept @Otago — support and helpful discussions

You — coming to my talk and listening :)