by aaron

## Success....kinda.

June 25, 2013

As promised a more mathematically-formulated description of the latest assignment...

Once we were confident about the program we previously constructed, it was time to expand the horizon, so to speak. A brief recap: given a certain branch length, "time," we wanted to calculate the expected number of transitions of nucleotides within that branch. For example, if you start with a state, "A" and end in state, "G", how many times does the nucleotide substitute in a given time period should be proportional to the time period. So for a branch length of 1, we expect one substitution.

Now for the current project, what if instead we decided to work with codons? Instead of dealing with a set, R={A,G,C,T} and corresponding 4x4 rate matrix, the new state space has 64 elements, and has a 64x64 rates matrix(Q). The extension into 64 states isn't as straight forward as one might hope, due to the structure of the rates matrix.

To summarize; 4 nucleotides, group in triplets generating 64 codons, each corresponding to an amino acid, of which there are 20. So for every codon, there is an amino acid(stop codons excluded, there are 3), but every amino acid could have a different codon. Therefore the rates matrix must have a structure that displays the suppression of certain mutations, the mutations that,

1.  have multiple mutations, i.e., TTT $\rightarrow$ TAA

2. have a mutation that turns one amino acid to another, i.e. TTT corresponds to the amino acid, Phe, while TTA corresponds to the amino acid Leu.

In the first instance, the rates matrix should have "0" at its entry, in the second, this mutation would obviously not be as favoured as one in which a mutation would not result in a change in the amino acid.

Long story short, after plenty of mistakes, countless hours of sleep deprivation, and of course the expertise of Dr. D!, problem solved.

Implementing this new matrix into the old code, results should not have changed, and thankfully they didn't; the number of mutations was equal to the branch length. There are 2 instances in the Q-matrix; one that is population based, and the other without. In the latter case, we expect the results to mirror the results in the first assignment, which they do. In the former case, the results are almost identical, with a slight deviation. For example, in the model without population-size, if I specify a branch length of "1", I in turn get a expected mutation number of "1." In the other case, if I input "1," the mutation number is now, "1.002."

by aaron

## On to "bigger" things...

June 18, 2013

After finally being able to implement an equation(and know that it works) in matlab, the journey has only begun.

It's now time to expand this idea to instead of a state space of just 4 nucleotides, how about deal with codons. Not to worry, its not like they're 64 or anything like that. But which each passing day, and hours of failure, success can only be waiting around the corner.

Stay tuned for a detailed, more mathematically formulated post!

Aaron.

by aaron

## Circles

June 14, 2013

First week on the job, and it has been somewhat of a roller coaster.

Day 1: My assignment is simple: Implement equation 54 from Minin and Suchard's paper, "Counting labeled transitions in continous-time Markov models of evolution":

$m^[t]_(ij)(t)=E(N_t | X_0=i, X_t =j)$

in matlab. The equation reads; "the first factorial moment is equal to the expectation of the number of transitions, given that you start in a state i and end in a state j."

And to find the first moment, the equation is:

eqn 36

$M^{[1]}(t) = \sum_{i=1}^m \sum_{j=1}^m B_i \Lambda_R B_j I_{ij}(t)$

where,

eqn37.

$I_{ij}(t) = \frac{\exp{h_it} - \exp{h_jt}}{h_i - h_j} if h_i \ne h_j$

$I_{ij}(t) = t \exp^{h_i t} if h_i =h_j$

Well it doesn't seem bad, but M is a matrix, and what we want is a single value. But how do we interpret that? Dr. deKoning and I have come up with several ideas, but honestly, it just feels like we're going in circles.