## Success....kinda.

June 25, 2013 in Aaron's Blog

As promised a more mathematically-formulated description of the latest assignment...

Once we were confident about the program we previously constructed, it was time to expand the horizon, so to speak. A brief recap: given a certain branch length, "time," we wanted to calculate the expected number of transitions of nucleotides within that branch. For example, if you start with a state, "A" and end in state, "G", how many times does the nucleotide substitute in a given time period should be proportional to the time period. So for a branch length of 1, we expect one substitution.

Now for the current project, what if instead we decided to work with codons? Instead of dealing with a set, R={A,G,C,T} and corresponding 4x4 rate matrix, the new state space has 64 elements, and has a 64x64 rates matrix(Q). The extension into 64 states isn't as straight forward as one might hope, due to the structure of the rates matrix.

To summarize; 4 nucleotides, group in triplets generating 64 codons, each corresponding to an amino acid, of which there are 20. So for every codon, there is an amino acid(stop codons excluded, there are 3), but every amino acid could have a different codon. Therefore the rates matrix must have a structure that displays the suppression of certain mutations, the mutations that,

1. have multiple mutations, i.e., TTT TAA

2. have a mutation that turns one amino acid to another, i.e. TTT corresponds to the amino acid, Phe, while TTA corresponds to the amino acid Leu.

In the first instance, the rates matrix should have "0" at its entry, in the second, this mutation would obviously not be as favoured as one in which a mutation would not result in a change in the amino acid.

Long story short, after plenty of mistakes, countless hours of sleep deprivation, and of course the expertise of Dr. D!, problem solved.

Implementing this new matrix into the old code, results should not have changed, and thankfully they didn't; the number of mutations was equal to the branch length. There are 2 instances in the Q-matrix; one that is population based, and the other without. In the latter case, we expect the results to mirror the results in the first assignment, which they do. In the former case, the results are almost identical, with a slight deviation. For example, in the model without population-size, if I specify a branch length of "1", I in turn get a expected mutation number of "1." In the other case, if I input "1," the mutation number is now, "1.002."

Dr. D, sound about right??