Wednesday, 7 December 2016

Building the Mutation History Tree - Grouping

The process of generating the Mutation History Tree for Lineage II (L2 MHT) is not an easy one. It involves many steps and hours of work. I have made a video of the actual process to give you an idea of what is involved and to help other Project Administrators who might be interested in undertaking the same exercise. You can watch it via the embedded video at the end of this post, or directly on YouTube.

The Results Spreadsheet
The basis of the work is an Excel spreadsheet generated from the DNA Results page on the FTDNA or WFN ( webpages. Luckily for us we use both webpages in the Gleason/Gleeson DNA Project - each have their pros and cons. The Results Spreadsheet generated from these results pages is below (for the first 37 markers only. The spreadsheet for the full 111 marker dataset can be downloaded from Dropbox here).

Lineage II DNA Results in the Results Spreadsheet

So what's the difference between this and the results as seen on FTDNA or WFN?

Well, FTDNA & WFN appear to group the Y-DNA results by ascending marker value, column by column. So for example, on the FTDNA results page, the 2nd column is arranged by marker value in ascending order - the 23's first, the 24's after. The ascending order is again seen in the third column - 14's first, 15's after. This biases the listing of members in favour of the values of the markers that occur at the start of the row. In other words, the order of values in any given column is dependant on the order of values in the preceding column. This does not give the best representation of who is most closely related to whom.

Lineage II DNA Results on the FTDNA website
Lineage II DNA Results on the WFN website

The Grouping Process
In my Results Spreadsheet, the project members are organised into specific sub-branches, with a number of ungrouped members at the end of the spreadsheet. The 6 distinct branches (A to F) identified thus far are relatively clearly defined. The grouping process relies on several distinct pieces of information, namely ...
1) Known Relationship
2) Downstream SNP markers
3) Genetic Distance (GD) & GD Demarcation
4) Y-STR Signatures (relatively unique marker values)

1. Known Relationship
Several members are known to be related (highlighted in yellow) and can therefore be grouped together. For example, members G-75 (371160, MG) and G-91 (438302, DG) are known to be second cousins.

Known Relationships (yellow highlight) & Downstream SNPs (red & orange highlight)

2. Downstream SNP markers
The first four branches include members who have tested positive for specific SNP markers (namely A660Y16880BY5706, & BY5707). These SNP markers help place these specific members on specific branches. They also help anchor the upper reaches (the earlier, more distant parts) of the tree. Of note, on Alex Williamson's Big Tree, he places Branch D members G-05 (86192, RLG) & G-68 (411177, CG) on the same branch and so I have done the same in the Results Spreadsheet.

Gleeson Lineage II on Alex Williamson's Big Tree

3. Genetic Distance & GD Demarcation
Some of the members who have not tested for downstream SNPs can still be confidently grouped together. For example, in Branch E, member G-84 (412320, THG) is a Genetic Distance of 1/37 from G-75 & G-91 (known second cousins), but in addition, there is a clear demarcation in the Genetic Distance these three people have to each other and to the rest of the group. In the last line in the text box in the diagram below, you can see that the Genetic Distance jumps from a value of 0 or 1 for all three of them to a value of 3 or 4, demarcating them from the rest of the members in the DNA Project. This information, in combination with relatively unique marker values that they all share, supports their being grouped together.

GD & GD Demarcation

4. Y-STR Signature (relatively unique marker values)
If you look carefully at the spreadsheet, you can see certain patterns associated with specific subgroups. Thus, for example, Branch B is the only sub-branch that has a value of 15 for marker 30 (aka DYS 456). And Branch E has distinctive values of 17, 14 & 17 for markers 23, 31 & 32 (aka DYS 464b, 607, & 576). A similar distinctive pattern of marker values appears in Group F with values of 10, 17, 9, 9, & 17 for markers 4, 13, 14, 15, & 32. These distinctive Y-STR Signatures (together with Genetic Distance data) help to group these people together into their relevant sub-branches. This process is explained in some detail in the video below.

Unique marker values (outlined in bold) define a unique Y-STR Signature for Branch F

Ungrouped people
There are a number of people who have not been confidently allocated to sub-branches as yet. Most of these have only tested out to 37 markers. For these people, using Fluxus software (which I used to generate the first version of the Mutation History Tree) or Dave Vance's SAPP programme or Robert Casey's methodology can help give some indication of where they might sit, and we will look at that in a subsequent post. But ultimately, additional SNP marker testing will provide definitive answers for these particular individuals. And because we already have a lot of information from the people who have previously undertaken the Big Y test, a cost-efficient SNP-testing strategy can be devised for the rest of the group and future members.

Once the members have been grouped into sub-branches, the next step is to see how these various sub-branches are placed on the larger Tree of Mankind. More on that in the next post.

Maurice Gleeson
Dec 2016

The Mutation History Tree for Lineage II (L2 MHT)
(click to enlarge)

No comments:

Post a Comment