The Gleason / Gleeson DNA Project: 2017

Friday 17 November 2017

FTDNA Holiday Sale until Dec 31 2017

FamilyTreeDNA have launched their Annual Holiday Sale. This runs from the last day of the Annual FTDNA Conference (Nov 12th 2017) until the end of the year. So now is the time to buy FTDNA tests and take advantage of some of their lowest prices ever. They also make perfect Birthday, Thanksgiving & Christmas gifts for friends and family.

2017 Holiday Sale Discounts

There are discounts on many of their products including upgrades on mtDNA and Y-DNA. The discounts represent approximately a 10-30% reduction from the usual price.

There is a special offer regarding the Big Y test. The usual price is $575 but there is a $100 discount in the sale. Further discounts are possible with the vouchers described below. But everyone who buys a Big Y test will automatically get a FREE upgrade to the Y-DNA-111 test. So if you have only tested your Y-DNA to the 37 marker level, buying the Big Y will get you a free upgrade to 111 markers (which would normally cost you $188).

Even if you haven't done a Y-DNA-37 test yet, you can order it at the Sale Price, and use a voucher for a further discount, and then once it has registered on the system, you can order the Big Y test and get the $100 Sale Price discount, and any additional voucher discount, and a free upgrade to 111 markers. This is a very good deal indeed!

So if you were very lucky, you could get the Y-DNA-37 for $109 (using a $20 voucher) plus the Big Y for $375 (using a $100 voucher) and the free upgrade to 111 markers. This wold normally cost $169 + $575 + $188 = $942 but you would be getting it for $484. This is only 51% of the price you would normally pay.

As mentioned above, you can use Holiday Reward vouchers to lower the sale prices even further. These will be issued every Monday until the end of the Sale but each voucher only lasts for 7 days so you have to use them quickly. In effect, this may reduce the cost of the Family Finder atDNA test to $49 and Y-DNA-37 to $109.

A $20 voucher for the Y-DNA-67 test

To access your voucher, simply log on to your FTDNA account and click on the Holiday Reward icon on your home page. If you make a purchase during the Sale, you frequently get a Bonus Reward as well. This gives further discounts on other tests.

And if you want to use the voucher for yourself, simply click on the Enjoy Rewards button and the product will be added to your Cart and the discount applied. Alternatively you can give the voucher to friends or family by clicking on the Share Rewards button. Each voucher can only be used once, and must be used before the weekly deadline.

A lot of people donate any vouchers they are not using so check the ISOGG Facebook group and Genetic Genealogy Ireland Facebook group for any unused vouchers that you might be able to take advantage of. Be warned, they go fast so you might have to try several before you find one that works.

Enjoy the Sale!

Maurice Gleeson

Nov 2017

Tuesday 15 August 2017

Version 3 of the Mutation History Tree for Lineage II

Below is the updated version of the Mutation History Tree for Gleeson Lineage II (the North Tipperary Gleeson's). Previous versions were published in December 2015 (version 1) and December 2016 (version 2). A pdf version of the tree can be downloaded from Dropbox via this link ... L2 MHT v3a

To see where you sit in the tree, find your G-number from the table at the bottom of this post (taken from our WFN Results page).

So what does it tell us?

The Gleeson Lineage II family tree currently has 11 major branches. And there are likely to be a lot more.
It looks like the Gleeson surname has been around for quite some time. The first branch to branch off was Branch F (far right). This is a pretty ancient branch and dates (very roughly) from about 1050 AD, not to far away from the presumed date of origin of the Gleeson surname.
There are probably some branches that have simply died out over the passage of time ... and what we are seeing here is simply a modern day snapshot of the remains of the "clan" that once was. In times past, some of the branches might have been much more prominent, and others much less prominent.
Age estimates of the branching points are very crude because the dating methodology has severe limitations. It is hoped that these can be improved with time.
Some branches are associated with a particular area or townland in North Tipperary (e.g. Branch C1 - Garryard; Branch E - Curraghneddy). It is hoped that as more people join the project and supply their MDKA information (particularly birth location) that more and more branches will be associated with specific locations. This in turn will help members with their individual genealogical research.

The Tree

The Pedigrees (and Key)

The (previously) Unique SNP markers
Click to enlarge ... or download the high-quality pdf version

The tree consists of several parts:

the tree itself, illustrating the branching pattern based on SNP & STR marker data
the pedigrees associated with each member in the tree (plus details of their MDKA / EKA)
a key to the tree, and numbered footnotes
the unique SNPs identified for those members who undertook the Big Y test

The tree has expanded considerably since the last version. The results of the tenth Big Y test are now included (from our Clan Gathering Chairman, Michael G. Gleeson). These came back from the lab in late December 2016 and underwent additional analysis by Alex Williamson for inclusion in the Gleeson portion of his Big Tree. These results confirmed the existence of Branch F (which had previously been merely predicted to exist on the basis of STR marker data). They also split up the "A5629 SNP Block" which up to that point consisted of 4 SNP markers. Thereafter it was split into an upstream branch characterised by the SNP A5631, and a downstream SNP block characterised by the 3 SNPs A5627, A5629, & A5630.

These results made A5631 the apparent over-arching Gleeson-specific SNP for Lineage II (i.e. only Gleeson's have been discovered to share this particular SNP marker). Thus, A5631 could be the DNA marker that defines membership of the larger Gleeson "Clan".

Lineage II Gleeson's on the Big Tree illustrating the old "A5629 SNP Block"

(from Nov 2016)

The current version of the Lineage II Gleeson portion of the Big Tree
showing how the previous A5629 SNP Block is now split in two (Aug 2017)

In addition to the 10th set of Big Y results, fifteen people expressed an interest in doing the new Z255 SNP Pack and the results of 13 of these people have now come back from the lab. This revised SNP Pack contains almost 50 SNP markers that are either shared only by Lineage II members or are unique to Lineage II members, and represents over 95% of all shared and unique Lineage II-specific SNPs (see this previous blog post). So the Pack is very specific for Lineage II. These SNP markers were identified via the 10 Big Y tests previously undertaken by our project members and were incorporated into the revised SNP Pack by the team at FTDNA.

A review of some preliminary results of these SNP Pack tests was discussed in a previous blog post. The updated results are included in a table at the bottom of this post.

The data from these 13 sets of new results have been added to the tree and as a result, the branching pattern has expanded considerably. The previous version of the tree consisted of 6 branches (known or predicted) but the new version contains 11 branches:

Branch A has been split in two (A1 & A2) and two new members added (see red G-numbers: G95 & G113).
Branch B has remained intact and has gained a new member (G107).
Branch E was previously thought to be more closely related to Branch B but the new SNP Pack results indicate that it is in fact more closely related to Branch C. Thus Branch E's attachment to the tree has been changed.
Branch C has been split into two (C1 & C2) - the latter has gained a new member (G89) thanks to the new SNP Pack results.
Branch D has split into two also (D1 & D2). This is not a big surprise as the anticipated common ancestor of the original 2 members of this branch was some 14 generations ago. This branch has also gained a new member (G106) due to the SNP Pack results.
Branch F has also remained intact and has gained a new member (G104), again due to the new SNP Pack results. This is an unusual branch and appears to be the oldest branch in the project so far. Its connection to the rest of the group is some 30-32 generations ago, approximately 1050 AD, taking it very far back in time, almost to the predicted origin of the Gleeson surname.
Branch G is a new branch within the tree. It consists of just two people and they are not particularly closely related to each other. Both tested with the new Z255 SNP Pack and only tested positive for the more upstream Lineage II SNP markers (A5631 & the A5627/29/30 SNP Block). This too is a relatively old branch and its connection to the rest of the tree is some 25 generations ago (about 1200 AD).
Branch H is also a new branch and may be a similar age to Branch G (i.e. about 25 generations ago). However, the members of this branch have tested positive for marker BY5706 (which is one step further downstream than Branch G). None of the 4 members in this branch are particularly closely related, so I would expect this branch to split up into further sub-branches in due course.

Version 1 of this Mutation History Tree contained 16 of the project members of Lineage II, version 2 placed 20 of the 31 members (65%) on the tree, and version 3 is the most comprehensive to date and contains 32 of the 36 members currently in Lineage II (89%). The remaining 4 members cannot be placed with reasonable accuracy and will require further testing to enable placement.

Altogether, of the 36 members in Lineage II, 23 (64%) have downstream SNP data available - 10 via the Big Y test, and 13 via the new Z255 SNP Pack. The SNP Pack proved to be a great success and an 89% placement rate is quite impressive. The placement rate increased from 65% to 89% as a result of the SNP Pack testing.

Interestingly, some members were sufficiently closely related to other members of the group that SNP testing was not necessary. In some cases a definite relationship was already known, and in other cases the STR-based Genetic Distance was sufficiently close that placement was possible ... with reasonable confidence. The caveat here is that there may be a degree of Convergence obscuring the true relationship between certain members. And as a result, some people who have not undergone SNP-testing may need to be moved onto a different branch in the future.

There were several questions that I had hoped the revised Z255 SNP Pack testing would answer:

Are 10 Big Y tests enough to identify all/most of the downstream SNPs associated with Lineage II?
How many future members are likely to be placed on the tree by just using the revised Z255 SNP Pack?
Will there be a need for future Big Y testing within the group? or has the testing undertaken by group members so far helped reduce the cost for future members?

Now that the results of the SNP Pack testing are in, we can look at these questions one by one and see to what extent we have an answer for each.

The 10 Big Y tests certainly did identify a lot of the downstream branches of the tree, but not all of them. If we take "downstream" to mean (crudely) less than 18 generations ago (i.e. less than 600 years), then between the Big Y testing and the Z255 SNP Pack testing, six (6) downstream branches were identified (Branches A1, B, E, C1, C2, F). The remaining 5 branches did not have a "sufficiently downstream" SNP identified (Branches A2, D1, D2, G, H).

Also, the exercise identified new branches that were not predicted from the original Big Y testing. It is therefore likely that additional new branches will continue to be identified over time as more people join the project and undertake SNP testing.

So, although the SNP Pack testing did provide a lot of additional useful information, and has improved the structure of the Mutation History Tree, its coverage of "downstream SNPs" (using the arbitrary threshold of approximately 18 generations) is only about 50%. This fact alone indicates that there will be a need for Big Y testing in the future, but perhaps much more selectively (thus saving money for project members).

Now that the structure of the tree is quite developed, and as it continues to "mature", it will become easier and easier to place future members on a particular branch of the tree and in many instances will obviate the need for SNP testing. At this stage it is difficult to know how often this will happen.

For future members who are not easy to place, the options will be a 67 or 111 STR upgrade, the Z255 SNP Pack, or the Big Y test. In most instances, the SNP Pack might be the test of choice if the new member appears to be a possible match to one of the "downstream branches". But if it is not possible to place the new member anywhere on the existing tree, then the Big Y test might be preferred.

Accurately dating when each branch arose remains a problem and there are several reasons for this:

In order for the dating to be accurate, the branching structure must be accurate. And for some people there is insufficient data to place them confidently on the tree. In such cases, it may be necessary to upgrade to 67 or 111 STR markers, or do the Z255 SNP Pack, or do the Big Y test.
There is an inherent problem with any dating methodology used. Statistically, it may produce very accurate results. But from a genealogical perspective, the results are very inexact. Even at the 111 STR marker level, one can often expect to find a range of +/- 300 years on either side of the midpoint estimate. The same is true for dating using SNPs.
Dating using STRs appears to work best for people who are relatively closely related (say within the last 500 years) and dating using SNP markers may be more "exact" for people who are related 500-1000 years ago. Only further research will help clarify this.
FTDNA's TiP tool uses proprietary information and its methodology is not public knowledge. As a result there is no way of checking the science behind it. It may be that it's estimations are incorrect. Last year (2016) the algorithm's were adjusted and new TMRCA estimates were generated for the same results. But there is no way of knowing if this was an improvement or not. I suspect that the TiP tool may underestimate the age of more distant (upstream) branching points because it does not accurately take into account the extent of parallel and back mutations.
Dave Vance's SAPP programme uses Ken Nordtvedt's Interclade Ageing methodology. I don't know much about this method but it may be a better way of using STR data to estimate TMRCA. And as the SAPP Programme is automated, it takes a lot of the hard work out of the calculations. Potentially.
Ultimately, dating the branching points will involve a mixture of the above techniques and the best that can be achieved may simply be a "best guess".

So the take home message is that all time points in the tree should only be taken as a very rough guide.

As the tree grows and expands, more and more people will be able to use it to help their own genealogical research. Already we are making connections and breaking down Brick Walls for members in Branches B and C1.

More will follow in time.

Maurice Gleeson

Aug 2017

The members of Gleeson Lineage II (from the WFN Results page)
... find your G-number above and then locate yourself on the tree

Below is the revised spreadsheet of the results of the recent Z255 SNP Pack testing. The previous blog post only included 12 sets of results - the 13th set of results effectively split Branch D into two separate branches.

click to enlarge ...
or download a high-quality pdf version
via this Dropbox link here

Note that some SNP markers have more than one name (e.g. A5631 is also called Y17108). This confusing situation arises because different institutions give the same SNP different names. The best place to see which SNPs have alternative names is to go to the Gleeson portion of the tree on YFULL. Just search for A5631 (use Cmd+F on a Mac or Ctrl+F on a PC). Note that the YFULL tree does not have as many datapoints as the Big Tree or FTDNA's haplotree.

Thursday 25 May 2017

Convergence - quantifying Parallel & Back Mutations (Part 1)

In a recent post I explored the concept of Convergence and made the point that the mechanism by which Convergence arises is via a combination of Parallel Mutations and Back Mutations in the STR marker values. These mutations are changes that occurred at some time in the past but because they remain hidden to us in the present, we cannot tell when they occurred or how frequently they occurred just by looking at two sets of STR results from people living today.

However, there is a way around this problem. Or at least a partial solution.

By using a combination of STR data and SNP data we have been able to build a Mutation History Tree for the North Tipperary Gleeson's (Lineage II of the Gleason DNA Project). This tree is a "best fit" tree, by which I mean a tree constructed in such a way as to explain the STR & SNP data in the most parsimonious way i.e. with the fewest number of branches that will accommodate or "fit" the data. This approach is also called the "maximum parsimony" approach and is often used when building cladograms or phylogenetic trees. The Mutation History Tree (MHT) is simply another type of cladogram.

But a key point here is that this "best fit" tree is likely to change as more data becomes available. And to illustrate this point, I'm going to compare the current version of the tree (Dec 2016) with the next version that is being prepared following the recent availability of new data from 12 sets of Z255 SNP Pack results.

Below is the current version of the MHT for Lineage II. By comparing each mutation in the tree with every other one, we can identify which mutations are Back Mutations (occurring on a single line of descent) and which are Parallel Mutations (occurring on two or more lines of descent). I have highlighted the Back Mutations in yellow and the Parallel Mutations in green.

Back Mutations in yellow, Parallel Mutations in green
from Gleeson Lineage II MHT (version Dec 2016)

Parallel Mutations occur in the following lines of descent:

CDYb 40-39 ... A, E, D, F (4 times)
CDYa 39-38 ... A, B, C, F (4 times)
464c 17-16 ... A x2, D (3 times)
461 12-11 ... A, B (2 times)
576 18-19 ... A, D (2 times)
390 23-24 ... A, B, C (3 times)
390 24-23 ... B, C (2 times)
456 16-15 ... B, D (2 times)
and so on ...

Back Mutations are more difficult to count, and to conceptualise. Whether you consider the value as mutating forward or back is entirely dependant on your reference point. If our anchor is the upstream Z255 branch, then the original value of marker 390 (for example) is 24, mutating (forward) to 23 on the Z16438 branch, and then back to 24 (in parallel) on Branches A, B & C, and then back to 23 (again in parallel) on Branches B & C. So there are several points to make here:

this is in fact a Back Mutation that occurs in parallel in 3 separate lines of descent. It is thus both a Back Mutation (relative to its earlier value of 24 on the Z255 branch) and a Parallel Mutation, occurring at (presumably) different time points in Branches A, B & C. It is thus coloured yellow and green.
It can also be considered a Triple Mutation relative to the Z255 branch - in the sense that it mutates forward to 23 then back to 24, then back to 23 again. But what happens if it flips forward and back 5 times? What would we call that? And what do we call it if it goes two steps forward and one step back? This is where terminology fails us. I'm not sure if there is a standardised way of describing these different kinds of mutation (if there is, please leave a comment below).
the mutation 390 24-23 occurs in Branches B & C ... relative to its value of 24 in the Z255 branch, this could be considered a Parallel Forward Back Forward Mutation ... for Pete's Sake!!

But if we just focus on the Back Mutations that occur downstream of the branch characterised by the STR mutation (710 36-37), just above the A5627 SNP Block. This "710 branch" incorporates all the Gleeson's of Lineage II, from Branch A to F.* On this overarching branch for Lineage II, the value of the STR marker 390 is 23 and Back Mutations are as follows:

390 24-23 ... B, C ... this is the only Back Mutation below the "710 branch"
And it is also a Parallel Mutation
All the other yellow Back Mutations are relative to the upstream Z255 branch, and not our downstream "710 branch", and so are not counted in this particular exercise.

So, let's generate some statistics from these numbers:

The total number of mutations below the "710 branch" (irrespective of whether they are forward or back) is 71.
There are 69 Forward Mutations (i.e. away from the original value of the relevant marker on the "710 branch")
There are 2 Back Mutations
There are 26 Parallel Mutations
Forward Mutations outnumber Back Mutations by a ratio of 35.5 : 1
Parallel Mutations outnumber Back Mutations by a ratio of 13 : 1
There are 16 people in this tree, and if we make the big assumption that the "710 branch" starts 1000 years ago (i.e. roughly at the time of the introduction of the Gleeson surname), then over the course of 1000 years, the rate of each type of mutation is (crudely) as follows:

Forward Mutations = 69/16 = 4.3125 mutations per "line of descent" per 1000 years
Back Mutations = 2/16 = 0.125 mutations per "line of descent" per 1000 years
Parallel Mutations = 26/16 = 1.625 mutations per "line of descent" per 1000 years

These are crude estimates but they give some idea of the relative importance of Parallel Mutations compared to Back Mutations. And applying this information to the phenomenon of Convergence, it would seem that Back Mutations play a very minor role compared to Parallel Mutations.

In a subsequent post we will see how these calculations stand up when we add in additional data from 12 SNP Pack results and reconfigure the tree into the next version of the "best fit" model. And we will also attempt to quantify the total number of Back & Parallel Mutations below the upstream marker Z255.

Maurice Gleeson

May 2017

* the Big Y results of a 10th member of the group indicate that this branch is characterised by the SNP A5631 although this result is not reflected in this version of the MHT

Wednesday 24 May 2017

Z255 SNP Pack Results - a first look

Last month (April), 14 members of the Gleason DNA Project underwent testing with the newly revised Z255 SNP Pack. FamilyTreeDNA were very quick to process the requests and the results of 12 of these members have already been returned from the lab.

Below are the top-line results of these first 12 tests. Only the most relevant SNP marker data has been extracted below and individual sections can be enlarged by clicking on the image. A pdf version of the complete data can be downloaded using this Dropbox link here.

To orientate you to the table, the SNP marker results of each of the 12 members are arranged in columns B to M. Each column has the initials, G-number, and kit number for each member. SNP markers highlighted in pink tested positive in that particular individual. The various branches of the Mutation History Tree for the North Tipperary Gleeson's (Lineage II) are indicated at the top and at the sides and are coloured in the same colours as in the current diagram of the tree (see below).

So what do the results tell us?

First off, working down the tree from marker Z255, all 12 members tested positive for the following markers: Z255, Z16437, Z16438, and BY2852. All but one person tested positive for BY2853 and BY2854.

Then we arrive at the Gleeson-specific markers, starting off with A5631 - everyone tested positive for this marker, as previously predicted from the results of the 10th Big Y test (from our Clan Gathering Chairman, Michael G Gleeson, 371202). These results came in after the tree diagram below was drawn and effectively confirmed that Branch F (predicted solely from STR data) did in fact exist.

Below A5631, the Gleeson Tree splits into 2 branches. The first of these branches is Branch F and the first impressive results of the SNP Pack testing have revealed that 2 of the 12 members (G-104, G-97) belong to this branch. Their SNP data has helped define a new SNP block below A5631 consisting of the following SNPs (which up until now were "Private SNPs", present only in our Chairman, MGG-371202):

BY14189
BY14193
BY14194
BY14195
BY14197

The other branch below A5631 is characterised by a SNP block consisting of 3 SNPs: A5627, A5629 & A5630. These 3 SNP markers are shared by Branches A through E in the Lineage II group. Ten of the 12 members tested positive for these 3 SNPs ... except for member G-75. His result came back negative for SNP A5627. This could be a back mutation in this SNP, and so FTDNA are retesting this single SNP marker to be sure.

Current version of the MHT for Gleeson Lineage II

Below the A5627 Block, there are two branches:

A5628, which in turn splits into Branch A and Branch B ... and possibly Branch E
BY5706, which in turn splits into Branch C and Branch D ... and possibly Branch E

From the above, you will see that the placement of Branch E on the current Gleeson Tree is in some doubt. This is because we have had no SNP data available for Branch E up until now and its placement in the tree has been based on STR data alone. Its ambivalent placement became apparent when we ran Dave Vance's SAPP programme using the STR data and this showed that it could equally be predicted to be nearest to Branch B as Branch C - this was discussed in a previous post (Dec 2016). However, a second major benefit of the SNP Pack testing is the revelation of the correct placement of Branch E.

Member G-75 (MG, 371160) from Branch E tests positive for the SNPs BY5706, BY5707, BY5708 & BY5709.
The first (BY5706) is common to both Branches C & D, and the latter three are SNPs that characterise Branch C.
Thus Branch E is now identified as a sub-branch of Branch C.

And on the topic of Branch C, two of the 12 members (G-89 & G-22) tested positive for Branch C SNP markers (BY5707, BY5708 & BY5709). Not only that, but their results effectively split Branch C into 2 different sub-branches: C1, C2

C1 - characterised by the SNP A13116 and shared by members G-89 and G-66
C2 - characterised by the SNP Block A13110, A13112 & A13113, and shared by members G22 & G71

This new categorisation leaves member G-71 with the Private SNPs A13111 and FGC19590; and leaves member G-66 with the Private SNPs A13114 & A13115.

None of the 12 members tested positive for the SNPs characterising Branch A, Branch B or Branch D. However ...

1 member tested positive for A5628 and therefore sits on a new branch adjacent to Branch A and Branch B
4 members (G-81, G-108, G-77 & G-18) tested positive for BY5706 but no downstream markers - they therefore sit on a new branch (or branches) adjacent to Branch C and Branch D
2 members (G-94 & G-110) only tested positive for the A5627 SNP Block and nothing further downstream. They therefore sit on a new, relatively upstream branch (or branches) on the Tree.

In the next post we will look at the revised Mutation History Tree for Gleeson Lineage II, incorporating these new SNP Pack results.

We'll also look at what this SNP Pack testing has told us about the nature of the evolution of the Gleeson surname and how it has helped individual members with their genealogical research.

Maurice Gleeson

May 2017

Saturday 20 May 2017

Convergence - what is it?

There are several phenomena encountered in the the analysis of Y-DNA STR data that can throw a genetic spanner in the works, and Convergence is one of them!

In genetic genealogy, Convergence occurs when two men have DNA signatures that are exactly or nearly identical, but have evolved that way purely by chance. As a result, the two men will show up in each others' list of matches and will give the false impression that they may be closely related (e.g. within the last several hundred years) when in fact they are much more distantly related (e.g. within the last several thousand years). The problem is we cannot tell that Convergence has occurred simply by looking at the two men's STR results. It is hidden from our view. We cannot see it just by looking at the present-day STR data. And the danger is that if the two men think they are closely related, they may start chasing their common connection, thinking that they will find the answer via further documentary research, when in fact there is little hope of that at all. Their "close match" is a red herring. And their pursuit of the Common Ancestor is a wild goose chase.

So what can we do about it? How can we recognise it? How can we avoid it wasting our precious research time?

Confusion

The concept is occasionally discussed in Facebook groups or on various blogs, but there tends to be quite a lot of confusion around what it actually means. And there are a variety of quite understandable reasons for this.

Firstly, there isn't a standard definition for Convergence, so how it is used varies from person to person. Some people apply it only to exact matches, others apply it to exact and close matches. Moreover, the concept of Convergence is closely tied up with the concept of lack of Divergence. Both are different phenomena, but their effects and consequences are very similar. Another contributing factor is the fact that it is difficult to see it or detect it in practice. We know that it exists, but we have no way of identifying it just by comparing two sets of STR results. In other words, it's largely a hidden phenomenon (like Black Holes). It is only when we do SNP testing that the extent of Convergence becomes apparent. And the problem is that not enough people have done SNP testing.

The good news is that more and more people are doing SNP testing and as they do, the extent of Convergence becomes more apparent. The Lineage II members in the Gleason DNA Project are trailblazers in this regard and we will explore the results of the recent Z255 SNP Pack testing in subsequent blog posts.

But in this post, we will look at an example of Convergence from the Gleason DNA Project in order to illustrate some of the key characteristics and consequences of Convergence. In later posts, we will look at clues that may indicate that Convergence is present, attempt to quantify the number of Back Mutations & Parallel Mutations that occur over time (using the Mutation History Tree that we have previously constructed for Lineage II - the North Tipperary Gleeson's), and finally we will attempt to quantify Convergence itself.

But first of all, let's look at some of the aspects of the definition of the term.

Definition

A general definition for the term convergence from the Conicse Oxford English Dictionary illustrates some general characteristics of convergence that are worth exploring because they are of relevance to how the term is applied in genetic genealogy and to the analysis of Y-DNA STR data in particular:

converge 1. come together from different directions so as eventually to meet
convergent 2. Biology (of unrelated animals and plants) showing a tendency to evolve superficially similar characteristics ...

There are several important aspects to these definitions that we can apply to the analysis of STR data (e.g. your 37 marker data). First of all, the sense that things were initially apart, but then they come together. Secondly, the idea that two things can look the same or similar on the surface, but in fact they have come from very different directions. And thirdly, the idea that two things can evolve from something different into something the same.

Let's look at how this more general concept can be applied to the analysis of Y-STR data.

And a good starting point is the description of Convergence on the ISOGG Wiki:

Convergence (also known as evolutionary convergence) is a term used in genetic genealogy to describe the process whereby two different genetic signatures (usually Y-STR-based haplotypes) have mutated over time to become identical or near identical resulting in an accidental or coincidental match.

One can think of convergence as producing misleading matches – two men appear to be more closely related than they actually are. The same situation may result (very occasionally) if there is an exceptional lack of divergence. In other words, so few mutations occurred in the descendants of a common ancestor over the course of time that the common ancestor may appear to have lived only a few hundred years ago when in fact he lived much further back than that, perhaps several thousand years ago.

So let's pick apart some of the key elements of this definition. You might like to refamiliarise yourself with some basic concepts, such as the different types of DNA markers (STRs and SNPs), and what you are actually seeing when you look at the DNA Results page.

Basic Concepts

Firstly, the above description of Convergence refers to the genetic signature - the Y-STR haplotype. This is the string of numbers you see associated with your results on the DNA Results page of the project. I like to think of it as if all the Y-chromosomes of the men in the group were all stacked up on top of each other, in such a way that each of the individual markers along the chromosome were all aligned with one column for each marker. Thus in the diagram below, each of the men have a value of 13 for the first marker. The values for the second marker are a mixture of 23 and 24. And so on.

The Y-STR results for the men of Lineage II
(click to enlarge)

Another key point in the above description is the concept that some markers mutate over time e.g. the number changes from 14 to 15. These mutations are identified by comparing the value in each square to the modal value for the entire group (i.e. the most frequent value among the men in that group). The most frequent values for each of the markers are used to generate the "modal haplotype" which is a virtual signature constructed from these most frequent values (and is represented by the row marked "MODE", the 3rd row from the top in the diagram above).

Mutations are indicated by coloured squares. If the value for any marker is the same as the modal value for that marker (i.e. the most common value among the men in that group), then the square that the value is in will not have a colour. If however, the value is higher than the norm, it will be coloured pink; if it is lower than the norm, it will be coloured purple.

If you and someone else have exactly the same string of numbers, you will have the same coloured squares and the same "no-colour" squares. If you are not exactly identical, you will have some coloured squares that the other person does not have ... and vice versa. In other words, the sequence of numbers, and hence colours, will be different. Each coloured square represents a mutation - a small minor increase or decrease in the number (compared to the norm) for that particular marker, in that particular individual.

Convergence in theory

Let's imagine that some distant ancestor living 10,000 years ago gave rise to four distinct lines of descent surviving today (represented by the men A, B, C, and D in the diagram below). Let's look at what happened to their first 37 STR markers over time, and let's assume that mutations only occurred in 5 of these STR markers, as shown in the diagram below. How did the values change over the passage of time, from 10,000 years ago to the present day? And how many of the descendants of this ancestor "match" each other today?

In descendant A, only one of these 5 STR markers mutated. It underwent a single mutation (from 13 to 14) about 6000 years ago, and that was the only mutation over the span of 10,000 years. This is an rather extreme example of "lack of Divergence".

Descendant B had several mutations in his line of descent, but only affecting the first and the fifth markers. These show progressive "forward mutations" away from their original values. With the first marker, the mutations go forward in an upward direction (14,15,16,17) whilst with the fifth marker they go forward in a downward direction (15,14,13,12). This latter may seem counterintuitive but it serves to emphasise that "forward" means "away from" the original value, no matter if it is up numerically or down numerically.

Descendant C also has experienced mutations in only the first and fifth marker. But here we see two examples of a Back Mutation. The first marker shows a forward mutation 6000 years ago (13 becomes 12) but this has gone back to 13 by 4000 years ago. It then undergoes another forward mutation by the time of the present day (13 to 14). Similarly, the fifth marker undergoes a forward mutation (16 to 17) by 4000 years ago but a Back Mutation by 2000 years ago.

Descendant D undergoes mutations on all 5 of his STR markers. A Back Mutation occurs with the second marker between 2000 years ago and the present day (15 to 14); and likewise with the third marker (12 to 13); and likewise with the fifth marker (17 to 16). Two Back Mutations occur with the fourth marker (29 to 30 by 4000 years ago; and 31 to 30 by the present day).

Mutations over time in 4 distinct lines of descendants

Remember, these are four distinct lines of descent, with the MRCA (Most Recent Common Ancestor) represented by the first row of 5 STR markers in the diagram above. So now let's look to see if any of the mutations that occurred in these four individual lines of descent occurred in parallel i.e. the same mutational change occurred in two completely separate lines of descent.

Have a look at the first marker in A, B and C. All three men developed the same mutation on this marker - a change from a value of 13 to 14. In Lines A and B this change occurred in parallel around 6000 years ago. In Line C, the change occurred in parallel around about the present day.

There is a similar parallel mutation between Line C and D. Look at the fifth marker - it increases in value from 16 to 17 around about 6000 years ago in Line D and 4000 years ago in Line C.

And there is a parallel back mutation present in Lines C and D also - the fifth marker switches from 17 to 16 about 2000 years ago in Line C and around about the present day in Line D.

With Back Mutations you are only looking at a single line of descent. With Parallel Mutations we are comparing two or more lines of descent. And we will see that in practice Parallel Mutations are much more common than Back Mutations and have a much greater role to play in the development of Convergence.

The STR results of living people today tells us nothing about their evolutionary history

Which brings us to Convergence itself. Let's look at the Genetic Distance between each of these lines of descent. This helps to make the point that the DNA results from living people are only a snapshot in time. They do not tell us anything about how those STR values have evolved over the past 10,000 years:

A and B have a Genetic Distance (GD) of 7. This is made up of a 3-step difference on the first marker (14 vs 17) and a 4-step difference on the fifth marker (16 vs 12). And as these were the only changes on their first 37 markers, the GD would be written as 7/37. This exceeds FTDNA's threshold for declaring a match (i.e. 4 steps or less over the first 37 markers; written as 0-4/37) and so A and B would not appear in each other's list of matches.
A and C have a GD of zero. They are an exact match. Their GD for the first 37 markers is thus 0/37. They appear in each other's match list and the match looks really close. They think they have a common ancestor in the last few hundred years. They start comparing family trees, looking for the elusive ancestor. They will never find him. This is a wild goose chase. This is the consequence of Convergence.
A and D have a GD of 2 (or 2/37). This GD falls within the threshold for declaring a match. They both appear in the other's match list. They email each other, looking for the common ancestor - another wild goose chase. Another example of Convergence and its consequences.
B and C have a GD of 7/37. No match.
B and D have a GD of 9/37. No match.
C and D have a GD of 2/37. It's a match. It's Convergence. They don't know that. They spend months researching their connection. It's a wild goose chase.

The STR results of people living today tell us virtually nothing about how those STR marker values have evolved over time. They may have come from a relatively recent common source, or they may have come from widely differing directions.

Below is another way of conceptualising how the numerical value of a single STR marker might evolve over time. This marker started out with a value of 8 for the common ancestor of 4 distinct lines of descent. But by the time of the present day, two lines had a value of 9, one had a value of 13 and one had a value of 5. But the evolutionary history of these 4 lines of descent is peppered with Back Mutations and Parallel Mutations:

Back Mutations

Line 2 (red) - 14 becomes 13 some time between 1000 years ago and the present day (0)
Line 4 (purple) - 4 to 5 between 1000 and 0 years ago
Line 3 (green) - 5 to 6, 6 to 7, and 7 to 8 between 7000 (7K) and 4000 (4K0 years ago

Parallel Mutations

8 to 9 in Line 2 (10K to 9K), Line 1 (7K to 6K), and Line 3 (2K to 1K)
8 to 7 in Line 3 (10K to 9K) and Line 4 (9K to 8K)
7 to 6 in Line 3 (9K to 8K) and Line 4 (7K to 6K)
6 to 5 in Line 3 (8K to 7K) and Line 4 (4K to 3K)

The evolution of values in a single STR marker over time in 4 descendant lines
of a common ancestor who lived some 10,000 years ago

The consequence of all these Parallel & Back Mutations is that the present day descendants of two of the lines (green Line 3 & blue Line 1) have exactly the same numerical value for this STR marker despite the fact that their evolutionary histories are so different.

This is an example of the evolutionary history for a single STR marker. And if this is representative of all STR markers, then the chances that the values for a particular marker will converge over time is really quite high. But our DNA results usually consist of 37 markers (the standard test most people start with) so what are the chances of the first 37 markers evolving in such a way as to result in convergence of a sufficient number of STR values to cause a coincidental match? ... well, the probability of that happening would be a lot lower. And the probability would be lower still with 67 markers, and lower still with 111 markers. But because so many people have tested (over 600,000 currently), we do see the phenomenon occurring even at higher marker levels (67 and 111).

And in a subsequent post we will look at clues to the presence of Convergence, so that you can look at your own or anyone's list of matches and adjust your suspicion level accordingly.

Convergence in practice

And to illustrate these points, I have temporarily moved one of the ungrouped project members into Lineage II, namely member Jim Treacy (B38804)*. He is third from the end in the diagram below. Don't worry about not being able to read the text (you can click to enlarge the diagram if you like) - just focus on the coloured squares.

The Y-STR results for the men of Lineage II (with a Treacy third from the end)
(click to enlarge)

And Jim has no coloured squares for the first half of the markers. It is only when we reach the 19th marker in the row that he has a pink square with the value 16 inside it - everyone else in that column has a value of 15 for that marker, except for one person who has a value of 14. And as we continue along Jim's row, there are 4 other coloured squares, bringing the total to 5. This can be expressed as a Genetic Distance of 5/37 from the modal haplotype (i.e. the 3rd row from the top, which - to remind you - is a virtual signature constructed from the most frequent values for each of the markers).

Now a GD of 5/37 between two men would mean that they do not appear in each others' list of matches (because FTDNA have set the threshold for "declaring" a match to be 4/37 or less). But among Jim's list of matches at the 37 marker level, there are two members of Lineage II (with a GD of 4/37). And at the 67 marker level, Jim has 6 members of Lineage II among his matches (with a GD of 6 to 7/67). So this looks (on the surface) that Jim is relatively closely related to our Lineage II group. And this suggests (on the surface) that there may be a common ancestor some time in the past several hundred years, maybe somewhere between 1700-1850 (on the basis of TMRCA calculations based on the TiP Report).

So what do we do next? Do we start looking for documentary evidence? Do we go back to the church records and land records and old newspapers to see if there is mention of a Gleeson-Treacy connection?

We could do. But it would be a wild goose chase. Because the Treacy-Gleeson connection is a red herring. And we know this because we have done SNP testing.

Jim has done the Big Y test, as have 10 of the members of Lineage II. Both Jim and Lineage II members belong to Haplogroup R, and both share some SNP markers in common. Each marker characterises a branching point in the Tree of Mankind and a SNP Progression is a list of these SNP markers down to the finer "more downstream" branches of the Tree. Here are the SNP Progressions for Jim and for the Lineage II Gleeson's:

R-P312> Z290 > L21> DF13 > ZZ10 > Z255 > Z16437 > A557 > Z29008 > A10891
R-P312> Z290 > L21> DF13 > ZZ10 > Z255 > Z16437 > Z16438 > BY2852 > A5631

You can see that the branching points are exactly the same ... until marker Z16437. Thereafter, Jim goes down one branch and the Gleeson's go down another one. Now, let's be clear: the Gleason's and Jim do share a common ancestor. And if he was around today he would test positive for the SNP marker Z16437. But his children would have evolved along different paths - one path taking us down to our present-day Jim Treacy, the other taking us down to our present-day Gleeson's. You can see where Jim and the Gleeson's are placed on the Tree Mankind in the diagram below.

Gleeson's to the left, Treacy's to the right, & about 1500 years in between

And when did this common ancestor live? YFULL date the formation of Z16437 as 1650 years ago. The two markers downstream of this, A557 (Jim Treacy) and A5631 (Gleeson), both have formation dates of 1400 years ago. So from this we can say that the common ancestor of Treacy & the Gleeson's is somewhere between 1400 to 1650 years ago. Or to give it an actual date (by subtracting from 1950, the approximate birth year for members of Lineage II), sometime between 300 and 450 AD.

This is clearly a lot further back in time than the 1700-1850 AD estimate suggested by the STR data.

So this is a great example of Convergence. By chance, Jim's STR signature has evolved over time to approximate that of the Gleeson's of Lineage II and as a result, he looks a lot more closely related to the group than he actually is.

Maurice Gleeson

May 2017

* a big thank you to Jim for allowing me to use his name and his results in this example

Tuesday 2 May 2017

Origin of Thomas Gleason 1609-1687

Everyone who uses John Barber White’s book, Genealogy of the Descendants of Thomas Gleason…, as a research source should be aware that White himself had doubts about the origin of Thomas Gleason that he suggested in his work. “Of the parentage and birthplace of this Thomas Gleason no positive knowledge has been obtained,” he states in his brief foreword. One cannot fault him for trying, however, although his hypothesis has been found to be incorrect. The book was written in 1909 and certainly genealogical research was a challenge in those times. Today we are fortunate that resources are available everywhere, often digitized and online. One hundred and five years after White, the true origin of Thomas was published, and is summarized here:

THOMAS GLEASON was christened as Thomas Gleson in Cockfield, Suffolk, England, 3 September 1609, the son of Thomas and Anne (Armesby) Gleson.¹ He died in Cambridge, Massachusetts, about 1687.² On 31 July 1634 he married Susan[nA] Page in Cockfield.³ She was baptized on 4 December 1614 in Ingham, Suffolk, the daughter of Thomas and Susanna (___) Page of Ingham and Hawstead, Suffolk.⁴ Susanna (Page) Gleason is believed to have died in Boston, Massachusetts, 24 January 1691.⁵

For an overview of all the relevant Suffolk records concerning this family, including Thomas’s parents and four children, see: Judith Gleason Claassen, “The Origin of Thomas Gleason of Watertown and Cambridge, Massachusetts,” NEHGS Register, Vol. 168 (January 2014), 5-15.

---------------------------------------------------------

¹Original parish registers of Cockfield, Suffolk, at Suffolk Record Office, Bury St .Edmunds (SROB), FL552/4/3; FL552/4/1; Transcripts of parish registers of Cockfield, Suffolk, 1561-1922 [FHL 0,993,235], 42, 49. The name Armesby is written variously in the register as: Armsby, Armsbye, Armsbie, and Arnsby.The baptisms of four children are also found in these records.

²Thomas is presumed to have been living at the time of Anna (Hanna) Winn’s death in 1686, since she left him a bequest in her will, yet no longer living when his daughter Ann was baptized as an adult in January of 1687/8.

³SROB FL552/4/4; Cockfield Transcripts [note 1], 74.

⁴William Brig, The Parish Register of Ingham, Co. Suffolk: Baptisms 1538 to 1804, Marriages 1539 to 1787, Burials 1538 to 1811 (Leeds: Knight, 1909), 9. The E-book is available online at archive.org; SROB IC500/2/57; Suffolk Family History Society, Suffolk Burial Index, CD-ROM (2005), Index by Parish, Noz-Pee: 532. Wife Susanna was buried 1631, a second wife, Elizabeth, in 1645.

⁵SC1/series 45X, Massachusetts Archives Collection, vol. 37: 64; Daniel Angell Gleason, “Thomas Gleason (Leson) and Susanna Page,” manuscript in R. Stanton Avery Special Collections Department of NEHGS, Mss A4060, Part I:24. "Widow Gleason" was included in the list of charges of the keeper of the Suffolk County gaol. If not Susanna, this widow may have been the relict of son Philip.

Judith Gleason Claassen

May 2017