Thursday, 25 May 2017

Convergence - quantifying Parallel & Back Mutations (Part 1)

In a recent post I explored the concept of Convergence and made the point that the mechanism by which Convergence arises is via a combination of Parallel Mutations and Back Mutations in the STR marker values. These mutations are changes that occurred at some time in the past but because they remain hidden to us in the present, we cannot tell when they occurred or how frequently they occurred just by looking at two sets of STR results from people living today.

However, there is a way around this problem. Or at least a partial solution.

By using a combination of STR data and SNP data we have been able to build a Mutation History Tree for the North Tipperary Gleeson's (Lineage II of the Gleason DNA Project). This tree is a "best fit" tree, by which I mean a tree constructed in such a way as to explain the STR & SNP data in the most parsimonious way i.e. with the fewest number of branches that will accommodate or "fit" the data. This approach is also called the "maximum parsimony" approach and is often used when building cladograms or phylogenetic trees. The Mutation History Tree (MHT) is simply another type of cladogram.

But a key point here is that this "best fit" tree is likely to change as more data becomes available. And to illustrate this point, I'm going to compare the current version of the tree (Dec 2016) with the next version that is being prepared following the recent availability of new data from 12 sets of Z255 SNP Pack results.

Below is the current version of the MHT for Lineage II. By comparing each mutation in the tree with every other one, we can identify which mutations are Back Mutations (occurring on a single line of descent) and which are Parallel Mutations (occurring on two or more lines of descent). I have highlighted the Back Mutations in yellow and the Parallel Mutations in green.

Back Mutations in yellow, Parallel Mutations in green
from Gleeson Lineage II MHT (version Dec 2016)

Parallel Mutations occur in the following lines of descent:
  • CDYb 40-39 ... A, E, D, F (4 times)
  • CDYa 39-38 ... A, B, C, F (4 times)
  • 464c 17-16 ... A x2, D (3 times)
  • 461 12-11 ... A, B (2 times)
  • 576 18-19 ... A, D (2 times)
  • 390 23-24 ... A, B, C (3 times)
  • 390 24-23 ... B, C (2 times)
  • 456 16-15 ... B, D (2 times)
  • and so on ...
Back Mutations are more difficult to count, and to conceptualise. Whether you consider the value as mutating forward or back is entirely dependant on your reference point. If our anchor is the upstream Z255 branch, then the original value of marker 390 (for example) is 24, mutating (forward) to 23 on the Z16438 branch, and then back to 24 (in parallel) on Branches A, B & C, and then back to 23 (again in parallel) on Branches B & C. So there are several points to make here:
  • this is in fact a Back Mutation that occurs in parallel in 3 separate lines of descent. It is thus both a Back Mutation (relative to its earlier value of 24 on the Z255 branch) and a Parallel Mutation, occurring at (presumably) different time points in Branches A, B & C. It is thus coloured yellow and green.
  • It can also be considered a Triple Mutation relative to the Z255 branch - in the sense that it mutates forward to 23 then back to 24, then back to 23 again. But what happens if it flips forward and back 5 times? What would we call that? And what do we call it if it goes two steps forward and one step back? This is where terminology fails us. I'm not sure if there is a standardised way of describing these different kinds of mutation (if there is, please leave a comment below).
  • the mutation 390 24-23 occurs in Branches B & C ... relative to its value of 24 in the Z255 branch, this could be considered a Parallel Forward Back Forward Mutation ... for Pete's Sake!!

But if we just focus on the Back Mutations that occur downstream of the branch characterised by the STR mutation (710 36-37), just above the A5627 SNP Block. This "710 branch" incorporates all the Gleeson's of Lineage II, from Branch A to F.* On this overarching branch for Lineage II, the value of the STR marker 390 is 23 and Back Mutations are as follows:
  • 390 24-23 ... B, C ... this is the only Back Mutation below the "710 branch"
  • And it is also a Parallel Mutation
  • All the other yellow Back Mutations are relative to the upstream Z255 branch, and not our downstream "710 branch", and so are not counted in this particular exercise.

So, let's generate some statistics from these numbers:
  • The total number of mutations below the "710 branch" (irrespective of whether they are forward or back) is 71.
  • There are 69 Forward Mutations (i.e. away from the original value of the relevant marker on the "710 branch")
  • There are 2 Back Mutations 
  • There are 26 Parallel Mutations
  • Forward Mutations outnumber Back Mutations by a ratio of 35.5 : 1
  • Parallel Mutations outnumber Back Mutations by a ratio of 13 : 1
  • There are 16 people in this tree, and if we make the big assumption that the "710 branch" starts 1000 years ago (i.e. roughly at the time of the introduction of the Gleeson surname), then over the course of 1000 years, the rate of each type of mutation is (crudely) as follows:
    • Forward Mutations = 69/16 = 4.3125 mutations per "line of descent" per 1000 years
    • Back Mutations = 2/16 = 0.125 mutations per "line of descent" per 1000 years
    • Parallel Mutations = 26/16 = 1.625 mutations per "line of descent" per 1000 years
These are crude estimates but they give some idea of the relative importance of Parallel Mutations compared to Back Mutations. And applying this information to the phenomenon of Convergence, it would seem that Back Mutations play a very minor role compared to Parallel Mutations.

In a subsequent post we will see how these calculations stand up when we add in additional data from 12 SNP Pack results and reconfigure the tree into the next version of the "best fit" model. And we will also attempt to quantify the total number of Back & Parallel Mutations below the upstream marker Z255.

Maurice Gleeson
May 2017

* the Big Y results of a 10th member of the group indicate that this branch is characterised by the SNP A5631 although this result is not reflected in this version of the MHT

Wednesday, 24 May 2017

Z255 SNP Pack Results - a first look

Last month (April), 14 members of the Gleason DNA Project underwent testing with the newly revised Z255 SNP Pack. FamilyTreeDNA were very quick to process the requests and the results of 12 of these members have already been returned from the lab.

Below are the top-line results of these first 12 tests. Only the most relevant SNP marker data has been extracted below and individual sections can be enlarged by clicking on the image. A pdf version of the complete data can be downloaded using this Dropbox link here.

To orientate you to the table, the SNP marker results of each of the 12 members are arranged in columns B to M. Each column has the initials, G-number, and kit number for each member. SNP markers highlighted in pink tested positive in that particular individual. The various branches of the Mutation History Tree for the North Tipperary Gleeson's (Lineage II) are indicated at the top and at the sides and are coloured in the same colours as in the current diagram of the tree (see below).

So what do the results tell us?

First off, working down the tree from marker Z255, all 12 members tested positive for the following markers: Z255, Z16437, Z16438, and BY2852. All but one person tested positive for BY2853 and BY2854.

Then we arrive at the Gleeson-specific markers, starting off with A5631 - everyone tested positive for this marker, as previously predicted from the results of the 10th Big Y test (from our Clan Gathering Chairman, Michael G Gleeson, 371202). These results came in after the tree diagram below was drawn and effectively confirmed that Branch F (predicted solely from STR data) did in fact exist. 

Below A5631, the Gleeson Tree splits into 2 branches. The first of these branches is Branch F and the first impressive results of the SNP Pack testing have revealed that 2 of the 12 members (G-104, G-97) belong to this branch. Their SNP data has helped define a new SNP block below A5631 consisting of the following SNPs (which up until now were "Private SNPs", present only in our Chairman, MGG-371202): 
  • BY14189
  • BY14193
  • BY14194
  • BY14195
  • BY14197

The other branch below A5631 is characterised by a SNP block consisting of 3 SNPs: A5627, A5629 & A5630. These 3 SNP markers are shared by Branches A through E in the Lineage II group. Ten of the 12 members tested positive for these 3 SNPs ... except for member G-75. His result came back negative for SNP A5627. This could be a back mutation in this SNP, and so FTDNA are retesting this single SNP marker to be sure.

Current version of the MHT for Gleeson Lineage II

Below the A5627 Block, there are two branches:
  • A5628, which in turn splits into Branch A and Branch B ... and possibly Branch E
  • BY5706, which in turn splits into Branch C and Branch D ... and possibly Branch E

From the above, you will see that the placement of Branch E on the current Gleeson Tree is in some doubt. This is because we have had no SNP data available for Branch E up until now and its placement in the tree has been based on STR data alone. Its ambivalent placement became apparent when we ran Dave Vance's SAPP programme using the STR data and this showed that it could equally be predicted to be nearest to Branch B as Branch C - this was discussed in a previous post (Dec 2016). However, a second major benefit of the SNP Pack testing is the revelation of the correct placement of Branch E. 
  • Member G-75 (MG, 371160) from Branch E tests positive for the SNPs BY5706, BY5707, BY5708 & BY5709. 
  • The first (BY5706) is common to both Branches C & D, and the latter three are SNPs that characterise Branch C. 
  • Thus Branch E is now identified as a sub-branch of Branch C.
And on the topic of Branch C, two of the 12 members (G-89 & G-22) tested positive for Branch C SNP markers (BY5707, BY5708 & BY5709). Not only that, but their results effectively split Branch C into 2 different sub-branches: C1, C2
  • C1 - characterised by the SNP A13116 and shared by members G-89 and G-66
  • C2 - characterised by the SNP Block A13110, A13112 & A13113, and shared by members G22 & G71

This new categorisation leaves member G-71 with the Private SNPs A13111 and FGC19590; and leaves member G-66 with the Private SNPs A13114 & A13115.

None of the 12 members tested positive for the SNPs characterising Branch A, Branch B or Branch D.  However ... 
  • 1 member tested positive for A5628 and therefore sits on a new branch adjacent to Branch A and Branch B
  • 4 members (G-81, G-108, G-77 & G-18) tested positive for BY5706 but no downstream markers - they therefore sit on a new branch (or branches) adjacent to Branch C and Branch D
  • 2 members (G-94 & G-110) only tested positive for the A5627 SNP Block and nothing further downstream. They therefore sit on a new, relatively upstream branch (or branches) on the Tree.

In the next post we will look at the revised Mutation History Tree for Gleeson Lineage II, incorporating these new SNP Pack results.

We'll also look at what this SNP Pack testing has told us about the nature of the evolution of the Gleeson surname and how it has helped individual members with their genealogical research.

Maurice Gleeson
May 2017

Saturday, 20 May 2017

Convergence - what is it?

There are several phenomena encountered in the the analysis of Y-DNA STR data that can throw a genetic spanner in the works, and Convergence is one of them!

In genetic genealogy, Convergence occurs when two men have DNA signatures that are exactly or nearly identical, but have evolved that way purely by chance. As a result, the two men will show up in each others' list of matches and will give the false impression that they may be closely related (e.g. within the last several hundred years) when in fact they are much more distantly related (e.g. within the last several thousand years). The problem is we cannot tell that Convergence has occurred simply by looking at the two men's STR results. It is hidden from our view. We cannot see it just by looking at the present-day STR data. And the danger is that if the two men think they are closely related, they may start chasing their common connection, thinking that they will find the answer via further documentary research, when in fact there is little hope of that at all. Their "close match" is a red herring. And their pursuit of the Common Ancestor is a wild goose chase.

So what can we do about it? How can we recognise it? How can we avoid it wasting our precious research time?


The concept is occasionally discussed in Facebook groups or on various blogs, but there tends to be quite a lot of confusion around what it actually means. And there are a variety of quite understandable reasons for this. 

Firstly, there isn't a standard definition for Convergence, so how it is used varies from person to person.  Some people apply it only to exact matches, others apply it to exact and close matches. Moreover, the concept of Convergence is closely tied up with the concept of lack of Divergence. Both are different phenomena, but their effects and consequences are very similar. Another contributing factor is the fact that it is difficult to see it or detect it in practice. We know that it exists, but we have no way of identifying it just by comparing two sets of STR results. In other words, it's largely a hidden phenomenon (like Black Holes). It is only when we do SNP testing that the extent of Convergence becomes apparent. And the problem is that not enough people have done SNP testing. 

The good news is that more and more people are doing SNP testing and as they do, the extent of Convergence becomes more apparent. The Lineage II members in the Gleason DNA Project are trailblazers in this regard and we will explore the results of the recent Z255 SNP Pack testing in subsequent blog posts.

But in this post, we will look at an example of Convergence from the Gleason DNA Project in order to illustrate some of the key characteristics and consequences of Convergence. In later posts, we will look at clues that may indicate that Convergence is present, attempt to quantify the number of Back Mutations & Parallel Mutations that occur over time (using the Mutation History Tree that we have previously constructed for Lineage II - the North Tipperary Gleeson's), and finally we will attempt to quantify Convergence itself.

But first of all, let's look at some of the aspects of the definition of the term.


A general definition for the term convergence from the Conicse Oxford English Dictionary illustrates some general characteristics of convergence that are worth exploring because they are of relevance to how the term is applied in genetic genealogy and to the analysis of Y-DNA STR data in particular:
converge  1. come together from different directions so as eventually to meet
convergent  2. Biology (of unrelated animals and plants) showing a tendency to evolve superficially similar characteristics ...
There are several important aspects to these definitions that we can apply to the analysis of STR data (e.g. your 37 marker data). First of all, the sense that things were initially apart, but then they come together. Secondly, the idea that two things can look the same or similar on the surface, but in fact they have come from very different directions. And thirdly, the idea that two things can evolve from something different into something the same.

Let's look at how this more general concept can be applied to the analysis of Y-STR data.

And a good starting point is the description of Convergence on the ISOGG Wiki:
Convergence (also known as evolutionary convergence) is a term used in genetic genealogy to describe the process whereby two different genetic signatures (usually Y-STR-based haplotypes) have mutated over time to become identical or near identical resulting in an accidental or coincidental match.
One can think of convergence as producing misleading matches – two men appear to be more closely related than they actually are. The same situation may result (very occasionally) if there is an exceptional lack of divergence. In other words, so few mutations occurred in the descendants of a common ancestor over the course of time that the common ancestor may appear to have lived only a few hundred years ago when in fact he lived much further back than that, perhaps several thousand years ago.

So let's pick apart some of the key elements of this definition. You might like to refamiliarise yourself with some basic concepts, such as the different types of DNA markers (STRs and SNPs), and what you are actually seeing when you look at the DNA Results page.

Basic Concepts

Firstly, the above description of Convergence refers to the genetic signature - the Y-STR haplotype. This is the string of numbers you see associated with your results on the DNA Results page of the project. I like to think of it as if all the Y-chromosomes of the men in the group were all stacked up on top of each other, in such a way that each of the individual markers along the chromosome were all aligned with one column for each marker. Thus in the diagram below, each of the men have a value of 13 for the first marker. The values for the second marker are a mixture of 23 and 24. And so on.

The Y-STR results for the men of Lineage II
(click to enlarge)

Another key point in the above description is the concept that some markers mutate over time e.g. the number changes from 14 to 15. These mutations are identified by comparing the value in each square to the modal value for the entire group (i.e. the most frequent value among the men in that group). The most frequent values for each of the markers are used to generate the "modal haplotype" which is a virtual signature constructed from these most frequent values (and is represented by the row marked "MODE", the 3rd row from the top in the diagram above).

Mutations are indicated by coloured squares. If the value for any marker is the same as the modal value for that marker (i.e. the most common value among the men in that group), then the square that the value is in will not have a colour. If however, the value is higher than the norm, it will be coloured pink; if it is lower than the norm, it will be coloured purple.

If you and someone else have exactly the same string of numbers, you will have the same coloured squares and the same "no-colour" squares. If you are not exactly identical, you will have some coloured squares that the other person does not have ... and vice versa. In other words, the sequence of numbers, and hence colours, will be different. Each coloured square represents a mutation - a small minor increase or decrease in the number (compared to the norm) for that particular marker, in that particular individual.

Convergence in theory

Let's imagine that some distant ancestor living 10,000 years ago gave rise to four distinct lines of descent surviving today (represented by the men A, B, C, and D in the diagram below). Let's look at what happened to their first 37 STR markers over time, and let's assume that mutations only occurred in 5 of these STR markers, as shown in the diagram below. How did the values  change over the passage of time, from 10,000 years ago to the present day? And how many of the descendants of this ancestor "match" each other today?

In descendant A, only one of these 5 STR markers mutated. It underwent a single mutation (from 13 to 14) about 6000 years ago, and that was the only mutation over the span of 10,000 years. This is an rather extreme example of "lack of Divergence".

Descendant B had several mutations in his line of descent, but only affecting the first and the fifth markers. These show progressive "forward mutations" away from their original values. With the first marker, the mutations go forward in an upward direction (14,15,16,17) whilst with the fifth marker they go forward in a downward direction (15,14,13,12). This latter may seem counterintuitive but it serves to emphasise that "forward" means "away from" the original value, no matter if it is up numerically or down numerically.

Descendant C also has experienced mutations in only the first and fifth marker. But here we see two examples of a Back Mutation. The first marker shows a forward mutation 6000 years ago (13 becomes 12) but this has gone back to 13 by 4000 years ago. It then undergoes another forward mutation by the time of the present day (13 to 14). Similarly, the fifth marker undergoes a forward mutation (16 to 17) by 4000 years ago but a Back Mutation by 2000 years ago.

Descendant D undergoes mutations on all 5 of his STR markers. A Back Mutation occurs with the second marker between 2000 years ago and the present day (15 to 14); and likewise with the third marker (12 to 13); and likewise with the fifth marker (17 to 16). Two Back Mutations occur with the fourth marker (29 to 30 by 4000 years ago; and 31 to 30 by the present day).

Mutations over time in 4 distinct lines of descendants

Remember, these are four distinct lines of descent, with the MRCA (Most Recent Common Ancestor) represented by the first row of 5 STR markers in the diagram above. So now let's look to see if any of the mutations that occurred in these four individual lines of descent occurred in parallel i.e. the same mutational change occurred in two completely separate lines of descent.

Have a look at the first marker in A, B and C. All three men developed the same mutation on this marker - a change from a value of 13 to 14. In Lines A and B this change occurred in parallel around 6000 years ago. In Line C, the change occurred in parallel around about the present day.

There is a similar parallel mutation between Line C and D. Look at the fifth marker - it increases in value from 16 to 17 around about 6000 years ago in Line D and 4000 years ago in Line C.

And there is a parallel back mutation present in Lines C and D also - the fifth marker switches from 17 to 16 about 2000 years ago in Line C and around about the present day in Line D.

With Back Mutations you are only looking at a single line of descent. With Parallel Mutations we are comparing two or more lines of descent. And we will see that in practice Parallel Mutations are much more common than Back Mutations and have a much greater role to play in the development of Convergence.

The STR results of living people today tells us nothing about their evolutionary history

Which brings us to Convergence itself. Let's look at the Genetic Distance between each of these lines of descent. This helps to make the point that the DNA results from living people are only a snapshot in time. They do not tell us anything about how those STR values have evolved over the past 10,000 years:
  • A and B have a Genetic Distance (GD) of 7. This is made up of a 3-step difference on the first marker (14 vs 17) and a 4-step difference on the fifth marker (16 vs 12). And as these were the only changes on their first 37 markers, the GD would be written as 7/37. This exceeds FTDNA's threshold for declaring a match (i.e. 4 steps or less over the first 37 markers; written as 0-4/37) and so A and B would not appear in each other's list of matches.
  • A and C have a GD of zero. They are an exact match. Their GD for the first 37 markers is thus 0/37. They appear in each other's match list and the match looks really close. They think they have a common ancestor in the last few hundred years. They start comparing family trees, looking for the elusive ancestor. They will never find him. This is a wild goose chase. This is the consequence of Convergence.
  • A and D have a GD of 2 (or 2/37). This GD falls within the threshold for declaring a match. They both appear in the other's match list. They email each other, looking for the common ancestor - another wild goose chase. Another example of Convergence and its consequences.
  • B and C have a GD of 7/37. No match.
  • B and D have a GD of 9/37. No match.
  • C and D have a GD of 2/37. It's a match. It's Convergence. They don't know that. They spend months researching their connection. It's a wild goose chase.

The STR results of people living today tell us virtually nothing about how those STR marker values have evolved over time. They may have come from a relatively recent common source, or they may have come from widely differing directions.

Below is another way of conceptualising how the numerical value of a single STR marker might evolve over time. This marker started out with a value of 8 for the common ancestor of 4 distinct lines of descent. But by the time of the present day, two lines had a value of 9, one had a value of 13 and one had a value of 5. But the evolutionary history of these 4 lines of descent is peppered with Back Mutations and Parallel Mutations:

  • Back Mutations
    • Line 2 (red) - 14 becomes 13 some time between 1000 years ago and the present day (0)
    • Line 4 (purple) - 4 to 5 between 1000 and 0 years ago
    • Line 3 (green) - 5 to 6, 6 to 7, and 7 to 8 between 7000 (7K) and 4000 (4K0 years ago
  • Parallel Mutations
    • 8 to 9 in Line 2 (10K to 9K), Line 1 (7K to 6K), and Line 3 (2K to 1K)
    • 8 to 7 in Line 3 (10K to 9K) and Line 4 (9K to 8K)
    • 7 to 6 in Line 3 (9K to 8K) and Line 4 (7K to 6K)
    • 6 to 5 in Line 3 (8K to 7K) and Line 4 (4K to 3K)

The evolution of values in a single STR marker over time in 4 descendant lines
of a common ancestor who lived some 10,000 years ago

The consequence of all these Parallel & Back Mutations is that the present day descendants of two of the lines (green Line 3 & blue Line 1) have exactly the same numerical value for this STR marker despite the fact that their evolutionary histories are so different.

This is an example of the evolutionary history for a single STR marker. And if this is representative of all STR markers, then the chances that the values for a particular marker will converge over time is really quite high. But our DNA results usually consist of 37 markers (the standard test most people start with) so what are the chances of the first 37 markers evolving in such a way as to result in convergence of a sufficient number of STR values to cause a coincidental match? ... well, the probability of that happening would be a lot lower. And the probability would be lower still with 67 markers, and lower still with 111 markers. But because so many people have tested (over 600,000 currently), we do see the phenomenon occurring even at higher marker levels (67 and 111).

And in a subsequent post we will look at clues to the presence of Convergence, so that you can look at your own or anyone's list of matches and adjust your suspicion level accordingly.

Convergence in practice

And to illustrate these points, I have temporarily moved one of the ungrouped project members into Lineage II, namely member Jim Treacy (B38804)*. He is third from the end in the diagram below. Don't worry about not being able to read the text (you can click to enlarge the diagram if you like) - just focus on the coloured squares. 

The Y-STR results for the men of Lineage II (with a Treacy third from the end)
(click to enlarge)

And Jim has no coloured squares for the first half of the markers. It is only when we reach the 19th marker in the row that he has a pink square with the value 16 inside it - everyone else in that column has a value of 15 for that marker, except for one person who has a value of 14. And as we continue along Jim's row, there are 4 other coloured squares, bringing the total to 5. This can be expressed as a Genetic Distance of 5/37 from the modal haplotype (i.e. the 3rd row from the top, which - to remind you - is a virtual signature constructed from the most frequent values for each of the markers).

Now a GD of 5/37 between two men would mean that they do not appear in each others' list of matches (because FTDNA have set the threshold for "declaring" a match to be 4/37 or less). But among Jim's list of matches at the 37 marker level, there are two members of Lineage II (with a GD of 4/37). And at the 67 marker level, Jim has 6 members of Lineage II among his matches (with a GD of 6 to 7/67). So this looks (on the surface) that Jim is relatively closely related to our Lineage II group. And this suggests (on the surface) that there may be a common ancestor some time in the past several hundred years, maybe somewhere between 1700-1850 (on the basis of TMRCA calculations based on the TiP Report). 

So what do we do next? Do we start looking for documentary evidence? Do we go back to the church records and land records and old newspapers to see if there is mention of a Gleeson-Treacy connection? 

We could do. But it would be a wild goose chase. Because the Treacy-Gleeson connection is a red herring. And we know this because we have done SNP testing.

Jim has done the Big Y test, as have 10 of the members of Lineage II. Both Jim and Lineage II members belong to Haplogroup R, and both share some SNP markers in common. Each marker characterises a branching point in the Tree of Mankind and a SNP Progression is a list of these SNP markers down to the finer "more downstream" branches of the Tree. Here are the SNP Progressions for Jim and for the Lineage II Gleeson's:
  • R-P312> Z290 > L21> DF13 > ZZ10 > Z255 > Z16437 > A557 > Z29008 > A10891
  • R-P312> Z290 > L21> DF13 > ZZ10 > Z255 > Z16437 > Z16438 > BY2852 > A5631

You can see that the branching points are exactly the same ... until marker Z16437. Thereafter, Jim goes down one branch and the Gleeson's go down another one. Now, let's be clear: the Gleason's and Jim do share a common ancestor. And if he was around today he would test positive for the SNP marker Z16437. But his children would have evolved along different paths - one path taking us down to our present-day Jim Treacy, the other taking us down to our present-day Gleeson's. You can see where Jim and the Gleeson's are placed on the Tree Mankind in the diagram below.

Gleeson's to the left, Treacy's to the right, & about 1500 years in between

And when did this common ancestor live? YFULL date the formation of Z16437 as 1650 years ago. The two markers downstream of this, A557 (Jim Treacy) and A5631 (Gleeson), both have formation dates of 1400 years ago. So from this we can say that the common ancestor of Treacy & the Gleeson's is somewhere between 1400 to 1650 years ago. Or to give it an actual date (by subtracting from 1950, the approximate birth year for members of Lineage II), sometime between 300 and 450 AD.

This is clearly a lot further back in time than the 1700-1850 AD estimate suggested by the STR data.

So this is a great example of Convergence. By chance, Jim's STR signature has evolved over time to approximate that of the Gleeson's of Lineage II and as a result, he looks a lot more closely related to the group than he actually is.

Maurice Gleeson
May 2017

* a big thank you to Jim for allowing me to use his name and his results in this example

Tuesday, 2 May 2017

Origin of Thomas Gleason 1609-1687

      Everyone who uses John Barber White’s book, Genealogy of the Descendants of Thomas Gleason…, as a research source should be aware that White himself had doubts about the origin of Thomas Gleason that he suggested in his work. “Of the parentage and birthplace of this Thomas Gleason no positive knowledge has been obtained,” he states in his brief foreword. One cannot fault him for trying, however, although his hypothesis has been found to be incorrect. The book was written in 1909 and certainly genealogical research was a challenge in those times. Today we are fortunate that resources are available everywhere, often digitized and online. One hundred and five years after White, the true origin of Thomas was published, and is summarized here:

THOMAS GLEASON was christened as Thomas Gleson in Cockfield, Suffolk, England, 3 September 1609, the son of Thomas and Anne (Armesby) Gleson.1 He died in Cambridge, Massachusetts, about 1687.2 On 31 July 1634 he married Susan[nA] Page in Cockfield.3 She was baptized on 4 December 1614 in Ingham, Suffolk,  the daughter of Thomas and Susanna (___) Page of Ingham and Hawstead, Suffolk.4 Susanna (Page) Gleason is believed to have died in Boston, Massachusetts, 24 January 1691.5

For an overview of all the relevant Suffolk records concerning this family, including Thomas’s parents and four children, see: Judith Gleason Claassen, “The Origin of Thomas Gleason of Watertown and Cambridge, Massachusetts,” NEHGS Register, Vol. 168 (January 2014), 5-15.

­­­­­       1 Original parish registers of Cockfield, Suffolk, at Suffolk Record Office, Bury St .Edmunds (SROB), FL552/4/3; FL552/4/1; Transcripts of parish registers of Cockfield, Suffolk, 1561-1922 [FHL 0,993,235], 42, 49. The name Armesby is written variously in the register as: Armsby, Armsbye, Armsbie, and Arnsby. The baptisms of four children are also found in these records.
         2 Thomas is presumed to have been living at the time of Anna (Hanna) Winn’s death in 1686, since she left him a bequest in her will, yet no longer living when his daughter Ann was baptized as an adult in January of 1687/8.
          3 SROB FL552/4/4; Cockfield Transcripts [note 1], 74.                              
        4 William Brig, The Parish Register of Ingham, Co. Suffolk: Baptisms 1538 to 1804, Marriages 1539 to 1787, Burials 1538 to 1811 (Leeds: Knight, 1909), 9. The E-book is available online at; SROB IC500/2/57; Suffolk Family History Society, Suffolk Burial Index, CD-ROM (2005), Index by Parish, Noz-Pee: 532. Wife Susanna was buried 1631, a second wife, Elizabeth, in 1645.
          5 SC1/series 45X, Massachusetts Archives Collection, vol. 37: 64; Daniel Angell Gleason, “Thomas Gleason (Leson) and Susanna Page,” manuscript in R. Stanton Avery Special Collections Department of NEHGS, Mss A4060, Part I:24. "Widow Gleason" was included in the list of charges of the keeper of the Suffolk County gaol. If not Susanna, this widow may have been the relict of son Philip.

Judith Gleason Claassen
May 2017