Thursday, 30 March 2017

The Cork Gleeson's - analysis of Surname Distribution Maps

In the previous post, evaluation of the surnames of matches of the two members of the Cork Gleeson's group (Lineage V) returned an array of relatively unusual surnames. In this post, we explore where these surnames were distributed within Ireland and the UK based on 1881 (UK) and 1901 (Ireland) census data available from the Forebears website.

The matches of the two members included the following surnames and their relative frequencies:

Matches of DG-8367 (at 37 marker level)
Anglin/Anglen ... x14
Conner ... x3 (1 from Carlow)
Enderton ... x2
Enright ... x3
Guthrie ... x3 (same ancestor)
Mauck ... x2
McEachern ... x2 (1 from Islay, Scotland)
O'Donoghue ... x2
Popkins ... x2 (1 from Orkney)
Rippere ... x2
Roddy/Ruddy ... x9 (mainly from Co. Down; most have same ancestor)
Sinclair ... x2 (1 from Islay, Scotland)
Spain ... x7
Wright ... x2

Matches of PMG-6085 (at 67 marker level)
Anglin ... x4
Conner ... x3 (as above)
Dever ... x2 (both from Kilmacrennan, Donegal)
Devine ... x6 (all from Conwal, Co. Donegal; most have same ancestor)
Enderton ... x2
Guthrie ... x3 (as above)
Leonard ... x2
Mauck ... x2
McCauley ... x3 (1 from Islay, Scotland)
Moore ... x3 (same ancestor)
Roddy/Ruddy ... x9 (as above)
Wright ... x2

Below are distribution maps relating to each of the surnames in the list (click to enlarge individual maps).

The above diagrams show that the frequency distributions of some of these surnames (specifically Anglin, Enright, O'Donoghue and Spain) show a clear preponderance in the southern half of Ireland. This distribution is not exclusive to Ireland and there are reasonable numbers of all these surnames within the UK in 1881, suggestive of emigration from Ireland to these areas.

Other surnames are clearly dominant in the northern part of Ireland (examples being Dever, Devine, McCauley, Roddy & Ruddy), whilst others are predominantly seen in Scotland (Guthrie, McEachern, Sinclair) or England (Conner, Popkins, Wright). Some surnames are fairly ubiquitous throughout the islands (Leonard, Moore).

So all in all, there is no clear signal that the Cork Gleeson's are associated with any particular geographical area. There is some evidence that supports a southern Irish (SI) association, some to support a northern Irish (NI) association, and some to support a possible origin in Scotland (SC) or England (EN).

It is not immediately apparent why there is such a geographical spread of surnames associated with the genetic signature of the Cork Gleeson's. However, there are three possible explanations. Some (in all probability) represent pre-surname cousins of the Cork Gleeson's whilst others could be the result of Surname or DNA Switches (SDSs or NPEs*) in the last several hundred years. It is also possible that the genetic signature of the Cork Gleeson's is itself the result of an SDS - and everybody has (on average) a 50% chance that this is the case. At this stage of the game it is not possible to tell which came first - the Gleeson chicken or the Popkins egg. Only further recruits to this genetic family will help answer the question.

A third possibility is that there is a degree of Convergence within this group of apparently unrelated surnames and that some of these matches are merely coincidental i.e. the common ancestor is a lot further back than might be suspected from the closeness of the match. Is there any evidence of this? Can we tie in some of the SNP data to some of the surnames? Possibly ...

The Cork Gleeson's are likely to sit somewhere below FGC13411

The analysis of the terminal SNPs of the matches was discussed in the previous post, and concluded that there were two possible downstream branches on which the Cork Gleeson's might sit, characterised by the (currently) terminal SNPs FGC32363 or BY11289, both crudely dated to 1650 years ago. According to Mike Walsh's descendancy tree for the L513 subclade, FGC32363 is associated with the surnames Roddy (NI) and Anglin (SI), whilst BY11289 is associated with Dever & Devine (both NI). Additionally, reviewing the matches list of the two Cork Gleeson members reveals that the surnames McCauley (NI), Roddy (NI) and Wright (EN) are associated with FGC32363 (or one of its downstream branches), whilst Dever (NI), Devine (NI), Moore, and Rooney (NI) are associated with BY11289. Thus there is currently too little data to suggest that either branch is likely to be associated with a particular geographic region.

However, we will have the answer soon enough - one of the members has just ordered the new S6365 SNP Pack and the results will reveal where the Cork Gleeson's sit on the Human Evolutionary Tree. 

Maurice Gleeson 
March 2017

*NPE, Non-Paternal Event

Monday, 27 March 2017

The Cork Gleeson's - birth of a new genetic family

At the Gleeson Clan Gathering in August last year, I presented an evaluation of the DNA evidence thus far and asked the question: does the DNA evidence support what we know of the origins of the Gleeson surname from historical texts and documentary evidence?

You can watch a video of the presentation here (the relevant section starts at 21:25). The review of historical texts included an evaluation of some of the following strands of evidence. This analysis needs to be completed and I will attempt to summarise the findings in subsequent blog posts but for now I include links (where available) to the main sources in case anybody wants to explore these themselves:
  • Surname Dictionaries written by MacLysaght, Woulfe and O'Hart (you can read about these in a previous blog post here)
  • accounts of the Gleeson name in Ancient Texts such as ... 
  • secondary sources such as books by the following authors:
    • Dermot F. Gleeson - The Last Lords of Ormond (1938) and A History of the Diocese of Killaloe (1962) ... watch a video of a lecture about him by local historian Danny Grace here from the Gleeson Clan Gathering, August 2016)
    • Rev. John Gleeson - History of the Ely O'Carroll Territory (1915)
    • David Austin Larkin (Irish Septs, 2007)
  • articles from academic journals, including ...
    • JCHAS, Journal of the Cork Historical & Archaeological Society (1895-1945)
    • THJ, Tipperary Historical Journal (1988+)
    • NMAJ, North Munster Antiquarian Journal (1936+)
  • Surname Distribution maps
  • other miscellaneous sources (e.g. Clans of Ireland website, etc)

One of the main conclusions from this overview was that there are potentially three separate origins to the Gleeson surname and its variants - one in North Tipperary (represented in all probability by Lineage II within the DNA project), one in Ulster (McGlashin), and one in East Cork, centred around the area known as Imokilly from the Gaelic Uí Macc Caille or Uí Meic Caille (a group I have dubbed "the Imokilly Gleeson's"). The Imokilly Gleeson's have been present in Cork since the 1100s when they were land stewards ("betaghs") for the FitzGerald family (1). However, one of the enduring questions that has continued to intrigue me is: did the Imokilly Gleeson's and the North Tipperary Gleeson's arise from the same stock ... and therefore, will they bear the same genetic signature?

Well, it would appear from the recent results of a new project member that the answer to that question is: No - the North Tipperary and Imokilly Gleeson's are separate and arose from different origins ... at least for now (until further DNA evidence emerges to the contrary).

The Uí Glaisín were a sub-sept of the Uí Meic Caille who were a sub-sept of the Uí Líatháin

Up till now, there has been no one with the surname Gleeson from East Cork in the project. But that changed recently thanks to the efforts of Phyl Irwin from Cork and David Gleeson from Thurles who are helping to recruit new members to the project from the Cork area. David paid a visit to Youghal (in the heart of Imokilly) and recruited a new member (DG-8367*) who has a Gleeson pedigree that goes back to 1833 in Pilmore, Youghal, Co. Cork. This new members Y-DNA-37 results came back from the lab last week and they are very revealing indeed!

STR marker analysis

DG-8367 (G-109) has 98 matches at the 37 marker level and 286 at the 25 marker level. But his closest match is another project member who until now has been residing in the Ungrouped section. DG-8367 matches PMG-6085 (G-101) with a Genetic Distance of 1/37 (i.e. one step away from an exact match). Together these two close matches have formed a new genetic family, Lineage V (the Cork Gleeson's). 

Neither of these two individuals have any other close matches within the project.

The single mutation between these two individuals is on the marker CDYb, which is a very fast mutating marker, and this supports a very close relationship between the two project members. The second project member (PMG-6085) has a contradictory family history - some family members maintaining a Belfast origin for their Gleeson ancestors and some maintaining a Cork origin. The DNA results would lean strongly toward the latter.

The TiP Report for these two people suggests that there is a 95% chance of them being related in the last 7 generations and 74% chance within the last 3 generations. Taking the former estimate, this would mean that the two families of Lineage V have carried the Gleeson surname since the mid-1700s (i.e. 7 x30 = 210 from 1950 = 1740). However, it is not clear if the surname was associated with the same genetic signature prior to this time. We would need more Cork Gleeson's (with a more distant relationship to these two members) to join the project in order to get a better estimate of how far back in time the Gleeson name and the particular DNA signature of Lineage V are associated. It may only go back as far as the 1700s, or it might go all the way back to the origin of the surname about 1000 AD. At this stage, we simply cannot say one way or the other.

Comparing the pedigrees of both men does not reveal a common ancestor. The MDKA (Most Distant Known Ancestor) for DG-8367 is a Michael Gleeson born about 1833 in Pilmore, Youghal, Co. Cork; and for PMG-6085, the MDKA is Patrick Gleeson born about 1852 in "Ireland" (possibly Cork). These two MDKAs may have been brothers or first cousins, making our two project members 3rd or 4th cousins. This could be explored further if the two members both undertook autosomal DNA testing and/or upgraded to 67 or 111 STR markers.

Surnames of Matches

Looking through the list of surnames in each of these two mens matches, what stands out is a number of repeating surnames, some of them rather unusual (e.g. Anglin, Mauck, Popkins, Rippere). Where available, the place of origin of the MDKA throws up interesting conundrums - Islay, Orkney, Co. Down, Co. Donegal ... but no obvious origins in Cork.

Matches of DG-8367 (at 37 marker level)
Anglin/Anglen ... x14
Conner ... x3 (1 from Carlow)
Enderton ... x2
Enright ... x3
Guthrie ... x3 (same ancestor)
Mauck ... x2
McEachern ... x2 (1 from Islay, Scotland)
O'Donoghue ... x2
Popkins ... x2 (1 from Orkney)
Rippere ... x2
Roddy/Ruddy ... x9 (mainly from Co. Down; most have same ancestor)
Sinclair ... x2 (1 from Islay, Scotland)
Spain ... x7
Wright ... x2

Matches of PMG-6085 (at 67 marker level)
Anglin ... x4
Conner ... x3 (as above)
Dever ... x2 (both from Kilmacrennan, Donegal)
Devine ... x6 (all from Conwal, Co. Donegal; most have same ancestor)
Enderton ... x2
Guthrie ... x3 (as above)
Leonard ... x2
Mauck ... x2
McCauley ... x3 (1 from Islay, Scotland)
Moore ... x3 (same ancestor)
Roddy/Ruddy ... x9 (as above)
Wright ... x2

In addition to the above, there are two Norman names that stand out, although these are only single occurrences - FitzPatrick and de Burca. Nevertheless, the area around Imokilly was for many centuries a Norman stronghold and both the Barry family and the FitzGerald family held sway here.

However, even though these surnames do not appear to be quintessentially Gaelic, there is not a huge amount that can be concluded currently from this interesting array of names. However, a more complete surname analysis (using surname distribution maps) will be included in a subsequent blog post.

SNP marker analysis

Here is the list of "downstream" terminal SNPs associated with the matches of these two members:

Matches of DG-8367 (at 37 marker level)
BY11270 ... x1 ... sub L513 > S6365 > BY16 > CTS3087 > FGC13411
BY390 ... x2 ... sub L513 > S6365 > BY16 > CTS3087 > FGC13411 > FGC32363
CTS10651 ... x1 ... sub L513 > S6365 > BY16 > CTS3087 > FGC13411 > FGC32363
CTS3087 ... x2 ... sub L513 > S6365 > BY16
FGC13422 ... x1 ... sub L513 > S6365 > BY16 > CTS3087 > FGC13411 > FGC32363
FGC32363 ... x1 ... sub L513 > S6365 > BY16 > CTS3087
L513 ... x6
Z17626 ... x1 ... sub L513 > S6365 > BY16 > CTS3087 > FGC13411 > BY11289

Matches of PMG-6085 (at 67 marker level)
BY11270 ... x1 ... (as above)
BY11284 ... x1 ... sub L513 > S6365 > BY16 > CTS3087 > FGC13411 > BY11289
BY11286 ... x1 ... sub L513 > S6365 > BY16 > CTS3087 > FGC13411 > BY11289
BY390 ... x2 ... (as above)
CTS10651 ... x1 ... (as above)
CTS3087 ... x2 ... (as above)
FGC13422 ... x1 ... (as above)
FGC32363 ... x1 ... (as above)
L513 ... x4
Z17626 ... x1 ... (as above)

This analysis strongly suggests that these project members sit on the L513 branch of the Y-Haplotree (Human Evolutionary Tree) and furthermore, sit on one of the branches downstream of the SNP marker FGC13411. The overall SNP Progression for Lineage V is therefore likely to be as follows:
R-P312/S116 > Z290 > L21/S145 > DF13 > L513/S215/DF1 > S6365 > BY16 > CTS3087 > FGC13411 > ... and either FGC32363 or BY11289

The Cork Gleeson's are likely to sit somewhere below FGC13411

This could be confirmed by further SNP marker testing. The L513 SNP Pack has recently been overhauled and split into 3 downstream SNP packs. The new S6365 SNP Pack includes all the SNP markers relevant to Lineage V and for those interested in further downstream SNP testing, this new SNP Pack or the Big Y test are the preferred options.

YFULL provides the following approximate dates for the emergence of the various SNP markers (and hence the branching points they represent):
  • L513 ... 4300 years ago
  • S6365 ... 3900 years ago
  • CTS3087 ... 3600 years ago
  • FGC32363 ... 1650 years ago
  • Z17626 ... 1650 years ago
So further SNP marker testing will move these project members further downstream along the Haplotree to about 1650 years ago, which is about 350 AD, a time when the Gaelic clans were just beginning to emerge.

Looking at the L513 section of the Haplotree on Alex Williamson's Big Tree, and in particular the FGC13411 sub-branch reveals that it contains a mixture of Scottish and Irish names. There are too few people tested at this point to draw any firm conclusions about the origins of the different branches below FGC13411.

The FGC13411 sub-branch of the L513 portion of the Y-Haplotree

Turning to the L513 Haplogroup Project, the Administrator Mike Walsh has produced an excellent diagram of the various branches of this particular subgroup and you can view it and download it here. Mike's version incorporates more data than the Big Tree diagram above.

The FGC13411 portion of Mike Walsh’s experimental L513 Descendant Tree (version Mar 25, 2017).
The current version is maintained at
Use permitted courtesy of Mike Walsh. 

Interestingly, most of the names below FGC13411 (top left of the diagram above) appear to be of Irish origin, with some being assigned to "Ireland West" (Butler, Scanlan, Sears, Scott, Anglin, Breen), some being assigned to "Ireland North" (Roddy), and some to both (Devine, Dever, Devins). Possibly the most interesting name among this group is Butler, a well-known Norman name. The Butlers of Ormond held sway over large parts of Munster during the Middle Ages and their castles are still evident to this day. In fact, I visited one just last week!

Me in front of Cahir Castle - owned by the Butler's from 1375 to 1961


There is little doubt that these two matches form a new genetic family within the project. Several questions remain however: where did they come from? and are we seeing the genetic signature of the Imokilly Gleeson's?

Their genetic signature is quite unique and separate from the North Tipperary Gleeson's of Lineage II.  There is good evidence from their respective genealogies that their Gleeson ancestors go back to Cork in the early 1800s. But as the two individuals are very closely connected, could we be looking at a relatively recent DNA switch (NPE**) within the last several hundred years in this particular Gleeson branch? It certainly is a possibility. Only by recruiting additional Cork Gleeson's can we further elucidate how long the Gleeson name has been associated with this particular DNA signature, and whether or not it goes back to the original Imokilly Gleeson's of the 1100s. 

The surnames of their matches are an interesting array of rather unusual names, some of them originating in Scotland, others in the north of Ireland, and few (apparently) from the Cork area. The project administrators of the relevant haplogroup projects and geographic projects might be able to throw some more light on this.

The SNP marker analysis suggests a pretty exact position on the Y-Haplotree (i.e. somewhere below FGC13411) and further SNP testing with the R1b-S6365 SNP Pack (or Big Y) should help clarify this. But the mixture of Scottish and Irish surnames associated with this particular portion of the Y-Haplotree is not currently illuminating.

So overall, we could be looking at the genetic signature of the Imokilly Gleeson's ... or we could be looking at a relatively recent DNA switch within the last 200-400 years. Hopefully we can recruit more Gleeson's from Cork and the nature of this new genetic family will become more clear over time. Nonetheless, this is an important and exciting advance for the DNA Project.

Maurice Gleeson
March 2017

(1) The Pipe Roll of Cloyne. JCHAS, 1915, Vol. 21, No. 107, page(s) 136-­145 ... the territory of Oglassyn is mentioned on page 143. I need to dig out the specific reference to the Gleeson's being local land stewards.

* for added security and privacy, the ID number consists of the individuals initials and the last 4 digits of the kit number. The G numbers used on our WFN website are also included.

** NPE, Non-Paternity Event, refers to a Surname or DNA Switch (SDS). The causes of such a switch are manifold and may include adoption, legal name change, a young widow remarrying, infidelity, illegitimacy, social climbing, swearing of allegiance, etc. See the video below from 6:30 onwards for a full explanation.

Here is a the video that explores if the DNA evidence supports what we know of the origins of the Gleeson surname from historical texts and documentary evidence. The section that deals with the Imokilly Gleeson's starts at 21:25.  

Friday, 16 December 2016

L2 MHT - Recommendations for further SNP Testing

So far in Lineage II, we have SNP marker data from 10 "Big Y" tests - the tenth one came in last week and is currently being assessed (member G65, 371202, MGG). Together with the STR marker data (i.e. the Y-DNA-37 test most people took when joining the project), this has allowed us to group people into separate branches of the Mutation History Tree for Lineage II (the L2 MHT for short; as discussed in the recent posts).(1) The pioneers of Lineage II who undertook this Big Y testing have paved the way for a more cost-effective, money-saving SNP testing strategy for the rest of the group, so a debt of gratitude is owed to these project members. Thank you, guys! 

As a result, only in rare circumstances will it be necessary for new members to do Big Y testing ... this will only happen if there is a new distant outlier to the rest of the Lineage II group. And this will happen from time to time as more people join our group.

The new SNP Testing Strategy involves collaboration with another DNA testing company - YSEQ. This is a specialist company headed by husband and wife team, Thomas and Astrid Krahn. Both of them used to work at FamilyTreeDNA but left to form their own company to address a gap in the market. Their niche company offers bespoke SNP testing. In discussions with Thomas, we have come up with a highly cost-effective strategy for further SNP testing in Lineage II. This will allow us to achieve our goals and save money at the same time.

The SNP Testing Strategy is in three phases:
  • Phase 1 - the YSEQ Z255 SNP Panel
  • Phase 2 - SNP Block testing
  • Phase 3 - Private SNP testing
All three phases are fluid and a person may jump between Phases 2 and 3 depending on what the results of new testing reveal.

Phase 1 - the YSEQ Z255 SNP Panel
This will be the starting point for most new members, for most people who have only tested to the 37 marker level, and for the 9 members who are currently ungrouped in our L2 MHT.  The YSEQ Z255 SNP Panel includes 6 SNP markers that are immediately relevant to Gleeson Lineage II.

First off, YSEQ is offering us discounted testing with their Z255 SNP Panel. Luckily for us, their Z255 SNP Panel offers more Lineage II-relevant SNP markers than the corresponding Z255 SNP Pack from FTDNA. YSEQ's Panel covers all the six "most downstream" branches identified so far for Lineage II. These branches are represented by the SNPs in the green boxed portion of the diagram below, namely A5631 (aka Y17108), A5628 (aka Y17112), A660, Y16880, A10634 (aka BY5706), & A10640 (aka BY5707). In addition, new branches can be added to this panel over time, so it will adapt as Gleeson Lineage II grows.

YSEQ's Z255 SNP Panel covers all 6 confirmed branches of Lineage II
FTDNA's Z255 SNP Panel (Dec 2016) - only covers 1 of the 6 branches of Lineage II

The usual price of the YSEQ Z255 SNP Panel is $88 (cheaper than FTDNA's panel) but we are getting it for a considerable discount. Not everyone within Lineage II needs to do this as the revised Mutation History Tree predicts which branch many people are likely to fall on. And simply doing a single SNP test with YSEQ (for $17.50 or less) might be all that is needed to confirm the prediction. More money savings! 

So only the following people need do the Z255 Panel (because their placement on the tree is uncertain and needs clarification):
  • the 9 people who are currently ungrouped in the L2 MHT - see footnote (2)
  • Branch E - any one of the 3 members (G91, G75, G84) could do the test - this will tell us whether or not this Branch is more closely related to Branch B or Branch C
  • Branch F - one of G97 or G98 could do the test to confirm or refute their supposed close connection to G65 (371202, MGG)
So at the moment, the recommendation is for 11 people in total to take the YSEQ Z255 SNP Panel. I will be writing to these people individually over the course of the next week in order to organise funding and ordering of the new kits (they will have to swab again as it is a different company).

Phase 2 - SNP Block Testing
For those people whose sub-branch is easily predicted from their STR results, the Z255 SNP Panel is not necessary. Instead they can jump straight to single SNP testing to confirm which of the 6 major branches they belong to. Once this has been confirmed, the next step would be to test for any "phylo-equivalent SNPs" - in other words, SNPs that occur in the same "SNP block" ... and there are 4 of these in Lineage II, highlighted in the diagram below. This stage of the testing is likely to split these 4 blocks into smaller blocks or even single SNPs, and in this way the number of branches associated with Lineage II will increase, possibly to as many as 18 unique "Gleeson" branches.

The 4 current SNP Blocks of Lineage II

Branch A of the L2 MHT
Currently, there are two people in the project that are at this stage of testing, both of them in Branch A (G54 & G79). These two people are predicted to belong to Branch A (both in the L2 MHT & the SAPP Tree). Both members should test for the single SNP A660 and if positive (95% probability) they should then go on to test for the remaining five SNPs in that particular SNP Block. The chances are that they will be positive for some of them and negative for others. This will "split" the block into two pieces. For example, G54 & G79 may test positive for the first 3 SNPs and negative for the other three. The block will thus be split into 1 section with the first 3 SNPs (with G54 & G79 sitting underneath it) and a second section beneath that with the last 3 SNPs, and the current members (the 3 brothers - G39, G51, G73) sitting underneath that.

In addition, this would identify a new branching point within Branch A, and one (or more) of the defining SNPs could then be added to YSEQ's Z255 SNP Panel. This would benefit future testers.

Phase 3 - Private SNP testing
The third phase is testing of Private / Unique SNPs (i.e. SNPs that are currently unique to one or more specific individuals). These Unique SNPs have been identified via a collaborative effort of many individuals, including Thomas Krahn himself, Alex Williamson, James Kane, the teams at FTDNA and YFULL, various Haologroup Project Administrators, and many others. You can read previous blog posts of how these SNPs were identified below:

The current set of Unique SNPs associated with Lineage II members (3)
(red dots indicate SNPs covered by YSEQ's Z255 SNP Panel)

Currently, there is no one in the project at this phase of testing. But if the Z255 Panel testing of Branch F members confirms a connection between G65 (371202, MGG) and either G97 (524241, PG) or G98 (437908, WG), then G97 & G98 would undertake single SNP testing of the Private / Unique SNPs of G65 (our 10th & most recent Big Y tester). He appears to have 9 Private SNPs (based on preliminary analyses). Testing all 9 SNPs individually would usually cost $351 at FTDNA ($39 per SNP), and $157.50 at YSEQ ($17.50 per SNP = 55% cheaper) but Thomas is offering substantial reductions from the usual YSEQ price if we order in bulk. This SNP Testing Strategy is a lot less expensive than doing the Big Y test at FTDNA (usually $575).

Similarly, if any of the 9 currently-ungrouped individuals (doing the Z255 Panel) test positive for BY5706 as a terminal SNP (and therefore fall into Branch D), they can start single SNP testing of the Private SNPs of members G05 (86192, RLG) & G68 (411177, CG). G68 has 5 Private SNPs and G05 has 1, so it would usually cost $234 at FTDNA, and $105 at YSEQ, but again, Thomas is offering a substantial discount on the usual YSEQ prices if we order in bulk.

Thomas proposes to test for these SNPs sequentially, as this will allow substantial reductions in the cost of doing these tests. And the more people who test, the greater the reductions will be.

Thomas will not test all the SNPs at once. Instead he does sequential testing of single SNPs, usually 1 SNP every 1-2 weeks, and thus the results will trickle in over the course of several weeks or months, and decisions on further testing of the group will be based on the results of the previous test.

In addition, Thomas will add every reasonable new branch to the YSEQ Z255 Panel where new results (e.g. from additional Big Y testing or the above SNP Testing Strategy)  identify at least two people who share the same new branching point (characterised by a new SNP discovery) and are different in at least one other SNP.

So to recap, the new SNP Testing Strategy involves the following potential steps:
  1. Test with the YSEQ Z255 SNP Panel for those whose predicted placement on the Mutation History Tree for Lineage II needs further clarification
  2. Test for phylo-equivalent SNPs (within the same SNP Block) - this may split the block
  3. Test for Private / Unique SNPs - this may identify new downstream branching points
I will write to all 29 Lineage II project members individually with a bespoke testing plan for each member. I will calculate the price, ask all project members (who are willing to take part) to pay the relevant amount to the General Fund, and then I will pay for the tests via bulk order to get the maximum discount. Thomas will then send new (YSEQ) kits to the relevant people so that they can swab their cheeks and get tested.

In this way, the MHT for Lineage II will continue to grow progressively and will increase in complexity and accuracy. This will allow people to see how closely they are related to everyone else within the group and at what timepoint the common ancestor they share with any of the other members is likely to have been born. This information will facilitate more focussed genealogical research using documentary records. (4)

I will also be writing to individual project members to review the genealogical information they have supplied. Now is a good time for everyone to update that information. At the very least everyone should have supplied their basic MDKA information and their Gleeson pedigree. But to really optimise the chances of breaking through your Brick Wall, you should have the information detailed in my previous post about the MDKA Profile. I will review this information with each project member over the course of the next several weeks.

Maurice Gleeson
Dec 2016

(1) the current pdf version of the L2 MHT is available in three parts:

(2) Here are the 9 currently-ungrouped members who should test with the YSEQ Z255 SNP Panel (listed by G-number, kit number, and initials for ease of reference):
  • G99 ... 437986 ... MIG
  • G81 ... 458407 ... BEG
  • G89 ... 498597 ... OMG
  • G77 ... B78262 ... SG
  • G70 ... 371145 ... DG
  • G78 ... 446153 ... AG
  • G92 ... 438324 ... TOG
  • G95 ... 437988 ... TEG
  • G94 ... 437987 ... GMG

(3) These are based on Alex Williamson's analysis from the Big Tree 

(4) Incidentally, Judy has rearranged the order of project members displayed on the DNA Results page of our WFN website to match the current new groupings in the L2 MHT. See below ... 

The DNA Results table for Lineage II on our WorldFamilie.Net website
(click to enlarge)

Wednesday, 14 December 2016

L2 MHT - "difficult to place" people

When we put together the revised Mutation History Tree for Lineage II (L2 MHT), there were 9 people who could not be confidently allocated to sub-branches of the tree. We are going to use two techniques to attempt to place them on the MHT and the first of these is Dave Vance's SAPP programme. We will explore Robert Casey's methodology in a subsequent post.

The SAPP programme is like a turbo-charged version of Fluxus especially designed for genetic genealogists. Fluxus is the software programme I used to help generate the first version of the MHT last year. It is a programme that uses STR data to generate a phylogenetic tree (a.k.a. cladogram or phylogram or Mutation History Tree). The SAPP programme also generates a phylogenetic tree based on STR data, but it has some additional features that make it way superior and far more elegant and user-friendly: 
  • the output is more like a family tree, and less like assembly instructions for Swedish furniture (it's an oldie but a goodie)
  • unlike Fluxus, it incorporates SNP data so that the upper branches of the tree can be anchored effectively 
  • it recognises similar STR signatures and takes these into account when grouping people 
  • it recognises people with known genealogical relationships and groups them together
These features make the SAPP programme a great time-saver and an excellent way of double-checking your work if you have created your MHT manually, as I have. It takes a lot of trial and error (40 minutes in my case) to get the data input "just right" but once you have done it correctly, the output is impressive.

Below is the SAPP Tree output from the SAPP programme for L2 (with some of my own graphic additions) and below it (for comparison) the output from my manually created L2 MHT. You can download higher quality pdf versions of these files from Dropbox by clicking on the captions below each individual graph. And that's the first point - the detail in the images is not easy to see. There is a lot of information concentrated in a small space and that makes reading it very challenging. We have the same problem when trying to navigate through our family tree - it will never all fit on the same page. Best to click on the Dropbox link (the caption below each graph) so you can view it in a separate bowser window, or download the file and open it in a separate programme on your computer.

This SAPP Tree includes all 29 members of Lineage II. The 9 previously ungrouped individuals are indicated by a dashed red border around the relevant boxes. Most of them are sitting away from the SNP-confirmed branches of the tree - the exceptions are G77 under Branch A and G99 & G81 under Branch D.

What you can just about make out without enlarging the image is the colour-coded branches in each version of the tree. There is good concordance between the two trees with regards to Branch A & Branch B (both SNP-confirmed branches) - both have the same membership, the same (or complementary) STR mutations listed, and similar placement on the larger tree in relation to each other.

However, although Branch C (also SNP-confirmed) & Branch E have the same membership in both trees, & the same (or comparable) STR mutations listed, their placement in the SAPP Tree is different - SAPP says Branch E could be genetically closer to Branch C than Branch B. Further SNP testing will be needed to determine this.

Branch F is split in two in the SAPP Tree, with members G97 & G98 sitting quite distantly away from G65. Determining which tree has the correct placement will only be decided by additional SNP testing. The split in Branch F  has also caused a split in Branch D - this is not too surprising as the two members on this branch (G05 & G68) are very distantly related and I am sure this particular branch will split into several smaller branches in due course as more SNP testing is undertaken. In addition, Branch E has also split Branch D and has been placed very differently compared to the other tree.

The SAPP Tree for Gleeson L2
(click image to enlarge, click caption to download pdf)
My L2 MHT for Gleeson Lineage II
(click image to enlarge, click caption to download pdf)

The SAPP Tree raises some important considerations for the Gleeson L2 MHT:
  • It revealed a few data omissions on my part (so it was a good way of verifying my data)
  • It generated Genetic Distance tables which I found very useful (see below). The maximum Genetic Distance (GD) between any two members in L2 was 12/37, 12/67, and 18/111 … and Adjusted GDs (taking into account Back & Parallel Mutations) were a staggering 30/37, 29/67, & 29/111
  • In Branch A, the SAPP programme identified a "more parsimonious" configuration of the branch, as a result of which I have slightly modified my version of Branch A (in other words, it identified a better configuration that made better sense of the data - see diagram below). The revised L2 MHT is available from Dropbox here.

The Old & New Versions of Branch A
  • In addition, all 9 people who could not be placed on the tree previously have now been allocated (provisionally) to specific sub-branches. This allows us to see to whom they are (potentially) most closely related. However, the confidence with which these 9 members have been placed on the SAPP Tree is relatively low (compared to the other members, who have either been SNP-tested or have relatively unique Y-STR Signatures and/or supportive Genetic Distance data). Thus their positions on the SAPP Tree have to be taken with a grain of salt. To confirm whether or not these 9 members have been accurately placed on the tree will require additional SNP testing. 

I have been in discussions with Thomas Krahn from YSEQ and have negotiated a specially-priced SNP Testing Strategy for the Gleeson's of Lineage II. And that will be outlined in detail in the next post.

Maurice Gleeson
Dec 2016

Outputs of the SAPP Programme for Gleeson Lineage II

Wednesday, 7 December 2016

Building the Mutation History Tree - Placement

In the previous post we looked at how we can group people together within Lineage II to form sub-branches. Once we have our sub-branches, the next step is to place each of them on the larger Tree of Mankind.

The starting point is the Modal Haplotype (MH) of the Z255 subclade. I obtained this from two sources - the R-L21 Haplogroup Project and Nigel McCarthy's Group E. Both are in complete agreement apart from marker CDYb which has a value of 39 in Nigel's version and a value of 40 in the Z255 Project's version. I arbitrarily chose the value of 40 for my version of the tree.

The Z255 Modal Haplotype (red text) & Branch Modal Haplotype (mutations in green highlight)

I then defined the modal haplotype for each individual branch (which I called the Branch Modal Haplotype or BMH in the Results Spreadsheet) and highlighted the differences between it and the Z255 MH (in green highlight). This identified those mutations which were common to all or some of the sub-branches, and which therefore potentially occurred quite early on (i.e. relatively far upstream) on the Tree, after the Z255 mutation/marker. These included the following:

  • a value of 19 on marker 20 (dys448) for all sub-branches
  • a value of 13 on marker 9 (dys439) for all sub-branches except Branch D
  • a value of 16 & 8 on markers 13 & 14 (dys458 & dys459a) for all sub-branches except Branch F
  • and so on ...

I then visually inspected each sub-branch in turn, marker by marker, and identified which marker values differed from the Z255 MH and if these marker values were unique to that branch. Thus marker 4 (dys391) has a unique value of 10 in Branch F, a value that is unique to this group and therefore helps define this branch. Similarly, marker 2 (dys390) has a value of 23 for each member of Branch F and thus is also potentially branch-defining (it is also present in Branch A and thus could potentially occur further up the tree as a common mutation to both).

Branch-defining mutations on Branch F (bold outline)

I also identified mutations that were specific to individuals and were therefore not branch-defining. Examples include ...

  • the marker value of 14 for marker 1 (dys393) in the results of member G-68 (row 28)
  • a value of 15 for marker 3 (dys19) in member G-79 (row 15)
  • a value of 14 for marker 9 (dys439) in member G-79 (row 15)
  • and so on ... 

The end result is a draft "tree" for each sub-branch. This exercise is best done with a paper and pen initially because there will be a lot of crossing out and moving markers around. You can see in the diagram below that the marker values that are shared by all members of Branch F are written in the upper part of the tree, and the values that are unique to specific individuals result in a branching pattern in the lower part of the tree. Marker values that also occur in other branches and might therefore be better placed further up the tree are indicated with arrows pointing upwards.

Identifying Branch-specific & Individual mutations for Branch F

Once the mutations for each sub-branch had been defined, the next step was to try to hook the various sub-branches together. This was a game of chicken and egg, trying to figure out if some mutations could have occurred earlier in the tree than others. If placing them earlier in the tree resulted in a simpler version of the tree, then the particular mutation was moved up accordingly (this is analogous to the "maximum parsimony" approach used in the Fluxus software programme). Doing so often required additional upstream branches to be created in order to "fit them in".

And lastly, once the tree had been accurately defined on paper, it could be easily transferred into a digital format using Excel to draw the tree.

Maurice Gleeson
Dec 2016

The Mutation History Tree for Lineage II (L2 MHT)
(click to enlarge)

A more detailed account of the Grouping & Placement process can be found in this YouTube video.

Building the Mutation History Tree - Grouping

The process of generating the Mutation History Tree for Lineage II (L2 MHT) is not an easy one. It involves many steps and hours of work. I have made a video of the actual process to give you an idea of what is involved and to help other Project Administrators who might be interested in undertaking the same exercise. You can watch it via the embedded video at the end of this post, or directly on YouTube.

The Results Spreadsheet
The basis of the work is an Excel spreadsheet generated from the DNA Results page on the FTDNA or WFN ( webpages. Luckily for us we use both webpages in the Gleason/Gleeson DNA Project - each have their pros and cons. The Results Spreadsheet generated from these results pages is below (for the first 37 markers only. The spreadsheet for the full 111 marker dataset can be downloaded from Dropbox here).

Lineage II DNA Results in the Results Spreadsheet

So what's the difference between this and the results as seen on FTDNA or WFN?

Well, FTDNA & WFN appear to group the Y-DNA results by ascending marker value, column by column. So for example, on the FTDNA results page, the 2nd column is arranged by marker value in ascending order - the 23's first, the 24's after. The ascending order is again seen in the third column - 14's first, 15's after. This biases the listing of members in favour of the values of the markers that occur at the start of the row. In other words, the order of values in any given column is dependant on the order of values in the preceding column. This does not give the best representation of who is most closely related to whom.

Lineage II DNA Results on the FTDNA website
Lineage II DNA Results on the WFN website

The Grouping Process
In my Results Spreadsheet, the project members are organised into specific sub-branches, with a number of ungrouped members at the end of the spreadsheet. The 6 distinct branches (A to F) identified thus far are relatively clearly defined. The grouping process relies on several distinct pieces of information, namely ...
1) Known Relationship
2) Downstream SNP markers
3) Genetic Distance (GD) & GD Demarcation
4) Y-STR Signatures (relatively unique marker values)

1. Known Relationship
Several members are known to be related (highlighted in yellow) and can therefore be grouped together. For example, members G-75 (371160, MG) and G-91 (438302, DG) are known to be second cousins.

Known Relationships (yellow highlight) & Downstream SNPs (red & orange highlight)

2. Downstream SNP markers
The first four branches include members who have tested positive for specific SNP markers (namely A660Y16880BY5706, & BY5707). These SNP markers help place these specific members on specific branches. They also help anchor the upper reaches (the earlier, more distant parts) of the tree. Of note, on Alex Williamson's Big Tree, he places Branch D members G-05 (86192, RLG) & G-68 (411177, CG) on the same branch and so I have done the same in the Results Spreadsheet.

Gleeson Lineage II on Alex Williamson's Big Tree

3. Genetic Distance & GD Demarcation
Some of the members who have not tested for downstream SNPs can still be confidently grouped together. For example, in Branch E, member G-84 (412320, THG) is a Genetic Distance of 1/37 from G-75 & G-91 (known second cousins), but in addition, there is a clear demarcation in the Genetic Distance these three people have to each other and to the rest of the group. In the last line in the text box in the diagram below, you can see that the Genetic Distance jumps from a value of 0 or 1 for all three of them to a value of 3 or 4, demarcating them from the rest of the members in the DNA Project. This information, in combination with relatively unique marker values that they all share, supports their being grouped together.

GD & GD Demarcation

4. Y-STR Signature (relatively unique marker values)
If you look carefully at the spreadsheet, you can see certain patterns associated with specific subgroups. Thus, for example, Branch B is the only sub-branch that has a value of 15 for marker 30 (aka DYS 456). And Branch E has distinctive values of 17, 14 & 17 for markers 23, 31 & 32 (aka DYS 464b, 607, & 576). A similar distinctive pattern of marker values appears in Group F with values of 10, 17, 9, 9, & 17 for markers 4, 13, 14, 15, & 32. These distinctive Y-STR Signatures (together with Genetic Distance data) help to group these people together into their relevant sub-branches. This process is explained in some detail in the video below.

Unique marker values (outlined in bold) define a unique Y-STR Signature for Branch F

Ungrouped people
There are a number of people who have not been confidently allocated to sub-branches as yet. Most of these have only tested out to 37 markers. For these people, using Fluxus software (which I used to generate the first version of the Mutation History Tree) or Dave Vance's SAPP programme or Robert Casey's methodology can help give some indication of where they might sit, and we will look at that in a subsequent post. But ultimately, additional SNP marker testing will provide definitive answers for these particular individuals. And because we already have a lot of information from the people who have previously undertaken the Big Y test, a cost-efficient SNP-testing strategy can be devised for the rest of the group and future members.

Once the members have been grouped into sub-branches, the next step is to see how these various sub-branches are placed on the larger Tree of Mankind. More on that in the next post.

Maurice Gleeson
Dec 2016

The Mutation History Tree for Lineage II (L2 MHT)
(click to enlarge)