Raw notes – Sale at the end.
Katherine welcomed everyone to the ISOGG meeting. She gave some background on ISOGG and how it was founded. She showed photos of ISOGG members at booths or on outings in places such as Burbank, London, Dublin, and Glasgow. Katherine talked about the American Society of Human Genetics (ASHG) meetings.
Last night the new ISOGG website debuted. There is also a mobile-friendly version.
Katherine introduced Alice Fairhurst, who started the Y-SNP tree in 2005 with Richard Kenyon. It was started because people who were interested in SNPs noticed that academia published that people in different subclades were getting the same SNPs. The YCC was not doing an updated tree every year. Over the years, lots of changes have occurred. It started with Sanger, then microarrays and Next Generation, and they have had to do a lot of changing of the requirements. The ISOGG tree is public and they do state what the requirements are. They tend to be on the conservative side because the academics are more likely to accept what they have. If your personal SNP is not on the tree, there is a reason. This is a tree of mankind, not a personal tree.
Family Tree DNA tree and ISOGG tree are not the same because Family Tree DNA will not release their private data to ISOGG. That would not be ethical. What ISOGG gets is information that is given to us. There are many sources.
Alice reminded everyone that they need to know their SNP names.
Alice has retired from doing all of the work for updating the tree, although she will still remain to work on it. Ray Banks is now the lead. People must prove why something should be added to the tree. They are looking at SNP quality and ensuring that things are not on the tree that shouldn’t be there.
Katherine presented Alice with a plaque on behalf of ISOGG.
Max and Bennett presented Alice with a plaque on behalf of Family Tree DNA.
A panel including Katherine Borges, Steven Perkins, Dr. Tim Janzen, Jennifer Zinck, and Debbie Parker Wayne spoke about American Society of Human Genetics and Genetic Genealogy Standards Committee. Since I was on the panel, I was not taking notes. Please feel free to ask questions about this session if you have any.
Next, Brad Larkin introduced Dr. Maurice Gleeson, who presented “Combining SNPs, STRs, & Genealogy to build a Surname Origins Tree.” Brad presented Maruce with the Genetic Genealogist of the Year award from the Surname DNA Journal. Maurice has the ability to speak with beginners and also with people on the cutting edge of the field.
Maurice shared his vision of a combined mutation and family history tree. He wanted to know if it was possible to build the tree with mutations in the absence of family history records. This will help everyone see where they sit in the project in relation to others and can be applied to any surname project.
Maurice presented the Gleason/Gleeson DNS Project results, which is hosted on World Families. Although he uses both WFM and FTDNA formats, Maurice particularly likes the WFM format. In the example, there are lots of parallel mutations and back mutations remain hidden. Maurice looked at the Fluxus cladogram. If you can, it’s useful to compare this against a hand-drawn tree. Moving from 37 to 111 markers provides a lot more markers and many more branches. This will cause repositioning. A back mutation also became evident. The problem is that the 111 marker cladogram still has no weighting.
James Irvine developed an algorithm for weighting markers and Ralph Taylor ran it on the Fluxus. For more information, see www.isogg.org/wiki/Cladogram.
Maurice noted that some markers behave unusually. These include Marker 389, multi-copy markers 464abcd, and others. Once they removed those markers, this caused major changes. Maurice pondered which was more accurate. How likely is it that 464 & CDY will screw things up? It will be less of a problem in those branches related within the last 200-300 years and more of a problem with those branches more distantly related, 600-1000 years. Be more aware of these mutations in the more distant branches of the tree.
Maurice has opted to go for the 111 markers weighted version of the tree. Some caveats and limitations include missing data, because some people only tested to 67. There are no adequate mutation rates for many markers. The tree is not yet anchored. Perhaps this will reduce as more people test and upgrade. The tree may also be skewed by more recent mutations. Everyone should test two known distant cousins from each family branch. There is also a risk of convergence in the tree, e.g. 3/111.
Maurice also talked about building a tree with SNPs and doing it as cheaply and efficiently as possibly. There are many opportunities and challenges when working with SNPs. There are false positives, false negatives, constant change, and SNPs with no name. Sometimes there is no coverage or poor coverage or the detection filter or threshold is too strict or is set too high. This creates questions like is the SNP really present or is it really absent?
Many times SNPs that are considered private are moved to shared as more people test. Alex Williamson’s tree has a feature where if you click on a name, it brings up more information for analysis. In an analysis by Alex Williamson and YFull, they both identified some of the same SNPs but each identified several SNPs that the other did not.
Maurice made some predictions. Despite NGS, Sanger Sequencing will still be required. Chip-based SNP testing will still be needed to confirm or refute discoveries made by NGS. Multiple Deep Clade Panels will need to be created for subclades, surnames, and genetic clusters. The audience applauded.
We are approaching the combined family and mutation tree but we are not quite there yet. We have opportunities for the future.
James Irvine presented “Surname Projects – Some Fresh Ideas.” James touched on his ideas about the roles of a group administrator and the background of their project. Their project includes the largest genetic family in any surname project. They also now have an associated but separate autosomal DNA project.
James has been working on traditional genealogy since about 1950. There is a book The Original of the Family of the Irvines or Erinvines that gives rise to the traditional tree. It claims that all Irwins have a single origin in Scotland.
James said he is not a fan of genetic distance because it lacks weighting. He prefers TiP calculations. He uses “TiP Score”, which is a simple, arbitrary tool for project management. If you use the matches pages, keep in mind that they are useful for newbies but are an arbitrary compromise. For comparing similar surnames, they are too stringent. 7% of the Irwins in his project show as “False Negatives.”
For grouping, you can group by haplogroup, which is superficial, or by SNP. You can do it by GD matrix, GD from mode, or Tip Score from modal participant.
Popular tables and graphs can predict the number of generations or years back to the common ancestor of two participants but all TMRCAs are probabilities. TMRCAs based on genetic distance assume some single average mutation rate, ignore back mutations, and can be very misleading.
James talked about NPEs, alternate names for NPEs, and possible causes. Some of these are more insidious, like infidelity, but some are as simple as remarriage, informal name changes, or errors in genealogy. He gave some tips for recognizing and handling NPEs. James has handled 50 of these and has not had a single complaint.
STR tests are analog and SNP tests are probabilistic.
I’m sorry I missed the second half of this talk about Big Y. I got involved in a conversation about my rare mitochondrial DNA and didn’t want to pass up the opportunity to chat with Miguel Vilar of National Geographic.
Matt Dexter presented “Surveying Autosomal DNA Results for Ancestry.” Matt has 24 family finder tests in his immediate family. Matt reviewed the basics of autosomal DNA. He showed the chromosome browser and explained that a child would match the entire lengths of all of the chromosomes because the child gets one side from the mother and one side from the father. He showed some examples of inheritance through generations.
Matt explained fully identical regions and half identical regions. Full siblings have fully identical regions, meaning they match on both sides, in some places. Half siblings will not have fully identical regions because they only share one side of their chromosomes.
Roberta Estes presented “Crumley Y DNA to Autosomal Case Study – Kicking it Up a Notch.” Roberta writes the blog DNA-Explained and participates in the 52 Ancestors series of Geneabloggers. The Crumley family members wanted to know if George and James shared a common ancestor and this was the reason to create the Crumley Y DNA project in 2004. They did not share a common ancestor or even a common haplogroup. After that, the project became stagnant except for the occasional person who wanted to know if they were from the George or James family.
Roberta asked the members to take autosomal tests. Many of the project members knew each other and other people outside the project. They sent out feelers to find more participants.
Roberta included one of my very favorite lines in her talk. “You don’t know what you don’t know.” It’s really important that we all remember that!
They asked all existing members to test and then to invite cousins. They checked project members for crumley matches by surname, ancestral surname field, ICW, ICW plus surname, and matrix. She found ancestry matches and invited them to transfer and join the project. She also looked on Rootsweb lists and board as well as Google.
The benefits of joining are intra-project matching, female inclusion, non-Crumley surname inclusion, collaboration, new opportunities, and unknown discoveries.
Once the females learned that they would be included, they got excited and went out and helped recruit for the project. Roberta had a lot of questions about the different types of DNA. It brought up a lot of questions she didn’t expect. They also got a great new co-administrator from this.
In advanced matching, you can choose people who match you only in a specific project and select autosomal. By using this method, she was able to find her autosomal matches who were also members of the Crumley project.
Several people brought family groups to the project. Several people tested more family members. There are a total of 50 people across two of James Crumley’s sons. This became the perfect research project.
One of the questions would the descendants of John and William match each other despite being seven generations apart at the closest point? How about 8th cousins? How much DNA would they share? Would it triangulate? How much of James Crumley’s genome can we reconstruct?
James Crumley was born in 1712 in a location unknown. He was in Chester CO., PA in 1732. He married Catherine and had five children, four of whom were males. One son died and one has not been located. Known descendants were placed onto a tree, although some were unconnected.
Roberta created a relationship chart. It is a grid of all people and how closely they are connected. On the relationship chart she bolded the people who were FTDNA matches. There are currently four generations testing. The most distant is 7C3R=8C1R. The closest are 11 descendants who are 6th cousins. DNA survived division combined total of 16 times in order to match.
Roberta created a James Crumley spreadsheet. There were over 1250 individual comparisons done manually. It took her about three weeks. She then removed the duplicates. There were 8300 rows, including to 3cM and 300 SNPs.
She made a second spreadsheet that was a subset of the first spreadsheet. It narrows the focus to known lines and excludes others. She removed everyone except John, William, and BH. The resulting DNA is James’ and Catherine’s. No wife’s DNA is causing interference. There were 3000 rows in the James only spreadsheet.
For matching and triangulating, some of the tools are used the same as you would for a personal spreadsheet. Begin with individual matches. There is a difference between individual matching, group matching and triangulation, especially for group projects. She used the ICW tool with Surname. To do this, combine the two tools so that you’re using the ICW tool and well as the ancestral surnames option. Then she ran the normal matrix but also ran the project admin matrix for all members of the project. Next she used the chromosome browser to examine segment matches. Roberta showed a match group where they all match on the same chromosome. If you are in the project you can then verify that each matches the other through their own kit, which you can view as the administrator.
Roberta introduced the Confidence Spectrum. Sometimes you can’t triangulate everyone. There may be a mishmash of match types. Roberta talked about her article “Autosomal DNA Matching Confidence Spectrum,” which provides more information on this topic. http://dna-explained.com/2015/09/25/autosomal-dna-matching-confidence-spectrum/
The spreadsheet for recreating James contained many of the traditional columns to start. She added some columns for triangulations. She did not triangulate every one of the 3,000 rows but she did triangulate some of them to check things. There was also a column for FTDNA only with no Gedmatch. This makes it apparent that the match is subject to the 20cM floor thresholds of FTDNA.
The match groups of BH suggested a common ancestor. They suggested, but did not prove, the common ancestor. Some of the match groups are very large. Triangulation is the highest confidence of proof. Ancestor reconstruction includes a lot of stepped matches, also called heel to toe or staggered matches. Sometimes the first and third on the list won’t match, the they are joined by the second individual, who matches both.
During this process, there were several discovered. There was an incorrect genealogy. There was also a third Crumley line. They reconnected an NPE using the Y and autosomal. They also found an unknown connection in a surprise location.
The men connected on the Y but had no idea when. The Y + FF approach was using utilizing Advanced Matches feature. Doing this depends on most of the project members having done both Y and autosomal tests.
BH is an autosomal Crumley match from circa 1850. The line daughtered out and Roberta didn’t pay attention. Their Crumley ancestor on the 1850 Harrison County, OH census was born in Ireland. He matches the James Crumley line. This means the common ancestor is likely in Ireland. This may help to determine where the 1850 James Crumley was actually from.
Max introduced Connie Bormans, PhD and Arjan Bormans, PhD. Connie is the lab manager. In the past year they experienced some growing pains. The lab is now over 30 people. Originally STRs were run using fragment analysis. PCR amplification followed by analysis based on size. It was run on 3730XL using capillary electrophoresis. STR panels included a commercial kit that was discontinued in 2014. Order volume was surpassing lab capacity. They made a decision to move the STRs to NGS to increase throughput and get better coverage. NGS is still PCR but based on sequence not size. It required RnD for all markers, with new primers, new analysis, and a new scoring method.
Some things that they discovered are that NGS is more sensitive to stutter. Microallele changes in size (.1, .2, .3) sometimes were not in the repeat. The STR repeat motifs are imperfect, which creates a challenge with the caller when looking for a specific sequence.
Arjans introduced the Agena MassARRAY. There are now three primers. There are the forward and reverse primer and a third primer sits very close by the SNP, right in front of it. They’re doing a single base extension so that it gets a C, G, A, or T. Basically you feed in a small single base extension product, it shoots it with a laser and small fragments come off first and larger fragments second. Each has a weight. Each single base extension has a different mass and this machine can determine the difference. The data looks like a bunch of peaks, starting at 4,500 and going all the way up to 9,000. The limitation is that you have to separate the primers by molecular weight and not cross over into another section.
So far they have created about 40 panels, which takes a large amount of vetting. All of the primers have to be able to work together to create single base reactions. They pull out the weaker ones. Once they have the whole mapping down they can’t remove one and add another. They are now trying to develop a system to help them with redesigning all of the panels.
The 40 panels are primarily for the largest haplogroups. They are created by information provided by administrators. The more information they have, the more comfortable they are moving forward. Other haplogroups covered are G, J1, J2, N, and I. Additional panels are slated for release in December. 80-100 panels to cover the existing Y-Tree.
Max noted that FTDNA is the only company that has their own lab. He noted the complexity that goes behind the panels and analyzing the results.
Max introduced Mike Alexander for an Engineering Update. Mike is head of the IT department right now. Mike has been in IT for 30 years and is from Houston. He spent 10 years at NASA working on the Space Shuttle program working with high performance computing and the operating system for the space station. After that he worked in telecommunications for 8 years at WorldCom. He managed all of the circuits with large quantities of data. For the last 10 years he has been in the financial markets. He helped to build trading systems, utilizing high performance computing to make sure things are fast. He is hoping to spend the next 10-15 years working with Max and Bennett.
Mike follows the three P’s: People, Processes, and Products. When Mike started at FTDNA, he felt that there was a problem with the balance of staff. They doubled the number of people on staff to do testing. Everyone was doing a good job but there was heavy manual testing and it was hard to keep up. Mitch did a tremendous amount of System Administration. They’ve also doubled up that staff. They created a Database Administration team and also a Technical Writer. They intend to do a better job of communicating changes to the project administrators.
Some improvements are computer programs that test Family Tree DNA, computer programs that tell them when Family Tree DNA is too slow, and they will replace computers before they break. They have replaced pictures on the wall with monitors to keep up with things.
They have rebuilt Family Finder, using high performance computing techniques. They use these techniques throughout FTDNA. These techniques make FTDNA more responsive and this helps them deliver new features quicker. They’re updating all of their computers and network hardware.
Mike gave his email address: firstname.lastname@example.org. He would like us to contact him if he sees something bothersome that he can put in the system.
Max introduced the final department head to speak, the head of Customer Support, Tom Richard. Tom started October 20, 2014 and the sale started the next week. During that time, he noted many flaws. The ratios of emails to phone calls is 2-1 – two emails for every phone call. He implemented a Philosophy change. People are being promoted a lot within the department. Richard started working as the Team leader and Meagan started as a Trainer and created a two week training program. Now new employees are prepared to get on the phone. They changed their hours so that they have weekly training on Fridays. They bring in experts on Friday afternoons for training to get better at customer support. Each Line Managers (FF, MY, Y) have presented on how samples are handled. The most frequent trainer is Bennett for Big Y, SNPs, and Customer Skills.
They have developed their hiring skills and only hire people with college degrees. They are holding on to more people longer and keeping employees engaged and feeling valued. They are working on organizational structure. They are building a tiered structure with specialists for differing customers, such as group projects, advanced, and beginner.
One of the biggest changes was that they started Saturday email hours in August. Most emails are getting done in about 7 hours now. There are still many complex emails that take longer because they need to find the correct person to be able to answer it.
In the future they hope to improve email response time, have much more training, and answer the phones faster. They are currently answering them faster than when Tom got there. When he started, they answered 3 out of 4 calls before the caller hung up. Now they are at 95%.
How does this new testing protocol impact Geno 2.0 results? It doesn’t at all.
8 people in my project have ordered BigY. Is there an easy way to find out who they are? Bennett and Max do not understand the question. If the question was how do we know if they have placed orders and been batched, check pending shipments to lab section for status.
How can we be sure that new Y-STR results are comparable to old ones since the methodology is different? It is different but the scoring is a little different. When you move to any new platform there’s a chance it will vary slightly but before they moved, they ran hundreds of samples and compared all of the results. When they found a few discrepancies, they went back to the new caller and looked and used the old results to fine tune the caller.
It would be most helpful to be able to quickly ID the other matching segment son the chromosome browser. Bennett says that’s right and it will happen.
How are you counting imperfect repeats? Ignoring the SNP.
How are you reporting the partials? They report micro alleles but do not use them for matching.
Max noted that the company is now at 100 people and offers more opportunities. With the hiring of a real manager for this department, there is more training also. That leads to the kinds of changes that Tom was mentioning.
Could you reduce the number of clicks required per member – too many menus, toggles, etc? Mike said that Michael Davila is an expert in that area and will help define what the product should be. Michael has made some progress on this and has been redesigning some features for the website. They have been bringing in end users to take a look at it. Tom has put together a program where people are leaving comments. Many comments are about user interface so Mike went to Michael and had some sessions to see what people are doing online and better figure out how end users are using the system. There are improvements to be made.
I have a micro allele that does not show up on FTDNA that I had at Ancestry. How do I get it to show up? Connie said that Ancestry could have been using different primers and it could be using a primer that doesn’t let it show up in the repeat. Connie will follow up with kit number.
Will there be more than 111 STRs offered? Bennett says that he doesn’t think so. People seem to think Y-67 or Y-111 will get you close enough and if you want to go further, then use Big Y or the SNP harvested for the Big Y that is put into a cost effective panel of SNPs. Rather than doubling the number of STRs, it’s less expensive to run Big Ys, determine cladistic SNPs, then curate those panels into something that can be used to make the genealogical determination.
Will the new complexity discovered by NGS be added to the dataset? If they find something that happens all the time, they will incorporate it. They need to do a lot more research into the complexities. If they can find meaning that will help you, then yes.
What can be done other than buy more kits to promote new customers for FTDNA? Max said that they have several initiatives that he is not able to talk about now. They are different than what they have done in the past for 2016. This should bring in a lot of new customers. In support of those initiatives, they think it’s important to convey to people the enthusiasm about what the DNA test can tell you. It is important to convey success stories.
Bennett noted that there is a history book written in your DNA. It is your family story or your lineage story. When he goes to dinner or a cocktail party with new people, he tries to stop talking about what he does because then no one else has a chance to talk. People tend to be interested in this. The only way we will get to a genetic census is for all of us to show our enthusiasm. We are all having a hand in rewriting the history of the western world.
Bennett said this conference came about three months to early because their IT guy is newer, Michael Davila has been there only for about four sprints, or two week periods. They have a couple of new things coming in the next few sprints and they will show up on the website. Janine will likely email group admins before that happens.
Michael Davila took a question about Y Search and Mitosearch. He said it’s really old code and they will work on it but he doesn’t have a timeline. Please send in questions through the normal question pipeline.
Are you going to have a discount code for the conference? Are you going to have a holiday sale? YES! They plan on having one. It will be two part. All prices on the website will be marked down. In addition, every Monday they will email everyone in the database a Christmas discount offer. It will be some number of dollars you will be able to take off any product in the store. It will expire in 7 days. If your coupon is used, they’ll send you another coupon to be used. You will get a different one. It’s kind of like potluck.
At 5:00, now, the sale is turned on and has started! It will run through December 31st.
To purchase your kit, go to Family Tree DNA’s website now! If you need a spare coupon code, please feel free to ask me. I will have a few dozen to share each week.
Another great conference on the books! It’s been great to see everyone. See you soon, I hope! Safe travels home to all attendees.