By Prathi Chowdri
Forensic genetic genealogy (FGG) is a valuable tool for law enforcement — one that requires multiple disciplines (genetics, genealogy and traditional investigations) to leverage. Also known as investigative genetic genealogy (IGG), FGG involves searching genetic genealogy databases for clues to the source of an unknown DNA sample, with the goal of identifying or eliminating potential criminal suspects or identifying human remains.
While the techniques involving FGG are relatively new, they have been responsible for some high-profile wins for law enforcement. But care must be taken to use these methods ethically and responsibly to help ensure that courts (including the court of public opinion) continue to accept the FGG process and evidence as fair and reliable.
In this article, we will discuss the origin and development of forensic genetic genealogy searches and guidelines for ethical and responsible investigations in the absence of controlling law. We’ll also provide an example of how the technique might be used by law enforcement.
| POPULAR: 5 common myths about genetic genealogy investigations
Origins of forensic genetic genealogy
DNA has been used as evidence in criminal cases since the late 1980s. One of the first cases resulted in the conviction of Tommie Lee Andrews for sexual assault based on a genetic test. Recognizing the potential for DNA to solve crimes, the FBI created the Combined DNA Index System (CODIS) in 1998. This database collects DNA profiles and facilitates their comparison between different law enforcement agencies. One of the main purposes of CODIS is to solve cases in which there is no suspect. Law enforcement can search CODIS for a match to a DNA profile developed from DNA obtained at a crime scene.
Since CODIS was established, millions of profiles from across the U.S. have been added to the database, which hit 20 million records (of both offenders and arrestees) in April 2021. Nevertheless, if a suspect’s DNA is not in CODIS, then there will be no match to the evidence. Moreover, CODIS profiles are remarkably simple by today’s standards, using 13 to 20 “short tandem repeat” (STR) genetic markers to match samples to potential suspects. The very rudimentary nature of CODIS profiles means that if law enforcement investigators do not receive a positive match, the investigation could go cold. But what if law enforcement could match the profile to the suspect’s relatives as a starting point?
The rise of direct-to-consumer genomics
The concept of matching evidentiary DNA to suspects reportedly occurred to investigators in the Andrews case after they saw a magazine ad for a private DNA laboratory that offered paternity tests. While this type of genetic testing has been available since the 1980s, direct-to-consumer genomics didn’t really hit its stride until 2007, with the launch of 23andMe. Other companies followed, including Ancestry and FamilyTreeDNA.
In sharp contrast to the STR markers stored in CODIS, these modern consumer profiles typically use between 500,000 and a million single nucleotide polymorphisms (SNPs) from across the whole genome. An SNP profile allows private companies to efficiently tell whether two people are related and much more; they can also identify national/ethnic origins, point to potential disease-causing genes and predict genetic traits like hair and eye color and even “cilantro taste aversion.”
The DNA databases owned by companies like 23andMe and AncestryDNA are private and limit law enforcement access. However, since individuals “own” their personal DNA profiles, they can choose to upload them to an open-source database such as GEDmatch. Doing this allows people to find distant (or even not-so-distant) relatives who may have used a different DNA testing company but also uploaded their profiles.
As private DNA profiles have become more popular, individuals have added more and more records to these open-source databases. While police require a warrant to get access to private DNA databases, they can generally search open-source databases without one.
Use of open-source data by law enforcement
One early FGG success involved the identification of the “Buckskin Girl” in Miami County, Ohio. In April 1981, the body of a young woman wearing a fringed suede jacket was found in a roadside ditch near Troy, Ohio. She had been beaten and strangled to death, but nobody knew who she was. Over the years, efforts to identify her using her DNA had been unsuccessful. Police failed to find a match for her genetic profile in CODIS or the National Missing and Unidentified Persons System (NamUs). Fast forward to 2018, when a search in an open-source DNA database enabled police to identify the woman as Marcia L. King. Though the victim now has a name, the identity of her murderer is still unknown.
Not long after the Buckskin Girl case, the FBI and a genealogy expert turned to forensic genetic genealogy for clues to the identity of the Golden State Killer (GSK), an unknown subject who police believed to have committed 13 murders and dozens of rapes and burglaries between 1974 and 1986. The genetic profile developed from a decades-old “rape kit” was found to match several third cousins of the still-unknown perpetrator. Genealogists then developed family trees which were narrowed down further and further to find closer relatives of the suspect. Reports from surviving victims suggested that the killer had blue eyes and was 5’9” and 165 pounds when he committed his crimes. Further investigative work led investigators to 74-year-old Joseph DeAngelo, who was arrested in 2018 and pled guilty to 26 counts of murder and kidnapping.
Prior to the use of FGG, more than a dozen different law enforcement agencies spent 43 years and $10 million trying to identify a suspect. DeAngelo is now serving multiple life sentences in a California state prison.
Another high-profile case involving FGG was that of the “NorCal Rapist,” an unknown man who had violently assaulted dozens of young women between 1991 and 2006. During one of the sexual assaults, a victim had managed to stab the rapist with a pair of scissors. Investigators extracted a DNA sample of the suspect from the scissors, allowing them to create a profile and run it through open-source DNA databases. The searches turned up several potential relatives, which led investigators to one possible suspect. In September 2018, police arrested 58-year-old Roy Charles Waller after DNA samples obtained from discarded items showed he was a match to DNA at the multiple crime scenes. He was tried in 2020 and convicted of 21 counts of rape, among other offenses. He is currently in prison serving a sentence of 897 years.
Forensic genetic genealogy — the nuts and bolts
Forensic genetic genealogy relies on the properties of inherited genetics. In simple terms, we get about half of our DNA from one parent and half from the other. This also means we have 25% of each of our grandparents’ DNA and half as much from each of our great-grandparents. Brothers and sisters from the same parents receive 50% from each parent — but with the exception of identical twins, the specific sections of inherited genetic code could be very different.
GEDmatch says its database currently contains about 1.5 million genetic profiles, and other open genetic databases have fewer than that. With 330 million or so people in the United States, the chance of finding a perfect match between a genetic sample and a publicly accessible profile is quite small. However, even with relatively distant matches it is possible to determine how “far away” (in kinship terms) a relative might be. A 50% match would indicate a parent or possibly a sibling. A match of 12.5% would indicate a first cousin, whereas a third-cousin match would be less than 1%.
Concerns about forensic genetic genealogy
There is a public interest in public safety and solving crimes, and the utility of FGG searches is clear from the investigations it has solved. But the technique also raises legitimate concerns. We have broken down these concerns into three categories: efficiency, accuracy and privacy.
1. Efficiency: Forensic genealogy is not a quick process; it takes time and money to do it right. The cost of SNP profiling can vary depending on the lab and may include expenses related to sample processing and data analysis. Even with today’s faster sequencing services, creating a genetic profile takes time. So does the traditional genealogy work.
If potential relatives come from backgrounds where genealogical records are easier to locate (like the United States and much of Europe), then building a family tree can be straightforward. But the profiles of people from countries or cultures where record-keeping is sparse or nonexistent can make it difficult to create kinship maps. Once a family tree is created, it takes time and money to investigate and home in on potential suspects. For these reasons, agencies should carefully consider whether the case is appropriate for and worth the FGG process.
2. Accuracy: When an investigation involves DNA, there is often an assumption that any conclusions must be accurate. DNA testing does contain a level of precision when conducted properly. But the associated investigation requires diligence before and after DNA testing to ensure a reliable outcome. This diligence requires careful consideration of the quality, quantity and value of the DNA. Investigators have a lot to think about, starting with the sample itself:
- Is the sample itself reliable? Is it pure and unadulterated?
- If the material may be mixed with another DNA sample, could the other source(s) such as any victim(s) reliably be ruled out?
- Has time degraded the genetic material in the sample?
- Is there enough DNA to develop a profile and corroborate the result?
- Given that a crime scene could have many sources of DNA, will this sample have probative value to identification?
Here are some of the external factors that need to be considered:
- Is the testing lab reputable and proficient?
- What is the skill level and reputation of the forensic and traditional genealogist(s) contracted for the case?
Investigators need to factor in these and other concerns as they work through the process. It is vital to use trusted and traditional investigative techniques to eliminate suspects as well as to identify them.
3. Privacy: When it comes to privacy, it is important to consider three factors in the context of FGG searches. First, DNA profiles developed in this process reveal personal information (health and other genetic features) that is sensitive and can be irrelevant to a law enforcement investigation.
Second, the consumers who have uploaded their profiles to a genealogy database are generally trying to locate relatives or learn about their health or ancestry; they are not participating in this process to assist government investigations nor are they connected to the criminal justice system in any way. (This is an important distinction from individuals whose profiles are uploaded into CODIS, for example.)
Third, the government — as opposed to private individuals — has different responsibilities when it comes to searches. Assuming the FGG search identifies a close relative of the perpetrator or a suspect themself, the agency must carefully consider how to collect a DNA sample from third parties and suspects for confirmatory testing within the parameters of the Fourth Amendment. The difficulty for agencies is that these privacy implications are largely unaddressed by controlling law and there is no perfectly analogous case law to provide guidance either.
In the absence of federal law pertaining to FGG investigations, the Department of Justice has issued guidelines (not controlling) for federal law enforcement agencies. Currently, only Maryland, Montana and Utah have enacted state laws on FGG searches. Maryland’s law requires judicial authorization and that consumers expressly consent to law enforcement accessing their information. Montana’s law requires a warrant and that consumers waive their right to privacy in their information. Utah’s law restricts law enforcement access to genetic genealogy companies that explicitly notify consumers and allow them to opt in or out of having their data accessible to law enforcement. Utah and Maryland restrict FGG searches to serious crimes and the identification of human remains.
This leaves the majority of agencies in jurisdictions without controlling law. These agencies can nevertheless learn from the common themes in the DOJ guidance and state laws and create responsible guidelines that address efficiency, accuracy and privacy concerns.
Responsible guidelines
Given the time and money involved, agencies should be selective of which cases they submit for FGG searches. This could be accomplished by limiting the technique to specific investigations (e.g., violent crimes and the identification of unknown human remains) as well as cases where all other investigative leads have not led to an identification (including CODIS).
To address concerns about accuracy, it is critical to diligently investigate these cases before and after DNA testing. This includes choosing a reputable genealogist to develop family trees, performing confirmatory testing, and using traditional investigative techniques to determine whether the other evidence in the case corroborates any positive identification from FGG.
Out of respect for the privacy implications involved, the genetic genealogy company’s terms of service, at a minimum, must be followed. For instance, some companies prohibit law enforcement use without a warrant whereas others permit law enforcement use for specific qualifying cases. In the spirit of transparency, agencies should use companies that permit users to opt in or opt out of allowing law enforcement to access their names (their DNA and genetic data are not accessible to law enforcement or other users). The names of potential relatives — who, after all, are innocent parties — should be kept out of official reports when reasonable.
In all cases, but particularly in jurisdictions where there is no law on point, it is important that investigators work closely with the prosecutor to determine how to collect a DNA sample from a third party or from a suspect to confirm an identification. If a suspect is located out of state, this could implicate laws in multiple jurisdictions. Likewise, investigators should consult with prosecutors regarding applicable laws and procedures to manage situations where there is a reasonable belief that the integrity of the investigation would be compromised by seeking consent from the third party prior to collection. Finally, agencies should follow records retention laws and work with a prosecutor to determine which records need to be retained and which need to be discarded or destroyed at the conclusion of the criminal case.
Although it may be legally possible to circumvent some of these guidelines, law enforcement agencies should resist the temptation. Not only is it unethical, but it could also impact public trust and support of FGG searches.
Forensic genetic genealogy — the investigation
Let’s assume a case meets the criteria to begin the FGG process, whether under state law or within the genetic genealogy company’s terms of service. Here is an example (albeit highly simplified) of an FGG investigation applying the guidelines above.
It all starts with the DNA sample in evidence:
1. Develop a profile: Before creating a genetic profile, the investigator needs a DNA sample. A SNP profile cannot be created from an STR profile, so the investigator needs to go back to the original evidence and obtain DNA to develop the SNP profile. This requires assessing the DNA sample for quality, quantity and value. If the sample is not corrupted, is of sufficient quantity to create a genetic profile and for subsequent required testing and is likely to provide valuable information about the person to be identified, it can be submitted to a lab for profiling.
2. Search open-source genetic databases: Once the profile has been developed, either police or contractors upload it to one or more open-source DNA databases (such as GEDmatch). Searches are conducted to identify possible relatives — no matter how distant. If any are found, site algorithms help determine how close or far away those relationships are.
3. Create family trees for any potential relatives: Next, a traditional genealogist creates family trees for each person whose genetic profile has a family relationship with the source of the original sample.
4. Identify possible candidates: Working with the genealogist, investigators will narrow down the list of subjects who fit the genetic criteria based on degree of kinship (that is, percentage of profile match).
5. Consider other evidence: At this point, investigators ask, “What else do we know about the unknown suspect?” They may have other evidence that can rule individuals in or out. For example, eyewitness descriptions (or other sources, like surveillance video) may suggest a subject’s approximate height, weight, hair and eye color. Logic dictates that the person had to have been in the area where the crime occurred within a specific time frame. Considering the totality of the evidence can often eliminate some possible subjects while highlighting others.
6. Further investigation: Investigators will use conventional investigative techniques to research potential suspects and determine whether they can be ruled out. This is an important step — particularly if the investigation identifies multiple individuals as potential suspects. This may include talking to family members, obtaining work documents, searching for arrest records and other traditional investigation methods. Following and documenting this part of the process shows courts and critics the many possibilities that were considered and ruled out, downplaying the possibility of prejudging and confirmation bias.
7. Confirm through further testing: If a viable suspect emerges from the research, law enforcement should work with a prosecutor to determine how to obtain a DNA sample to corroborate the identification.
If the second test confirms the match, and the rest of the investigation is in order, police will usually make an arrest.
Key takeaways for law enforcement
Here is some final advice to help law enforcement investigators get the most of their FGG investigations:
- Work with your prosecutor and/or agency counsel: The sooner you involve them in the investigation, the better. They can help guide you through the process to ensure any results you find will be valid and admissible.
- Follow state laws: You don’t want the results of your investigation to be tossed out because you broke the rules.
- Follow database policies: Don’t be tempted to try “workarounds” to search databases that prohibit the exact activities you’re trying to do.
- Work with reputable contractors: When your case goes to trial, the people doing the genetic testing, the searching and the traditional genealogy work will be called into question by the defense. Be sure they are highly qualified and well-regarded in their fields.
- Do your due diligence: Be sure to conduct confirmatory testing to ensure that your eventual suspect is a match for your evidentiary sample. Also, never make an arrest solely based on the results of a genetic test. Confirm the results by using traditional investigative techniques to rule out or confirm the viability of suspects.
Forensic genetic genealogy is a powerful tool for criminal and missing persons investigations, enabling the identification of individuals based on DNA samples. Members of law enforcement need to be careful to use this tool effectively and judiciously to ensure it remains available as a useful investigatory method.
Note: If your agency is a Lexipol policy subscriber, watch for updates to your policy manual for the forthcoming Forensic Genetic Genealogy policy.
Notes
In July 2019, a Circuit Court in Florida approved a warrant to override GEDmatch’s privacy settings. The law enforcement community should consider the benefits of crime solving with the long-term problem of eroding public trust. The decline in public trust is more than just talk; it has the potential to result in real outcomes, including laws that might limit the use of FGG in investigations or cause people to stop using open-source genetic databases.
While this example is written in the context of a no-suspect case, this process could be applied to the identification of victims or unknown human remains as well.
Sources
- Average Percent DNA Shared Between Relatives. 23andMe. Accessed via https://customercare.23andme.com/hc/en-us/articles/212170668-Average-Percent-DNA-Shared-Between-Relatives
- Chamary JV. How Genetic Genealogy Helped Catch The Golden State Killer. Forbes, 6/30/2020. Accessed 3/1/2024 via https://www.forbes.com/sites/jvchamary/2020/06/30/genetic-genealogy-golden-state-killer/
- Comprehensive solutions for genetic genealogy and family tree search. GEDmatch.com. Accessed 3/1/2024 via https://www.gedmatch.com/
- The FBI’s Combined DNA Index System (CODIS) Hits Major Milestone. FBI.gov, 5/21/2021. Accessed 3/1/2024 via https://www.fbi.gov/news/press-releases/the-fbis-combined-dna-index-system-codis-hits-major-milestone
- Kramer RS. Investigative Genetic Genealogy, Golden State Killer. Jerri Williams, 1/10/2024. Accessed 3/1/2024 via https://www.cnn.com/2020/12/18/us/norcal-rapist-roy-waller-sentenced-trnd/index.html
- Jeanguenat A. Switching to 20 Core CODIS Loci and the Impact on SAKI Testing. Bureau of Justice Assistance, U.S. Department of Justice, 8/2018. Accessed 3/1/2024 via https://bja.ojp.gov/library/publications/switching-20-core-codis-loci-and-impact-saki-testing
- Lynch J. Forensic Genetic Genealogy Searches: What Defense Attorneys & Policy Makers Need to Know. Electronic Frontier Foundation, 7/26/2023. Accessed 3/1/2024 via https://www.eff.org/wp/forensic-genetic-genealogy-searches-what-defense-attorneys-need-know
- St. John P. The untold story of how the Golden State Killer was found: A covert operation and private DNA. Los Angeles Times, 12/8/2020. Accessed 3/1/2024 via https://www.latimes.com/california/story/2020-12-08/man-in-the-window
- Sweat M. A Square Double Helix in a Round Hole: Forensic Genetic Genealogy Searches and the Fourth Amendment. Georgia State University Law Review, 3/2/2023. Accessed 3/1/2024 via https://readingroom.law.gsu.edu/cgi/viewcontent.cgi?article=3190&context=gsulr
- United States Department of Justice Interim Policy Forensic Genetic Genealogical DNA Analysis and Searching. U.S. Bureau of Justice, 9/2/2019. Accessed 3/1/2024 via https://www.justice.gov/olp/page/file/1204386/download
- Vallieu M. ‘Buckskin Girl’ identified. Lima News, 4/16/2018. Accessed 3/1/2024 via https://www.limaohio.com/top-stories/2018/04/16/buckskin-girl-identified/
- Waller R. Genealogy Explained, 2/9/2024. Accessed 3/1/2024 via https://www.genealogyexplained.com/igg-cases/roy-waller/
- Williams J, Kramer S. Investigative Genetic Genealogy, Golden State Killer. JerriWilliams.com, 1/10/2024. Accessed 3/1/2024 via https://jerriwilliams.com/306-steve-kramer-and-steve-busch-investigative-genetic-genealogy-golden-state-killer/
About the author
Prathi Chowdri served as a federal trial attorney defending the NYPD against claims of false arrest, excessive force, wrongful conviction and malicious prosecution in the Southern and Eastern Districts of New York from 2005 to 2010. Many of her cases as Senior Counsel involved complex civil litigation with extensive e-discovery and high-profile claims against the city, its police officers, prosecutors, and corrections officers. She also worked as an associate at a private law firm in New York City where she continued federal trial litigation in both 42 USC § 1983 and medical malpractice. Prathi has also served as an Adjunct Professor in Constitutional Law at Florida Atlantic University. From 2015 to 2024 she worked as a member of Lexipol’s legal team, ensuring that policy and other content conformed to state and federal standards. She is now chief legal advisor and director of strategy for Polis Solutions, focusing on law, policy and civil rights from a policing and AI perspective.