U.S. flag

An official website of the United States government, Department of Justice.

Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images

NCJ Number
Scientific Reports Volume: 10 Issue: 1 Dated: 2020
Date Published

This article presents a new approach and an accompanying open-source toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets.


Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, enabling the addressing of issues regarding admixture and the numbers and identities of the parental source populations. Unrecognized population structure is also an important confounder to correct for in genome-wide association studies; however, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. The approach explained in this article shows robustness against individual outliers and various protocols for the projection of new samples into a reference ancestry space, and the ability to reveal and adjust for population structure in a simulated case–control admixed population. Given that visually evident and easily recognizable patterns of human facial characteristics co-vary with genomic ancestry, and based on the integration of three different sources of genome data, the project generates average 3D faces to illustrate genomic ancestry variations within the 1,000 Genome project and for eight ancient-DNA profiles, respectively. (publisher abstract modified)

Date Published: January 1, 2020