It is important to use care in labeling the populations when publishing or presenting the findings of studies that used the samples. This document provides guidelines on how to refer to the populations.
Rationale
The way that a population is named in studies of genetic variation, such as in the HapMap or 1000 Genomes Projects, has important ramifications scientifically, culturally, and ethically. From a scientific standpoint, precision in describing the population from which the samples were collected is an essential component of sound study design; the source of the data must be accurately described in order for the data to be interpreted correctly. From a cultural standpoint, precision in labeling reflects respect for the local norms of the communities that agreed to participate in the research, and an acknowledgement that populations in one part of the world are not all the same. From an ethical standpoint, precision is part of the obligation of researchers to participants, and helps to ensure that the research findings are neither under-generalized nor over-generalized. The use of careless or inconsistent terminology when describing the populations represents a failure in all three of these areas. The populations whose samples are included in the NHGRI Repository should not be named in such a way that they single out small, discrete communities and imply that those communities are somehow genetically unique or of special interest. Labels that are too specific could also invade the privacy interests of communities (or even of individual sample donors).
On the other hand, describing the populations in terms that are too broad could result in inappropriate over-generalization. This could erroneously lead those who interpret data from studies that use the samples to equate ancestry with race (an imprecise and socially constructed category, which has very different meanings in various parts of the world). This could reinforce social and historical stereotypes, and lead to group stigmatization and discrimination in places where members of the named populations or of closely related communities are minorities.
Recommended Descriptors
Recommended language has been developed for naming each population whose samples are included in the NHGRI Repository. Each recommended descriptor reflects the principles discussed above, as well as input from the sample donor communities about how they wished to be described.
The complete recommended language for naming the populations whose samples are included in the NHGRI Repository reflects both the ancestral geography or ethnicity of each population and the geographic location where the samples from that population were collected. Below are the official, approved descriptors for eacsh of the populations whose samples are in the Repository. After the complete descriptor for a population has been provided, it is acceptable to use the abbreviation for that population (e.g., “YRI,” “JPT,” “CHB,” “CEU") in the remainder of the article or presentation. However, the full descriptor for each population should be provided before the abbreviations are used; this will help to avoid the risks associated with over-generalization of findings.
The sample sets should not be described as having come from “normal controls.” No phenotypic information was collected with the samples, so we do not know what medical conditions the donors had.
In some cases, in addition to providing the complete descriptor for each population when first describing the populations, it may be appropriate to describe the criteria that were used to assign membership in each population. This information can be found in the Population Descriptions for each specific population (follow links for each specific population above).