Guidelines for Referring to Populations

It is important to use care in labeling the populations when publishing or presenting the findings of studies that used the samples. This document provides guidelines on how to refer to the populations.


The way that a population is named in studies of genetic variation, such as in the HapMap or 1000 Genomes Projects, has important ramifications scientifically, culturally, and ethically. From a scientific standpoint, precision in describing the population from which the samples were collected is an essential component of sound study design; the source of the data must be accurately described in order for the data to be interpreted correctly. From a cultural standpoint, precision in labeling reflects respect for the local norms of the communities that agreed to participate in the research, and an acknowledgement that populations in one part of the world are not all the same. From an ethical standpoint, precision is part of the obligation of researchers to participants, and helps to ensure that the research findings are neither under-generalized nor over-generalized. The use of careless or inconsistent terminology when describing the populations represents a failure in all three of these areas. The populations whose samples are included in the NHGRI Repository should not be named in such a way that they single out small, discrete communities and imply that those communities are somehow genetically unique, of special interest, or very different from their close neighbors. Labels that are too specific could also invade the privacy interests of communities (or even of individual sample donors).

On the other hand, describing the populations in terms that are too broad could result in inappropriate over-generalization. This could erroneously lead those who interpret data from studies that use the samples to equate ethnicity or ancestral geography with race (an imprecise and in large part socially constructed category, which has very different meanings in various parts of the world). This could reinforce social and historical stereotypes, and lead to group stigmatization and discrimination in places where members of the named populations or of closely related populations are minorities.

Recommended Descriptors 

Recommended language has been developed for naming each population whose samples are included in the NHGRI Repository. Each recommended descriptor reflects the principles discussed above, as well as input from the sample donor communities about how they wished to be described.

The complete recommended language for naming the populations whose samples are included in the NHGRI Repository reflects both the ancestral geography or ethnicity of each population and the geographic location where the samples from that population were collected. Below are the official, approved descriptors for each of the populations whose samples are in the Repository. After the complete descriptor for a population has been provided, it is acceptable to use the designated shorthand label for that population (e.g., “Yoruba,” “Japanese,” “Han Chinese,” “CEPH”) or the abbreviation for that population (e.g., “YRI,” “JPT,” “CHB,” “CEU") in the remainder of the article or presentation. However, the full descriptor for each population should be provided before the shorthand labels are used; this will help to avoid the risks associated with over-generalization of findings.

Full Population Descriptor Shorthand Label Abbreviation
African Ancestry in Southwest USA African Ancestry SW ASW
African Caribbean in Barbados African Caribbean ACB
Bengali in Bangladesh Bengali BEB
British From England and Scotland, UK British GBR
Chinese Dai - Xishuangbanna, China Dai Chinese CDX
Chinese in Metropolitan Denver, Colorado, USA Denver Chinese CHD
Colombian in Medellin, Colombia Colombian CLM
Esan in Nigeria Esan ESN
Finnish in Finland Finnish FIN
Gujarati Indians in Houston, Texas, USA Gujarati GIH
Han Chinese in Beijing, China Han Chinese CHB
Han Chinese South, China Southern Han Chinese CHS
Iberian populations in Spain Iberian IBS
Indian Telugu in the U.K. Telugu ITU
Japanese in Tokyo, Japan Japanese JPT
Kinh in Ho Chi Minh City, Vietnam Kinh Vietnamese KHV
Luhya in Webuye, Kenya Luhya LWK
Maasai in Kinyawa, Kenya Maasai MKK
Mende in Sierra Leone Mende MSL
Mexican Ancestry in Los Angeles, California, USA Mexican MXL
Peruvian in Lima, Peru Peruvian PEL
Puerto Rican in Puerto Rico Puerto Rican PUR
Punjabi in Lahore, Pakistan Punjabi PJL
Sri Lankan Tamil in the UK Tamil STU
Toscani in Italia Toscani TSI
Yoruba in Ibadan, Nigeria Yoruba YRI

The sample sets should not be described as having come from “normal controls.” No phenotypic information was collected with the samples, so we do not know what medical conditions the donors had.

In some cases, in addition to providing the complete descriptor for each population when first describing the populations, it may be appropriate to describe the criteria that were used to assign membership in each population. This information can be found in the Population Descriptions for each specific population (follow links for each specific population above).