HapMap Project

General Background  
In 2001, the International HapMap Consortium launched the International HapMap Project to develop a haplotype map (“HapMap”) of the human genome - a resource that describes the common patterns of human DNA sequence variation. The HapMap has become an important tool for researchers to use to find genes that affect health, disease, and response to drugs and environmental factors. All HapMap data are freely available to the public through the database dbSNP. A graphical browser for HapMap genotypes is also available at http://www.hapmap.org/cgi-perl/gbrowse/gbrowse. Further information can be found at the International HapMap Project website. The Project is also described at Nature 426 :789-796, 2003 [PMID: 14685227]. The associated ethical issues are described at Nature Reviews Genetics 5: 467-475, 2004 [PMID: 15153999]. The general process of community engagement used in connection with the collection of samples for the Project is described at Community Genetics, 10(3): 186-198, 2007 [PMID: 17575464]. 

Phase I  
In 2005, the International HapMap Consortium released the Phase I HapMap, a resource consisting of over a million accurate and complete SNP genotypes generated in 269 individuals from four geographically diverse populations: The Yoruba in Ibadan, Nigeria; Japanese in Tokyo, Japan; Han Chinese in Beijing, China; and the CEPH (U.S. Utah residents with ancestry from northern and western Europe). The Phase I HapMap includes data from ten 500-kb regions (the “HapMap ENCODE I regions”) that were sequenced, to assess the genotyping. The Phase I HapMap documents the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbors. This resource will guide the design and analysis of genetic association studies. It also sheds light on structural variation and recombination and identifies loci that may have been subject to natural selection during human evolution. The Phase I HapMap is described at Nature 437:1299-1320 [PMID: 16255080]. 

Phase II  
In 2007, the International HapMap Consortium released the Phase II HapMap, which added over 2.1 million SNPs to the original map in the same 269 individuals. The Phase II HapMap enables an improved choice of tag SNPs, a better understanding of how well studies capture patterns of genetic variation, and the potential to increase the power of association experiments using fixed marker sets through imputation. It also reveals novel aspects of the structure of linkage disequilibrium, including the importance of recent co-ancestry among individuals and the distribution and causes of untaggable SNPs. In addition, it improves the resolution of the fine-scale genetic map and location of recombination hotspots and provides new information about the influence of natural selection on protein-changing variants. The Phase II HapMap is described at Nature 449: 851-861, 2007 [PMID: 17943122]. 

Analysis of Samples from Additional Populations – HapMap 3  
U.K. and U.S. investigators expanded the Phase I/II HapMap by genotyping and sequencing additional samples from the HapMap populations and samples from seven additional populations: Maasai in Kinyawa, Kenya; Luhya in Webuye, Kenya; Chinese in metropolitan Denver, CO, USA; Gujarati Indians in Houston, TX, USA; Toscani in Italia (Tuscans in Italy); African ancestry in the Southwest USA; and Mexican ancestry in Los Angeles, CA, USA. Most of these samples were genotyped across the genome for 1.6 million SNPs. A subset of the samples from these populations was sequenced in 2 Mb of the ENCODE II regions (20 regions of 100 kb each). This combination of genotyping and sequencing allowed comparison of genome-wide patterns of variation. It also made possible the assessment of the transferability of the tag SNPs based on the initial 269 samples to the new populations.

The HapMap 3 results are described at Nature 467: 52-58, 2010 [PMID: 20811451].

HapMap Samples  
No identifying or phenotype information is available for the HapMap samples that are housed in the NHGRI Repository. All of the samples were collected with extensive community engagement, including discussions with members of the donor communities about the ethical and social implications of human genetic variation research. Donors gave broad consent to future uses of the samples, including their use for extensive genotyping and sequencing, gene expression and proteomics studies, and all other types of genetic variation research, with the data publicly released. Investigators can order individual DNA samples, or individual cell cultures. The biomaterials currently available are shown in the table below: 

Populations Included in Phase I/II HapMap 

Population Approved 
Individual DNA Samples
Approved 
Individual Cell Cultures
Yoruba in Ibadan, Nigeria [YRI] 229 229
Han Chinese in Beijing, China [CHB] 162 162
Japanese in Tokyo, Japan [JPT] 131 131
CEPH/Utah Collection [CEU] 
[NIGMS Human Genetic Cell Repository]
186 186

Additional Populations 
Population Approved 
Individual DNA 
Samples
Approved 
Individual 
Cell Cultures
Maasai in Kinyawa, Kenya [MKK] 205 205
Luhya in Webuye, Kenya [LWK] 122 122
Chinese in Metropolitan Denver, CO, USA [CHD] 129 129
Gujarati Indians in Houston, TX, USA [GIH] 117 117
Toscani in Italia [TSI] 117 117
Mexican Ancestry in LA, CA, USA [MXL] 104 104
African Ancestry in SW USA [ASW] 106 106