|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Apr 20, 2010 |
Title |
Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions. |
Organism |
Homo sapiens |
Experiment type |
Genome variation profiling by genome tiling array
|
Summary |
The high level of human genome structural variation among individuals suggests that there must be portions of the genome that have yet to be discovered, annotated and characterized at the sequence level. Using clone resources developed as part of the Human Genome Structural Variation Sequencing Project, we focused on the characterization of 2,363 novel sequence contigs not present in the human reference genome. We determined that these contigs corresponded to 720 distinct loci of which 400 now have an anchored position in the reference genome. We investigated the sequence properties of these loci and determined that 37% of these novel insertions are copy-number polymorphic. We find that they are significantly enriched within the last 5 Mb of chromosomes (a 2.9-fold enrichment, p=1.0e-18, binomial test) and that most arose as a result of deletions in the human lineage after separation from the African great apes. A subset of these sites shows evidence of marked population stratification among Asian, African and European populations, including a 3.9-kb insertion within the first intron of the lactase gene. Complete sequencing of clones from 192 genomic loci, including 156 completely spanned insertions, provides a detailed and contextual view of 1.67 Mb of inserted sequence. Analysis of this sequence identified 477 elements that show evidence of sequence constraint over evolutionary time, as well as matches to 22 RefSeq gene segments. Twenty-six of the insertions contain matches against mRNA-seq data indicating the potential presence of functionally important, unannotated human sequences. Taking advantage of this high-quality sequence, we develop a method to accurately genotype these novel insertions using next-generation whole-genome sequencing datasets.
|
|
|
Overall design |
29 samples including the reference sample (NA15110) which was used in both channels in a single self-self experiment.
|
|
|
Contributor(s) |
Kidd JM, Sampas N, Antonacci F, Graves T, Fulton R, Hayden HS, Alkan C, Malig M, Ventura M, Giannuzzi G, Kallicki J, Anderson P, Tsalenko A, Yamada NA, Tsang P, Kaul R, Wilson RK, Bruhn L, Eichler EE |
Citation(s) |
20440878 |
Submission date |
Mar 04, 2010 |
Last update date |
Mar 22, 2012 |
Contact name |
Nick Sampas |
E-mail(s) |
nick_sampas@agilent.com
|
Organization name |
Agilent Technologies
|
Department |
Life Science and Nanotechnology Department
|
Lab |
Molecular Technology Laboratory
|
Street address |
5301 Stevens Creek Blvd
|
City |
Santa Clara |
State/province |
CA |
ZIP/Postal code |
95051 |
Country |
USA |
|
|
Platforms (1) |
GPL10118 |
Agilent Custom Human 244K CGH Array |
|
Samples (29)
|
|
Relations |
BioProject |
PRJNA124851 |
Supplementary file |
Size |
Download |
File type/resource |
GSE20634_RAW.tar |
808.2 Mb |
(http)(custom) |
TAR (of TXT) |
Processed data included within Sample table |
|
|
|
|
|