NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE20634 Query DataSets for GSE20634
Status Public on Apr 20, 2010
Title Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions.
Organism Homo sapiens
Experiment type Genome variation profiling by genome tiling array
Summary The high level of human genome structural variation among individuals suggests that there must be portions of the genome that have yet to be discovered, annotated and characterized at the sequence level. Using clone resources developed as part of the Human Genome Structural Variation Sequencing Project, we focused on the characterization of 2,363 novel sequence contigs not present in the human reference genome. We determined that these contigs corresponded to 720 distinct loci of which 400 now have an anchored position in the reference genome. We investigated the sequence properties of these loci and determined that 37% of these novel insertions are copy-number polymorphic. We find that they are significantly enriched within the last 5 Mb of chromosomes (a 2.9-fold enrichment, p=1.0e-18, binomial test) and that most arose as a result of deletions in the human lineage after separation from the African great apes. A subset of these sites shows evidence of marked population stratification among Asian, African and European populations, including a 3.9-kb insertion within the first intron of the lactase gene. Complete sequencing of clones from 192 genomic loci, including 156 completely spanned insertions, provides a detailed and contextual view of 1.67 Mb of inserted sequence. Analysis of this sequence identified 477 elements that show evidence of sequence constraint over evolutionary time, as well as matches to 22 RefSeq gene segments. Twenty-six of the insertions contain matches against mRNA-seq data indicating the potential presence of functionally important, unannotated human sequences. Taking advantage of this high-quality sequence, we develop a method to accurately genotype these novel insertions using next-generation whole-genome sequencing datasets.
 
Overall design 29 samples including the reference sample (NA15110) which was used in both channels in a single self-self experiment.
 
Contributor(s) Kidd JM, Sampas N, Antonacci F, Graves T, Fulton R, Hayden HS, Alkan C, Malig M, Ventura M, Giannuzzi G, Kallicki J, Anderson P, Tsalenko A, Yamada NA, Tsang P, Kaul R, Wilson RK, Bruhn L, Eichler EE
Citation(s) 20440878
Submission date Mar 04, 2010
Last update date Mar 22, 2012
Contact name Nick Sampas
E-mail(s) nick_sampas@agilent.com
Organization name Agilent Technologies
Department Life Science and Nanotechnology Department
Lab Molecular Technology Laboratory
Street address 5301 Stevens Creek Blvd
City Santa Clara
State/province CA
ZIP/Postal code 95051
Country USA
 
Platforms (1)
GPL10118 Agilent Custom Human 244K CGH Array
Samples (29)
GSM518151 NA10847/NA15510
GSM518152 NA10851/NA15510
GSM518153 NA11832/NA15510
Relations
BioProject PRJNA124851

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE20634_RAW.tar 808.2 Mb (http)(custom) TAR (of TXT)
Processed data included within Sample table

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap