GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Series GSE20634

Query DataSets for GSE20634

Status

Public on Apr 20, 2010

Title

Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions.

Organism

Homo sapiens

Experiment type

Genome variation profiling by genome tiling array

Summary

The high level of human genome structural variation among individuals suggests that there must be portions of the genome that have yet to be discovered, annotated and characterized at the sequence level. Using clone resources developed as part of the Human Genome Structural Variation Sequencing Project, we focused on the characterization of 2,363 novel sequence contigs not present in the human reference genome. We determined that these contigs corresponded to 720 distinct loci of which 400 now have an anchored position in the reference genome. We investigated the sequence properties of these loci and determined that 37% of these novel insertions are copy-number polymorphic. We find that they are significantly enriched within the last 5 Mb of chromosomes (a 2.9-fold enrichment, p=1.0e-18, binomial test) and that most arose as a result of deletions in the human lineage after separation from the African great apes. A subset of these sites shows evidence of marked population stratification among Asian, African and European populations, including a 3.9-kb insertion within the first intron of the lactase gene. Complete sequencing of clones from 192 genomic loci, including 156 completely spanned insertions, provides a detailed and contextual view of 1.67 Mb of inserted sequence. Analysis of this sequence identified 477 elements that show evidence of sequence constraint over evolutionary time, as well as matches to 22 RefSeq gene segments. Twenty-six of the insertions contain matches against mRNA-seq data indicating the potential presence of functionally important, unannotated human sequences. Taking advantage of this high-quality sequence, we develop a method to accurately genotype these novel insertions using next-generation whole-genome sequencing datasets.

Overall design

29 samples including the reference sample (NA15110) which was used in both channels in a single self-self experiment.

Contributor(s)

Kidd JM, Sampas N, Antonacci F, Graves T, Fulton R, Hayden HS, Alkan C, Malig M, Ventura M, Giannuzzi G, Kallicki J, Anderson P, Tsalenko A, Yamada NA, Tsang P, Kaul R, Wilson RK, Bruhn L, Eichler EE

Citation(s)

20440878

Submission date

Mar 04, 2010

Last update date

Mar 22, 2012

Contact name

Nick Sampas

E-mail(s)

nick_sampas@agilent.com

Organization name

Agilent Technologies

Department

Life Science and Nanotechnology Department

Lab

Molecular Technology Laboratory

Street address

5301 Stevens Creek Blvd

City

Santa Clara

State/province

ZIP/Postal code

95051

Country

USA

Platforms (1)

GPL10118

Agilent Custom Human 244K CGH Array

Samples (29)

More...

GSM518151	NA10847/NA15510
GSM518152	NA10851/NA15510
GSM518153	NA11832/NA15510

Relations

BioProject

PRJNA124851

Download family	Format
SOFT formatted family file(s)	SOFT
MINiML formatted family file(s)	MINiML
Series Matrix File(s)	TXT

Supplementary file	Size	Download	File type/resource
GSE20634_RAW.tar	808.2 Mb	(http)(custom)	TAR (of TXT)
Processed data included within Sample table