U.S. flag

An official website of the United States government

dbVar Excel Submission Guidelines

The dbVar Excel submission template contains detailed instructions for completing your submission (regardless of the format you choose - Excel, Tab-delimited, or XML). Please download a copy of the Excel template and keep it handy for reference as you complete your submission.

The template contains:

  • An informational tab labeled PLEASE READ FIRST, containing a general overview of the template
  • seven (7) mandatory sheets (CONTACTS, STUDY, SAMPLESETS, SAMPLES, EXPERIMENTS, VARIANT CALLS, and VARIANT REGIONS)
  • six (6) informational sheets (one each for SAMPLES (INFO), EXPERIMENTS (INFO), VARIANT CALLS (INFO), and VARIANT REGIONS (INFO), a list of controlled vocabularies used throughout the submission (CV (INFO)), and a list of external databases to which you may link from various places throughout your submission (LINKS (INFO)). These informational sheets are locked and are not intended to accept data input.

The following is a brief description of each section of the template. When viewing the Excel template, please note that required fields are indicated throughout with bright yellow highlighting.

CONTACTS

Accepts contact information for the submitter, first author, and last author (all required). Includes name, email address, at least one pda_login, and institutional information. Additional persons may be added as needed.

STUDY

Accepts basic information about your study, including study ID in the format Smith2012 (last name of first author followed by year of publication), study description, study type (e.g., Case-Control, Control Set, etc.), and PubMed IDs of any publication(s) associated with data in the submission. You may also indicate a "hold date"; we will process your data but not display it publicly until after this date. (This is commonly used to delay until a publication comes out).

SAMPLESETS

Accepts information about the criteria by which samples or subjects were grouped in your study and the number of such groupings. Each sampleset is assigned a unique ID which may be used later in the submission to indicate groups of samples. If study is based on the study of human samples, a specific phenotype may be provided; other common discriminants include sex and population (i.e., 'ethnicity' if samples are of human origin, 'strain' if they are from mouse, 'breed' if from dog, etc.).

SAMPLES

Detailed description of samples and subjects in the study, including data on cell type (if applicable), karyotype, membership in any public collections (e.g., HapMap, Human Genome Diversity Project, etc.), and parental IDs (e.g., in the case of trios).  NOTE:  If your data contains sensitive clinical information or is from individuals who have not been fully consented to having their genetic information displayed publicly online, you must first submit your data to dbGaP. After that data has been anonymized, it will be forwarded for processing and display at dbVar. A prime example of this is the International Standards for Cytogenomic Arrays (ISCA) project, listed at dbGaP as phs000205 and at dbVar as ntsd37.

EXPERIMENTS

Details about the methods and analyses you used to process samples and produce structural variation data. A complete listing of accepted methods and analyses can be found here. (If your methods and analyses do not match any items in these lists, please contact dbVar to have them added.) Each EXPERIMENT receives a unique ID and represents a distinct combination of wet-lab methods, data analyses, and detection procedures. Validation and genotyping experiments are given separate IDs.

VARIANT CALLS

This section and the next will contain the bulk of your data. VARIANT CALLS details the individual variant calls produced by your experiments. This includes individual call identifiers, type of structural variation (e.g., insertion, deletion, copy number gain, etc.), links to any external resources that contain information about the calls (e.g., TRACE archives of sequence traces), and placement information. The specific data present in the Placement portion of the submission will depend on the methods you used for your experiments. For example, each call made in an array experiment may be described with the coordinates outer_start, inner_start, inner_stop, and outer_stop, indicating the uncertainty in precise localization of breakpoints that is inherent to array-based experiments.

Submitting Clinical Assertions: We accept assertions of clinical relevance (e.g. pathogenic, benign, etc) on variant calls (SSVs). If your submission includes assertions of pathogenicity, you must indicate the specific phenotype(s) to which each assertion applies. You should still provide phenotype information on the SUBJECT and/or SAMPLESET as applicable for your submission.  

For example, if you have a sampleset composed of subjects all of whom have a phenotype of "developmental delay" (as you indicated in SAMPLESETS/sampleset_phenotype), and you identify a variant in a patient who has the additional, more specific phenotype "speech delay" (as you indicated in SAMPLES/subject_phenotype), then in VARIANT CALLS you must specify two items:  (1) the nature of the assertion (select from the pull-down menu in the clinical_significance field) and (2) the phenotype(s) to which the assertion applies - developmental delay, delayed speech, or both - by indicating the phenotype again in the phenotype field. Without this information, dbVar cannot accurately represent your data because we do not know which aspects of phenotype are included in your assertion. 

VARIANT REGIONS

You are required to merge your Variant Calls into Variant Regions. Variant Regions represent a consolidated, non-redundant set of genomic regions in which you have observed structural variation of a given type and size. For example, you might merge calls with identical coordinates and variant call types, or those which have 50% reciprocal overlap, etc. Like VARIANT CALLS, the VARIANT REGIONS section accepts Placement data to describe the location of your regions. It also contains the required field, assertion_method, to describe how you performed your merging.

Additional information about structural variation and the data used to describe it can be found by exploring dbVar webpages and documentation contained in the (INFO) sections of the Excel template. Any remaining questions may be emailed to dbvar@ncbi.nlm.nih.gov.

Support Center

Last updated: 2017-10-29T15:22:46Z