NCBI logo TBL2ASN
PubMed Entrez BLAST OMIM Taxonomy Structure
NCBI
SITE MAP

GenBank
Sequence submission support and software

BankIt
For quick and simple submissions

Sequin
Stand-alone sequence submission tool


blue bulletWhat is tbl2asn?

Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Sequin but is driven generally by data files. Tbl2asn generates .sqn files for submission to GenBank. Additional manual editing is not required before submission.

Tbl2asn is available by anonymous FTP. Copy the right version for your platform, then uncompress the file, rename it to "tbl2asn", and set the permissions, as necessary for the platform.

NEW:Several command-line arguments were changed in version 10.0 of tbl2asn, to make it more flexible and expandable. Please read the updated list of arguments and example command-lines below.

blue bullet6 types of input data files
  1. Template file containing a text ASN.1 Submit-block object (suffix .sbt).
  2. Nucleotide sequence data in FASTA format (suffix .fsa).
  3. Feature Table (suffix .tbl).
  4. Protein sequence (suffix .pep). (These will replace the tbl2asn-generated conceptual translations to confirm that the CDS intervals are correct.).
  5. Source Table (suffix .src).
  6. Quality Scores (suffix .qvl).
blue bulletCreating the template file (.sbt)
  • Choose start a new submission with Sequin.
  • Enter manuscript title if desired.
  • Enter contact, authors and affiliation information.
  • Return to submission tab and use File->Export Submitter Info.
  • Save as template.sbt.
blue bulletGenerating the .sqn file for submission

  • The minimum requirements to generate a Sequin file using tbl2asn are one .sbt file and one or more .fsa files.
  • The files are placed in a source directory and a series of command-line arguments are used to generate the .sqn files.
  • Tbl2asn will generate a .sqn for every .fsa file in the directory, plus any of the corresponding optional files that may be present. The other files must have the same file name prefix as their corresponding .fsa. (for example helicase.fsa and helicase.tbl).

  • Command Line Arguments

    Typing "tbl2asn -" will give the full list of command line arguments. Here is a partial list of commonly used arguments:

    -pPath to the directory. If files are in the current directory -p. should be used.
    -rPath for the resulting .sqn file(s) (if the -r argument is not used, the .sqn files will be saved in the source directory).
    -tSpecifies the template file (.sbt). If the .sbt file is in a different directory the full path must be specified.
    -iCreates single submission from indicated .fsa file in a directory of multiple .fsa files.
    -aSpecifies the File type.
      s :FASTA Set (s Batch, s1 Pop, s2 Phy, s3 Mut, s4 Eco)
      l :FASTA+Gap Alignment
      z :FASTA with Gap Lines
      e :PHRAP/ACE
      d :FASTA Delta, di FASTA Delta with Implicit Gaps
      a :Any (default)
    Sample command line: -a s
    -sInstructs tbl2asn to read multiple FASTA components in one file as a set of unrelated sequences. Equivalent to "-a s". This creates a single file of multiple submissions. (1000 sequences per file is the usual maximum.)
    -jAllows the addition of source qualifiers that will be the same for each submission. Example: -j "[organism=Saccharomyces cerevisiae] [strain=S288C]".
    -VVerification (combine any of the following letters):
      v :Validates the data records. The output is saved to files with a .val suffix.
      b :Generates GenBank flatfiles with a .gbf suffix.
      r :Validates without Country Check
    Sample command line: -V vb
    -kCDS Flags (combine any of the following letters):
      c :Instructs tbl2asn to annotate the longest open reading frame (ORF) if a .tbl file is not provided. The product name will be 'unknown' unless a product name is included in the FASTA definition, [product=xyz].
      m :Allows alternative start codons to be used in ORF searches.
      r :Allows Runon ORFs
    Sample command line: -k c
    -yAdds a COMMENT to each submission. Example: -y "Contigs larger than 2kb have been annotated, representing approx. 87% of the total genome".
    -YLike -y, but adds a COMMENT to each submission from a file.
    -ZRuns the Discrepancy Report. Must supply an output file name. Recommended only for annotated genome submissions, complete or WGS. See the Discrepancy Report page for information about its output.
    -oCreates a single submission from multiple fasta files.

    Example Command Lines:

  • Single submission: one sequence per .fsa file
      tbl2asn -t template.sbt -p path_to_files -V v
  • Batch submission: multiple sequences per .fsa file
      tbl2asn -t template.sbt -p path_to_files -a s -V v
  • Single submission: one .fsa file in directory of multiple .fsa files
      tbl2asn -t template.sbt -i x.fsa -V v

    Before submitting your .sqn files to GenBank, review the .val files and correct any error-level errors. Taxonomy-related errors about missing lineages can generally be ignored. However, if there is annotation and the genetic code is not the standard code, then include the correct code in the .fsa definition line as shown in the .fsa definition line, or with the -j in the command line, to avoid errors.

    blue bulletNucleotide sequence and FASTA defline formats (.fsa)

  • No size limit on nucleotide sequence.
  • FASTA file should consist of a single definition line beginning with a '>'.
  • Minimum requirements for the FASTA defline are:
    • SeqID (sequence identifier) which is the text between the '>' and the first space.
    • Organism and related information (unless organism information is included with -j at the command line or in a .src file )
  • Optional defline information that may be included is:
      Biological
      • strain [strain=S288C]
      • isolate [isolate=CWS1]
      • chromosome [chromosome=XVI]
      Other elements
      • topology [topology=circular]
      • location [location=mitochondrion]
      • molecule [moltype=mRNA] (DNA is the default)
      • technique [tech=wgs]
      • protein name [protein=helicase] (if using -c)
      • genetic code [gcode=4]

    For a complete list of source modifiers click here. See the Tax Browser for the genetic code values.

    Example FASTA:

    >Sc_16 [organism=Saccharomyces cerevisiae]
    tataggcgaatcgagtatattattttttctcaacatatgtat
    atgaacatgagaatatatttataggaatgtataaaattgtga
    cctctcctgctattttagttactgattttatgtatgtagggg
    gaataggggctgcctttcttaatgcagttttaattttttctt
    ttaattttttcttagtaaaattatttaaagtaaagattaatg
    gaataaccattgcgcttttttttacagtttttggtttttcat
    tttttggaaaaaatattttaaatattttacctttttatttag
    ggggtattttatatagtatctatacttcaacagatttttctg
    aacatatagttcctattgctttttcaagtgcattagcccctt
    ttgtaagcagtgttgctttttatggagaaatatcctatgaaa
    catcatatataaatgcaattttaattggtattttaattggtt
    ttatagtggttcctttgtctaaaagtctttatgactttcatg
    agggatatgatttatataatttaggttttacagcaggtt
    

    blue bulletFeature table format (.tbl)

    Tbl2asn reads features from a five-column tab-delimited table called a Feature table . The feature table specifies the location and type of each feature. Tbl2asn will process the feature intervals and translate any CDSs into proteins. The first line of the table should contain the following information:

    >Features SeqID table_name
    

    The SeqID must match the nucleotide sequence SeqID in the corresponding .fsa file.

    Example Feature Table:

    >Feature Sc_16 Table1
    69      543    gene
                            gene       sde3p
    69      543    CDS
                            product SDE3P
                            protein_id     WS1030
                            

    blue bulletProtein sequence format (.pep)

    • Set up as a FASTA file using the protein sequence.
    • This file will substitute the automatically translated products of the CDS features with the provided protein sequences.
    • Serves as a check that the conceptual translation of the nucleotide sequence is as predicted.
    • SeqID must match protein_id in the .tbl file

    Example FASTA:

    >WS1030 [gene=sde3p] [protein=SDE3P]
    MYKIVTSPAILVTDFMYVGGIGAAFLNAVLIFSFNFFL
    VKLFKVKINGITIAAFFTVFGFSFFGKNILNILPFYLG
    GILYSIYTSTDFSEHIVPIAFSSALAPFVSSVAFYGEI
    SYETSYINAILIGILIGFIVVPLSKSLYDFHEGYDLYN
    LGFTAG
    

    blue bulletSource table format (.src)

    For sets of sequences, a source modifier table can be placed in a tab-delimited file with a .src extension. The first column must be the sequence's SeqID. The first row gives the names of the source qualifiers being added, separated by tabs. Any additional rows list the SeqID and related source qualifiers for each sequence in the corresponding .fsa file.

    SeqID     organism     strain     isolate
    Sc_16     Zea mays     A69Y       JH90.6-2x12
    

    blue bulletQuality scores table format (.qvl)

  • Provides Phrap/Consed quality scores.
  • Has a defline with the corresponding SeqID from the .fsa file.
  • Generates Seq-graph data that will be included with the nucleotide sequence of the .fsa file in the final .sqn file.
  • The quality scores appear below the sequence in the .sqn file, and are shown in the Quality format option when the .sqn file is viewed in Sequin.

    >Sc_16
    51 63 70 82 82 82 90 90 90 90 86 86  
    86 86 86 86 90 90 90 90 90 86 86 78...

    blue bulletTbl2asn Update Notification

    To receive email notification about updates to tbl2asn, as well as a description of what is included in the update follow these directions.

    Disclaimer     Privacy statement

    Revised February 12, 2008