Semi-automatic in silico gap closure enabled de novo assembly of two Dehalobacter genomes from metagenomic data

PLoS One. 2012;7(12):e52038. doi: 10.1371/journal.pone.0052038. Epub 2012 Dec 21.

Abstract

Typically, the assembly and closure of a complete bacterial genome requires substantial additional effort spent in a wet lab for gap resolution and genome polishing. Assembly is further confounded by subspecies polymorphism when starting from metagenome sequence data. In this paper, we describe an in silico gap-resolution strategy that can substantially improve assembly. This strategy resolves assembly gaps in scaffolds using pre-assembled contigs, followed by verification with read mapping. It is capable of resolving assembly gaps caused by repetitive elements and subspecies polymorphisms. Using this strategy, we realized the de novo assembly of the first two Dehalobacter genomes from the metagenomes of two anaerobic mixed microbial cultures capable of reductive dechlorination of chlorinated ethanes and chloroform. Only four additional PCR reactions were required even though the initial assembly with Newbler v. 2.5 produced 101 contigs within 9 scaffolds belonging to two Dehalobacter strains. By applying this strategy to the re-assembly of a recently published genome of Bacteroides, we demonstrate its potential utility for other sequencing projects, both metagenomic and genomic.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Alleles
  • Computational Biology*
  • Computer Simulation
  • Contig Mapping*
  • Genetic Variation
  • Genome, Bacterial*
  • Metagenomics*
  • Peptococcaceae / genetics*
  • RNA, Ribosomal, 16S
  • Reproducibility of Results

Substances

  • RNA, Ribosomal, 16S

Grants and funding

Metagenome sequencing of the ACT-3 culture was provided by the U.S. Deparment of Energy Joint Genome Institute through the Community Sequencing Program (CSP 2010). Support was provided by the Government of Canada through Genome Canada and the Ontario Genomics Institute (2009-OGI-ABC-1405). Support was also provided by the Government of Ontario through the ORF-GL2 program and the United States Department of Defense through the Strategic Environmental Research and Development Program (SERDP) under contract W912HQ-07-C-0036 (project ER-1586). S.T. received awards from the Government of Ontario through the Ontario Graduate Scholarships in Science and Technology (OGSST) and the Natural Sciences and Engineering Research Council of Canada (NSERC PGS B). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.