The initial draft assembly yielded 5 large (>1,500 bp), non-redun

The initial draft assembly yielded 5 large (>1,500 bp), non-redundant contigs with an N50 of 379,608bp by combing 831,945 Roche/454 reads (3kb and 8kb insert libraries) at 166.93�� MEK162 coverage, 3,514,850 normalized Illumina reads [33] at 107.95�� coverage, and 10,798 corrected PacBio reads [34] at 7.81�� coverage by hybrid assembly through the Mira assembler [28]. The resulting maximal base-error rate (

crescens, and were manually corrected with the CLC Genomics Workbench (CLCbio, Katrinebjerg, Denmark). Intrascaffold gaps were closed by further passes of the Mira hybrid assembly combining the current scaffold with varying combinations of read data. Omitting certain read technologies at further hybrid assembly iterations allowed more successful assemblies at different points of the genome. Pseudo 454-like paired-end reads were generated from the scaffold to allow very large contigs to be employed in further iterations of Mira hybrid assembly. Pseudo 454-like reads conformed to the 19 kb upper limit of Mira read length and consisted of a 34 kb insert size. Additionally, subsets of the original Illumina paired-end reads and normalized Roche/454 reads were entered into the read pool to avoid problematic reads.

Contigs of each hybrid assembly pass were manually corrected for misjoined contigs and combined by Minimus2 [29] to yield a circular genomic sequence. Genome annotation Genome annotation was performed by the Rapid Annotation using Subsystem Technology (RAST) pipeline [36]. RAST employs tRNAscan-SE [37] to identify tRNA genes, Niels Larsen��s “search_for_rnas” (available from the author) to identify rRNA encoding genes, and GLIMMER [38] to identify candidate protein-encoding genes. RAST compares the set of candidate protein-encoding genes to a collection of protein families, referred to as FIGfams [36], in order to correct CDS starting positions and place the genome in a phylogenic context.

The candidate protein set was compared to the National Center for Biotechnology Information (NCBI) non-redundant (nr) database, SwissProt database, European Bioinformatics Institute (EBI) phage database, and COG subset of the NCBI Conserved Domain Database (CDD) through the NCBI BLAST suite. Additionally, predicted proteins were annotated through the Kyoto Entinostat Encyclopedia of Genes and Genomes (KEGG) automatic annotation server (KAAS). KAAS employs NCBI BLAST to search the KEGG Orthology database [39]. Genome properties The genome consists of one circular chromosome of 1,504,659 bp (35.35% GC content). 1,433 genes were predicted, 1,379 of which are protein-coding genes.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>