Tuesday, February 25, 2020

Assembling and annotating genomes: long reads, short reads, optical maps, DNA modifications, and more.


I finally put out my preprint on assembling the genome of Sciara (Bradysia) coprophila, a black fungus gnat. I have been working on it, on and off for several years.

Single-molecule sequencing of long DNA molecules allows high contiguity de novo genome assembly for the fungus fly, Sciara coprophila:
https://www.biorxiv.org/content/10.1101/2020.02.24.963009v1
https://doi.org/10.1101/2020.02.24.963009

Along the way I learned a lot, and also helped a lot of others assemble genomes. For my preprint, I added excruciating detail (e.g. commands, workflows) in the Supplemental Materials on how to:
- assemble genomes with short reads (Illumina)
- assemble genomes with long reads (MinION, PacBio)
- generate consensuses or polish genomes with external programs
- assemble transcriptomes de novo or with a reference
- filter haplotigs from genomes
- filter contaminating contigs out
- identify X-linked contigs (when only 1 copy X to 2 copies autosomes)
- RNA-seq dosage compensation analysis
- DNA modification analyses using single-molecule long reads
- evaluate genome assemblies to choose a "best" one
- scaffold with BioNano optical maps
- and more

If you find it useful, please cite the preprint (or the forthcoming publication when it arrives).


Urban JM, Foulk MS, Bliss JE, Coleman CM, Lu N, Mazloom R, Brown SJ, Spradling AC, Gerbi SA. 2020. Single-molecule sequencing of long DNA molecules allows high contiguity de novo genome assembly for the fungus fly, Sciara coprophila. bioRxiv 2020.02.24.963009.


I also generated a lot of tools for working with assemblies, annotations, MinION reads, and so on. Please feel free to explore and use them anyway you see fit, and if you do, please cite the preprint or forthcoming publication:

Working with MinION data:
https://github.com/JohnUrban/poreminion
https://github.com/JohnUrban/fast5tools

Tools to help evaluate genome assemblies using a battery of metrics:
https://github.com/JohnUrban/battery
https://github.com/JohnUrban/lave

Many, many general tools generated during my work with Sciara genomic datasets:
https://github.com/JohnUrban/sciara-project-tools
https://github.com/JohnUrban/fftDnaMods



Urban JM, Foulk MS, Bliss JE, Coleman CM, Lu N, Mazloom R, Brown SJ, Spradling AC, Gerbi SA. 2020. Single-molecule sequencing of long DNA molecules allows high contiguity de novo genome assembly for the fungus fly, Sciara coprophila. bioRxiv 2020.02.24.963009.






Feel free to get in touch with me for direct help with your genome project(s) in exchange for an authorship position on the resulting paper(s).

I am glad to answer to comments below or emails otherwise.

Best of luck to you! Happy assembling!



-------------------------------------------------------------------------------------------------------------------------
NOTE: The assembly, annotation, and associated datasets will be made available between now and when the peer-review publication is available.

Check NCBI BioProject database (http://www.ncbi.nlm.nih.gov/bioproject/) under accession number PRJNA123456 for:
- raw Illumina (DNA and RNA-seq)
- PacBio data
- MinION data
- BioNano data 
- BioNano CMAPs 
- PacBio kinetics and DNA modification results 

Also look within DDBJ/ENA/GenBank "Whole Genome Shotgun projects" where the Bcop_v1.0 genome assembly will be (or has been) deposited under accession: VSDI00000000 (Bcop_v1.0 = version VSDI01000000).

The automated Bcop_v1.0 annotation will be (or is) available at the i5k Workspace (i5k.nal.usda.gov) where updates via community-based manual curation will/can be made.

No comments:

Post a Comment