I finally put out my preprint on assembling the genome of Sciara (Bradysia) coprophila, a black fungus gnat. I have been working on it, on and off for several years.
Single-molecule sequencing of long DNA molecules allows high contiguity de novo genome assembly for the fungus fly, Sciara coprophila:
https://www.biorxiv.org/content/10.1101/2020.02.24.963009v1
https://doi.org/10.1101/2020.02.24.963009
Along the way I learned a lot, and also helped a lot of others assemble genomes. For my preprint, I added excruciating detail (e.g. commands, workflows) in the Supplemental Materials on how to:
- assemble genomes with short reads (Illumina)
- assemble genomes with long reads (MinION, PacBio)
- generate consensuses or polish genomes with external programs
- assemble transcriptomes de novo or with a reference
- filter haplotigs from genomes
- filter contaminating contigs out
- identify X-linked contigs (when only 1 copy X to 2 copies autosomes)
- RNA-seq dosage compensation analysis
- DNA modification analyses using single-molecule long reads
- evaluate genome assemblies to choose a "best" one
- scaffold with BioNano optical maps
- and more
If you find it useful, please cite the preprint (or the forthcoming publication when it arrives).
Urban JM, Foulk MS, Bliss JE, Coleman CM, Lu N,
Mazloom R, Brown SJ, Spradling AC, Gerbi SA. 2020. Single-molecule sequencing
of long DNA molecules allows high contiguity de novo genome assembly for the
fungus fly, Sciara coprophila. bioRxiv 2020.02.24.963009.
I also generated a lot of tools for working with assemblies, annotations, MinION reads, and so on. Please feel free to explore and use them anyway you see fit, and if you do, please cite the preprint or forthcoming publication:
Working with MinION data:
- https://github.com/JohnUrban/poreminion
- https://github.com/JohnUrban/fast5tools
Tools to help evaluate genome assemblies using a battery of metrics:
- https://github.com/JohnUrban/battery
- https://github.com/JohnUrban/lave
Many, many general tools generated during my work with Sciara genomic datasets:
- https://github.com/JohnUrban/sciara-project-tools
- https://github.com/JohnUrban/fftDnaMods
Urban JM, Foulk MS, Bliss JE, Coleman CM, Lu N,
Mazloom R, Brown SJ, Spradling AC, Gerbi SA. 2020. Single-molecule sequencing
of long DNA molecules allows high contiguity de novo genome assembly for the
fungus fly, Sciara coprophila. bioRxiv 2020.02.24.963009.
Feel free to get in touch with me for direct help with your genome project(s) in exchange for an authorship position on the resulting paper(s).
I am glad to answer to comments below or emails otherwise.
Best of luck to you! Happy assembling!
-------------------------------------------------------------------------------------------------------------------------
NOTE: The assembly, annotation, and associated datasets will be made available between now and when the peer-review publication is available.
Check NCBI BioProject database (http://www.ncbi.nlm.nih.gov/bioproject/) under accession number PRJNA123456 for:
- raw Illumina (DNA and RNA-seq)
- PacBio data
- MinION data
- BioNano
data
- BioNano CMAPs
- PacBio kinetics and DNA
modification results
Also look within DDBJ/ENA/GenBank "Whole Genome Shotgun projects" where the Bcop_v1.0 genome assembly will be (or has been) deposited under accession: VSDI00000000 (Bcop_v1.0 = version VSDI01000000).
The automated Bcop_v1.0 annotation will be (or is) available at the i5k Workspace (i5k.nal.usda.gov) where updates via community-based manual curation will/can be made.