Genome assembly using Shasta

The third assembler we introduce is Shasta, a long-read assembler with very fast run-time that also produces contigs (in contrast to miniasms unitigs).

Change into directory assembler_practical/shasta. In contrast to the other assemblers Shasta expects fasta files as input, not fastq. Thus, before you can run Shasta you will have to convert the guppy fastq to a fasta file using the Shasta command FastqToFastq.py and then run shasta on the output fasta file.

FastqToFasta.py \
~/course_data/practicals/basecalling_practical/all_guppy.fastq \
all_guppy.fasta

shasta --input all_guppy.fasta

This will create a directory ShastaRun as output containing, among others, the assembly in fasta format (Assembly.fasta).

As before, assess the assembly using assembly-stats, dnadiff and Assemblytics.

Which of the assemblies/assemblers would you use for your project?
What are the strengths/weaknesses of the different assemblers?

Answers