Genome assembly using Flye

Another assembler that can be used for long-reads such as PacBio and Oxford Nanopore is Flye. In contrast to the minimap and miniasm pipeline Flye also produces a polished consensus sequence for the assembly which significantly reduces the error rate (more about consensus sequences and polishing in the next practicals).

Change into the Flye directory in the assembler_practical folder and run flye on the raw basecalled reads

flye --nano-raw \
~/course_data/precompiled/all_guppy.fastq \
--genome-size 1m --out-dir ./flye_output

As you can see, flye requires the input reads (–nano-raw) as well as an output directory and the (expected) size of the final assembly which, in this case is set to 1 megabase (1,000,000 bases). The output of flye are several files including the assembly in fasta format.

Flye takes significantly more time and resources to run. On an average laptop this assembly will take approximately 10 minutes. If you don’t want to wait just stop flye (Ctrl-c) and copy the precompiled result files from directory ~/course_data/precompiled/flye_output into this directory.


When Flye is finished use assembly-stats to get a first overview over the finished assembly.

  1. Does the assembly differ from the miniasm assembly, e.g., wrt total length, number of contigs and length of the contigs?


Now align the flye assembly to the reference chromosome using dnadiff

dnadiff –p flye_dnadiff ~/course_data/precompiled/chr17.fasta \

Open the file (e.g. double-click on the file).

  1. How many contigs aligned with the reference? What is the error rate?


Now upload the to Assemblytics and inspect the dot plots.

  1. How many contigs align well with the reference?
  2. Is the Flye assembly more or less fragmented than the miniasm assembly? Why?
  3. Does the alignment differ from the reference, e.g., does the Flye assembly extend the start or stop of the reference? Are there inversions?


You can zoom in on parts of the dot plot by