Quality Control using PycoQC

PycoQC is a data visualisation and quality control tool for nanopore data. In contrast to FastQC it needs a specific sequencing_summary.txt file generated by Oxford nanopore basecallers such as Guppy or the older albacore basecaller.

In the directory qc_practical/summaries you will find 3 different run-directories each containing a sequence summary file. From within the qc_practical directory call PycoQC on one of the summary files, e.g., run_1:

pycoQC –f summaries/run_1/sequencing_summary.txt –o run_1.html
Do not copy-paste the commands shown on the tutorial pages! Some characters, e.g., the hyphens, change based on the font used causing random errors in the commands! Even if it seems silly, please type the commands yourself.

The above command will create the file run_1.html with multiple plots and summary statistics. This file can be opened in your web browser, e.g., via the “Open File” menu button in FireFox

Inspect the different plots and statistics

  1. How many reads do you have in total?
  2. What is the median, minimum and maximum read length, what is the N50?
  3. What do the mean quality and the quality distribution of the run look like? Remember, Q10 = 10% error rate

Answers

In addition to read statistics PycoQC also gives a lot of information about the sequencing run and the flowcell itself such as sequencing run, yield over time, number fo active pores etc.

  1. Have a look at the “Basecalled Reads PHRED Quality” and “Read length vs PHRED quality plots”. Is there a link between read length and PHRED score?
  2. Have a look at the “Read Length over Experiment time” plot. Did the read length change over time? What could the reason be?
  3. Given the number of active pores, yield over time, and channel activity over time, do you think this was a successful sequencing run? Why/why not?
  4. Inspect the “output over experiment time” graph. Can you explain the shown curve-pattern? Would you have stopped the run earlier? Think about how the MinION works, especially with regards to adjustment of the applied currents.
  5. If you want to you can generate the PycoQC plots for run_3/sequencing_summary.txt and compare it to run_1. What are the differences?

Answers

One of the strengths of PycoQC is that it is interactive and highly customizable, e.g., plots can be cropped, you can zoom in and out, sub-select areas and export figures. For detailed usage and examples see the [PycoQC documentation](https://a-slide.github.io/pycoQC/)