Adapter Removal using PoreChop

Similar to 2nd generation sequencing platforms data from most 3rd generation sequencing platforms contain linker, barcode or adapter sequences at the read beginning or end. Additionally, some sometimes artificial sequences are also included in the middle, e.g., for 2D library preparations of Oxford Nanopore sequencers.

Although ONT’s new basecaller Guppy can trim adapters (if told to) it might be a good idea to check and remove additional/overlooked adapters. Multiple tools exist for adapter trimming and filtering for 2nd generation Illumina reads, e.g. Trimmomatic. In contrast, currently only one tools exists for trimming of Oxford Nanopore adapters: Porechop. Unfortunately, the development of porechop is discontinued and it might not work for new library preparation kits. However, for completeness I still include it in this practical as it may be useful for the older “standard” library preparation kits. Let’s hope that the developer (Ryan Wick) will find someone to continue the development.

Despite included in this tutorial cecent comparisons such as the one by [Wick & Holt (2019)](https://f1000research.com/articles/8-2138) indicate that adapter removal is not necessary as they have no impact on current assemblers.

Create a directory porechop in the directory ~/course_data/practicals/trimming_practical and use porechop to remove adapters from the (pre-compiled) guppy reads we used in the basecalling tutorial.

mkdir ~/course_data/practicals/trimming_practical/porechop

porechop –i ~/course_data/precompiled/guppy_output/all_guppy.fastq \
-o trimming_practical/porechop/porechopped.fastq --discard_middle

The “\” at the end of each line is only for convenience to write a long command into several lines. It tells the command-line that all lines still belong together although the are separated by “enter” keys. However, if you type all of the command, i.e., paths etc, in one line don’t’ use the backslash at the end of the lines.

The above command will use the default values of porechop to search for adapters in all fastq files of the input directory, trim the reads and write them to file porechopped.fastq in the created porechop directory. The “–discard_middle” option will remove reads with internal adapters (needed for 2D libraries and downstream us of nanopolish).

How many adapters did porechop remove?
Did it discard any reads? Why not?

Answers

QC after trimming

Use the tool FastQC to look at your data and compare it to the un-chopped data (see the QC_practical for an introduction to FastQC).

Open FastQC by typing fastqc on the command line
Use the File->Open menu in FastQC to first open the raw (concatenated) guppy fastq file
When FastQC is done with the first file open the chopped fastq file

Inspect the different graphs. Did the porechop step improve the data? If so, which parts were improved?
Are there still areas of the sequences that you’d like to improve?

Answers