Adapter Removal using PoreChop
Similar to 2nd generation sequencing platforms data from most 3rd generation sequencing platforms contain linker, barcode or adapter sequences at the read beginning or end. Additionally, some sometimes artificial sequences are also included in the middle, e.g., for 2D library preparations of Oxford Nanopore sequencers.
Although ONT’s new basecaller Guppy can trim adapters (if told to) it might be a good idea to check and remove additional/overlooked adapters. Multiple tools exist for adapter trimming and filtering for 2nd generation Illumina reads, e.g. Trimmomatic. In contrast, currently only one tools exists for trimming of Oxford Nanopore adapters: Porechop. Unfortunately, the development of porechop is discontinued and it might not work for new library preparation kits. However, for completeness I still include it in this practical as it may be useful for the older “standard” library preparation kits. Let’s hope that the developer (Ryan Wick) will find someone to continue the development.
Create a directory porechop in the directory ~/course_data/practicals/trimming_practical and use porechop to remove adapters from the (pre-compiled) guppy reads we used in the basecalling tutorial.
mkdir ~/course_data/practicals/trimming_practical/porechop
porechop –i ~/course_data/precompiled/guppy_output/all_guppy.fastq \
-o trimming_practical/porechop/porechopped.fastq --discard_middle
The above command will use the default values of porechop to search for adapters in all fastq files of the input directory, trim the reads and write them to file porechopped.fastq in the created porechop directory. The “–discard_middle” option will remove reads with internal adapters (needed for 2D libraries and downstream us of nanopolish).
- How many adapters did porechop remove?
- Did it discard any reads? Why not?
QC after trimming
Use the tool FastQC to look at your data and compare it to the un-chopped data (see the QC_practical for an introduction to FastQC).
- Open FastQC by typing fastqc on the command line
- Use the File->Open menu in FastQC to first open the raw (concatenated) guppy fastq file
- When FastQC is done with the first file open the chopped fastq file
- Inspect the different graphs. Did the porechop step improve the data? If so, which parts were improved?
- Are there still areas of the sequences that you’d like to improve?