Error Correction using Pilon
Another popular error correction tool is Pilon. As medaka, Pilon can be run after Racon to further improve assembly quality by correcting Insertion/Deletion (Indel) errors as well as Single Nucleotide Polymorphisms (SNPs).
To correct errors Pilon uses two common bioinformatic packages/tools:
- BWA (short for Burrows-Wheeler Aligner) for read mapping
- Samtools, for mapping file convertion and sorting
The Pilon workflow consists of 4 steps
- Map your reads (PacBio or Oxford Nanopore) back to your (consensus) assembly using BWA. This will create a mapping file in SAM format
- Convert your SAM file to the binary BAM format using samtools
- Sort and index the BAM file for faster access using samtools
- Run Pilon using the genome assembly and the sorted and indexed bam file
First, change into the pilon directory in your error_correction_practical folder. For convenience the miniasm assembly as well as the guppy-basecalled reads are already in that directory.
To map your reads to the assembly using BWA you will first have to index the assembly file (for faster access for BWA) and then run bwa mem for read mapping
# index the genome Bwa index miniasm.racon_consensus.fasta # map reads bwa mem –t 2 miniasm.racon_consensus.fasta –x ont2d \ all_guppy.fastq > bwa_mapping.sam
The above statement will run bwa-mem with 2 processors (-t 2) using oxford nanopore reads (-x ont2d) and redirect the output into the output file bwa_mapping.sam
After mapping the reads we have to do some true bioinformatics: file conversion! First, convert the sam file to bam using “samtools view” (-bS means the input is sam and output is bam), then sort it with “samtools sort” and finally index the bam file using “samtools index”
samtools view -Sb bwa_mapping.sam > bwa_mapping.bam samtools sort –o bwa_mapping.sorted.bam bwa_mapping.bam samtools index bwa_mapping.sorted.bam
Now run Pilon using the assembly and sorted bam file
pilon --genome miniasm.consensus_sequence.fasta \ --bam bwa_mapping.sorted.bam --threads 2
The above command will run pilon with 2 threads and use the mapped read information for polishing of the miniasm assembly.
In the pilon directory you will find the corrected assembly in file pilon.fasta. As before, use dnadiff and check the assembly report.