The motivation for these tutorials was to give people without bioinformatics experience an entry-level introduction to sequence analysis, specifically whole-genome sequencing (WGS) using long-read data analysis with focus on Oxford Nanopore data.
As an introduction the tutorials won’t cover advanced tool use or workflows but are intended to give people a starting point and basic understanding of general sequence analysis workflows. This includes the general steps but also raise awareness for potential problems and pitfalls.
The tutorials are organised in the order they would potentially be executed in a common whole-genome sequencing project. However, they don’t have to be executed in a specific order. All tutorials stand alone and can be done without the need to do previous tutorials.
Although the focus of these tutorials is on the analysis of long-read Oxford Nanopore data most steps and tools also apply for other long-read technologies such as PacBio and even Illumina short-reads. All steps and/or tools are marked according to the technologies they can be used for:
- for Oxford Nanopore specific tools/steps
- for general Long-Read technologies
- for steps/tools that can be used for long-read as well as short-read data
Workflow and steps covered
The general workflow of Long-Read and Short-Read whole genome sequence analysis is very similar, especially with regards to quality control and the often iterative assembly process using different assemblers to find the best possible genome assembly. The tutorials on this page cover most steps of the general workflow. However, so far only few post-assembly downstream analyses are covered:
Questions and Answers
Throughout the tutorials you will find questions that are intended to make you think about what exactly you are doing, what to look for and help you to self-check. Answers are provided here but can also be found through the “Answers” link below the questions.
To improve the tutorials, make them better/more user friendly or if you find anny bugs have issues with it please go to the github page and open a new issue/bug report.
The data used for the tutorials is real nanopore data from (several) sequencing runs of the diatom (microalga) Thalassiosira pseudonana. It is not the most perfect data which, from my point of view, makes it perfect for a tutorial.
Additional information about the tutorials, e.g. software list, references and tutorial data, as well as background information about (long-read) sequence analysis can be found in the Appendix section.