Skip to content

logo Twitter Follow Twitter Follow

Generic badge

CaSe-Group

What the Phage: A scalable workflow for the identification and analysis of phage sequences

M. Marquet, M. Hölzer, M. W. Pletz, A. Viehweger, O. Makarewicz, R. Ehricht, C. Brandt

doi: https://doi.org/10.1101/2020.07.24.219899


System Requirements

Components minimum recomended
OS Linux like Linux like
Cores 4 8
Memory 4 GB RAM 8 GB RAM
Storage 75 GB available space 128-256 GB available space

Why so much space? -.-


What the Phage

Phages are among the most abundant and diverse biological entities on earth. Identification from sequence data is a crucial first step to understand their impact on the environment. A variety of bacteriophage identification tools have been developed over the years. They differ in algorithmic approach, results and ease of use. We, therefore, developed “What the Phage” (WtP), an easy-to-use and parallel multitool approach for phage identification combined with an annotation and classification downstream strategy, thus, supporting the user’s decision-making process when the phage identification tools are not in agreement to each other. WtP is reproducible and scales to thousands of datasets through the use of a workflow manager (Nextflow).


Example result output

Example result report


Under the hood

plot

Figure 3: This plot shows a simplified Flowchart of WtP for better understanding of what's going on behind the curtain


Included tools

Identification

Toolname/Gitlink Reference
VirFinder VirFinder: R package for identifying viral sequences from metagenomic data using sequence signatures
PPR-Meta PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning
VirSorter VirSorter: mining viral signal from microbial genomic data
MetaPhinder MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Sets
DeepVirFinder Identifying viruses from metagenomic data by deep learning
Sourmash sourmash: a library for MinHash sketching of DNA
VIBRANT Automated recovery, annotation and curation of microbial viruses, and evaluation of virome function from genomic sequences
VirNet Deep attention model for viral reads identification
Phigaro Phigaro: high throughput prophage sequence annotation
Virsorter2 VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses
Seeker Seeker: alignment-free identification of bacteriophage genomes by deep learning

Annotation & classification

Toolname/Git Reference
prodigal Prodigal: prokaryotic gene recognition and translation initiation site identification
hmmer nhmmer: DNA homology search with profile HMMs
chromomap
CheckV CheckV: assessing the quality of metagenome-assembled viral genomes

Other tools

Toolname/Git Reference
samtools The Sequence Alignment/Map format and SAMtools
seqkit SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
UpSetR UpSetR: an R package for the visualization of intersecting sets and their properties

.