nf-core/proteinfold
Edit

Protein 3D structure prediction pipeline

alphafold2colabfoldesmfoldprotein-fold-predictionprotein-foldingprotein-sequencesprotein-structure

This is the development version of the pipeline.

Launch development version https://github.com/nf-core/proteinfold

Introduction

This document describes the user-facing output produced by the pipeline.

Pipeline overview

The pipeline is built using Nextflow and predicts protein structures using the following methods:

See main README.md for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.

The directories listed below will be created in the output directory after the pipeline has finished. All paths are relative to the top-level results directory.

Exact subdirectories depend on the selected mode(s). In a multi-mode run (for example alphafold2,boltz,rosettafold_all_atom) you will typically see top-level directories such as alphafold2/, boltz/, rosettafold_all_atom/, multiqc/, reports/, compare/, and pipeline_info/.

Prediction outputs (all modes)

User-facing outputs are largely consistent across modes.

Common output patterns

<MODE>/top_ranked_structures/<SEQUENCE NAME>.pdb
<MODE>/<SEQUENCE NAME>/<SEQUENCE NAME>_plddt.tsv
<MODE>/<SEQUENCE NAME>/paes/<SEQUENCE NAME>_<RANK>_pae.tsv (when available)
<MODE>/<SEQUENCE NAME>/<SEQUENCE NAME>_*msa.tsv (mode-specific MSA summary)
<MODE>/<SEQUENCE NAME>/<SEQUENCE NAME>_{ptm,iptm}.tsv and chainwise summaries (where applicable)

Canonical .tsv metric formats (including pLDDT, MSA, (i)pTM, chain-wise (i)pTM and PAE) are defined in the contributor documentation: Processable structure prediction metrics.

Example report plots

The report exports include key visualisations such as sequence coverage, predicted Local Distance Difference Test (pLDDT), and Predicted Aligned Error (PAE).

Sequence coverage

predicted Local Distance Difference Test (pLDDT)

pLDDT

Predicted Aligned Error (PAE)

PAE

Per-mode reports and comparisons

Output files

reports/
- <SEQUENCE NAME>_<MODE>_report.html (single-mode report per sequence/mode)
compare/
- <SEQUENCE NAME>_comparison_report.html (present when running multiple modes)

MultiQC report

Output files

multiqc
- *_multiqc_report.html: Standalone HTML report(s) that can be viewed in your web browser.
- *_multiqc_report_data/: Parsed report data for each corresponding MultiQC report.

MultiQC is a visualisation tool that generates HTML report(s) summarising samples in your project. Most QC results are visualised in the report and further statistics are available within each corresponding *_multiqc_report_data/ directory.

Results generated by MultiQC collate QC metrics from the selected structure-prediction mode(s), and the software versions for traceability. For more information about how to use MultiQC reports, see http://multiqc.info.

Pipeline information

Output files

pipeline_info/
- Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
- Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter’s are used when running the pipeline.
- Reformatted samplesheet files used as input to the pipeline: samplesheet.valid.csv.
- Parameters used by the pipeline run: params.json.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

Additional intermediate outputs

Depending on the selected mode(s) and options, additional top-level directories may be present, for example:

fasta2yaml/ (for YAML conversion inputs/outputs)
mmseqs/results/ (for MMseqs2 outputs such as .a3m files)
split/output_msa/ (for split-MSA intermediate CSV outputs)

`--save_intermediates`

If --save_intermediates is enabled, extra raw intermediate files are published in mode-specific raw/ directories.

Examples include:

alphafold2/<MODE>/<SEQUENCE NAME>/raw/
colabfold/<SEQUENCE NAME>/raw/
boltz/<SEQUENCE NAME>/boltz_results_*/
rosettafold_all_atom/<SEQUENCE NAME>/raw/
alphafold3/<SEQUENCE NAME>/raw/
helixfold3/<SEQUENCE NAME>/raw/
rosettafold2na/<SEQUENCE NAME>/raw/

These raw outputs are intended for advanced debugging, reproducibility and method-specific downstream analyses. For detailed, canonical tool-specific native output specifications, see:

On this page

nf-core/proteinfold Edit

Introduction

Pipeline overview

Prediction outputs (all modes)

Example report plots

Sequence coverage

predicted Local Distance Difference Test (pLDDT)

Predicted Aligned Error (PAE)

Per-mode reports and comparisons

MultiQC report

Pipeline information

Additional intermediate outputs

--save_intermediates

nf-core/proteinfold
Edit

`--save_intermediates`