AlignNFSeq
End-to-end RNA-seq pipeline orchestration from SRA to differential expression
The Challenge
RNA-seq experiments generate enormous volumes of raw sequencing data stored in public repositories like SRA and GEO, but getting from accession numbers to analyzable count matrices requires navigating a complex chain of bioinformatics tools: downloading FASTQs, quality control, adapter trimming, genome alignment, and transcript quantification. Each step demands specialized software, correct parameterization, and substantial compute resources.
Cloud computing solves the resource problem, but introduces its own complexity — configuring GCP Batch executors, managing storage buckets, handling spot instance preemptions, and monitoring distributed jobs across dozens of samples. Researchers shouldn’t need to become cloud infrastructure engineers to analyze their RNA-seq data.
How AlignNFSeq Helps
AlignNFSeq orchestrates the entire upstream RNA-seq workflow through a single interactive Shiny interface. Paste your SRA accession IDs, pick your organism, and launch — the platform handles everything from FASTQ download through alignment and quantification using battle-tested nf-core Nextflow pipelines.
The platform runs nf-core/fetchngs (v1.12.0) to download FASTQs from SRA/ENA/DDBJ, then automatically chains into nf-core/rnaseq (v3.14.0) for STAR alignment and Salmon quantification. All Nextflow parameters are pre-configured with production-proven defaults refined through hundreds of pipeline runs.
A dual-mode interface serves both audiences: wet-lab scientists get a 4-step wizard that abstracts away all complexity, while bioinformaticians get full parameter control, log streaming, and resume/cancel capabilities. Both modes support GCP cloud execution (via Batch API) and local Docker for development or smaller datasets.
Built-in differential expression analysis via limma-voom (powered by AlignRNAseqFlow) completes the pipeline — from accession numbers to DE results and pathway enrichment without leaving the application.
What You Receive
FASTQ Downloads
Raw sequencing data automatically downloaded from SRA/ENA/DDBJ with MD5 validation. Organized samplesheet generated for downstream processing.
Aligned BAMs & Counts
STAR-aligned BAM files and Salmon-quantified gene count matrices (TPM and raw counts). Ready for any downstream analysis tool.
MultiQC Reports
Comprehensive quality control reports aggregating FastQC, STAR alignment statistics, Salmon mapping rates, and sample-level metrics in interactive HTML.
DE & Enrichment Results
Optional differential expression analysis with volcano plots, top gene tables, and ORA pathway enrichment — all from within the same interface.
Methodology & Infrastructure
nf-core/rnaseq
STAR
Salmon
Nextflow
GCP Batch
limma-voom
Docker
AlignNFSeq builds on the nf-core framework — community-curated, peer-reviewed Nextflow pipelines used by thousands of genomics labs worldwide. Every tool runs in a versioned Docker container ensuring complete reproducibility.
Cloud execution uses Google Cloud Batch with production-hardened configurations: e2-highmem-16 machine types, automatic retry on preemption (error codes 50001-50006), and intelligent resource allocation. Problematic QC steps (QualiMap, dupRadar) are automatically skipped based on operational experience from hundreds of pipeline runs.
The R package architecture separates concerns cleanly: processx for non-blocking pipeline execution, glue for config generation, and bslib for the responsive Shiny interface. Pipeline state persists across browser disconnects — Nextflow manages the actual compute, and AlignNFSeq reconnects to running jobs on session restore.
Cost transparency: Per-sample cost estimates (~$0.50 for fetchngs, ~$8.50 for rnaseq on GCP) are displayed before every launch, with confirmation dialogs to prevent accidental cloud spend.
Ideal For
- Researchers reanalyzing public RNA-seq datasets from GEO/SRA without command-line bioinformatics
- Labs processing new sequencing runs through a standardized, reproducible pipeline
- Core facilities needing a consistent interface for RNA-seq processing requests
- Studies requiring STAR+Salmon alignment with nf-core best practices
- Teams wanting cloud-scale processing (GCP Batch) without infrastructure expertise
- Projects that need end-to-end traceability from raw accessions to differential expression
- Any bulk RNA-seq experiment where you have SRA IDs and need count matrices
Start Your Analysis
Ready to analyze your data with AlignNFSeq? Submit your project and we'll scope a plan tailored to your experimental design.