# nf-artist: Multiplexed Tissue Imaging Visualization Pipeline

# About

nf-artist is a Nextflow pipeline developed to automate the generation of interactive visualizations from high-dimensional tissue imaging data. Originally built for the Human Tumor Atlas Network (HTAN) Data Portal, this tool transforms complex microscopy datasets into accessible, web-based explorations that enable researchers to investigate tumor microenvironments without specialized software.

Multiplexed tissue imaging generates massive, multi-channel datasets that are difficult to visualize and share. Researchers working with technologies like Cyclic Immunofluorescence (CyCIF) produce images with 40+ protein markers per tissue section, stored in specialized formats (SVS, CZI, OME-TIFF) that require expert software to view. This creates barriers to data sharing, collaboration, and reproducibility in cancer research.

The pipeline addresses this by orchestrating four key processing stages:

  • Format Standardization: Automated conversion of proprietary bioformats to OME-TIFF standard, preserving full image pyramid structure for multi-resolution access

  • Interactive Story Generation: Integration with the Minerva ecosystem to create self-contained HTML visualizations with automatic channel thresholding and tiled image pyramids for efficient web streaming

  • Thumbnail Generation: Dimensionality reduction (UMAP/t-SNE/PCA) to compress 40+ channels into intuitive RGB thumbnails with configurable colormaps optimized for perceptual uniformity

  • Batch Processing at Scale: CSV-driven sample sheet input for processing hundreds of images with parallel execution and configurable resource allocation

The pipeline supports deployment on local Docker environments, AWS Batch for production workloads, and Nextflow Tower for interactive monitoring. All dependencies are version-pinned in multi-stage Docker builds, ensuring reproducibility across environments and over time.

# My Role

I designed and implemented nf-artist from the ground up, making key architectural decisions around modularity, reproducibility, and scalability. My contributions included:

  • Designing the modular subworkflow architecture that separates conversion, visualization, and thumbnail generation into independent, testable components

  • Implementing conditional processing logic that intelligently routes images through appropriate pipelines based on metadata flags, handling both multiplexed fluorescence and H&E brightfield images through a unified interface

  • Building the CI/CD infrastructure with three GitHub Actions workflows: automated Docker builds with semantic versioning, integration testing against representative datasets, and weekly security scanning with Trivy

  • Configuring AWS Batch deployment profiles with graduated retry logic and resource scaling for fault-tolerant cloud execution

  • Integrating four external specialized tools (Miniature, Auto-Minerva, Minerva-Author, bioformats2raw) into a cohesive pipeline with version-pinned dependencies

The pipeline now supports the HTAN consortium's mission to make tumor atlas data publicly accessible, reducing the time from data acquisition to shareable visualization from days to hours.

# Resources