12.1 Commercial vs. Open Source/Public Health Tools

Public health agencies often operate with constrained budgets while managing sensitive health data. This chapter compares commercial enterprise tools with open source and public health-specific alternatives, helping you select the right tools for your context.

12.1.1 Selection Criteria

When evaluating tools, consider:

Criterion Commercial Advantage OSS/PH Advantage
Cost Predictable licensing No license fees
Support Vendor SLAs Community + self-reliance
Features Polished, integrated Customizable, extensible
Compliance Often pre-certified Full control over data
Data Sovereignty Vendor-managed Organization-controlled
Sustainability Vendor roadmap Community-driven

12.1.2 Tool Categories

12.1.2.1 Project Management

Capability Commercial Options OSS/PH Options
Full PM Suite Jira, Azure DevOps, MS Project OpenProject, Redmine, Taiga
Kanban Boards Trello (paid), Monday.com Trello (free tier), Wekan, Kanboard
Agile Planning Jira, Rally, VersionOne Taiga, OpenProject
Grant Management Smartsheet, Asana OpenProject with custom fields

Recommendation for Public Health:

  • Small teams (<10): Trello free tier or Taiga for simple Kanban/Scrum
  • Larger programs: OpenProject for full PM capabilities with data sovereignty
  • CDC/Federal projects: Often require Azure DevOps or Jira per contract
TipWhen to Choose OSS

Choose open source when:

  • Budget is constrained
  • Data sovereignty is critical (cannot store project data externally)
  • Technical staff can support installation and maintenance
  • Customization is needed beyond commercial options

12.1.2.2 Requirements and Documentation

Capability Commercial Options OSS/PH Options
Wiki/Docs Confluence, SharePoint BookStack, MediaWiki, GitHub Wiki
Requirements Management Jama, Helix RM, DOORS GitHub Issues, GitLab, Notion (free)
Collaborative Editing MS 365, Google Workspace Nextcloud, CryptPad, HedgeDoc

Recommendation for Public Health:

  • Documentation: BookStack provides Confluence-like experience without licensing
  • Requirements: GitHub Issues sufficient for most projects; integrates with development
  • Collaboration: Consider data sensitivity; Nextcloud for on-premise control

12.1.2.3 Diagramming

Capability Commercial Options OSS/PH Options
General Diagramming Visio, Lucidchart diagrams.net (draw.io), Mermaid
Process Modeling (BPMN) Visio, Bizagi diagrams.net, Camunda Modeler
Architecture Lucidchart, Visio diagrams.net, PlantUML

Recommendation for Public Health:

  • diagrams.net is the de facto standard in public sector: free, web-based, exports to multiple formats, works offline
  • Mermaid for diagrams in documentation (renders from text, version-controllable)

12.1.2.4 Data Collection

Capability Commercial Options OSS/PH Options
Surveys Qualtrics, SurveyMonkey LimeSurvey, KoBoToolbox
Clinical/Research Data REDCap (free for research) REDCap, ODK, DHIS2
Forms Microsoft Forms, Google Forms KoBoToolbox, ODK Collect
Case Management Salesforce DHIS2, CommCare

REDCap: The Public Health Standard

REDCap (Research Electronic Data Capture) deserves special mention:

  • Free for non-profit research institutions
  • HIPAA-compliant, 21 CFR Part 11 capable
  • Supports complex branching logic, validation
  • Built-in audit trails
  • Consortium of 6,000+ institutions
  • CDC and NIH approved
NoteCancerSurv Example

For the CancerSurv project, data collection tools include:

  • REDCap: Pilot site feedback surveys, user satisfaction assessments
  • KoBoToolbox: Field data collection for mobile cancer screening events
  • Native CancerSurv: Case abstraction (built into the platform)

12.1.2.5 Data Analysis

Capability Commercial Options OSS/PH Options
Statistical Analysis SAS, SPSS, Stata R, Python (pandas, scipy)
Epidemiological Analysis SAS, Stata R (epitools), Epi Info
Data Wrangling Alteryx, Trifacta R (tidyverse), Python (pandas)
Notebooks Databricks, SAS Studio Jupyter, RStudio, Quarto

Epi Info: CDC’s Free Epidemiology Tool

Epi Info is developed by CDC specifically for outbreak investigation:

  • Free download, no installation fees
  • Built-in epidemiological statistics (odds ratios, relative risks)
  • Epidemic curve generation
  • Geographic mapping
  • Survey development and analysis
  • 7-day moving averages, case fatality rates

R for Public Health

R has become the standard for public health analytics:

# Example: Calculate age-adjusted incidence rate
library(epitools)
library(tidyverse)

cancer_data %>%
  group_by(county, year) %>%
  summarize(
    cases = n(),
    population = first(population),
    crude_rate = cases / population * 100000
  ) %>%
  # Age adjustment using standard population
  ageadjust.direct(count = cases, pop = population, stdpop = us_std_pop)

12.1.2.6 Data Platform Architecture

Modern public health data systems benefit from structured data architectures that organize information from raw ingestion through analytics-ready outputs. The medallion architecture (Bronze → Silver → Gold) provides a framework for designing scalable, maintainable data platforms.

While often discussed in cloud contexts, medallion architecture works equally well on desktop computers and local servers. The key is the logical separation of data by refinement stage, not the specific technology.

WarningCommunication Tip

“Medallion architecture” and “Bronze/Silver/Gold” are IT jargon unfamiliar to most public health professionals. When discussing data workflows with epidemiologists or program staff, use terms like “raw data,” “cleaned data,” and “final reports” instead. See the Terminology Dictionary for a complete translation guide.

Commercial vs. Open Source Data Platforms

Capability Commercial Options OSS/PH Options
Data Lake / Lakehouse Databricks, Snowflake, Azure Synapse Apache Spark + Delta Lake, Apache Iceberg, DuckDB
ETL/Orchestration Azure Data Factory, Informatica, Talend Apache Airflow, Dagster, Prefect, dbt
Data Catalog Alation, Collibra Apache Atlas, DataHub, Amundsen
Data Quality Informatica DQ, Talend Great Expectations, dbt tests, Soda

Implementing Medallion Architecture

The medallion architecture can be implemented with various tool combinations, from enterprise cloud platforms to desktop applications:

Layer Purpose Cloud/Server Options Desktop/Local Options
Bronze Raw data landing, preserve source fidelity Object storage (S3, Azure Blob), PostgreSQL staging tables File folders, SQLite database, Excel “Raw Data” sheets
Silver Cleansing, standardization, deduplication dbt transformations, Apache Spark, Python/pandas Excel Power Query, Python scripts, Access queries
Gold Analytics-ready datasets, aggregations Dimensional models, materialized views, OLAP cubes Pivot tables, final Excel reports, exported CSVs for tools
TipStarting Small

You don’t need Databricks or Snowflake to implement medallion architecture. Even a well-organized folder structure with clear naming conventions implements the same principle:

project/
├── 01_bronze/          # Raw files as received
│   ├── lab_results_2024-01-15.csv
│   └── ehr_export_raw.xlsx
├── 02_silver/          # Cleaned and standardized
│   ├── cases_cleaned.csv
│   └── patients_deduplicated.xlsx
└── 03_gold/            # Ready for analysis/reporting
    ├── outbreak_line_list.xlsx
    └── monthly_summary_report.xlsx

Many state health departments successfully run medallion architectures on modest infrastructure, including single PostgreSQL databases with three schemas or even organized Excel workbooks.

Open Source Lakehouse Stack for Public Health

For organizations seeking data sovereignty and cost control:

Component Tool Notes
Storage MinIO or local filesystem S3-compatible object storage
Table format Delta Lake or Apache Iceberg ACID transactions, time travel
Compute Apache Spark or DuckDB DuckDB for smaller workloads
Orchestration Apache Airflow Workflow scheduling
Transformation dbt SQL-based transformations
Quality Great Expectations Data validation
Catalog DataHub Metadata management

Data Architecture for Different Scales

Organization Size Recommended Approach Key Tools Typical Staffing
Individual analyst Organized folders with naming conventions Excel, Python/R scripts, SQLite Single epidemiologist or data manager handles all layers
Small program Single PostgreSQL database with layered schemas PostgreSQL, dbt, Python 1-2 staff share responsibilities across layers
Medium health department Data warehouse with ETL pipeline PostgreSQL/Snowflake, Airflow, dbt Dedicated data team with some role specialization
Large state/federal Full lakehouse architecture Spark/Databricks, Delta Lake, Airflow, dbt Specialized roles: data engineers (Bronze/Silver), analysts (Silver/Gold), BI developers (Gold)
NoteCancerSurv Example

CancerSurv implements a medallion architecture using open source tools:

Layer Implementation Gold Layer Outputs
Bronze PostgreSQL raw schema; HL7 messages stored as JSON; CSV uploads preserved verbatim
Silver PostgreSQL staging schema; dbt models for deduplication and ICD-O-3 standardization
Gold PostgreSQL analytics schema; pre-computed incidence rates, survival metrics, NPCR submission tables Line lists for case follow-up, incidence reports, survival dashboards
Orchestration Apache Airflow schedules daily Bronze→Silver→Gold pipeline
Quality Great Expectations validates data at Silver layer before promotion to Gold

12.1.2.7 Visualization

Capability Commercial Options OSS/PH Options
Dashboards Tableau, Power BI R Shiny, Dash (Python), Apache Superset
Static Visualization Tableau, Excel R (ggplot2), Python (matplotlib, plotly)
Interactive Charts Tableau, Power BI Plotly, Highcharts (free for non-commercial)

R Shiny for Public Health Dashboards

R Shiny enables interactive dashboards without JavaScript expertise:

  • Free and open source
  • Integrates with R analysis pipelines
  • Can be deployed on-premise or Shinyapps.io
  • Many public health templates available

12.1.2.8 GIS and Mapping

Capability Commercial Options OSS/PH Options
Desktop GIS ArcGIS Pro QGIS
Web Mapping ArcGIS Online, Mapbox Leaflet, OpenLayers
Spatial Analysis ArcGIS, ESRI QGIS, R (sf package), PostGIS
Geocoding Google, ESRI Nominatim, US Census Geocoder

QGIS for Disease Mapping

QGIS is essential for spatial epidemiology:

  • Free, cross-platform
  • Full-featured GIS capabilities
  • Disease mapping and cluster detection
  • Integrates with R for spatial statistics
  • Active public health user community
NoteCancerSurv Example

CancerSurv analytics stack:

Function Tool Rationale
Case data storage PostgreSQL Open source, HIPAA-capable
ETL/Data pipeline Apache Airflow Orchestration of data flows
Statistical analysis R (tidyverse, survival) Standard for cancer epidemiology
Dashboards R Shiny Interactive, deployable on-premise
Geographic mapping QGIS + Leaflet Cancer cluster visualization
Ad-hoc queries Apache Superset Self-service for epidemiologists

12.1.2.9 Data Standards and Interoperability

Standard Commercial Tools OSS Tools
HL7 FHIR Rhapsody, Corepoint HAPI FHIR, LinuxForHealth
HL7 v2.x Rhapsody, Mirth Mirth Connect (open source), HAPI
CDA/C-CDA Various EHR vendors MDHT, Reference CDA

Mirth Connect

Mirth Connect is widely used in public health for health information exchange:

  • Open source (NextGen Healthcare)
  • HL7 v2, FHIR, CDA support
  • Visual interface builder
  • Used by many state health departments

12.1.3 Building Your Stack

12.1.3.1 Small Public Health Program

Function Recommended Tool Notes
Project Management Trello or Taiga Free tier sufficient
Documentation GitHub Wiki or BookStack Version-controlled
Diagramming diagrams.net Free, export to any format
Data Collection REDCap Standard for research
Analysis R + RStudio Free, extensive packages
Visualization R Shiny or Excel Depends on technical capacity

12.1.3.2 Large State Health Department

Function Recommended Tool Notes
Project Management Azure DevOps or OpenProject Enterprise scale
Documentation Confluence or BookStack Team collaboration
Requirements Jira or GitHub Integrated with development
Data Collection REDCap + DHIS2 Research + program monitoring
Data Platform PostgreSQL + Airflow Scalable, HIPAA-capable
Analysis R + Python Comprehensive capabilities
Visualization R Shiny + Superset Dashboards + self-service
GIS QGIS + PostGIS Full spatial capabilities
Integration Mirth Connect HL7/FHIR integration

12.1.4 Considerations for Tool Selection

12.1.4.1 Total Cost of Ownership

Free software is not always cheaper:

Cost Factor Commercial Open Source
License fees Yes No
Implementation Vendor/partner Internal/consultant
Training Often included Self-directed or purchased
Support Included in license Community or purchased
Customization Limited Unlimited but costly
Infrastructure Cloud included or on-prem You manage

12.1.4.2 Compliance and Security

Consideration Commercial Open Source
HIPAA compliance Often certified Your responsibility to configure
SOC 2 certification Common Rare; your responsibility
Security updates Vendor manages You monitor and apply
Audit trails Built-in May require configuration

12.1.4.3 Sustainability

Consider long-term viability:

  • Commercial: Vendor may be acquired, change pricing, sunset product
  • Open Source: Community may lose momentum; check activity levels
  • Hybrid: Consider tools with both commercial and open source options

12.1.5 Summary

The choice between commercial and open source tools depends on your context: budget, technical capacity, data sensitivity, and compliance requirements. Public health has excellent open source options, particularly for data collection (REDCap), analysis (R), mapping (QGIS), and integration (Mirth Connect). Evaluate total cost of ownership, not just license fees.