12.1 Commercial vs. Open Source/Public Health Tools
Public health agencies often operate with constrained budgets while managing sensitive health data. This chapter compares commercial enterprise tools with open source and public health-specific alternatives, helping you select the right tools for your context.
12.1.1 Selection Criteria
When evaluating tools, consider:
| Criterion | Commercial Advantage | OSS/PH Advantage |
|---|---|---|
| Cost | Predictable licensing | No license fees |
| Support | Vendor SLAs | Community + self-reliance |
| Features | Polished, integrated | Customizable, extensible |
| Compliance | Often pre-certified | Full control over data |
| Data Sovereignty | Vendor-managed | Organization-controlled |
| Sustainability | Vendor roadmap | Community-driven |
12.1.2 Tool Categories
12.1.2.1 Project Management
| Capability | Commercial Options | OSS/PH Options |
|---|---|---|
| Full PM Suite | Jira, Azure DevOps, MS Project | OpenProject, Redmine, Taiga |
| Kanban Boards | Trello (paid), Monday.com | Trello (free tier), Wekan, Kanboard |
| Agile Planning | Jira, Rally, VersionOne | Taiga, OpenProject |
| Grant Management | Smartsheet, Asana | OpenProject with custom fields |
Recommendation for Public Health:
- Small teams (<10): Trello free tier or Taiga for simple Kanban/Scrum
- Larger programs: OpenProject for full PM capabilities with data sovereignty
- CDC/Federal projects: Often require Azure DevOps or Jira per contract
Choose open source when:
- Budget is constrained
- Data sovereignty is critical (cannot store project data externally)
- Technical staff can support installation and maintenance
- Customization is needed beyond commercial options
12.1.2.2 Requirements and Documentation
| Capability | Commercial Options | OSS/PH Options |
|---|---|---|
| Wiki/Docs | Confluence, SharePoint | BookStack, MediaWiki, GitHub Wiki |
| Requirements Management | Jama, Helix RM, DOORS | GitHub Issues, GitLab, Notion (free) |
| Collaborative Editing | MS 365, Google Workspace | Nextcloud, CryptPad, HedgeDoc |
Recommendation for Public Health:
- Documentation: BookStack provides Confluence-like experience without licensing
- Requirements: GitHub Issues sufficient for most projects; integrates with development
- Collaboration: Consider data sensitivity; Nextcloud for on-premise control
12.1.2.3 Diagramming
| Capability | Commercial Options | OSS/PH Options |
|---|---|---|
| General Diagramming | Visio, Lucidchart | diagrams.net (draw.io), Mermaid |
| Process Modeling (BPMN) | Visio, Bizagi | diagrams.net, Camunda Modeler |
| Architecture | Lucidchart, Visio | diagrams.net, PlantUML |
Recommendation for Public Health:
- diagrams.net is the de facto standard in public sector: free, web-based, exports to multiple formats, works offline
- Mermaid for diagrams in documentation (renders from text, version-controllable)
12.1.2.4 Data Collection
| Capability | Commercial Options | OSS/PH Options |
|---|---|---|
| Surveys | Qualtrics, SurveyMonkey | LimeSurvey, KoBoToolbox |
| Clinical/Research Data | REDCap (free for research) | REDCap, ODK, DHIS2 |
| Forms | Microsoft Forms, Google Forms | KoBoToolbox, ODK Collect |
| Case Management | Salesforce | DHIS2, CommCare |
REDCap: The Public Health Standard
REDCap (Research Electronic Data Capture) deserves special mention:
- Free for non-profit research institutions
- HIPAA-compliant, 21 CFR Part 11 capable
- Supports complex branching logic, validation
- Built-in audit trails
- Consortium of 6,000+ institutions
- CDC and NIH approved
For the CancerSurv project, data collection tools include:
- REDCap: Pilot site feedback surveys, user satisfaction assessments
- KoBoToolbox: Field data collection for mobile cancer screening events
- Native CancerSurv: Case abstraction (built into the platform)
12.1.2.5 Data Analysis
| Capability | Commercial Options | OSS/PH Options |
|---|---|---|
| Statistical Analysis | SAS, SPSS, Stata | R, Python (pandas, scipy) |
| Epidemiological Analysis | SAS, Stata | R (epitools), Epi Info |
| Data Wrangling | Alteryx, Trifacta | R (tidyverse), Python (pandas) |
| Notebooks | Databricks, SAS Studio | Jupyter, RStudio, Quarto |
Epi Info: CDC’s Free Epidemiology Tool
Epi Info is developed by CDC specifically for outbreak investigation:
- Free download, no installation fees
- Built-in epidemiological statistics (odds ratios, relative risks)
- Epidemic curve generation
- Geographic mapping
- Survey development and analysis
- 7-day moving averages, case fatality rates
R for Public Health
R has become the standard for public health analytics:
# Example: Calculate age-adjusted incidence rate
library(epitools)
library(tidyverse)
cancer_data %>%
group_by(county, year) %>%
summarize(
cases = n(),
population = first(population),
crude_rate = cases / population * 100000
) %>%
# Age adjustment using standard population
ageadjust.direct(count = cases, pop = population, stdpop = us_std_pop)12.1.2.6 Data Platform Architecture
Modern public health data systems benefit from structured data architectures that organize information from raw ingestion through analytics-ready outputs. The medallion architecture (Bronze → Silver → Gold) provides a framework for designing scalable, maintainable data platforms.
While often discussed in cloud contexts, medallion architecture works equally well on desktop computers and local servers. The key is the logical separation of data by refinement stage, not the specific technology.
“Medallion architecture” and “Bronze/Silver/Gold” are IT jargon unfamiliar to most public health professionals. When discussing data workflows with epidemiologists or program staff, use terms like “raw data,” “cleaned data,” and “final reports” instead. See the Terminology Dictionary for a complete translation guide.
Commercial vs. Open Source Data Platforms
| Capability | Commercial Options | OSS/PH Options |
|---|---|---|
| Data Lake / Lakehouse | Databricks, Snowflake, Azure Synapse | Apache Spark + Delta Lake, Apache Iceberg, DuckDB |
| ETL/Orchestration | Azure Data Factory, Informatica, Talend | Apache Airflow, Dagster, Prefect, dbt |
| Data Catalog | Alation, Collibra | Apache Atlas, DataHub, Amundsen |
| Data Quality | Informatica DQ, Talend | Great Expectations, dbt tests, Soda |
Implementing Medallion Architecture
The medallion architecture can be implemented with various tool combinations, from enterprise cloud platforms to desktop applications:
| Layer | Purpose | Cloud/Server Options | Desktop/Local Options |
|---|---|---|---|
| Bronze | Raw data landing, preserve source fidelity | Object storage (S3, Azure Blob), PostgreSQL staging tables | File folders, SQLite database, Excel “Raw Data” sheets |
| Silver | Cleansing, standardization, deduplication | dbt transformations, Apache Spark, Python/pandas | Excel Power Query, Python scripts, Access queries |
| Gold | Analytics-ready datasets, aggregations | Dimensional models, materialized views, OLAP cubes | Pivot tables, final Excel reports, exported CSVs for tools |
You don’t need Databricks or Snowflake to implement medallion architecture. Even a well-organized folder structure with clear naming conventions implements the same principle:
project/
├── 01_bronze/ # Raw files as received
│ ├── lab_results_2024-01-15.csv
│ └── ehr_export_raw.xlsx
├── 02_silver/ # Cleaned and standardized
│ ├── cases_cleaned.csv
│ └── patients_deduplicated.xlsx
└── 03_gold/ # Ready for analysis/reporting
├── outbreak_line_list.xlsx
└── monthly_summary_report.xlsx
Many state health departments successfully run medallion architectures on modest infrastructure, including single PostgreSQL databases with three schemas or even organized Excel workbooks.
Open Source Lakehouse Stack for Public Health
For organizations seeking data sovereignty and cost control:
| Component | Tool | Notes |
|---|---|---|
| Storage | MinIO or local filesystem | S3-compatible object storage |
| Table format | Delta Lake or Apache Iceberg | ACID transactions, time travel |
| Compute | Apache Spark or DuckDB | DuckDB for smaller workloads |
| Orchestration | Apache Airflow | Workflow scheduling |
| Transformation | dbt | SQL-based transformations |
| Quality | Great Expectations | Data validation |
| Catalog | DataHub | Metadata management |
Data Architecture for Different Scales
| Organization Size | Recommended Approach | Key Tools | Typical Staffing |
|---|---|---|---|
| Individual analyst | Organized folders with naming conventions | Excel, Python/R scripts, SQLite | Single epidemiologist or data manager handles all layers |
| Small program | Single PostgreSQL database with layered schemas | PostgreSQL, dbt, Python | 1-2 staff share responsibilities across layers |
| Medium health department | Data warehouse with ETL pipeline | PostgreSQL/Snowflake, Airflow, dbt | Dedicated data team with some role specialization |
| Large state/federal | Full lakehouse architecture | Spark/Databricks, Delta Lake, Airflow, dbt | Specialized roles: data engineers (Bronze/Silver), analysts (Silver/Gold), BI developers (Gold) |
CancerSurv implements a medallion architecture using open source tools:
| Layer | Implementation | Gold Layer Outputs |
|---|---|---|
| Bronze | PostgreSQL raw schema; HL7 messages stored as JSON; CSV uploads preserved verbatim |
— |
| Silver | PostgreSQL staging schema; dbt models for deduplication and ICD-O-3 standardization |
— |
| Gold | PostgreSQL analytics schema; pre-computed incidence rates, survival metrics, NPCR submission tables |
Line lists for case follow-up, incidence reports, survival dashboards |
| Orchestration | Apache Airflow schedules daily Bronze→Silver→Gold pipeline | — |
| Quality | Great Expectations validates data at Silver layer before promotion to Gold | — |
12.1.2.7 Visualization
| Capability | Commercial Options | OSS/PH Options |
|---|---|---|
| Dashboards | Tableau, Power BI | R Shiny, Dash (Python), Apache Superset |
| Static Visualization | Tableau, Excel | R (ggplot2), Python (matplotlib, plotly) |
| Interactive Charts | Tableau, Power BI | Plotly, Highcharts (free for non-commercial) |
R Shiny for Public Health Dashboards
R Shiny enables interactive dashboards without JavaScript expertise:
- Free and open source
- Integrates with R analysis pipelines
- Can be deployed on-premise or Shinyapps.io
- Many public health templates available
12.1.2.8 GIS and Mapping
| Capability | Commercial Options | OSS/PH Options |
|---|---|---|
| Desktop GIS | ArcGIS Pro | QGIS |
| Web Mapping | ArcGIS Online, Mapbox | Leaflet, OpenLayers |
| Spatial Analysis | ArcGIS, ESRI | QGIS, R (sf package), PostGIS |
| Geocoding | Google, ESRI | Nominatim, US Census Geocoder |
QGIS for Disease Mapping
QGIS is essential for spatial epidemiology:
- Free, cross-platform
- Full-featured GIS capabilities
- Disease mapping and cluster detection
- Integrates with R for spatial statistics
- Active public health user community
CancerSurv analytics stack:
| Function | Tool | Rationale |
|---|---|---|
| Case data storage | PostgreSQL | Open source, HIPAA-capable |
| ETL/Data pipeline | Apache Airflow | Orchestration of data flows |
| Statistical analysis | R (tidyverse, survival) | Standard for cancer epidemiology |
| Dashboards | R Shiny | Interactive, deployable on-premise |
| Geographic mapping | QGIS + Leaflet | Cancer cluster visualization |
| Ad-hoc queries | Apache Superset | Self-service for epidemiologists |
12.1.2.9 Data Standards and Interoperability
| Standard | Commercial Tools | OSS Tools |
|---|---|---|
| HL7 FHIR | Rhapsody, Corepoint | HAPI FHIR, LinuxForHealth |
| HL7 v2.x | Rhapsody, Mirth | Mirth Connect (open source), HAPI |
| CDA/C-CDA | Various EHR vendors | MDHT, Reference CDA |
Mirth Connect
Mirth Connect is widely used in public health for health information exchange:
- Open source (NextGen Healthcare)
- HL7 v2, FHIR, CDA support
- Visual interface builder
- Used by many state health departments
12.1.3 Building Your Stack
12.1.3.1 Small Public Health Program
| Function | Recommended Tool | Notes |
|---|---|---|
| Project Management | Trello or Taiga | Free tier sufficient |
| Documentation | GitHub Wiki or BookStack | Version-controlled |
| Diagramming | diagrams.net | Free, export to any format |
| Data Collection | REDCap | Standard for research |
| Analysis | R + RStudio | Free, extensive packages |
| Visualization | R Shiny or Excel | Depends on technical capacity |
12.1.3.2 Large State Health Department
| Function | Recommended Tool | Notes |
|---|---|---|
| Project Management | Azure DevOps or OpenProject | Enterprise scale |
| Documentation | Confluence or BookStack | Team collaboration |
| Requirements | Jira or GitHub | Integrated with development |
| Data Collection | REDCap + DHIS2 | Research + program monitoring |
| Data Platform | PostgreSQL + Airflow | Scalable, HIPAA-capable |
| Analysis | R + Python | Comprehensive capabilities |
| Visualization | R Shiny + Superset | Dashboards + self-service |
| GIS | QGIS + PostGIS | Full spatial capabilities |
| Integration | Mirth Connect | HL7/FHIR integration |
12.1.4 Considerations for Tool Selection
12.1.4.1 Total Cost of Ownership
Free software is not always cheaper:
| Cost Factor | Commercial | Open Source |
|---|---|---|
| License fees | Yes | No |
| Implementation | Vendor/partner | Internal/consultant |
| Training | Often included | Self-directed or purchased |
| Support | Included in license | Community or purchased |
| Customization | Limited | Unlimited but costly |
| Infrastructure | Cloud included or on-prem | You manage |
12.1.4.2 Compliance and Security
| Consideration | Commercial | Open Source |
|---|---|---|
| HIPAA compliance | Often certified | Your responsibility to configure |
| SOC 2 certification | Common | Rare; your responsibility |
| Security updates | Vendor manages | You monitor and apply |
| Audit trails | Built-in | May require configuration |
12.1.4.3 Sustainability
Consider long-term viability:
- Commercial: Vendor may be acquired, change pricing, sunset product
- Open Source: Community may lose momentum; check activity levels
- Hybrid: Consider tools with both commercial and open source options
12.1.5 Summary
The choice between commercial and open source tools depends on your context: budget, technical capacity, data sensitivity, and compliance requirements. Public health has excellent open source options, particularly for data collection (REDCap), analysis (R), mapping (QGIS), and integration (Mirth Connect). Evaluate total cost of ownership, not just license fees.