15  Glossary

16 Glossary

Key terms used throughout this book, organized alphabetically.

16.1 A

API (Application Programming Interface)
A set of rules and protocols that allows software programs to communicate with each other. In public health automation, APIs are commonly used to send or receive data from services like Google Sheets, REDCap, or CDC data portals.
Automation
The use of scripts, tools, or software to perform tasks that would otherwise require manual effort. In this book, automation refers primarily to R and Python scripts that handle repetitive data and file operations.

16.2 B

Batch processing
Running a single operation across many files or records at once, rather than processing them one at a time. Example: renaming 500 files using a script instead of renaming each manually.

16.3 D

Data pipeline
A sequence of automated steps that moves data from its source through cleaning, transformation, and analysis to a final output such as a report or dashboard.
Deduplication
The process of identifying and resolving duplicate records in a dataset, using either exact matching (deterministic) or probabilistic methods.

16.4 E

Edit rules
Validation checks applied to data to ensure logical consistency. Example: “diagnosis date must precede treatment date” or “age at diagnosis must be between 0 and 120.”
Exception report
A report that lists records failing one or more validation checks, enabling targeted review and correction.

16.5 G

GPS format (Given-Person-Should)
A submission format for describing automation needs: “Given [context], the [role] should [action] to [outcome].”

16.6 L

Local-first
A design philosophy prioritizing solutions that run on the user’s own computer or local server, without requiring cloud services, AI subscriptions, or remote API access.

16.7 P

Parameterized report
A report template (typically in Quarto or R Markdown) that accepts input parameters (e.g., jurisdiction, time period) and generates customized output for each set of parameters.
Posit
The company (formerly RStudio) behind RStudio, Quarto, Shiny, and related open source tools widely used in data science and public health analytics. Website: posit.co

16.8 Q

Quarto
An open source publishing system for creating reproducible documents, reports, presentations, and websites. Supports R, Python, Julia, and Observable. Website: quarto.org

16.9 R

Reproducible analysis
An analysis that can be re-run by anyone with access to the code and data, producing the same results. Quarto and R Markdown facilitate reproducibility by combining code and narrative in a single document.

16.10 S

Shiny
An R framework for building interactive web applications. In the clinic context, Shiny apps are provided as code that users run locally; hosting is not included. Website: shiny.posit.co
Situational Protocol format
A submission format for describing automation needs: “When [trigger], the [process/system] shall [action] within [constraint].”

16.11 U

User Story format
A submission format for describing automation needs: “As a [role], I want [action], so that [outcome].”
NotePlaceholder

This glossary will be expanded as new terms are introduced in the solutions library and mindset chapters.