8  Requirements & Data Analysis

8.1 Requirements Analysis & Data Analysis

Raw elicitation findings must be organized, analyzed, and specified in detail. In business analysis, this is Requirements Analysis and Design Definition. In public health, it maps to Data Analysis and Logic Model Development. Both processes transform unstructured input into structured, actionable specifications.

8.1.1 The Dual Framework

BA Perspective PH Perspective
Requirements Analysis Data Analysis
Requirements Specification Logic Model / Theory of Change
Functional Requirements Program Activities
Non-Functional Requirements Implementation Characteristics
Data Requirements Case Definitions, Data Dictionaries
Business Rules Clinical Guidelines, Protocols

8.1.2 Types of Requirements

8.1.2.1 Functional Requirements

BA Definition: What the system must do. Capabilities, features, functions.

PH Equivalent: Program activities, intervention components, service delivery specifications.

NoteCancerSurv Example

Functional Requirement (BA format):

FR-101: The system shall allow users to search for cases by patient name, medical record number, or social security number.

Program Activity (PH format):

Cancer registrars will abstract and code incident cases from hospital pathology reports within 6 months of diagnosis date.

Both describe “what happens” but at different levels of specificity.

8.1.2.2 Non-Functional Requirements (NFRs)

BA Definition: Quality attributes, constraints, performance characteristics.

PH Equivalent: Implementation characteristics (per CFIR framework).

NFR Category BA Focus PH Focus (CFIR Domain)
Performance Response time, throughput Efficiency of intervention delivery
Security Access control, encryption HIPAA compliance, trust
Scalability Growth capacity Outbreak surge response
Usability User interface design Complexity, ease of adoption
Reliability Uptime, fault tolerance Service continuity
Interoperability API standards, data exchange Health information exchange
NoteCancerSurv Example

NFR (BA format):

NFR-201: The system shall maintain 99.9% uptime during business hours (8 AM to 6 PM Eastern).

Implementation Characteristic (PH format):

The CancerSurv platform must demonstrate high reliability to maintain registrar confidence and ensure continuous data collection, critical during cancer awareness campaigns when reporting volumes increase.

8.1.2.3 Data Requirements

Data specifications are central to both domains:

BA Data Model:

  • Entity-Relationship diagrams
  • Database schemas
  • Data dictionaries
  • Validation rules

PH Case Definitions:

  • Diagnostic criteria
  • Inclusion/exclusion criteria
  • Coding standards (ICD-O-3, TNM)
  • Data quality metrics
erDiagram
    PATIENT ||--o{ TUMOR : has
    TUMOR ||--o{ TREATMENT : receives
    TUMOR ||--|| DIAGNOSIS : "classified by"
    FACILITY ||--o{ TUMOR : reports
    
    PATIENT {
        string patient_id PK
        string name
        date birth_date
        string ssn
        string address
    }
    
    TUMOR {
        string tumor_id PK
        string patient_id FK
        date diagnosis_date
        string primary_site
        string histology
        string stage
    }
Figure 8.1: CancerSurv Simplified Data Model

8.1.2.4 Data Architecture Requirements

Modern public health data systems require architecture that handles data from ingestion through analytics. The medallion architecture provides a framework for specifying data flow requirements across three progressive layers.

Specifying Requirements by Layer

When documenting data requirements, specify which layer each requirement applies to:

Requirement Type Bronze Layer Silver Layer Gold Layer
Primary Focus Completeness, lineage Accuracy, consistency Timeliness, usability
Data State Raw, as-received Cleansed, standardized Aggregated, analytics-ready
Schema Schema-on-read (flexible) Enforced schema Dimensional models
Retention Long-term archive Medium-term Purpose-specific
NoteCancerSurv Example

Bronze Layer Requirements:

  • REQ-DATA-001: The system shall ingest HL7 v2.x ADT messages from hospital interfaces within 15 minutes of receipt
  • REQ-DATA-002: The system shall preserve original message content with timestamp and source metadata for audit purposes
  • REQ-DATA-003: The system shall support ingestion of CSV files from facilities without HL7 capability

Silver Layer Requirements:

  • REQ-DATA-010: The system shall deduplicate patient records using probabilistic matching (≥95% precision)
  • REQ-DATA-011: The system shall map incoming diagnosis codes to ICD-O-3 standard within 24 hours
  • REQ-DATA-012: The system shall apply NAACCR edit checks and flag records failing validation

Gold Layer Requirements:

  • REQ-DATA-020: The system shall generate NPCR-compliant annual submission files by January 31
  • REQ-DATA-021: The system shall calculate age-adjusted incidence rates by county, updated monthly
  • REQ-DATA-022: The system shall provide self-service query access for approved epidemiologists

Data Lineage and Traceability

Public health reporting requires demonstrable data provenance. Requirements should specify:

  • How data flows from source to final output
  • Which transformations are applied at each layer
  • How to trace any Gold-layer value back to its Bronze-layer source

This is equivalent to the “chain of custody” concept in laboratory settings.

flowchart LR
    subgraph Bronze["Bronze (Raw)"]
        B1["HL7 Messages"]
        B2["Lab Reports"]
        B3["Vital Records"]
    end
    
    subgraph Silver["Silver (Cleansed)"]
        S1["Deduplicated<br/>Patient Records"]
        S2["Standardized<br/>Case Abstracts"]
    end
    
    subgraph Gold["Gold (Curated)"]
        G1["Incidence<br/>Reports"]
        G2["Analytics<br/>Dashboards"]
        G3["Research<br/>Datasets"]
    end
    
    B1 --> S1
    B2 --> S1
    B3 --> S1
    S1 --> S2
    S2 --> G1
    S2 --> G2
    S2 --> G3
Figure 8.2: Data Flow Through Medallion Layers

8.1.2.5 Business Rules / Clinical Guidelines

Rules governing system behavior and data processing:

BA Business Rule PH Clinical Guideline
“Order cannot be placed if credit limit exceeded” “Case is reportable if primary site is within state jurisdiction”
“Discount applies if quantity > 100” “Stage is unknown if pathology report unavailable within 4 months”
“Manager approval required for refunds > $500” “Multiple primary rules apply per SEER guidelines”

8.1.3 Data Standards as Primary Requirements

In commercial software projects, data standards (file formats, API specifications, integration protocols) are often treated as technical details to be resolved by developers during implementation. In public health IT, this approach fails.

Data standards are primary business requirements, not optional technical details.

Health information systems operate within a regulatory and interoperability landscape where specific standards are mandated, not merely preferred. These standards should be identified and documented early, during requirements analysis, not deferred to design or implementation.

Standard Purpose Requirement Implication
HIPAA Privacy and security Security architecture, access controls, audit logging
HL7 v2 Message-based data exchange Interface specifications for lab results, ADT events
HL7 FHIR Modern API-based exchange RESTful API design for EHR integration
USCDI Federal data interoperability Required data classes for ONC certification
ICD-10 / ICD-O-3 Diagnosis and oncology coding Validation rules, lookup tables, code mapping
SNOMED CT Clinical terminology Concept mapping specifications
LOINC Laboratory test coding Interface specifications for electronic lab reporting
NAACCR Cancer registry standards Data dictionary, edit checks, submission formats
WarningCommon Pitfall

When data standards are not identified as requirements, projects encounter costly surprises during integration testing. A system that functions correctly in isolation may fail when connected to external systems that expect specific data formats, codes, or protocols.

For business analysts entering public health IT: treat data standards as "Must Have" requirements from day one. Interview stakeholders about external data exchanges early, and document the specific standards each interface requires.

NoteCancerSurv Example

Standards-Based Requirements for CancerSurv:

Standard CancerSurv Requirement
HIPAA All PHI encrypted at rest and in transit; role-based access; 6-year audit log retention
HL7 FHIR Patient, Condition, and Observation resources for hospital EHR integration
NAACCR v24 All required data items; automated EDITS validation; annual submission file generation
ICD-O-3 Validated primary site and histology codes with cross-validation rules
LOINC Mapping table for incoming electronic pathology reports

These standards-based requirements appeared in the CancerSurv requirements specification alongside functional requirements, with the same priority and traceability as any other "Must Have" item.

8.1.4 The Logic Model as Requirements Framework

Public health uses the Logic Model to specify program components. This structure maps directly to requirements categories:

flowchart LR
    subgraph Inputs[" "]
        I["**Inputs**<br/>(Resources)<br/>───────<br/>Funding<br/>Staff<br/>Data feeds<br/>Infrastructure"]
    end
    
    subgraph Activities[" "]
        A["**Activities**<br/>(Functions)<br/>───────<br/>Case abstraction<br/>Data quality<br/>Reporting<br/>Analytics"]
    end
    
    subgraph Outputs[" "]
        O["**Outputs**<br/>(Deliverables)<br/>───────<br/>Case records<br/>Quality reports<br/>NPCR submissions<br/>Dashboards"]
    end
    
    subgraph Outcomes[" "]
        OC["**Outcomes**<br/>(Success Metrics)<br/>───────<br/>95% completeness<br/>Timely reporting<br/>User satisfaction"]
    end
    
    I --> A --> O --> OC
Figure 8.3: Logic Model Components Mapped to Requirements
Logic Model Component Requirements Category
Inputs Constraints, Assumptions, Dependencies
Activities Functional Requirements
Outputs System Deliverables, Features
Outcomes Success Metrics, Acceptance Criteria
Impact Strategic Objectives, Business Value

8.1.5 Prioritization

8.1.5.1 Methods for Ranking Requirements

Not all requirements are equal. Prioritization ensures critical needs are addressed first:

MoSCoW Method:

  • Must have: Essential for go-live
  • Should have: Important but not critical
  • Could have: Desirable if time permits
  • Won’t have: Out of scope for this release

Weighted Scoring:

Assign weights to criteria (business value, regulatory requirement, user impact) and score each requirement.

Kano Model:

  • Basic needs (expected, cause dissatisfaction if missing)
  • Performance needs (more is better)
  • Delighters (unexpected features that excite)
NoteCancerSurv Example

Must Have:

  • Case entry and coding functionality
  • HIPAA-compliant security
  • NPCR data submission capability

Should Have:

  • Real-time analytics dashboard
  • Mobile-friendly interface
  • Automated duplicate detection

Could Have:

  • Machine learning for coding assistance
  • Patient portal for self-reported outcomes
  • Integration with research databases

8.1.6 Requirements Traceability

8.1.6.1 Linking Requirements to Objectives

Traceability ensures every requirement connects to a business need or program goal:

flowchart TB
    BN[Business Need /<br/>Program Goal] --> FR[Functional<br/>Requirement]
    BN --> NFR[Non-Functional<br/>Requirement]
    FR --> TC[Test Case]
    NFR --> TC
    FR --> US[User Story]
    TC --> TR[Test Result]
Figure 8.4: Requirements Traceability Structure

Traceability Matrix Example:

Requirement ID Description Source Priority Test Case
FR-101 Case search functionality Registrar interviews Must TC-101, TC-102
FR-102 ICD-O-3 coding validation NAACCR standards Must TC-103
NFR-201 99.9% uptime SLA requirements Must TC-201

8.1.7 Specification Formats

8.1.7.1 Writing Good Requirements

Regardless of format, good requirements share characteristics:

Characteristic Description Example
Complete Contains all necessary information Includes error handling, edge cases
Consistent Does not contradict other requirements Uses standard terminology
Unambiguous Only one interpretation possible “Within 3 seconds” not “quickly”
Verifiable Can be tested Measurable acceptance criteria
Traceable Links to source and test Includes requirement ID

8.1.7.2 User Story Format

For Agile projects:

As a [role], I want [feature], so that [benefit].

Acceptance Criteria:

  • Given [context], when [action], then [result]

8.1.7.3 GPS Format for Clinical Contexts

Given [clinical context], the [health worker role] should [specific action] to [health outcome].

NoteCancerSurv Example

User Story:

As a cancer registrar, I want to search for existing cases before creating a new record, so that I avoid creating duplicate entries.

GPS Format:

Given a new pathology report, the registrar should search existing cases by patient identifiers before abstracting, to maintain data integrity and accurate incidence counts.

Acceptance Criteria:

  • Given a patient name, when the registrar searches, then matching cases display within 3 seconds
  • Given a patient with no existing cases, when the registrar searches, then a “No matches found” message displays with option to create new case

8.1.8 Deliverables from This Phase

BA Deliverable PH Deliverable Purpose
Requirements Specification Logic Model Document what must be built
Data Dictionary Case Definition / Data Standards Specify data structures
Business Rules Catalog Clinical Protocol Define processing rules
Traceability Matrix Evaluation Framework Link requirements to objectives
Prioritized Backlog Workplan Order implementation work

8.1.9 Moving Forward

With requirements analyzed, prioritized, and specified, the next phase focuses on Design: defining how the solution will be built to meet these requirements.