A guide to accounting for reporting delays in state, local, and territorial public health surveillance data

Authors

Affiliations

Laura Jones

Massachusetts Department of Public Health, Boston, MA, USA

Sam Abbott

Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, United Kingdom

Published

February 2, 2026

Abstract

State, local, and territorial surveillance systems are essential for public health decision making, but inherent delays between disease occurrence and reporting create challenges for real-time analysis. Other issues such as data revisions, site drop-in and drop-out, and data quality problems can also manifest as apparent delays in aggregate data. This guide provides practical guidance for epidemiologists, public health practitioners, and modellers working with reporting delays and related challenges in surveillance data. We describe challenges in real-time use of surveillance data, opportunities from modelling approaches, guidance on choosing appropriate methods, considerations for communicating results, and practical case studies with implementation resources. We also highlight gaps where new modelling methods could address unmet needs in public health practice.

¹ Massachusetts Department of Public Health, Boston, MA, USA
² Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, United Kingdom

^✉ Correspondence: Sam Abbott <sam.abbott@lshtm.ac.uk>

Introduction

Section lead: Sam Abbott Section support: TBD

State, local, and territorial surveillance systems play a central role in public health decision making in the United States. Similarly, sub-national jurisdictions in other countries also play a signficant role in public health decision making. However, several challenges complicate the use of surveillance data in real time. Reporting delays arise from lab confirmation requirements, differences between electronic and manual reporting, and weekend and holiday effects. Data revisions occur as duplicates are removed, cases are reclassified, and dates are corrected. Site-specific variations in reporting capabilities and intermittent reporting from some facilities create additional uncertainty. Other data quality issues such as missing or incorrect dates and incompatible formats can further limit the utility of recent data. Many of these issues can manifest as apparent delays when viewed in aggregate data, even when the underlying cause is not a true reporting delay.

Common approaches to handling these challenges include pruning recent data or using incomplete data with caveats. However, statistical methods exist that can adjust for these issues by learning from historical patterns. Addressing these challenges can improve situational awareness of true disease trends, supports decision making (such as timing of public health interventions), and can improve forecasting of disease trends and resource needs (i.e. hospitalisations). Examples of publicly available use of modelling to account for challenges data reporting include the Massachusetts Department of Public Health’s respiratory illness dashboard, New York City’s nowcasting during mpox and COVID-19 emergencies, California’s nowcast of COVID-19 effective reproduction number, and the CDC’s COVID-19 variant nowcast.

In this guide, we provide practical guidance for public health practitioners and modellers planning to account for reporting delays and other data quality issues in their analyses. We describe the challenges that arise when using surveillance data in real time, review modelling approaches that can address these challenges, and provide guidance on choosing and implementing appropriate methods. We also highlight gaps where new methods could address unmet needs in practice. This guide is accompanied by an interactive website with a decision tree for method selection and a community code repository with implementation examples.

Challenges in real-time use of surveillance data

Section overview lead: TBD Section overview support: TBD

Table 1 summarises the challenges described in this section.

Table 1: Summary of challenges in real-time surveillance data

Challenge	Description	Examples
Reporting delays	Time between event and report	Lab confirmation, manual entry
Data revisions	Changes to previously reported data	Duplicate removal, reclassification
Site drop-in/out	Facilities joining or leaving	New sites, closures, intermittent reporting
Other data quality	Additional data issues	Missing dates, format incompatibilities

Reporting delays

Section lead: TBD Section support: TBD

What it is

Right-truncation

Why it happens

Lab confirmation requirements
Electronic vs manual reporting differences
Weekend and holiday effects

How it affects analysis

How to identify it

Data revisions beyond reporting delays

Section lead: TBD Section support: TBD

What it is

Why it happens

Downward corrections from duplicate removal
Case reclassifications
Date corrections
De-duplication

How it affects analysis

How to identify it

Surveillance site drop-in and drop-out

Section lead: TBD Section support: TBD

What it is

Why it happens

System-to-system delays
Hospital/facility intermittent reporting
Urban vs rural reporting capabilities
EHR integration disparities

How it affects analysis

How to identify it

Other data quality issues

Section lead: TBD Section support: TBD

What it is

Why it happens

Duplicate entries across systems
Missing or incorrect dates
Missing strata variables of interest (e.g., race/ethnicity)
Incompatible formats

How it affects analysis

How to identify it

Opportunities from modelling

Section overview lead: TBD Section overview support: TBD

Table 2 provides an overview of modelling approaches that can address the challenges described in the previous section.

Table 2: Modelling approaches for addressing surveillance data challenges

Challenge	Approach category	Examples	Data requirements
Reporting delays	Chain ladder	baselinenowcast, ChainLadder	(placeholder?)
	GAMs	nowcaster, UKHSA GAMs	(placeholder?)
	Bayesian hierarchical	NobBS	(placeholder?)
	Ad-hoc	EpiNow2	(placeholder?)
	Semi-mechanistic	EpiNow2, epinowcast	(placeholder?)
Data revisions	Chain ladder	baselinenowcast	(placeholder?)
Site drop-in/out	(placeholder?)	(placeholder?)	(placeholder?)
Other data quality	(placeholder?)	(placeholder?)	(placeholder?)

Correcting delayed data

Section lead: TBD Section support: TBD

Data requirements

Defining report and reference dates
Event date hierarchy (NYC mpox example)
Ability to assess data revision history
Volume of data available

Methods

Chain ladder approaches (baselinenowcast, ChainLadder)
GAMs (nowcaster, UKHSA GAMs)
Bayesian hierarchical methods (NobBS)
Ad-hoc methods (EpiNow2)
Semi-mechanistic approaches (EpiNow2, epinowcast)

Implementation

Technical adjustments to capture data revision history
Aggregating data by report/reference date
Considerations for stratified nowcasts

Managing data revisions beyond reporting delays

Section lead: TBD Section support: TBD

Data requirements

Methods

Implementation

Handling site drop-in and drop-out

Section lead: TBD Section support: TBD

Data requirements

Methods

Implementation

Managing other data quality issues

Section lead: TBD Section support: TBD

Data requirements

Methods

Implementation

Choosing a modelling method

Section overview lead: TBD Section overview support: TBD

?@fig-decision-tree provides a decision tree linking methods to data characteristics and use case considerations.

Create decision tree figure linking methods to considerations
Link methods to considerations with examples of common PH data sources
Considerations for when NOT to nowcast

The following sections provide detailed guidance for specific aspects of method selection, referring back to the decision tree where relevant.

Adoption and sustainability

Section lead: TBD Section support: TBD

Advocating to public health leadership
Explaining nowcasts to decision makers
Justifying system modifications

Data analysis

Section lead: TBD Section support: TBD

Analysing delay distributions
Choosing maximum delay and training window
Representativeness
What historic data is most informative?
Determining variables needed, strata of interest

Implementation considerations

Section lead: TBD Section support: TBD

Considerations for emergencies vs chronic delays
Software/infrastructure requirements
Failure modes
Maintaining systems over time
Tradeoff between flexibility and model complexity
Model specific (link back to modelling opportunities implementation section)

Validating and evaluating a model

Section lead: TBD Section support: TBD

Practical qualitative methods (flipbooking, alignment with trends)
Alignment with overall trend vs inflection points
Applied PH quantitative methods (coverage, correlation, residuals)
Advanced methods (WIS, MAE/MSE) for method comparison
Tension between domain expertise and theoretical scores
Evaluation for public health utility
Against a baseline and other common methods

Communicating and visualising results

Section lead: TBD Section support: TBD

Presenting uncertainty to decision makers

Section lead: TBD Section support: TBD

Communicating uncertainty
What prediction/confidence intervals to show or use for decisions

Public-facing output considerations

Section lead: TBD Section support: TBD

Data presentation
Placing nowcasts in context (e.g., seasonal intensity thresholds)

Common pitfalls

Section lead: TBD Section support: TBD

How-to case studies

Section overview lead: TBD Section overview support: TBD

Regardless of the specific challenge or method, implementing a model to account for reporting challenges in surveillance data involves common steps:

Assessing the data source and predictability of the reporting challenge
Analysing any delay and delay distributions
Understanding the meaningful strata (i.e. ethnicity)
Implementation
Validation
Communication and visualisation

The following case studies illustrate how these steps apply in practice.

Case study: Syndromic surveillance

Section lead: Laura Jones Section support: TBD

Case study: To be determined

Section lead: TBD Section support: TBD

Additional tools and resources

Section lead: TBD Section support: TBD

Interactive website with decision tree for method selection
Community code repository with implementation snippets
Links to software packages and documentation
Example datasets for testing and learning

Discussion

Section lead: Sam Abbott Section support: TBD

Note: Subtitles in this section are for organisation during writing and will be removed at submission.

Summary

We presented a practical guide for public health practitioners and modellers working with surveillance data affected by reporting delays and other data quality issues. We described the challenges that arise when using surveillance data in real time, reviewed modelling approaches that can address these challenges, and provided guidance on choosing and implementing appropriate methods. We also provided case studies demonstrating how these methods can be applied in practice.

Strengths and limitations

This guide brings together practical experience from public health practitioners and modellers working with delayed surveillance data. We focus on methods that have been applied in real-world settings rather than purely theoretical approaches. However, our coverage of methods is not exhaustive and the field continues to evolve.

Future directions

Key areas for methodological development include methods for handling site drop-in and drop-out, approaches for managing data revisions beyond reporting delays, and tools that integrate multiple data quality adjustments. Improved software implementations that lower barriers to adoption would also benefit practitioners.

Conclusions

Reporting delays and data quality issues are inherent to surveillance systems but need not prevent timely public health decision making. Statistical methods exist that can adjust for these issues, improving situational awareness and supporting evidence-based responses. We hope this guide helps practitioners identify when such methods may be useful and provides a starting point for implementation. The companion website and code repository provide additional resources for those implementing these methods. We also hope this guide highlights gaps where new methods could address unmet needs, encouraging further methodological development in this area.

Other Formats

Introduction

Challenges in real-time use of surveillance data

Reporting delays

What it is

Why it happens

How it affects analysis

How to identify it

Data revisions beyond reporting delays

What it is

Why it happens

How it affects analysis

How to identify it

Surveillance site drop-in and drop-out

What it is

Why it happens

How it affects analysis

How to identify it

Other data quality issues

What it is

Why it happens

How it affects analysis

How to identify it

Opportunities from modelling

Correcting delayed data

Data requirements

Methods

Implementation

Managing data revisions beyond reporting delays

Data requirements

Methods

Implementation

Handling site drop-in and drop-out

Data requirements

Methods

Implementation

Managing other data quality issues

Data requirements

Methods

Implementation

Choosing a modelling method

Adoption and sustainability

Data analysis

Implementation considerations

Validating and evaluating a model

Communicating and visualising results

Presenting uncertainty to decision makers

Public-facing output considerations

Common pitfalls

How-to case studies

Case study: Syndromic surveillance

Case study: To be determined

Additional tools and resources

Discussion

Summary

Strengths and limitations

Related work

Future directions

Conclusions

References