A guide to accounting for reporting delays in state, tribal, local, and territorial public health surveillance data

Author

Affiliation

Sam Abbott

Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, United Kingdom

Published

March 16, 2026

Abstract

State, local, and territorial surveillance systems are essential for public health decision making, but inherent delays between disease occurrence and reporting create challenges for real-time analysis. Other issues such as data revisions, site drop-in and drop-out, and data quality problems can also manifest as apparent delays in aggregate data. This guide provides practical guidance for epidemiologists, public health practitioners, and modellers working with reporting delays and related challenges in surveillance data. We describe challenges in real-time use of surveillance data, opportunities from modelling approaches, guidance on choosing appropriate methods, considerations for communicating results, and practical case studies with implementation resources. We also highlight gaps where new modelling methods could address unmet needs in public health practice.

¹ Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, United Kingdom

^✉ Correspondence: Sam Abbott <sam.abbott@lshtm.ac.uk>

Introduction

Section lead: Sam Abbott Section support: Laura Jones

State, local, and territorial surveillance systems play a central role in public health decision making in the United States. Similarly, sub-national jurisdictions in other countries also play a signficant role in public health decision making. However, several challenges complicate the use of surveillance data in real time. Reporting delays arise from lab confirmation requirements, differences between electronic and manual reporting, and weekend and holiday effects. Data revisions occur as duplicates are removed, cases are reclassified, and dates are corrected. Site-specific variations in reporting capabilities and intermittent reporting from some facilities create additional uncertainty. Other data quality issues such as missing or incorrect dates and incompatible formats can further limit the utility of recent data. Many of these issues can manifest as apparent delays when viewed in aggregate data, even when the underlying cause is not a true reporting delay.

Common approaches to handling these challenges include discarding recent data, presenting frozen values that strip out delayed reports, or aggregating by report date rather than event date. These simple approaches are often the first and most accessible options and may be sufficient when delays are short and stable or when recent data does not directly drive time-sensitive decisions. When delays are long or variable, when timely estimates matter, or when uncertainty quantification is needed, statistical methods that learn from historical reporting patterns can improve estimates. Addressing these challenges can improve situational awareness of true disease trends, supports decision making (such as timing of public health interventions), and can improve forecasting of disease trends and resource needs (i.e. hospitalisations). Examples of publicly available use of modelling to account for challenges data reporting include the Massachusetts Department of Public Health’s respiratory illness dashboard, New York City’s nowcasting during mpox and COVID-19 emergencies, California’s nowcast of COVID-19 effective reproduction number, and the CDC’s COVID-19 variant nowcast.

In this guide, we provide practical guidance for public health practitioners and modellers planning to account for reporting delays and other data quality issues in their analyses. We describe the challenges that arise when using surveillance data in real time, review modelling approaches that can address these challenges, and provide guidance on choosing and implementing appropriate methods. We also highlight gaps where new methods could address unmet needs in practice. This guide is accompanied by an interactive website with a decision tree for method selection and a community code repository with implementation examples. This guide is intended as a companion to (placeholder?), which provides a detailed technical treatment of nowcasting methods including their mathematical formulation and computational properties. Here we focus on practical guidance for choosing and implementing these methods in STLT settings, with emphasis on adoption, communication, and real-world case studies.

Challenges in real-time use of surveillance data

Section overview lead: Sharon Greene Section overview support: Laura Jones,

Table 1 summarises the challenges described in this section.

Table 1: Summary of challenges in real-time surveillance data

Challenge	Description	Examples
Reporting delays	Time between event and report	Lab confirmation, manual entry
Data revisions	Changes to previously reported data	Duplicate removal, reclassification
Site drop-in/out	Facilities joining or leaving	New sites, closures, intermittent reporting
Other data quality	Additional data issues	Missing dates, format incompatibilities

Reporting delays

Section lead: TBD Section support: TBD

What it is

Right-truncation

Why it happens

Lab confirmation requirements
Electronic vs manual reporting differences
Delays in facility coding of clinical encounters
Weekend and holiday effects (fewer reports filed on weekends, lab closures)
Holiday periods (extended closures affect both reporting and care-seeking)
Delays may shorten as reporting improves or lengthen when systems are under strain

How it affects analysis

High variability in count data can make delay patterns harder to estimate
Seasonal patterns in disease incidence interact with delay patterns
Day-of-week effects appear in both disease incidence and reporting

How to identify it

Data revisions beyond reporting delays

Section lead: TBD Section support: TBD

What it is

Why it happens

Downward corrections from duplicate removal
Case reclassifications
Date corrections
De-duplication

How it affects analysis

How to identify it

Surveillance site drop-in and drop-out

Section lead: TBD Section support: TBD

What it is

Why it happens

System-to-system delays
Hospital/facility intermittent reporting
Urban vs rural reporting capabilities
EHR integration disparities

How it affects analysis

How to identify it

Other data quality issues

Section lead: TBD Section support: TBD

What it is

Why it happens

Duplicate entries across systems
Missing or incorrect dates
Missing strata variables of interest (e.g., race/ethnicity)
Incompatible formats
Changes in testing or admission practices over time
Changes in case definitions or reimbursement schemes

How it affects analysis

How to identify it

Opportunities from modelling

Section overview lead: TBD Section overview support: TBD

For each approach below, we describe what it does, what data it needs, and what it gives you in plain language. For mathematical details and computational properties, see (placeholder?).

Table 2 provides an overview of modelling approaches that can address the challenges described in the previous section.

Table 2: Modelling approaches for addressing surveillance data challenges

Challenge	Approach	What it does
Reporting delays	Simple	Flag recent data as provisional, use report dates, or freeze snapshots
	Multiplicative	Scale up partial counts by historical reporting proportions
	Regression	Jointly model the epidemic curve and delay as a smooth surface
	Generative	Model expected events and their delay distribution explicitly
	Other statistical	Apply general ML or time series methods to the reporting triangle
Data revisions	(placeholder?)	(placeholder?)
Site drop-in/out	(placeholder?)	(placeholder?)
Other data quality	(placeholder?)	(placeholder?)

Correcting delayed data

Section lead: Sam Abbott Section support: Daniel Mcdonald

Data requirements

Any nowcasting method that accounts for reporting delays needs data organised by both reference date (when the event occurred) and reporting delay (or equivalently, report date). This is often represented as a reporting triangle, where rows are reference dates, columns are delays, and cells contain the number of events first reported at each delay, but can equally be stored as a data frame with one row per reference-date-by-delay combination.

If you have individual line list records with both a reference date and a report date, this structure is built by counting events within each combination. If instead you receive periodic snapshots of aggregate counts (for example, a daily download of cumulative totals by event date), it is obtained by differencing successive snapshots (Wolffram et al. 2023; Johnson et al. 2025). Following a fixed data extraction schedule makes modelling substantially easier, because irregular download times introduce artificial variation in what appears as a new report.

There are several common data cleaning issues. Differencing snapshots can produce negative cell values when earlier counts are revised downward, for example through deduplication or reclassification (see Section 3.2). Some surveillance systems report rolling sums (for example, 7-day totals) rather than daily counts. These typically need to be decomposed back into daily or weekly values before the reporting triangle can be constructed, but some methods can target rolling sums directly, which has the advantage of avoiding the need to model correlations between the constituent counts (Johnson et al. 2025; Wolffram et al. 2023).

Methods

Not all situations require statistical correction. When delays are short relative to the decision timescale, or when the primary concern is monitoring trends rather than estimating levels, simpler strategies may be enough. One option is to flag recent dates as provisional and exclude them from interpretation until enough reports have accumulated (placeholder?). Another is to show only the counts available at each snapshot in time, removing any later additions, so that all dates are treated consistently even though each underestimates the true level (placeholder?). A third is to work with report dates instead of event dates, accepting a lagged and smoothed picture of the underlying signal (placeholder?). Each of these trades timeliness or completeness for simplicity, and each avoids the modelling choices that statistical methods require. They become problematic when delays are long or changing, when the gap between reported and actual counts matters for decisions, or when decision makers need to understand how uncertain current estimates are.

Multiplicative methods estimate current counts by scaling up partially reported values according to the proportion of cases historically observed at each delay (placeholder?). The approach originates from actuarial claims reserving, where it is known as the chain ladder method, and remains one of the most widely used approaches to nowcasting because it is fast, transparent, and easy to explain to decision makers. It works well when the delay pattern is reasonably stable over the fitting window and counts are large enough that the scaling factors are well estimated. The main limitations are that the basic form assumes a fixed delay distribution, cannot produce meaningful estimates when event-date counts are zero, and does not account for systematic day-of-the-week effects in reporting (Wolffram et al. 2023). Variants that address some of these issues exist, and uncertainty can be added either through distributional assumptions or by evaluating past prediction errors (Johnson et al. 2025).

Regression approaches go further by jointly modelling the epidemic curve and the delay distribution within a single model, treating the reporting triangle as a smooth surface over event time and delay (Kassteele, Eilers, and Wallinga 2019; Schneble et al. 2021; Bastos et al. 2019; Mellor et al. 2025). This means models can be specified that borrow information from neighbouring time points to stabilise estimates where counts are low, allow the delay pattern to change over time, account for systematic day-of-week effects, and incorporate additional covariates such as age group or geography. Uncertainty can be estimated as part of the model rather than as a postprocessing step. These are marginal approaches, treating each cell of the reporting triangle as independent (Stoner and Economou 2020). The regression structure makes it difficult to encode mechanistic components such as laboratory capacity or test-seeking behaviour, the delay component is not constrained to produce reporting proportions that sum to one, and parametric delay distributions are not supported, which in practice means that nowcasts can behave unpredictably at long delays where data are sparse.

Generative models are a more general class than regression approaches, specifying an explicit model for both the expected number of events at each event time and how those events distribute across reporting delays (Höhle and Heiden 2014; McGough et al. 2020; Günther et al. 2021; Lison et al. 2024; Sam Abbott et al. 2025). This structure makes it straightforward to incorporate mechanistic components, such as a renewal process linking expected counts to a reproduction number, or to bring in auxiliary data sources like leading epidemiological indicators (Lison et al. 2024; Bergström et al. 2022). It also makes it straightforward to add constraints to the delay distribution, such as requiring reporting proportions to sum to one or specifying a parametric form. Two main variants exist: conditional and marginal. Conditional generative models separate variability in incidence from variability in reporting by modelling total counts directly and then distributing them across delays, producing well-calibrated uncertainty for the quantities decision makers most care about (Höhle and Heiden 2014; Stoner and Economou 2020; Stoner, Halliday, and Economou 2023; Seaman et al. 2022). Marginal generative models instead treat each cell of the reporting triangle as an independent draw (McGough et al. 2020; Günther et al. 2021; Lison et al. 2024; Sam Abbott et al. 2025). Both variants can support parametric delay distributions, time-varying delays through covariates such as day-of-week effects, hierarchical pooling across regions or age groups, joint estimation of the reproduction number, and incorporation of leading indicators (Stoner, Halliday, and Economou 2023; Seaman et al. 2022; Bergström et al. 2022; Sam Abbott et al. 2025).

Because the nowcasting problem can be expressed as a regression model, any generalisation of regression can in principle be applied, including ARIMA models with covariates, gradient-boosted trees, neural networks, and other machine learning approaches (placeholder?). These methods can be easier to set up and may learn flexible structure from the data without requiring the analyst to specify it. The downside is that they typically lack the constraints and mechanistic components that purpose-built nowcasting models provide, and they carry the common limitations of black-box approaches.

Implementation

Software

Table 3 compares available software packages across key features.

For multiplicative methods, baselinenowcast (Johnson et al. 2025) provides a straightforward implementation with options for Poisson or negative binomial count models, separate day-of-week adjustment, and a correction for zero counts that ad hoc baseline methods cannot handle. ChainLadder implements the classical actuarial chain ladder and its variants. EpiNow2 (Abbott et al. 2020) implements a truncation model that can be used as input to its more flexible epidemic model, which can also accept nowcasts from any method expressed as a delay distribution.

For regression approaches, nowcaster (Bastos et al. 2019) uses smooth functions of event time and delay, with user-specified count distributions, support for day-of-week effects, and stratification by age or geography. A benefit of the regression framework is that it builds on widely used statistical software. Kassteele, Eilers, and Wallinga (2019) provide accompanying scripts for their constrained P-spline approach, and the UK Health Security Agency (UKHSA) nowcasting pipelines are available as public code repositories (Overton et al. 2023; Mellor et al. 2025; Tang et al. 2025).

For generative models, NobBS (McGough et al. 2020) implements a simple marginal generative model with a random walk expectation model. EpiLPS (Sumalinab et al. 2024) uses a marginal generative approach with smooth functions of event time and delay and user-specified count distributions. surveillance (Meyer, Held, and Höhle 2017) implements the conditional approach of Höhle and Heiden (2014) with a log-linear expectation model and support for additional data sources. epinowcast (Sam Abbott et al. 2025) implements a marginal generative approach with a flexible expectation model, multiple count distributions, parametric delays, hierarchical pooling across strata, joint effective reproduction number estimation, missing reference date imputation, user-defined reporting schedules, missing data handling, forecasting, and support for additional data sources.

Table 3: Available software packages for correcting delayed surveillance data. This table covers released packages; methods without dedicated software are described in the text.

Package	Method	Expectation model	Time-varying delays	Count distribution	Day-of-week effects	Proper delay	Parametric delay	Strata pooling	Additional data
baselinenowcast	Multiplicative	None	Rolling window	Poisson, NegBin	Separate models	Yes	No	No	No
ChainLadder	Multiplicative	None	Rolling window	Various	No	Yes	No	No	No
EpiNow2	Multiplicative	None	Partial	Poisson, NegBin	Yes	Yes	Yes	No	No
nowcaster	Regression	Smooth	Yes	User-specified	Yes	No	No	Yes	Possible
NobBS	Generative (marginal)	Random walk	Rolling window	NegBin	No	Yes	No	No	No
EpiLPS	Generative (marginal)	Smooth	Yes	User-specified	Yes	No	No	No	No
surveillance	Generative (conditional)	Log-linear	Yes	Poisson, NegBin	Yes	Yes	No	No	Yes
epinowcast	Generative (marginal)	Flexible	Yes	Multiple built-in	Yes	Yes	Yes	Yes	Yes

Starting simple

Before choosing a method, visualise the reporting triangle to understand the structure of your data: how long delays typically are, whether they change over time, and whether there are systematic day-of-week patterns. This informs which features a model actually needs. A good strategy is to start simple and build up complexity only when you can see that it is needed. Having a baseline nowcast adapted to your specific problem provides a reference point for judging whether additional model features add value (Johnson et al. 2025); a multiplicative method is a sensible choice for this. From there, you can assess whether the residual errors point to a specific deficiency and choose a more flexible method that addresses it. Using a framework that allows the model to be built up step by step makes this incremental approach easier to manage, such as (Sam Abbott et al. 2025).

Model specification decisions

Regardless of which method you choose, several specification decisions are likely going to be needed to be made.

Maximum delay. This should be set at the point that captures the vast majority of the target reporting, which you can estimate from historical data. Unless computational performance is a priority, there is generally little reason to vary this as part of the model development.
Data requirements. More complex models need more data to fit their additional flexibility. Most methods need at least as many snapshots of data as the target delay length at a minimum, though some Bayesian methods can work with less data by relying on prior models or parametric delay distributions.
Training window. Most methods fit to a window of recent data rather than the full history. A shorter window lets the model adapt quickly to changes in reporting behaviour but provides fewer data points for estimating delay proportions, which matters when counts are low. The appropriate window length depends on data frequency and the stability of reporting patterns (Johnson et al. 2025; Wolffram et al. 2023). Some methods support time-varying delays directly (see Table 3), which makes optimising the training window less important because the model itself can adapt. In settings where the delay distribution is expected to change, these methods are likely to perform better than those that assume a fixed delay within each window.
Stratification. When nowcasts are needed by age group, geography, or another variable, you can either fit separate models to each stratum or fit a single model that pools information across strata. Separate models are simpler but can be unstable when individual strata have low counts. Pooled or hierarchical models borrow strength across strata to stabilise estimates, but require software that supports this feature (see Table 3) and take longer to fit (Stoner, Halliday, and Economou 2023). A middle ground is to assume that the delay distribution is shared across strata while allowing the epidemic curve to vary (Seaman et al. 2022), which reduces the number of parameters without forcing strata to have identical trajectories.

Practical considerations

Computation. Multiplicative methods typically run in seconds. Regression and generative models fitted with Bayesian inference take longer, and fitting time depends heavily on the inference method. Full MCMC sampling via Stan (Stan Development Team 2021) is the most flexible but slowest option, with run times ranging from minutes to days depending on model complexity, the number of strata, and the length of the time series (Stoner, Halliday, and Economou 2023; Sam Abbott et al. 2025). Approximate inference methods can be substantially faster: nowcaster uses INLA (Rue, Martino, and Chopin 2009), EpiLPS uses Laplacian P-splines (Sumalinab et al. 2024), and epinowcast supports variational inference via Pathfinder (Zhang et al. 2022) as an alternative to MCMC. Plan for this when deciding how to integrate nowcasting into an operational workflow, for instance by scheduling model runs overnight.
Software availability. Public health departments often have limited ability to install software, with access restricted to packages available as pre-compiled binaries from CRAN. This rules out tools that depend on cmdstanr (Gabry and Češnovar 2021), which requires a local C++ toolchain: for example, epinowcast is cmdstanr-based and cannot be installed in such environments, whereas EpiNow2 uses rstan and is available from CRAN. nowcaster depends on INLA (Rue, Martino, and Chopin 2009), which may face similar installation barriers depending on the computing environment.
Monitoring. Once a pipeline is running, inspect the nowcast visually against the raw data after each run to confirm that estimates are plausible and uncertainty intervals are reasonable. Watch for estimates that swing sharply between runs, which can indicate changes in reporting practice, data quality events, or a model that is poorly suited to the current data. Compare the nowcast against what was eventually observed for recent past dates; persistent over- or under-prediction suggests the model needs recalibration (Wolffram et al. 2023).

Managing data revisions beyond reporting delays

Section lead: TBD Section support: TBD

Data requirements

Methods

Excluding data from known backlog or major revision periods
Flagging and annotating revision events so downstream analyses can account for them
Redistributing negative entries in reporting triangles across neighbouring cells
Nowcasting methods that model cumulative counts without enforcing a proper CDF (i.e. monotonically increasing) can accommodate negative increments directly
Methods that allow an improper PMF, such as baselinenowcast, can also handle negative counts in the reporting triangle
Many methods can be used without modification when negative counts are present for individual cells provided the net count for each delay is non-negative

Implementation

Handling site drop-in and drop-out

Section lead: TBD Section support: TBD

Data requirements

Methods

Restricting analysis to consistently reporting sites
Weighting or normalising by reporting completeness
Monitoring site participation over time to detect changes

Implementation

Managing other data quality issues

Section lead: TBD Section support: TBD

Data requirements

Methods

De-duplication rules applied before analysis
Imputing missing reference dates from available date fields
Standardising formats across data sources before aggregation

Implementation

Choosing a modelling method

Section overview lead: Ian Painter & Allie Warren Section overview support: TBD

?@fig-decision-tree provides a decision tree linking methods to data characteristics and use case considerations.

Create decision tree figure linking methods to considerations
Link methods to considerations with examples of common PH data sources
Considerations for when NOT to nowcast

The following sections provide detailed guidance for specific aspects of method selection, referring back to the decision tree where relevant.

Adoption and sustainability

Section lead: Tomas Leon & Natalie Linton Section support: Laura Jones

Advocating to public health leadership
Explaining nowcasts to decision makers
Justifying system modifications

Data analysis

Section lead: Tomas Leon & Natalie Linton Section support: Laura Jones

Analysing delay distributions
Choosing maximum delay and training window
Representativeness
What historic data is most informative?
Determining variables needed, strata of interest

Implementation considerations

Section lead: Tomas Leon & Natalie Linton Section support: Laura Jones

Considerations for emergencies vs chronic delays
Software/infrastructure requirements
Failure modes
Maintaining systems over time
Tradeoff between flexibility and model complexity
Model specific (link back to modelling opportunities implementation section)

Validating and evaluating a model

Section lead: Kaitlyn Johnson & Laura Jones Section support: TBD

What to check first: visual inspection of past nowcasts against final observed values (flipbooking, overlay plots)
Alignment with overall trend vs inflection points
What “good enough” looks like: prediction intervals cover observed values at roughly the expected rate (e.g. 90% intervals contain ~90% of final values)
When to investigate further: persistent directional bias, intervals that are consistently too narrow or too wide, sudden changes in performance
Applied PH quantitative methods (coverage, correlation, residuals)
Advanced methods (WIS, MAE/MSE) for method comparison; see (placeholder?) for scoring rules and formal comparative evaluation
Tension between domain expertise and theoretical scores
Evaluation for public health utility
Against a baseline and other common methods

Communicating and visualising results

Section lead: TBD Section support: TBD

Presenting uncertainty to decision makers

Section lead: TBD Section support: TBD

Communicating uncertainty
What prediction/confidence intervals to show or use for decisions

Public-facing output considerations

Section lead: TBD Section support: TBD

Data presentation
Placing nowcasts in context (e.g., seasonal intensity thresholds)

Common pitfalls

Section lead: TBD Section support: TBD

How-to case studies

Section overview lead: Laura Jones & Kaitlyn Johnson Section overview support: TBD

Regardless of the specific challenge or method, implementing a model to account for reporting challenges in surveillance data involves common steps:

Assessing the data source and predictability of the reporting challenge
Analysing any delay and delay distributions
Understanding the meaningful strata (i.e. ethnicity)
Implementation
Validation
Communication and visualisation

The following case studies illustrate how these steps apply in practice.

Case study: Syndromic surveillance

Section lead: Laura Jones Section support: TBD

Case study: To be determined

Section lead: TBD Section support: TBD

Additional tools and resources

Section lead: TBD Section support: TBD

Interactive website with decision tree for method selection
Community code repository with implementation snippets
Links to software packages and documentation
Example datasets for testing and learning

Discussion

Section lead: Sam Abbott Section support: TBD

Note: Subtitles in this section are for organisation during writing and will be removed at submission.

Summary

We presented a practical guide for public health practitioners and modellers working with surveillance data affected by reporting delays and other data quality issues. We described the challenges that arise when using surveillance data in real time, reviewed modelling approaches that can address these challenges, and provided guidance on choosing and implementing appropriate methods. We also provided case studies demonstrating how these methods can be applied in practice.

Strengths and limitations

This guide brings together practical experience from public health practitioners and modellers working with delayed surveillance data. We focus on methods that have been applied in real-world settings rather than purely theoretical approaches. Compared to the companion technical overview ((placeholder?)), this guide adds practical coverage of site drop-in/drop-out and data quality issues beyond reporting delays, guidance on adoption and making the case to leadership, communication and visualisation strategies, and case studies with actual STLT data and workflows. The companion website with its decision tree and community code repository provides interactive resources that a static paper cannot. However, our coverage of methods is not exhaustive and the field continues to evolve.

Future directions

Nowcasting is often an intermediate step toward other analytical goals. Nowcast-corrected incidence can feed into reproduction number estimators, and joint estimation avoids losing uncertainty between steps. Nowcasts can also be coupled with forecasting models to improve timeliness, and reporting delays can reduce the sensitivity of outbreak detection systems where approaches exist to adjust thresholds rather than correct data. See (placeholder?) for detailed treatment of these downstream applications.

Key areas for methodological development include methods for handling site drop-in and drop-out, approaches for managing data revisions beyond reporting delays, and tools that integrate multiple data quality adjustments. Improved software implementations that lower barriers to adoption would also benefit those working with these data.

Conclusions

Reporting delays and data quality issues are inherent to surveillance systems but need not prevent timely public health decision making. Statistical methods exist that can adjust for these issues, improving situational awareness and supporting evidence-based responses. We hope this guide helps practitioners identify when such methods may be useful and provides a starting point for implementation. The companion website and code repository provide additional resources for those implementing these methods. We also hope this guide highlights gaps where new methods could address unmet needs, encouraging further methodological development in this area.

References

Abbott, Sam, Joel Hellewell, Katharine Sherratt, Katelyn Gostic, Joe Hickson, Hamada S. Badr, Michael DeWitt, Robin Thompson, EpiForecasts, and Sebastian Funk. 2020. EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters. https://doi.org/10.5281/zenodo.3957489.

Bastos, Leonardo S, Theodoros Economou, Marcelo F C Gomes, Daniel A M Villela, Flavio C Coelho, Oswaldo G Cruz, Oliver Stoner, Trevor Bailey, and Claudia T Codeço. 2019. “A Modelling Approach for Correcting Reporting Delays in Disease Surveillance Data.” Statistics in Medicine 38 (22): 4363–77. https://doi.org/10.1002/sim.8303.

Bergström, Fanny, Felix Günther, Michael Höhle, and Tom Britton. 2022. “Bayesian Nowcasting with Leading Indicators Applied to COVID-19 Fatalities in Sweden.” PLOS Computational Biology 18 (12): e1010767. https://doi.org/10.1371/journal.pcbi.1010767.

Gabry, Jonah, and Rok Češnovar. 2021. Cmdstanr: R Interface to ’CmdStan’.

Günther, Felix, Andreas Bender, Katharina Katz, Helmut Küchenhoff, and Michael Höhle. 2021. “Nowcasting the COVID-19 Pandemic in Bavaria.” Biom. J. 63 (3): 490–502. https://doi.org/10.1002/bimj.202000112.

Höhle, Michael, and Matthias an der Heiden. 2014. “Bayesian Nowcasting During the STEC O104:H4 Outbreak in Germany, 2011.” Biometrics 70 (4): 993–1002. https://doi.org/10.1111/biom.12194.

Johnson, Kaitlyn E, Maria L Tang, Emily Tyszka, Laura Jones, Barbora Nemcova, Daniel Wolffram, Rosa Ergas, et al. 2025. “Baseline Nowcasting Methods for Handling Delays in Epidemiological Data.” Wellcome Open Research 10: 614. https://wellcomeopenresearch.org/articles/10-614.

Kassteele, Jan van de, Paul H C Eilers, and Jacco Wallinga. 2019. “Nowcasting the Number of New Symptomatic Cases During Infectious Disease Outbreaks Using Constrained P-Spline Smoothing.” Epidemiology 30 (5): 737–45. https://doi.org/10.1097/EDE.0000000000001050.

Lison, Adrian, Sam Abbott, Jana Huisman, and Tanja Stadler. 2024. “Generative Bayesian Modeling to Nowcast the Effective Reproduction Number from Line List Data with Missing Symptom Onset Dates.” Edited by Tom Britton. PLOS Computational Biology 20 (4): e1012021. https://doi.org/10.1371/journal.pcbi.1012021.

McGough, Sarah F, Michael A Johansson, Marc Lipsitch, and Nicolas A Menzies. 2020. “Nowcasting by Bayesian Smoothing: A Flexible, Generalizable Model for Real-Time Epidemic Tracking.” PLOS Computational Biology 16 (4): e1007735. https://doi.org/10.1371/journal.pcbi.1007735.

Mellor, Jonathon, Maria L Tang, Emilie Finch, Rachel Christie, Oliver Polhill, Christopher E Overton, Ann Hoban, Amy Douglas, Sarah R Deeny, and Thomas Ward. 2025. “An Application of Nowcasting Methods: Cases of Norovirus During the Winter 2023/2024 in England.” PLOS Computational Biology. https://doi.org/10.1371/journal.pcbi.1012849.

Meyer, Sebastian, Leonhard Held, and Michael Höhle. 2017. “Spatio-Temporal Analysis of Epidemic Phenomena Using the R Package surveillance.” Journal of Statistical Software 77 (11): 1–55. https://doi.org/10.18637/jss.v077.i11.

Overton, Christopher E, Sam Abbott, Rachel Christie, Fergus Cumming, Julie Day, Owen Jones, Rob Sherlock Paton, Charlie Turner, and Thomas Ward. 2023. “Nowcasting the 2022 Mpox Outbreak in England.” PLoS Comput. Biol. 19 (9): e1011463. https://doi.org/10.1371/journal.pcbi.1011463.

Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. https://doi.org/10.1111/j.1467-9868.2008.00700.x.

Sam Abbott, Adrian Lison, Sebastian Funk, Carl Pearson, Hugo Gruson, Felix Guenther, Michael DeWitt, James Mba Azam, and Jessalyn Sebastian. 2025. Epinowcast: A Bayesian Framework for Real-Time Infectious Disease Surveillance. https://doi.org/10.5281/zenodo.5637165.

Schneble, Marc, Giacomo De Nicola, Göran Kauermann, and Ursula Berger. 2021. “Nowcasting Fatal COVID-19 Infections on a Regional Level in Germany.” Biometrical Journal 63 (3): 471–89. https://doi.org/10.1002/bimj.202000143.

Seaman, Shaun R, Pantelis Samartsidis, Meaghan Kall, and Daniela De Angelis. 2022. “Nowcasting COVID-19 Deaths in England by Age and Region.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 71 (5): 1266–81. https://doi.org/10.1111/rssc.12576.

Stan Development Team. 2021. Stan Modeling Language Users Guide and Reference Manual, 2.28.1.

Stoner, Oliver, and Theo Economou. 2020. “Multivariate Hierarchical Frameworks for Modeling Delayed Reporting in Count Data.” Biometrics 76 (3): 789–98. https://doi.org/10.1111/biom.13188.

Stoner, Oliver, Allison Halliday, and Theo Economou. 2023. “Correcting Delayed Reporting of COVID-19 Using the Generalized-Dirichlet-Multinomial Method.” Biometrics 79 (3): 2537–50. https://doi.org/10.1111/biom.13810.

Sumalinab, Bryan, Oswaldo Gressani, Niel Hens, and Christel Faes. 2024. “Bayesian Nowcasting with Laplacian-P-Splines.” Journal of Computational and Graphical Statistics. https://doi.org/10.1080/10618600.2024.2395414.

Tang, Maria L, Ian S McFarlane, Christopher E Overton, Erjola Hani, Vanessa Saliba, Gareth J Hughes, Paul Crook, Thomas Ward, and Jonathon Mellor. 2025. “Nowcasting Cases and Trends During the Measles 2023/24 Outbreak in England.” Journal of Infection. https://doi.org/10.1016/j.jinf.2025.106473.

Wolffram, Daniel, Sam Abbott, Matthias an der Heiden, Sebastian Funk, Felix Günther, Davide Haase, Stefan Heyder, et al. 2023. “Collaborative Nowcasting of COVID-19 Hospitalization Incidences in Germany.” PLOS Computational Biology 19 (8): e1011394. https://doi.org/10.1371/journal.pcbi.1011394.

Zhang, Lu, Bob Carpenter, Andrew Gelman, and Aki Vehtari. 2022. “Pathfinder: Parallel Quasi-Newton Variational Inference.” Journal of Machine Learning Research 23 (306): 1–49.

Other Formats

Introduction

Challenges in real-time use of surveillance data

Reporting delays

What it is

Why it happens

How it affects analysis

How to identify it

Data revisions beyond reporting delays

What it is

Why it happens

How it affects analysis

How to identify it

Surveillance site drop-in and drop-out

What it is

Why it happens

How it affects analysis

How to identify it

Other data quality issues

What it is

Why it happens

How it affects analysis

How to identify it

Opportunities from modelling

Correcting delayed data

Data requirements

Methods

Implementation

Software

Starting simple

Model specification decisions

Practical considerations

Managing data revisions beyond reporting delays

Data requirements

Methods

Implementation

Handling site drop-in and drop-out

Data requirements

Methods

Implementation

Managing other data quality issues

Data requirements

Methods

Implementation

Choosing a modelling method

Adoption and sustainability

Data analysis

Implementation considerations

Validating and evaluating a model

Communicating and visualising results

Presenting uncertainty to decision makers

Public-facing output considerations

Common pitfalls

How-to case studies

Case study: Syndromic surveillance

Case study: To be determined

Additional tools and resources

Discussion

Summary

Strengths and limitations

Related work

Future directions

Conclusions

References