Overview

EpiInvert is a seasonal-trend decomposition method for the daily incidence, \(i_t\), based on the following renewal equation:

\((1) \ \ \ \ s_ti_t=\sum_k s_{t-k}i_{t-k}R_{t-k}\Phi_k\)

where \(s_t\) is the seasonal component, \(R_t\) is the case reproduction number and \(\Phi_k\) is the serial interval. We assume that the serial interval is known and \(R_t\), \(s_t\) are computed by inverting the above renewal equation using signal processing techniques. The mathematical foundation of EpiInvert has been published in PNAS, 2021 and Biology, 2022. Once \(R_t\) and \(s_t\) are computed, the trend component (or restored incidence) \(T_t\) and the normalized error \(\varepsilon_t\) are defined as

\((2) \ \ \ \ T_t=\sum_ks_{t-k}i_{t-k}R_{t-k}\Phi_k\)

\((3) \ \ \ \ \varepsilon_t = \frac{s_ti_t-T_t}{(T_t)^a}\)

where \(a\) is computed by minimizing the quadratic error

\((4) \ \ \ \ E(a,b)=\sum_t \left( log(|s_ti_t-T_t|) - a\cdot log(T_t)-b\right)^2\)

EpiInvert also corrects the bias introduced by the festive days (or any anomalous day) where the user expects that the incidence value es biased. To correct the bias introduced by the festive days EpiInvert automatically update the incidence value in each festive day and the next 2 days. Typically, in the festive (or anomalous day), one observes a sharp decrease in the number of registered incidence that is compensated by increased incidence numbers in the next few days. This bias is corrected by redistributing the number of cases in the festive day and the next 2 days.

In a nutshell, EpiInvert is based on estimating all the variables involved, that is \(s_t\), \(R_t\) and the values of \(i_t\) in the festive days and the next 2 days in order to minimize the difference between the incidence and its expected value using the renewal equation.

To model the serial interval, EpiInvert allows a shifted log-normal parametric formulation. The shift can be negative, reflecting the fact that secondary cases may present symptoms earlier than the primary case. The user can also provide a non-parametric serial interval given by a numeric vector.

owid dataset

We use, owid, a dataset containing COVID-19 epidemiological indicators for Canada, France, Germany, Italy, UK and the USA obtained from Our World in data up to 2022-11-28. In the case a data value is not available for a given day we assign the value 0 to the indicator. owid is a dataframe containing the following variables :

Not all countries have recorded values for all indicators in this database. For example, France and Italy have data for all the indicators, but the rest of the countries do not. Therefore, before using the data from a country, it is convenient to analyze, by exploring the dataset, which indicators have values other than zero.

Package installation

You can install EpiInvert from the CRAN repository or from the GitHub repository with:

 install.packages("devtools")
 devtools::install_github("lalvarezmat/EpiInvert")

The EpiInvert installation from the source file requires a C++ compilation. To avoid this compilation step you can use the pre-compiled versions available at the CRAN repository.

We attach some required packages

library(EpiInvert)
library(ggplot2)
library(dplyr)
library(grid)

We load the owid dataset:

data(owid)
summary(owid)

We filter the owid dataset to keep the data up to 2022-05-05:

owid <- filter(owid,date<=as.Date("2022-05-05"))

Loading some festive days for the same countries:

data(festives)
head(festives)

Germany incidence decomposition using EpiInvert

We show the execution of EpiInvert using Germany data. The first parameter is a numerical vector with the daily incidence, the second parameter is the date of the last incidence value and the third parameter is a character vector with the dates of the festive (or anomalous) days (this parameter is not mandatory)

sel <- filter(owid, iso_code=="DEU")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$DEU)

Plotting the results:

EpiInvert_plot(res)

EpiInvert returns a list with the following elements:

Modifying the regularity parameters

EpiInvert allows to control the regularity of the seasonal component \(s_t\) and/or \(R_t\) using regularity parameters, the default values of both parameters are \(5\). Next we apply EpiInvert to Germany incidence modifying these parameters:

res <- EpiInvert(sel$new_cases,"2022-05-05",festives$DEU,select_params(list(seasonality_regularization_weight = 10,Rt_regularization_weight=1)))
EpiInvert_plot(res)

With respect to the previous experiment we observe a larger variability of \(R_t\) and a lower variation of \(s_t-s_{t-7}\).

Controlling the number of days in the past used by EpiInvert

EpiInvert execution for France using 365 days in the past. If you are not constrained by the computational cost of the algorithm, you can choose a large value of this parameter (for instance 9999), to ensure that EpiInvert will use the whole available sequence in the estimation.

sel <- filter(owid, iso_code=="FRA")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$FRA,
                 select_params(list(max_time_interval = 365)))

Plot of the incidence between “2021-12-15” and “2022-01-15”. Observe that the festive days bias correction only modifies the original incidence in the festive days and the following 2 days.

 EpiInvert_plot(res,"incid","2021-12-15","2022-01-15")

Using non-parametric serial interval

EpiInvert execution for UK using a non-parametric serial interval shifted -2 days

load data of a serial interval

data(si_distr_data)
head(si_distr_data)
## [1] 3.285609e-06 3.401902e-04 3.904441e-03 1.543537e-02 3.466818e-02
## [6] 5.608451e-02
sel <- filter(owid, iso_code=="GBR")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$UK,
       select_params(list(si_distr = si_distr_data,
       shift_si_distr=-2)))

Plot of the serial interval used (including the shift)

 EpiInvert_plot(res,"SI")

Modifying parametric serial interval

EpiInvert execution for the USA changing the default values of the parametric serial interval (using a shifted log-normal)

sel <- filter(owid, iso_code=="USA")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$USA,
       select_params(list(mean_si = 11,sd_si=6,shift_si=-1)))

Plot of the reproduction number Rt including an empiric 95% confidence interval of the variation of EpiInvert Rt estimation as a function of the number of future days available.

 EpiInvert_plot(res,"R")