EpiInvert is a seasonal-trend decomposition method for the daily incidence, \(i_t\), based on the following renewal equation:
\((1) \ \ \ \ s_ti_t=\sum_k s_{t-k}i_{t-k}R_{t-k}\Phi_k\)
where \(s_t\) is the seasonal component, \(R_t\) is the case reproduction number and \(\Phi_k\) is the serial interval. We assume that the serial interval is known and \(R_t\), \(s_t\) are computed by inverting the above renewal equation using signal processing techniques. The mathematical foundation of EpiInvert has been published in PNAS, 2021 and Biology, 2022. Once \(R_t\) and \(s_t\) are computed, the trend component (or restored incidence) \(T_t\) and the normalized error \(\varepsilon_t\) are defined as
\((2) \ \ \ \ T_t=\sum_ks_{t-k}i_{t-k}R_{t-k}\Phi_k\)
\((3) \ \ \ \ \varepsilon_t = \frac{s_ti_t-T_t}{(T_t)^a}\)
where \(a\) is computed by minimizing the quadratic error
\((4) \ \ \ \ E(a,b)=\sum_t \left( log(|s_ti_t-T_t|) - a\cdot log(T_t)-b\right)^2\)
EpiInvert also corrects the bias introduced by the festive days (or any anomalous day) where the user expects that the incidence value es biased. To correct the bias introduced by the festive days EpiInvert automatically update the incidence value in each festive day and the next 2 days. Typically, in the festive (or anomalous day), one observes a sharp decrease in the number of registered incidence that is compensated by increased incidence numbers in the next few days. This bias is corrected by redistributing the number of cases in the festive day and the next 2 days.
In a nutshell, EpiInvert is based on estimating all the variables involved, that is \(s_t\), \(R_t\) and the values of \(i_t\) in the festive days and the next 2 days in order to minimize the difference between the incidence and its expected value using the renewal equation.
To model the serial interval, EpiInvert allows a shifted log-normal parametric formulation. The shift can be negative, reflecting the fact that secondary cases may present symptoms earlier than the primary case. The user can also provide a non-parametric serial interval given by a numeric vector.
We use, owid, a dataset containing COVID-19
epidemiological indicators for Canada, France, Germany, Italy, UK and
the USA obtained from Our
World in data up to 2022-11-28. In the case a data value is not
available for a given day we assign the value 0 to the indicator.
owid is a dataframe
containing the
following variables :
iso_code
: iso code of the countrylocation
: country namedate
: date of the indicator valuenew_cases
: new confirmed casesnew_cases_smoothed
: new confirmed cases smoothed (as
provided by Our
World in data)new_cases_restored_EpiInvert
: new confirmed cases
restored using EpiInvert
new_deaths
: new deaths attributed to COVID-19new_deaths_smoothed
: new deaths smoothed (as provided
by Our
World in data)new_deaths_restored_EpiInvert
: new deaths restored
using EpiInvert
icu_patients
: number of COVID-19 patients in intensive
care units (ICUs) on a given dayhosp_patients
: number of COVID-19 patients in hospital
on a given dayweekly_icu_admissions
: number of COVID-19 patients
newly admitted to intensive care units (ICUs) in a given week (reporting
date and the preceding 6 days)weekly_hosp_admissions
: number of COVID-19 patients
newly admitted to hospitals in a given week (reporting date and the
preceding 6 days)Not all countries have recorded values for all indicators in this database. For example, France and Italy have data for all the indicators, but the rest of the countries do not. Therefore, before using the data from a country, it is convenient to analyze, by exploring the dataset, which indicators have values other than zero.
You can install EpiInvert from the CRAN repository or from the GitHub repository with:
install.packages("devtools")
devtools::install_github("lalvarezmat/EpiInvert")
The EpiInvert installation from the source file requires a C++ compilation. To avoid this compilation step you can use the pre-compiled versions available at the CRAN repository.
We attach some required packages
library(EpiInvert)
library(ggplot2)
library(dplyr)
library(grid)
We load the owid dataset:
data(owid)
summary(owid)
We filter the owid dataset to keep the data up to 2022-05-05:
owid <- filter(owid,date<=as.Date("2022-05-05"))
Loading some festive days for the same countries:
data(festives)
head(festives)
We show the execution of EpiInvert using Germany data. The first parameter is a numerical vector with the daily incidence, the second parameter is the date of the last incidence value and the third parameter is a character vector with the dates of the festive (or anomalous) days (this parameter is not mandatory)
sel <- filter(owid, iso_code=="DEU")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$DEU)
Plotting the results:
EpiInvert_plot(res)
EpiInvert returns a list with the following elements:
i_original : the original daily incidence curve. Notice that EpiInvert does not allow missing values. On days when a country does not report data, a zero must be registered as the value associated with the incidence of that day.
i_festive : the festive days bias free incidence, \(i_t\), with the correction of the bias introduced by the declared festive days.
i_bias_free : the festive days and weekly biases free incidence given by \(s_ti_t\) where \(s_t\) is the seasonal component and \(i_t\) the festive days bias free incidence
i_restored : the restored incidence (or trend), \(T_t\) (see equation (2)).
Rt : time varying reproduction number \(R_t\) (see equation (1)).
Rt_CI95 : to estimate Rt on each day t, EpiInvert uses the past days (t’<=t) and the future days (t’>t) when available. Therefore, the EpiInvert estimate of Rt varies when more days are available. Rt_CI95 represents the radius of an empiric 95% confidence interval of the expected variation of Rt as a function of the number of days after t available (see the plot of Rt above).
seasonality : the seasonal component \(s_t\) (see equation (1).
dates : the date associated with each incidence value.
festive : a Boolean vector to indicate the days considered as festive.
epsilon : the normalized error given by equation (3).
power_a : the power a in equation (3). Note that this value strongly depends on the size of the incidence curve used by the EpiInvert estimation. As we shall see later, this size is an EpiInvert parameter. The estimated value of a only has sense in the case of a large incidence sequence.
si_distr : numeric vector with the serial interval distribution used, it can be computed from the shifted parametric log-normal or it can be uploaded by the user
shift_si_distr : shift of the serial interval used.
EpiInvert allows to control the regularity of the seasonal component \(s_t\) and/or \(R_t\) using regularity parameters, the default values of both parameters are \(5\). Next we apply EpiInvert to Germany incidence modifying these parameters:
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$DEU,select_params(list(seasonality_regularization_weight = 10,Rt_regularization_weight=1)))
EpiInvert_plot(res)
With respect to the previous experiment we observe a larger variability
of \(R_t\) and a lower variation of
\(s_t-s_{t-7}\).
EpiInvert execution for France using 365 days in the past. If you are not constrained by the computational cost of the algorithm, you can choose a large value of this parameter (for instance 9999), to ensure that EpiInvert will use the whole available sequence in the estimation.
sel <- filter(owid, iso_code=="FRA")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$FRA,
select_params(list(max_time_interval = 365)))
Plot of the incidence between “2021-12-15” and “2022-01-15”. Observe that the festive days bias correction only modifies the original incidence in the festive days and the following 2 days.
EpiInvert_plot(res,"incid","2021-12-15","2022-01-15")
EpiInvert execution for UK using a non-parametric serial interval shifted -2 days
load data of a serial interval
data(si_distr_data)
head(si_distr_data)
## [1] 3.285609e-06 3.401902e-04 3.904441e-03 1.543537e-02 3.466818e-02
## [6] 5.608451e-02
sel <- filter(owid, iso_code=="GBR")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$UK,
select_params(list(si_distr = si_distr_data,
shift_si_distr=-2)))
Plot of the serial interval used (including the shift)
EpiInvert_plot(res,"SI")
EpiInvert execution for the USA changing the default values of the parametric serial interval (using a shifted log-normal)
sel <- filter(owid, iso_code=="USA")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$USA,
select_params(list(mean_si = 11,sd_si=6,shift_si=-1)))
Plot of the reproduction number Rt including an empiric 95% confidence interval of the variation of EpiInvert Rt estimation as a function of the number of future days available.
EpiInvert_plot(res,"R")