The COVID-19 pandemy has created a radically new situation where most countries provide raw measurements of their daily incidence and disclose them in real time. This enables new machine learning forecast strategies where the prediction might no longer be based just on the past values of the current incidence curve, but could take advantage of observations in many countries. We present such a simple global machine learning procedure using all past daily incidence trend curves. Each of the 27,418 COVID-19 incidence trend curves in our database contains the values of 56 consecutive days extracted from observed incidence curves across 61 word regions and countries. Given a current incidence trend curve observed over the past four weeks, its forecast in the next four weeks is computed by matching it with the first four weeks of all samples, and ranking them by their similarity to the query curve. Then the 28 days forecast is obtained by a statistical estimation combining the values of the 28 last observed days in those similar samples. Using comparison performed by the European Covid-19 Forecast Hub with the current state of the art forecast methods, we verify that the proposed global learning method, EpiLearn, compares favorably to methods forecasting from a single past curve. In the R package implementation EpiLearn corresponds to the EpiInverForecast functionality. For a more detailed description of the method see EpiInvertForecast, 2022
We use, owid, a dataset containing COVID-19
epidemiological indicators for Canada, France, Germany, Italy, UK and
the USA obtained from Our
World in data up to 2022-11-28. In the case a data value is not
available for a given day we assign the value 0 to the indicator.
owid is a dataframe
containing the
following variables :
iso_code
: iso code of the countrylocation
: country namedate
: date of the indicator valuenew_cases
: new confirmed casesnew_cases_smoothed
: new confirmed cases smoothed (as
provided by Our
World in data)new_cases_restored_EpiInvert
: new confirmed cases
restored using EpiInvert
new_deaths
: new deaths attributed to COVID-19new_deaths_smoothed
: new deaths smoothed (as provided
by Our
World in data)new_deaths_restored_EpiInvert
: new deaths restored
using EpiInvert
icu_patients
: number of COVID-19 patients in intensive
care units (ICUs) on a given dayhosp_patients
: number of COVID-19 patients in hospital
on a given dayweekly_icu_admissions
: number of COVID-19 patients
newly admitted to intensive care units (ICUs) in a given week (reporting
date and the preceding 6 days)weekly_hosp_admissions
: number of COVID-19 patients
newly admitted to hospitals in a given week (reporting date and the
preceding 6 days)Not all countries have recorded values for all indicators in this database. For example, France and Italy have data for all the indicators, but the rest of the countries do not. Therefore, before using the data from a country, it is convenient to analyze, by exploring the dataset, which indicators have values other than zero.
EpiInvertForecast is included in the EpiInvert CRAN package , so you can install EpiIvert directly from CRAN. You can also install the development version of EpiInvert from GitHub with:
install.packages("devtools")
devtools::install_github("lalvarezmat/EpiInvert")
We attach some required packages
library(ggplot2)
library(grid)
library(EpiInvert)
library(tidyverse)
devtools::load_all(".")
We load the owid dataset:
data(owid)
summary(owid)
## iso_code location date new_cases
## Length:6007 Length:6007 Length:6007 Min. : 0
## Class :character Class :character Class :character 1st Qu.: 2058
## Mode :character Mode :character Mode :character Median : 11336
## Mean : 37533
## 3rd Qu.: 41396
## Max. :1355242
## new_cases_smoothed new_cases_restored_EpiInvert new_deaths
## Min. : 0 Min. : 7 Min. : 0.0
## 1st Qu.: 3180 1st Qu.: 3244 1st Qu.: 31.0
## Median : 15233 Median : 15750 Median : 104.0
## Mean : 37318 Mean : 37499 Mean : 305.9
## 3rd Qu.: 42573 3rd Qu.: 43033 3rd Qu.: 310.0
## Max. :806898 Max. :854609 Max. :4389.0
## new_deaths_smoothed new_deaths_restored_EpiInvert icu_patients
## Min. : 0.0 Min. : 0.0 Min. : 0
## 1st Qu.: 40.0 1st Qu.: 41.0 1st Qu.: 326
## Median : 113.0 Median : 112.0 Median : 932
## Mean : 305.1 Mean : 305.8 Mean : 2642
## 3rd Qu.: 333.0 3rd Qu.: 342.5 3rd Qu.: 2755
## Max. :3380.0 Max. :3375.0 Max. :28891
## hosp_patients weekly_icu_admissions weekly_hosp_admissions
## Min. : 0.0 Min. : 0.0 Min. : 0
## 1st Qu.: 977.5 1st Qu.: 0.0 1st Qu.: 552
## Median : 6629.0 Median : 0.0 Median : 4590
## Mean : 13940.8 Mean : 324.7 Mean : 10666
## 3rd Qu.: 19734.0 3rd Qu.: 351.0 3rd Qu.: 10618
## Max. :154497.0 Max. :4838.0 Max. :153977
We filter the owid dataset to keep the data up to 2022-05-05:
owid <- owid %>%
filter(date<=as.Date("2022-05-05"))
Loading some festive days for the same countries:
data(festives)
head(festives)
## USA DEU FRA UK
## 1 2020-01-01 2020-01-01 2020-01-01 2020-01-01
## 2 2020-01-20 2020-04-10 2020-04-10 2020-04-10
## 3 2020-02-17 2020-04-13 2020-04-13 2020-04-13
## 4 2020-05-25 2020-05-01 2020-05-01 2020-05-08
## 5 2020-06-21 2020-05-21 2020-05-08 2020-05-25
## 6 2020-07-03 2020-06-01 2020-05-21 2020-06-21
Loading the restored incidence curve database used by EpiInvertForecast. This database contains the last 56 values of the restored incidence curves obtained by 27,418 executions of EpiInvert using real data. The format of this database is a 27,418 X 56 matrix. Each restored incidence curve in the database is normalized (multiplying by a scale factor) in order to the average of the first 28 values be equal to 1. To compare the curves of the database with the current curve we normalize the current curve in the same way.
data(restored_incidence_database)
head(restored_incidence_database)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
## [1,] 1.838 1.759 1.675 1.585 1.491 1.400 1.318 1.245 1.181 1.124 1.072 1.022
## [2,] 1.815 1.748 1.677 1.602 1.519 1.431 1.346 1.267 1.198 1.136 1.080 1.028
## [3,] 1.742 1.697 1.647 1.594 1.537 1.474 1.402 1.323 1.245 1.172 1.105 1.045
## [4,] 1.719 1.680 1.634 1.583 1.528 1.470 1.407 1.337 1.261 1.185 1.115 1.050
## [5,] 1.677 1.645 1.607 1.563 1.514 1.461 1.405 1.345 1.277 1.204 1.131 1.063
## [6,] 1.626 1.600 1.570 1.535 1.493 1.447 1.397 1.344 1.286 1.221 1.151 1.082
## V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24
## [1,] 0.974 0.928 0.882 0.839 0.798 0.760 0.725 0.693 0.664 0.638 0.614 0.592
## [2,] 0.979 0.930 0.883 0.838 0.795 0.755 0.717 0.683 0.652 0.624 0.599 0.577
## [3,] 0.989 0.935 0.884 0.836 0.790 0.747 0.707 0.670 0.637 0.608 0.582 0.560
## [4,] 0.992 0.938 0.888 0.840 0.795 0.752 0.711 0.674 0.640 0.610 0.583 0.559
## [5,] 1.002 0.946 0.894 0.847 0.802 0.760 0.719 0.681 0.646 0.615 0.587 0.561
## [6,] 1.017 0.959 0.906 0.858 0.813 0.771 0.731 0.693 0.657 0.625 0.595 0.568
## V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36
## [1,] 0.572 0.554 0.537 0.520 0.504 0.488 0.471 0.455 0.440 0.424 0.409 0.395
## [2,] 0.556 0.538 0.522 0.506 0.491 0.477 0.462 0.447 0.433 0.418 0.404 0.390
## [3,] 0.541 0.525 0.510 0.496 0.483 0.470 0.458 0.446 0.433 0.419 0.404 0.389
## [4,] 0.539 0.520 0.504 0.490 0.477 0.465 0.453 0.442 0.429 0.417 0.403 0.390
## [5,] 0.539 0.520 0.503 0.487 0.474 0.461 0.449 0.437 0.426 0.413 0.400 0.387
## [6,] 0.544 0.522 0.504 0.487 0.472 0.458 0.445 0.433 0.421 0.408 0.396 0.383
## V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48
## [1,] 0.381 0.369 0.358 0.349 0.340 0.332 0.323 0.318 0.317 0.322 0.332 0.346
## [2,] 0.377 0.364 0.353 0.342 0.334 0.326 0.317 0.309 0.303 0.301 0.305 0.312
## [3,] 0.374 0.360 0.346 0.333 0.321 0.309 0.299 0.290 0.281 0.273 0.266 0.264
## [4,] 0.376 0.363 0.349 0.337 0.324 0.313 0.303 0.294 0.286 0.278 0.271 0.265
## [5,] 0.374 0.360 0.347 0.334 0.322 0.310 0.299 0.289 0.281 0.274 0.266 0.259
## [6,] 0.369 0.356 0.342 0.329 0.316 0.304 0.293 0.282 0.273 0.265 0.258 0.251
## V49 V50 V51 V52 V53 V54 V55 V56
## [1,] 0.363 0.383 0.405 0.430 0.454 0.479 0.506 0.539
## [2,] 0.323 0.335 0.350 0.367 0.385 0.401 0.416 0.430
## [3,] 0.265 0.270 0.278 0.288 0.300 0.313 0.327 0.342
## [4,] 0.263 0.265 0.270 0.278 0.288 0.299 0.311 0.323
## [5,] 0.253 0.251 0.253 0.258 0.265 0.274 0.284 0.294
## [6,] 0.244 0.238 0.236 0.238 0.242 0.248 0.256 0.265
First, we apply EpiInvert to the France incidence data (for more information about the EpiInvert usage see the EpiInvert vignette
sel <- filter(owid, iso_code=="FRA")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$FRA)
We plot the results of the obtained incidences in the last 28 days
EpiInvert_plot(res,"incid","2022-04-08","2022-05-05")
Next we execute EpiInvertForecast. Notice that EpiInvertForecast has 3 parameters: (1) the outcome of the EpiInvert execution, (2) the restored incidence database and (3) the forecast option that can be “mean” or “median”.
forecast <- EpiInvertForecast(res,restored_incidence_database,"mean")
We plot the forecast results.
EpiInvertForecast_plot(res,forecast)
Next, we use the “median” forecast option
forecast <- EpiInvertForecast(res,restored_incidence_database,"median")
EpiInvertForecast_plot(res,forecast)
We note that the predictions using the mean and median options can be quite different due to the asymmetry of the distribution of the expected value each forecast day. This asymmetry is observed in the confidence interval shown in the shaded area in the figures.
EpiInvertForecast returns a list with the following elements:
i_restored_forecast : the forecast estimate of the restored incidence curve (the green line in the figure).
i_restored_forecast_CI025 : the 2.5% th-percentile of the distribution of the expected error between the incidence prediction and the true value computed using the database (it corresponds to the lower limit of the shaded area in the figure).
i_restored_forecast_CI975 : the 97.5% th-percentile of the distribution of the expected error between the incidence prediction and the true value computed using the database (it corresponds to the upper limit of the shaded area in the figure).
dates : the date of each forecast value.
i_original_forecast : the forecast of the original incidence curve obtained by dividing the forecast of the restored incidence curve by the weekly bias correction multiplicative factors obtained by EpiInvert (the blue line in the figure).
Next we apply the same procedure to the Germany data:
EpiInvert execution:
sel <- filter(owid, iso_code=="DEU")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$DEU)
Plotting the results:
EpiInvert_plot(res,"incid","2022-04-08","2022-05-05")
EpiInvertForecast execution with the “mean” option
forecast <- EpiInvertForecast(res,restored_incidence_database,"mean")
EpiInvertForecast_plot(res,forecast)
EpiInvertForecast execution with the “median” option
forecast <- EpiInvertForecast(res,restored_incidence_database,"median")
EpiInvertForecast_plot(res,forecast)
Next we apply the same procedure to the USA data:
EpiInvert execution:
sel <- filter(owid, iso_code=="USA")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$USA)
Plotting the results:
EpiInvert_plot(res,"incid","2022-04-08","2022-05-05")
EpiInvertForecast execution with the “mean” option
forecast <- EpiInvertForecast(res,restored_incidence_database,"mean")
EpiInvertForecast_plot(res,forecast)
EpiInvertForecast execution with the “median” option
forecast <- EpiInvertForecast(res,restored_incidence_database,"median")
EpiInvertForecast_plot(res,forecast)
Next we apply the same procedure to the UK data:
EpiInvert execution:
sel <- filter(owid, iso_code=="GBR")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$UK)
Plotting the results:
EpiInvert_plot(res,"incid","2022-04-08","2022-05-05")
EpiInvertForecast execution with the “mean” option
forecast <- EpiInvertForecast(res,restored_incidence_database,"mean")
EpiInvertForecast_plot(res,forecast)
EpiInvertForecast execution with the “median” option
forecast <- EpiInvertForecast(res,restored_incidence_database,"median")
EpiInvertForecast_plot(res,forecast)
Next we show an example including the “a priori” trend sentiment. Assume that we believe, for any reason, that the future evolution of the incidence is going to be higher than the expected using EpiInvertForecast with all curve database. We can use the trend_sentiment parameter to add this information to the Forecast. This parameter represent the percentage of database curves that we remove before computing the forecast. The curves that we remove from the database are the ones with lowest growth in the last 28 days.
We use tha case of USA, and we fix trend_sentiment=0.25, which means that we remove the 25% of database curves initially selected, before computing the median of the curves.
trend_sentiment <- 0.25
sel <- filter(owid, iso_code=="USA")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$USA)
forecast <- EpiInvertForecast(res,restored_incidence_database,"median",trend_sentiment)
EpiInvertForecast_plot(res,forecast)