Overview

The COVID-19 pandemy has created a radically new situation where most countries provide raw measurements of their daily incidence and disclose them in real time. This enables new machine learning forecast strategies where the prediction might no longer be based just on the past values of the current incidence curve, but could take advantage of observations in many countries. We present such a simple global machine learning procedure using all past daily incidence trend curves. Each of the 27,418 COVID-19 incidence trend curves in our database contains the values of 56 consecutive days extracted from observed incidence curves across 61 word regions and countries. Given a current incidence trend curve observed over the past four weeks, its forecast in the next four weeks is computed by matching it with the first four weeks of all samples, and ranking them by their similarity to the query curve. Then the 28 days forecast is obtained by a statistical estimation combining the values of the 28 last observed days in those similar samples. Using comparison performed by the European Covid-19 Forecast Hub with the current state of the art forecast methods, we verify that the proposed global learning method, EpiLearn, compares favorably to methods forecasting from a single past curve. In the R package implementation EpiLearn corresponds to the EpiInverForecast functionality. For a more detailed description of the method see EpiInvertForecast, 2022

owid dataset

We use, owid, a dataset containing COVID-19 epidemiological indicators for Canada, France, Germany, Italy, UK and the USA obtained from Our World in data up to 2022-11-28. In the case a data value is not available for a given day we assign the value 0 to the indicator. owid is a dataframe containing the following variables :

Not all countries have recorded values for all indicators in this database. For example, France and Italy have data for all the indicators, but the rest of the countries do not. Therefore, before using the data from a country, it is convenient to analyze, by exploring the dataset, which indicators have values other than zero.

Package installation

EpiInvertForecast is included in the EpiInvert CRAN package , so you can install EpiIvert directly from CRAN. You can also install the development version of EpiInvert from GitHub with:

 install.packages("devtools")
 devtools::install_github("lalvarezmat/EpiInvert")

We attach some required packages

library(ggplot2)
library(grid)
library(EpiInvert)
library(tidyverse)
devtools::load_all(".")

We load the owid dataset:

data(owid)
summary(owid)
##    iso_code           location             date             new_cases      
##  Length:6007        Length:6007        Length:6007        Min.   :      0  
##  Class :character   Class :character   Class :character   1st Qu.:   2058  
##  Mode  :character   Mode  :character   Mode  :character   Median :  11336  
##                                                           Mean   :  37533  
##                                                           3rd Qu.:  41396  
##                                                           Max.   :1355242  
##  new_cases_smoothed new_cases_restored_EpiInvert   new_deaths    
##  Min.   :     0     Min.   :     7               Min.   :   0.0  
##  1st Qu.:  3180     1st Qu.:  3244               1st Qu.:  31.0  
##  Median : 15233     Median : 15750               Median : 104.0  
##  Mean   : 37318     Mean   : 37499               Mean   : 305.9  
##  3rd Qu.: 42573     3rd Qu.: 43033               3rd Qu.: 310.0  
##  Max.   :806898     Max.   :854609               Max.   :4389.0  
##  new_deaths_smoothed new_deaths_restored_EpiInvert  icu_patients  
##  Min.   :   0.0      Min.   :   0.0                Min.   :    0  
##  1st Qu.:  40.0      1st Qu.:  41.0                1st Qu.:  326  
##  Median : 113.0      Median : 112.0                Median :  932  
##  Mean   : 305.1      Mean   : 305.8                Mean   : 2642  
##  3rd Qu.: 333.0      3rd Qu.: 342.5                3rd Qu.: 2755  
##  Max.   :3380.0      Max.   :3375.0                Max.   :28891  
##  hosp_patients      weekly_icu_admissions weekly_hosp_admissions
##  Min.   :     0.0   Min.   :   0.0        Min.   :     0        
##  1st Qu.:   977.5   1st Qu.:   0.0        1st Qu.:   552        
##  Median :  6629.0   Median :   0.0        Median :  4590        
##  Mean   : 13940.8   Mean   : 324.7        Mean   : 10666        
##  3rd Qu.: 19734.0   3rd Qu.: 351.0        3rd Qu.: 10618        
##  Max.   :154497.0   Max.   :4838.0        Max.   :153977

We filter the owid dataset to keep the data up to 2022-05-05:

owid <- owid %>%
  filter(date<=as.Date("2022-05-05"))

Loading some festive days for the same countries:

data(festives)
head(festives)
##          USA        DEU        FRA         UK
## 1 2020-01-01 2020-01-01 2020-01-01 2020-01-01
## 2 2020-01-20 2020-04-10 2020-04-10 2020-04-10
## 3 2020-02-17 2020-04-13 2020-04-13 2020-04-13
## 4 2020-05-25 2020-05-01 2020-05-01 2020-05-08
## 5 2020-06-21 2020-05-21 2020-05-08 2020-05-25
## 6 2020-07-03 2020-06-01 2020-05-21 2020-06-21

Loading the restored incidence curve database used by EpiInvertForecast. This database contains the last 56 values of the restored incidence curves obtained by 27,418 executions of EpiInvert using real data. The format of this database is a 27,418 X 56 matrix. Each restored incidence curve in the database is normalized (multiplying by a scale factor) in order to the average of the first 28 values be equal to 1. To compare the curves of the database with the current curve we normalize the current curve in the same way.

data(restored_incidence_database)
head(restored_incidence_database)
##         V1    V2    V3    V4    V5    V6    V7    V8    V9   V10   V11   V12
## [1,] 1.838 1.759 1.675 1.585 1.491 1.400 1.318 1.245 1.181 1.124 1.072 1.022
## [2,] 1.815 1.748 1.677 1.602 1.519 1.431 1.346 1.267 1.198 1.136 1.080 1.028
## [3,] 1.742 1.697 1.647 1.594 1.537 1.474 1.402 1.323 1.245 1.172 1.105 1.045
## [4,] 1.719 1.680 1.634 1.583 1.528 1.470 1.407 1.337 1.261 1.185 1.115 1.050
## [5,] 1.677 1.645 1.607 1.563 1.514 1.461 1.405 1.345 1.277 1.204 1.131 1.063
## [6,] 1.626 1.600 1.570 1.535 1.493 1.447 1.397 1.344 1.286 1.221 1.151 1.082
##        V13   V14   V15   V16   V17   V18   V19   V20   V21   V22   V23   V24
## [1,] 0.974 0.928 0.882 0.839 0.798 0.760 0.725 0.693 0.664 0.638 0.614 0.592
## [2,] 0.979 0.930 0.883 0.838 0.795 0.755 0.717 0.683 0.652 0.624 0.599 0.577
## [3,] 0.989 0.935 0.884 0.836 0.790 0.747 0.707 0.670 0.637 0.608 0.582 0.560
## [4,] 0.992 0.938 0.888 0.840 0.795 0.752 0.711 0.674 0.640 0.610 0.583 0.559
## [5,] 1.002 0.946 0.894 0.847 0.802 0.760 0.719 0.681 0.646 0.615 0.587 0.561
## [6,] 1.017 0.959 0.906 0.858 0.813 0.771 0.731 0.693 0.657 0.625 0.595 0.568
##        V25   V26   V27   V28   V29   V30   V31   V32   V33   V34   V35   V36
## [1,] 0.572 0.554 0.537 0.520 0.504 0.488 0.471 0.455 0.440 0.424 0.409 0.395
## [2,] 0.556 0.538 0.522 0.506 0.491 0.477 0.462 0.447 0.433 0.418 0.404 0.390
## [3,] 0.541 0.525 0.510 0.496 0.483 0.470 0.458 0.446 0.433 0.419 0.404 0.389
## [4,] 0.539 0.520 0.504 0.490 0.477 0.465 0.453 0.442 0.429 0.417 0.403 0.390
## [5,] 0.539 0.520 0.503 0.487 0.474 0.461 0.449 0.437 0.426 0.413 0.400 0.387
## [6,] 0.544 0.522 0.504 0.487 0.472 0.458 0.445 0.433 0.421 0.408 0.396 0.383
##        V37   V38   V39   V40   V41   V42   V43   V44   V45   V46   V47   V48
## [1,] 0.381 0.369 0.358 0.349 0.340 0.332 0.323 0.318 0.317 0.322 0.332 0.346
## [2,] 0.377 0.364 0.353 0.342 0.334 0.326 0.317 0.309 0.303 0.301 0.305 0.312
## [3,] 0.374 0.360 0.346 0.333 0.321 0.309 0.299 0.290 0.281 0.273 0.266 0.264
## [4,] 0.376 0.363 0.349 0.337 0.324 0.313 0.303 0.294 0.286 0.278 0.271 0.265
## [5,] 0.374 0.360 0.347 0.334 0.322 0.310 0.299 0.289 0.281 0.274 0.266 0.259
## [6,] 0.369 0.356 0.342 0.329 0.316 0.304 0.293 0.282 0.273 0.265 0.258 0.251
##        V49   V50   V51   V52   V53   V54   V55   V56
## [1,] 0.363 0.383 0.405 0.430 0.454 0.479 0.506 0.539
## [2,] 0.323 0.335 0.350 0.367 0.385 0.401 0.416 0.430
## [3,] 0.265 0.270 0.278 0.288 0.300 0.313 0.327 0.342
## [4,] 0.263 0.265 0.270 0.278 0.288 0.299 0.311 0.323
## [5,] 0.253 0.251 0.253 0.258 0.265 0.274 0.284 0.294
## [6,] 0.244 0.238 0.236 0.238 0.242 0.248 0.256 0.265

Incidence short time forecasting in France

First, we apply EpiInvert to the France incidence data (for more information about the EpiInvert usage see the EpiInvert vignette

sel <- filter(owid, iso_code=="FRA")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$FRA)

We plot the results of the obtained incidences in the last 28 days

 EpiInvert_plot(res,"incid","2022-04-08","2022-05-05")

Next we execute EpiInvertForecast. Notice that EpiInvertForecast has 3 parameters: (1) the outcome of the EpiInvert execution, (2) the restored incidence database and (3) the forecast option that can be “mean” or “median”.

forecast <-  EpiInvertForecast(res,restored_incidence_database,"mean")

We plot the forecast results.

 EpiInvertForecast_plot(res,forecast)

Next, we use the “median” forecast option

forecast <-  EpiInvertForecast(res,restored_incidence_database,"median")
 EpiInvertForecast_plot(res,forecast)

We note that the predictions using the mean and median options can be quite different due to the asymmetry of the distribution of the expected value each forecast day. This asymmetry is observed in the confidence interval shown in the shaded area in the figures.

EpiInvertForecast returns a list with the following elements:

Incidence short time forecasting in Germany

Next we apply the same procedure to the Germany data:

EpiInvert execution:

sel <- filter(owid, iso_code=="DEU")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$DEU)

Plotting the results:

 EpiInvert_plot(res,"incid","2022-04-08","2022-05-05")

EpiInvertForecast execution with the “mean” option

forecast <-  EpiInvertForecast(res,restored_incidence_database,"mean")
 EpiInvertForecast_plot(res,forecast)

EpiInvertForecast execution with the “median” option

forecast <-  EpiInvertForecast(res,restored_incidence_database,"median")
 EpiInvertForecast_plot(res,forecast)

Incidence short time forecasting in the USA

Next we apply the same procedure to the USA data:

EpiInvert execution:

sel <- filter(owid, iso_code=="USA")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$USA)

Plotting the results:

 EpiInvert_plot(res,"incid","2022-04-08","2022-05-05")

EpiInvertForecast execution with the “mean” option

forecast <-  EpiInvertForecast(res,restored_incidence_database,"mean")
 EpiInvertForecast_plot(res,forecast)

EpiInvertForecast execution with the “median” option

forecast <-  EpiInvertForecast(res,restored_incidence_database,"median")
 EpiInvertForecast_plot(res,forecast)

Incidence short time forecasting in the UK

Next we apply the same procedure to the UK data:

EpiInvert execution:

sel <- filter(owid, iso_code=="GBR")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$UK)

Plotting the results:

 EpiInvert_plot(res,"incid","2022-04-08","2022-05-05")

EpiInvertForecast execution with the “mean” option

forecast <-  EpiInvertForecast(res,restored_incidence_database,"mean")
 EpiInvertForecast_plot(res,forecast)

EpiInvertForecast execution with the “median” option

forecast <-  EpiInvertForecast(res,restored_incidence_database,"median")
 EpiInvertForecast_plot(res,forecast)

Incidence short time forecasting including trend sentiment

Next we show an example including the “a priori” trend sentiment. Assume that we believe, for any reason, that the future evolution of the incidence is going to be higher than the expected using EpiInvertForecast with all curve database. We can use the trend_sentiment parameter to add this information to the Forecast. This parameter represent the percentage of database curves that we remove before computing the forecast. The curves that we remove from the database are the ones with lowest growth in the last 28 days.

We use tha case of USA, and we fix trend_sentiment=0.25, which means that we remove the 25% of database curves initially selected, before computing the median of the curves.

trend_sentiment <- 0.25
sel <- filter(owid, iso_code=="USA")
res <- EpiInvert(sel$new_cases,"2022-05-05",festives$USA)
forecast <-  EpiInvertForecast(res,restored_incidence_database,"median",trend_sentiment)
 EpiInvertForecast_plot(res,forecast)