Predict the growth rate at horizon \(k\) using the past during the last \(l\) days of growth rate
The motivation for using a functional regression model to predict CoViD19 cases arises from the classic SIR epidemiological model, which in the part specifically dealing with infections proposes the equation: \[\frac{dI}{dt}=\beta S I-\gamma I\] where \(\beta\) is the infection rate, \(\gamma\) is the recovery rate, \(S\) is the population susceptible to infection and \(I\)is the number of infections. Rewriting the above equation we obtain \[GR=\frac{dI/dt}{I}=\beta S - \gamma\] which signals that the growth rate of infections (\(GR\)) is a function of \(S\). If we extend this idea and discretize the above equation, we can pose the following functional regression model: \[ GR_{t+h}=f(S_{t-l}^t,GR_{t-l}^t,I_{t-l}^t,\ldots)+\epsilon_{t+h} \] where the notation \(X_{t-l}^t\) refers to process \(X\) in the interval \([t-l,t]\) and \[GR_{t+h}=\frac{I_{t+h}-I_{t}}{I_{t}+c}\] (divided by \(I_{t}+c\) to not divide by zero) with \(h\) as the prediction horizon.
The function \(f\) is the way to link the scalar response with the functional covariates for which the literature on functional data provides several possibilities.
Spanish region dataset. Confirmed, hospitalised, Intensive care units (ICU), deaths and recovered cases by Autonomous Community of Spain available at Situation of COVID-19 in Spain from Instituto de Salud Carlos III. Data updated daily in DATA2. The structure of this file is not stable over time. The current variables are: CCAA, FECHA, CASOS, PCR+, TestAc+, Hospitalizados, UCI, Fallecidos, Recuperados. Please read the notes at the end of the CSV.
Italian region dataset. Confirmed, hospitalised, Intensive care units (ICU), deaths and recovered cases by regions of Italy available at COVID-19 Italia - Monitoraggio situazioneDipartimento della Protezione Civile from Presidenza del Consiglio dei Ministri - Dipartimento della Protezione Civile. Data updated daily in DATA3.
Catalonia region dataset. These data come from the RSAcovid19 record from the Health Department and show data from the accumulated positive cases, which are those that tested positive on some diagnostic test (PCR or fast test). It also includes data from the accumulated suspicious cases corresponding to people who presented symptoms at some point and a sanitary professional has classified them as a possible case, but they do not have a diagnostic test (PCR or fast test) with a positive result. The surveillance service activated all the cases and they identified the person’s residence zone indicated on each sanitary card. Information is updated in open data daily at Dades obertes de Catalunya.
Madrid region dataset Portal de Datos Abiertos de la Comunidad de Madrid and new app
The availability of quality updated data conditions the selection of the training sample as well as the resolution at which the prediction/estimation/forecast may be made.
Note: new active cases can be negative for some days, if on this day there were more new recoveries \(+\) deaths cases than there were new confirmed cases.
Related with the idea of “flattening the curve”, we consider the curve (\(r_{1}^{(j)}(t)\)) that captures how growth rate changes over time. Besides, we smooth this signal to avoid the effect of sudden changes in notification (such as the weekend effect).
Objective: Predict the growth rate at horizon \(k\) using the past during the last 15 days of growth rate H\(_1\):
\[R_{1}(0)=\{r_1^{(j)}(-14),\ldots,r_1^{(j)}(0)\}\]
Data Incidences only from Instituto de Salud Carlos III (ISCIII)
The file obtained from Instituto de Salud Carlos III (ISCIII) has suffer changes along time in the units of the variables. Typically, the historical data is not reconstructed.
All these models are implemented in the fda.usc packages (Febrero-Bande and Oviedo de la Fuente 2012)
fregre.lm, Lineal Model (FLM)(Cardot, Ferraty, and Sarda 1999): The linear operator is used into functional space \(f(X_{a}^b,Y_{c}^d,\ldots)=\alpha+\int_{a}^b\beta_X(t)X(t)dt+\int_c^d\beta_Y(t)Y(t)dt+\ldots\)
fregre.gsam, Spectral Additive Model (FSAM)(Műller and Yao 2008): Given one finite representation of the curves on a basis of Hilbert space \(X_{t-l+1}^t\approx\sum_{k=1}^{K_X} c_k^X\phi_k^X\) then \(f(X_{a}^b,Y_{c}^d,\ldots)=\alpha+\sum_{k=1}^{K_X}f_k^X(c_k^X)+\sum_{k=1}^{K_Y}f_k^Y(c_k^Y)+\ldots\) where the functions \(f_k^X(c_k^X)\) e \(f_k^Y(c_k^Y)\) of the scalar coefficients of the basis representation are smooth.
fregre.gkam, Additive Kernel Model (FKAM)(Febrero-Bande and González-Manteiga 2013): \(f(X_{a}^b,Y_{c}^d,\ldots)=\alpha+f_X(X_{a}^b)+f_Y(Y_{c}^d)+\ldots\) where the function \(f_X\)(resp. \(f_Y\)) is estimated using a functional Kernel.
Other models
More information available in Informest
Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions, see R Shiny
Shiny apps are easy to write. No web development skills are required.
Shinyapps.io Host your Shiny apps on the web in minutes with Shinyapps.io. It is easy to use, secure, and scalable. No hardware, installation, or annual purchase contract required. Free and paid options available.
Deploy your Shiny apps and interactive documents on-premises with open source Shiny Server, which offers features such as multiple apps on a single server and deployment of apps behind firewalls.
RStudio server RStudio Server enables you to provide a browser based interface to a version of R running on a remote Linux server, bringing the power and productivity of the RStudio IDE to server-based deployments of R.
R Markdown Analyze. Share. Reproduce
htmlwidgets Embed widgets in R Markdown documents and Shiny web applications
readxl Read Excel Files
jsonlite A reasonably fast JSON parser and generator, optimized for statistical data and the web
DT:::datatable creates an HTML widget to display R data objects
foreign Reading and writing data stored by some versions of ‘Epi Info’, ‘Minitab’, ‘S’, ‘SAS’, ‘SPSS’, ‘Stata’, ‘Systat’, ‘Weka’, and for reading and writing some ‘dBase’ files.
tabulizer extracts Tables from PDFs in R
dygraphs: Automatically plots xts time series objects (or any object convertible to xts).
leaflet: Embed maps in knitr/R Markdown documents and Shiny apps
plotly Plotly’s R graphing library makes interactive, publication-quality graphs
dplyr dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges
rgdal rgdal: Bindings for the ‘Geospatial’ Data Abstraction Library
ggplot2 Pyramid plot in R
googleAnalyticsR R library for working with Google Analytics data
maptools Exploratory spatial data analysis is a set of techniques to describe and visualize spatial distributions, identify atypical locations or spatial outliers, discover patterns of sptial association, clusters or hot spots, and suggest spatial regimes or other forms of spatial heterogeneity (Dell’arba, 2005: Anselin, 1988)
This work has been supported by Project MTM2016-76969-P from Ministerio de Economía y Competitividad - Agencia Estatal de Investigación and European Regional Development Fund (ERDF) and IAP network StUDyS from Belgian Science Policy.
Cardot, Hervé, Frédéric Ferraty, and Pascal Sarda. 1999. “Functional Linear Model.” Statistics & Probability Letters 45 (1): 11–22.
Chiou, Jeng-Min, Hans-Georg Muller, Jane-Ling Wang, and others. 2004. “Functional Response Models.” Statistica Sinica 14 (3): 675–94.
Febrero-Bande, Manuel, and Wenceslao González-Manteiga. 2013. “Generalized Additive Models for Functional Data.” Test 22 (2): 278–92. http://dx.doi.org/10.1007/s11749-012-0308-0.
Febrero-Bande, Manuel, and M Oviedo de la Fuente. 2012. “Statistical Computing in Functional Data Analysis: The R Package fda.usc.” J. Statist. Software 51 (4): 1–28.
Műller, HG, and F Yao. 2008. “Functional Additive Model.” J Am Stat Assoc 103: 1534–44.
Oviedo de la Fuente, Manuel, Manuel Febrero-Bande, Marı́a Pilar Muñoz, and Àngela Domı́nguez. 2018. “Predicting Seasonal Influenza Transmission Using Functional Regression Models with Temporal Dependence.” PloS One 13 (4): e0194250.