This paper presents a block-based deep neural architecture for univariate time series point forecasting that is similar in its philosophy to very deep models (e.g. ResNet) used in more common deep learning applications such as image recognition. Furthermore, the authors demonstrate how their approach can be used to build predictive models that are interpretable.
What can we learn from this paper?
That it is possible to build a pure deep learning model for time series predictions that can take into account long-term trends and seasonality and beat the accuracy of existing models that combine ML and statistical approaches on common datasets.
Prerequisites (to understand the paper, what does one need to be familiar with?)
- Time series forecasting
- Residual neural networks
To automate time-series forecasting within a fully ML-based framework, while retaining the interpretability (such as expressed by seasonality and trend components) of statistical models.
The entire suggested deep architecture is created using the same type of basic building block (shown in blue in the picture). Each block has one input and two outputs (called forecast and backcast). The global model output is the sum of forecasts of all blocks in the network, while the backcasts are subtracted from the previous inputs to create the input for the following block. This is done in a way that the next input does not contain the part that has already been predicted by the previous block, meaning that the following blocks can concentrate on the parts that are still unexplained.
Each block has four fully connected layers with ReLU activations before splitting into forecast and backcast branches. Each branch has one more fully connected layer without activation, and then a linear basis layer that can be either learned or instead engineered to account for different effects such as trend and seasonality. Since the overall global output is a simple sum of partial outputs of each block, knowing the nature of each basis layer allows the user to estimate the contribution of each component, thus providing interpretability.
Essentially, M identical stacks with K blocks in each, as in the suggested model, could be represented by a simple MxK block sequence. However, when separated into stacks, all blocks within each stack can share learnable parameters, resulting in better performance. In addition, each stack can be structured in a given way (e.g. a trend block followed by a seasonality block), both for interpretability and better forecasting.
The developed model was applied to a number of benchmark datasets, such as M3, M4, and Tourism (also here) datasets, in each case obtaining prediction accuracy that was better than or comparable with the current state of the art. To improve performance, the authors ensembled multiple predictions using different initial random activations, input windows of different lengths, and different fitness metrics.
A Python implementation of the suggested model (not by the paper authors) can be found here.
Overall, it seems like a very important paper that advances our ability to perform time-series predictions using deep learning.