top of page

Financial Time Series Use Case 

Developing accurate forecasting methodologies for financial time series remains one of the key research topics relevant from both a theoretical and applied viewpoint. Traditionally, researchers aimed at constructing a causal model, based on econometric modelling, that explains the variations in the specific time series as a function of other inputs. Yet, traditional approaches often struggle when it comes to modelling high-dimensional, non-linear landscapes often characterized with missing or sparse input space.

The Need for a New Approach to Explainability for Financial Time Series

Recently, deep learning (DL) has become highly popularized in many aspects of data science and has become increasingly applied to forecasting financial and economic time series. Recurrent methods are suited to time series modelling due to their memory state and their ability to learn relations through time; moreover, convolutional neural networks (CNN) are also able to build temporal relationships.


The literature offers various examples of the application of DL methods to stock and forex market forecasting, with results that significantly outperform traditional counterpartsThis was also confirmed in more recent installments of
the Makridakis Forecasting Competitions which have been held roughly once a decade since 1982 and have the objective of comparing the accuracy of different forecasting methods. A recurring conclusion from these competitions has been that traditional, simpler methods are often able to perform equally well as their more complex counterparts. This changed at the latest editions of the competition series, M4 and M5, where a hybrid Exponential Smoothing Recurrent Neural Network method and LightGBM, won the competitions, respectively.

The introduction of DL methods for financial time series forecasts potentially enables higher predictive accuracy but this comes at the cost of higher complexity and thus lower interpretability. DL models are referred to as “black boxes” because it is often difficult to understand how variables are jointly related to arrive at a certain output. This reduced ability to understand the inner workings and mechanisms of DL models unavoidably affects their trustworthiness and the willingness among practitioners to deploy such methods in sensitive domains such as finance. As a result, the scientific interest in the field of XAI had grown tremendously within the last few year.

The growing popularity of the topic notwithstanding, research on XAI in finance remains limited and most of the existing explainability techniques are not suited for time series, let alone for non-stationary financial time series and their somehow notorious stylized facts. Many state-of-the-art XAI methods are originally tailored for certain input types such as images (ex. Saliency Maps) or text (ex. LRP) and have later been adjusted to suit tabular data as well. However, the temporal dimension is often omitted and the literature currently offers only a limited consideration of the topic. Notable examples are interpretable decision trees for time series classification and using attention
mechanisms, with none of the applications looking specifically at explainability for financial data.

So, what is the SOLUTION? 

We propose a family of explainability functions that we call X functions for assigning a meaning to the outputs of complex models (in this case, neural nets) over time. In order to preserve data-integrity as well as model-integrity we propose to analyze the effect of infinitesimal changes of the explanatory variables on some function of the net-output at each time-point t = 1, ..., T.

To do this, we start with the simplest case in which the X function is the identity and we look at the derivative of the output as to the input data which in turn gives us the sensitivities of the outputs for each explanatory variable in the net over time. In order to complete the explanation that is derived from the identity, we add an intercept to each output neuron:

Put simply, what we are interested in are these weights which are now generally functions of time unlike in a linear regression when they are fixed – here they generally vary depending on time. These weights are what we call linear parameter data (LPD). This essentially is a matrix of dimensions T * (n+1) and these remain the dimensions of the matrix irrespective of the nets architecture. 
In essence the LPDs can be interpreted as an exact replication of the net by a linear model at each time point hence we are preserving the natural ordering of the series and we are extracting the sensitivities of the outputs to small changes in the inputs at each time point. 

Application: Bitcoin Returns

Let's look at a specific example: we train a simple neural network model to predict Bitcoin returns at time t, by taking 6 lagged values as inputs. In terms of the architecture, we start with a simple, one hidden layer model with 100 hidden neurons hence the parameters to be estimated are 100*6 + 100 weighs and 100+1 biases. 











We here optimize the net 100-times, based on different random initializations of its parameters, and we compute trading performances of each random-net based on the simple sign-rule: buy or sell tomorrow’s return depending on the sign of today’s forecast. The resulting cohort of cumulated (out-of-sample) trading performances is displayed below (Figure 2) with the mean-performance in the center (bold blue line).






Next, we turn to the  explainability function i.e. the linear first-order approximation of the net. With the algorithm developed, for each output neuron, we obtain the derivatives (which in turn are the new data-flow we call LPDs). In order to give an intuition as to the sensitivities/explanations we want to obtain let's imagine the following brute approach:

  1. We start by training a neural network (NN) on the specified inputs and response and store the results  

  2. Next, we perturb a selected input slightly 

  3. We use the trained NN and make the predictions for the changed inputs

  4. For each changed variable, we collect the perturbed data and the corresponding NN-output

  5. We fit a linear model and obtain the weights. 

  6. We train the net for 100 different random initialization and we observe the dependency of the LPDs across the different random nets.

By using the derivatives approach, we are in turn obtaining the exact sensitivities of the outputs for each explanatory variable in the net over time, without recomputing perturbated outputs based on perturbated inputs and without refitting. 

What do we learn from the LPDs? Our initial insights suggest that the time-varying dependency of the data measured by the LPDs is indicative of different states of the market. In particular weak dependency (small absolute LPD) is an indicator of randomness or ‘chaos’. We therefore propose a simple rule for managing risks: exit markets at times tagged as chaotic by the LPD, see Figure 3 for illustration.












As visible in the Figure 3, exits (shaded in grey) occur if today’s out-of-sample mean-LPD (green) drops below the 1/7-quantile based on a rolling-window of length one quarter of its own history.

Table 1 provides additional evidence and alternative insight about the connection between the LPD and next day’s return. The table provides the proportions of positive signs and average next days’ returns based on critical time points (LPD small: weak dependence), neutral time points (normal-sized LPD: normal dependence), auspicious time points (LPD large: strong dependence) and all time points


As it can be noted from Table 1, the LPD is not conclusive about the sign of next-day’s return (first column) but about the sign and the size of the average next day return (second column) i.e. the LPD supports information about the skewness of the distribution.

The X-functions discussed here are made available through 2 channels: (i) the VA tool and (ii) the public GIT repository - the complete code, documentation and data is available on the project's public git repository.

Screenshot 2022-08-10 at 16.02.42.png
Screenshot 2022-08-10 at 18.43.08.png
Screenshot 2022-08-10 at 18.56.12.png
Screenshot 2022-08-10 at 19.19.29.png
Figure 1. Neural net BTC: feedforward net with a single hidden-layer of dimension 100 and an input layer of dimension 6 comprising the last six lagged returns
Figure 2. Cumulated log-performances out-of-sample based on sign-rule (buy or sell depending on sign of forecasted return): ’random’ neural nets (colored) vs. buy-and-hold (bold black) and mean-net performance (bold blue)
Table 1. Proportions of positive signs and average next days’ returns based on critical time points 
Screenshot 2022-08-10 at 19.11.20.png
Figure 3. Buy-and-hold (black) vs. out-of-sample (mean-) LPD market-exit strategy (blue): exits (shaded in grey) occur if today’s out-of-sample mean-LPD (green) drops below the 1/7-quantile based on a rolling-window of length one quarter of its own history. The LPD corresponding to the lag-6 BTC-value is used.
bottom of page