Financial Time Series Use Case
Developing accurate forecasting methodologies for financial time series remains one of the key research topics relevant from both a theoretical and applied viewpoint. Traditionally, researchers aimed at constructing a causal model, based on econometric modelling, that explains the variations in the specific time series as a function of other inputs. Yet, traditional approaches often struggle when it comes to modelling highdimensional, nonlinear landscapes often characterized with missing or sparse input space.
The Need for a New Approach to Explainability for Financial Time Series
Recently, deep learning (DL) has become highly popularized in many aspects of data science and has become increasingly applied to forecasting financial and economic time series. Recurrent methods are suited to time series modelling due to their memory state and their ability to learn relations through time; moreover, convolutional neural networks (CNN) are also able to build temporal relationships.
â€‹
â€‹
â€‹
The literature offers various examples of the application of DL methods to stock and forex market forecasting, with results that significantly outperform traditional counterparts. This was also confirmed in more recent installments of
the Makridakis Forecasting Competitions which have been held roughly once a decade since 1982 and have the objective of comparing the accuracy of different forecasting methods. A recurring conclusion from these competitions has been that traditional, simpler methods are often able to perform equally well as their more complex counterparts. This changed at the latest editions of the competition series, M4 and M5, where a hybrid Exponential Smoothing Recurrent Neural Network method and LightGBM, won the competitions, respectively.
â€‹
The introduction of DL methods for financial time series forecasts potentially enables higher predictive accuracy but this comes at the cost of higher complexity and thus lower interpretability. DL models are referred to as “black boxes” because it is often difficult to understand how variables are jointly related to arrive at a certain output. This reduced ability to understand the inner workings and mechanisms of DL models unavoidably affects their trustworthiness and the willingness among practitioners to deploy such methods in sensitive domains such as finance. As a result, the scientific interest in the field of XAI had grown tremendously within the last few year.
â€‹
The growing popularity of the topic notwithstanding, research on XAI in finance remains limited and most of the existing explainability techniques are not suited for time series, let alone for nonstationary financial time series and their somehow notorious stylized facts. Many stateoftheart XAI methods are originally tailored for certain input types such as images (ex. Saliency Maps) or text (ex. LRP) and have later been adjusted to suit tabular data as well. However, the temporal dimension is often omitted and the literature currently offers only a limited consideration of the topic. Notable examples are interpretable decision trees for time series classification and using attention
mechanisms, with none of the applications looking specifically at explainability for financial data.
â€‹
We propose a family of explainability functions that we call X functions for assigning a meaning to the outputs of complex models (in this case, neural nets) over time. In order to preserve dataintegrity as well as modelintegrity we propose to analyze the effect of infinitesimal changes of the explanatory variables on some function of the netoutput at each timepoint t = 1, ..., T.
â€‹
To do this, we start with the simplest case in which the X function is the identity and we look at the derivative of the output as to the input data which in turn gives us the sensitivities of the outputs for each explanatory variable in the net over time. In order to complete the explanation that is derived from the identity, we add an intercept to each output neuron:
â€‹
â€‹
Put simply, what we are interested in are these weights which are now generally functions of time unlike in a linear regression when they are fixed – here they generally vary depending on time. These weights are what we call linear parameter data (LPD). This essentially is a matrix of dimensions T * (n+1) and these remain the dimensions of the matrix irrespective of the nets architecture.
In essence the LPDs can be interpreted as an exact replication of the net by a linear model at each time point hence we are preserving the natural ordering of the series and we are extracting the sensitivities of the outputs to small changes in the inputs at each time point.
â€‹
Application: Bitcoin Returns
â€‹
Let's look at a specific example: we train a simple neural network model to predict Bitcoin returns at time t, by taking 6 lagged values as inputs. In terms of the architecture, we start with a simple, one hidden layer model with 100 hidden neurons hence the parameters to be estimated are 100*6 + 100 weighs and 100+1 biases.
â€‹
â€‹
â€‹
â€‹
â€‹
â€‹
We here optimize the net 100times, based on different random initializations of its parameters, and we compute trading performances of each randomnet based on the simple signrule: buy or sell tomorrow’s return depending on the sign of today’s forecast. The resulting cohort of cumulated (outofsample) trading performances is displayed below (Figure 2) with the meanperformance in the center (bold blue line).
â€‹
â€‹
â€‹
â€‹
â€‹
â€‹
â€‹
â€‹
Next, we turn to the explainability function i.e. the linear firstorder approximation of the net. With the algorithm developed, for each output neuron, we obtain the derivatives (which in turn are the new dataflow we call LPDs). In order to give an intuition as to the sensitivities/explanations we want to obtain let's imagine the following brute approach:

We start by training a neural network (NN) on the specified inputs and response and store the results

Next, we perturb a selected input slightly

We use the trained NN and make the predictions for the changed inputs

For each changed variable, we collect the perturbed data and the corresponding NNoutput

We fit a linear model and obtain the weights.

We train the net for 100 different random initialization and we observe the dependency of the LPDs across the different random nets.
â€‹
By using the derivatives approach, we are in turn obtaining the exact sensitivities of the outputs for each explanatory variable in the net over time, without recomputing perturbated outputs based on perturbated inputs and without refitting.
â€‹
What do we learn from the LPDs? Our initial insights suggest that the timevarying dependency of the data measured by the LPDs is indicative of different states of the market. In particular weak dependency (small absolute LPD) is an indicator of randomness or ‘chaos’. We therefore propose a simple rule for managing risks: exit markets at times tagged as chaotic by the LPD, see Figure 3 for illustration.
â€‹
â€‹
â€‹
As visible in the Figure 3, exits (shaded in grey) occur if today’s outofsample meanLPD (green) drops below the 1/7quantile based on a rollingwindow of length one quarter of its own history.
â€‹
Table 1 provides additional evidence and alternative insight about the connection between the LPD and next day’s return. The table provides the proportions of positive signs and average next days’ returns based on critical time points (LPD small: weak dependence), neutral time points (normalsized LPD: normal dependence), auspicious time points (LPD large: strong dependence) and all time points
â€‹
â€‹
â€‹
â€‹
â€‹
As it can be noted from Table 1, the LPD is not conclusive about the sign of nextday’s return (first column) but about the sign and the size of the average next day return (second column) i.e. the LPD supports information about the skewness of the distribution.
â€‹
The Xfunctions discussed here are made available through 2 channels: (i) the VA tool and (ii) the public GIT repository  the complete code, documentation and data is available on the project's public git repository.