Introducing ShaTS: A Shapley-Based Method for Time-Series Models

Why you should not explain your time-series data with tabular Shapley methods

Nov 17, 2025

9 min read

Image by the author

Introduction

Shapley-based methods are among the most popular tools for explaining Machine Learning (ML) and Deep Learning (DL) models. However, for time-series data, these methods often fall short because they do not account for the temporal dependencies inherent in such datasets. In a recent article, we (Ángel Luis Perales Gómez, Lorenzo Fernández Maimó and me) introduced ShaTS, a novel Shapley-based explainability method specifically designed for time-series models. ShaTS addresses the limitations of traditional Shapley methods by incorporating grouping strategies that enhance both computational efficiency and explainability.

Shapley values: The foundation

Shapley values originate in cooperative game theory and fairly distribute the total gain among players based on their individual contributions to a collaborative effort. The Shapley value for a player is calculated by considering all possible coalitions of players and determining the marginal contribution of that player to each coalition.

Formally, the Shapley value φ_i for player i is:

\[ \varphi_i(v) = \sum_{S \subseteq N \setminus {i}}
\frac{|S|! (|N| – |S| – 1)!}{|N|!} (v(S \cup {i}) – v(S)) \]

where:

N is the set of all players.
S is a coalition of players not including i.
v(S) is the value function that assigns a value to each coalition (i.e., the total gain that coalition S can achieve).

This formula averages the marginal contributions of player i across all possible coalitions, weighted by the likelihood of each coalition forming.

From Game Theory to xAI: Shapley values in Machine Learning

In the context of explainable AI (xAI), Shapley values attribute a model’s output to its input features. This is particularly useful for understanding complex models, such as deep neural networks, where the relationship between input and output is not always clear.

Shapley-based methods can be computationally expensive, especially as the number of features increases, because the number of possible coalitions grows exponentially. However, approximation methods, particularly those implemented in the popular SHAP library, have made them feasible in practice. These methods estimate the Shapley values by sampling a subset of coalitions rather than evaluating all possible combinations, significantly reducing the computational burden.

Consider an industrial scenario with three components: a water tank, a thermometer, and an engine. Suppose we have an Anomaly Detection (AD) ML/DL model that detects malicious activity based on the readings from these components. Using SHAP, we can determine how much each component contributes to the model’s prediction of whether the activity is malicious or benign.

Integration of SHAP in an industrial Anomaly Detection scenario. Image created by the authors

However, in more realistic scenarios the model uses not only the current reading from each sensor but also previous readings (a temporal window) to make predictions. This approach allows the model to capture temporal patterns and trends, thereby improving its performance. Applying SHAP in this scenario to assign responsibility to each physical component becomes more challenging because there is no longer a one-to-one mapping between features and sensors. Each sensor now contributes multiple features associated with different time steps. The common approach here is to calculate the Shapley value of each feature at each time step and then post-hoc aggregate these values.

Integration of SHAP in an industrial Anomaly Detection scenario with windowed sensor data and post-hoc aggregation. Image created by the authors.

This approach has two main drawbacks:

Computational Complexity: The computational cost increases exponentially with the number of features, making it impractical for large time-series datasets.
Ignoring Temporal Dependencies: SHAP explainers are designed for tabular data without temporal dependencies. Post-hoc aggregation can lead to inaccurate explanations because it fails to capture temporal relationships between features.

The ShaTS Approach: Grouping Before Computing Importance

In the Shapley framework, a player’s value is determined solely by comparing the performance of a coalition with and without that player. Although the method is defined at the individual level, nothing prevents applying it to groups of players rather than to single individuals. Thus, if we consider a set of players N divided into p groups G = {G₁, … , G_p}, we can compute the Shapley value for each group G_i by evaluating the marginal contribution of the entire group to all possible coalitions of the remaining groups. Formally, the Shapley value for group G_i can be expressed as:

\[ \varphi(G_i) = \sum_{T \subseteq G \setminus G_i} \frac{|T|! (|G| – |T| – 1)!}{|G|!} \left( v(T \cup G_i) – v(T) \right) \]

where:

G is the set of all groups.
T is a coalition of groups not including G_i.
v(T) is the value function that assigns a value to each coalition of groups.

Building on this idea, ShaTS operates on time windows and provides three distinct levels of grouping, depending on the explanatory goal:

Temporal

Each group contains all measurements recorded at a specific instant within the time window. This strategy is useful for identifying critical instants that significantly influence the model’s prediction.

Feature

Each group represents the measurements of an individual feature over the time window. This strategy isolates the impact of specific features on the model’s decisions.

Multi-Feature

Each group includes the combined measurements over the time window of features that share a logical relationship or represent a cohesive functional unit. This approach analyzes the collective impact of interdependent features, ensuring their combined influence is captured.

Once groups are defined, Shapley values are computed exactly as in the individual case, but using group-level marginal contributions instead of per-feature contributions.

ShaTS methodology overview. Image created by the authors.

ShaTS custom visualization

ShaTS includes a visualization designed specifically for sequential data and for the three grouping strategies above. The horizontal axis shows consecutive windows. The left vertical axis lists the groups, and the right vertical axis overlays the model’s anomaly score for each window. Each heatmap cell at (i, G_j) represents the importance of group G_j for window i. Warmer reds indicate a stronger positive contribution to the anomaly, cooler blues indicate a stronger negative contribution, and near-white means negligible influence. A purple dashed line traces the anomaly score across windows, and a horizontal dashed line at 0.5 marks the decision threshold between anomalous and normal windows.

To illustrate, imagine a model that processes windows of length 10 built from three features, X, Y, and Z. When an operator receives an alert and wants to know which signal triggered it, they inspect the feature grouping results. In the next figure, around windows 10–11 the anomaly score rises above the threshold, while the attribution for X intensifies. This pattern indicates that the decision is being driven primarily by X.

ShaTS custom visualization for Feature Strategy. Image generated by ShaTS library.

If the next question is when, within each window, the anomaly occurs, the operator switches to the temporal grouping view. The next figure shows that the final instant of each window (t₉) consistently carries the strongest positive attribution, revealing that the model has learned to rely on the last time step to classify the window as anomalous.

ShaTS custom visualization for Temporal Strategy. The left y-axis lists the window’s time slots $t_0$ (earliest) to $t_9$ (most recent). Image generated by ShaTS library.

Experimental Results: Testing ShaTS on the SWaT Dataset

In our recent publication, we validated ShaTS on the Secure Water Treatment (SWaT) testbed, an industrial water facility with 51 sensors/actuators organized into six plant stages (P1–P6). A stacked Bi-LSTM trained on windowed signals served as the detector, and we compared ShaTS with post hoc KernelSHAP using three viewpoints: Temporal (which instant in the window matters), Sensor/Actuator (which device), and Process (which of the six stages).

Across attacks, ShaTS yielded tight, interpretable bands that pinpointed the true source—down to the sensor/actuator or plant stage—whereas post hoc SHAP tended to diffuse importance across many groups, complicating root-cause analysis. ShaTS was also faster and more scalable: grouping shrinks the player set, so the coalition space drops dramatically; run time stays nearly constant as the window length grows because the number of groups does not change; and GPU execution further accelerates the method, making near-real-time use practical.

Hands-on Example: Integrating ShaTS into Your Workflow

This walkthrough shows how to plug ShaTS into a typical Python workflow: import the library, choose a grouping strategy, initialize the explainer with your trained model and background data, compute group-wise Shapley values on a test set, and visualize the results. The example assumes a PyTorch time-series model and that your data is windowed (e.g., shape [window_len, n_features] per sample).

1. Import ShaTS and configure the Explainer

In your Python script or notebook, begin by importing the necessary components from the ShaTS library. While the repository exposes the abstract ShaTS class, you will typically instantiate one of its concrete implementations (e.g., FastShaTS).

import shats
from shats.grouping import TimeGroupingStrategy
from shats.grouping import FeaturesGroupingStrategy
from shats.grouping import MultifeaturesGroupingStrategy

2. Initialize the Model and Data

Assume you have a pre-trained time series PyTorch model and a background dataset, which should be a list of tensors representing typical data samples that the model has seen during training. If you want to better undestand the background dataset check this blog from Cristoph Molnar.

model = MyTrainedModel()
random_samples = random.sample(range(len(trainDataset)), 100)
background = [trainDataset[idx] for idx in random_samples]

shapley_class = shats.FastShaTS(model, 
    support_dataset=background,
    grouping_strategy= FeaturesGroupingStrategy(names=variable_names)

3. Compute Shapley Values

Once the explainer is initialized, compute the ShaTS values for your test dataset. The test dataset should be formatted similarly to the background dataset.

shats_values = shaTS.compute(testDataset)

4. Visualize Results

Finally, use the built-in visualization function to plot the ShaTS values. You can specify which class (e.g., anomalous or normal) you want to explain.

shaTS.plot(shats_values, test_dataset=testDataset, class_to_explain=1)

Key Takeaways

Focused Attribution: ShaTS provides more focused attributions than post hoc SHAP, making it easier to identify the root cause in time-series models.
Efficiency: By reducing the number of players to groups, ShaTS significantly decreases the coalitions to evaluate, leading to faster computation times.
Scalability: ShaTS maintains consistent performance even as window size increases, thanks to its fixed group structure.
GPU Acceleration: ShaTS can leverage GPU resources, further enhancing its speed and efficiency.

Try it yourself

Interactive demo

Compare ShaTS with post hoc SHAP on synthetic time-series here. You can find a tutorial on the following video.

Open source

The ShaTS module is fully documented and ready to plug into your ML/DL pipeline. Find the code on Github.

I hope you liked it! You’re welcome to contact me if you have questions, want to share feedback, or simply feel like showcasing your own projects.