Analysis of Perturbation Methods in Explainable AI

Abstract

In critical fields such as healthcare, finance, and industrial operations, the reliability of artificial intelligence (AI) models depends on their predictive accuracy and ability to explain their decisions. This has led to growing attention toward Explainable AI (XAI), which aims to provide transparency and accountability in model predictions. Feature attribution methods (AMs) are widely employed among the various XAI approaches. They assign importance scores to input features, highlighting the factors influencing a model’s output. However, the usefulness of these methods is closely tied to their faithfulness—whether the identified features truly reflect what the model relies upon. Evaluating faithfulness is typically carried out through perturbation-based techniques, where input features are modified according to their attributed importance, and the resulting changes in model performance are analyzed.
Despite their popularity, existing evaluation practices, particularly those relying on the Area Under Perturbation Curve (AUPC), show notable shortcomings when applied to time-series data. Our earlier investigations revealed that such metrics can lead to misleading conclusions, especially when the choice of perturbation method or region size distorts the evaluation. To address these issues, this work introduces a new metric, the Consistency-Magnitude Index (CMI), which integrates two complementary measures: the Decaying Degradation Score (DDS) and the Perturbation Effect Size (PES). Together, they provide a more reliable and consistent assessment of attribution faithfulness.
Furthermore, we propose an adapted evaluation methodology that leverages a diverse set of perturbation strategies rather than depending on a single one, ensuring robustness across varying datasets and model architectures. Our experimental study, conducted on multiple time-series datasets and deep learning models, highlights the significant role of perturbation methods and region size in faithfulness evaluation. Based on these insights, we offer practical guidelines for selecting suitable attribution and perturbation methods.

Proposed Model

To address the limitations of existing evaluation strategies for attribution methods (AMs), this project proposes a robust perturbation-based evaluation framework combined with a novel metric, the Consistency-Magnitude Index (CMI). The technique is designed to reliably measure the faithfulness of AMs in the context of neural time-series classification models, where inconsistencies between explanation methods are particularly problematic.
The framework begins by applying multiple Perturbation Methods (PMs) to input features based on the importance scores provided by different AMs. Instead of relying on a single perturbation strategy, the methodology systematically explores a diverse set of PMs, reducing the risk of biased or misleading evaluations. Both highly relevant and low-relevance features are controlled to observe their influence on model predictions.
To quantify AM performance, the proposed CMI metric combines two complementary measures: the Decaying Degradation Score (DDS), which captures the degree of separation between relevant and irrelevant features, and the Perturbation Effect Size (PES), which evaluates how consistently an AM distinguishes important from unimportant features. Integrating these measures ensures that evaluations reflect feature attribution's magnitude and reliability.
Through this methodology, the model provides practitioners with actionable insights into selecting the most faithful attribution methods for a given dataset and architecture, thereby improving the interpretability and trustworthiness of deep learning systems in high-stakes domains.

Experimental Results

The mean CMI is used to rank the best PMs for each dataset across all models and area sizes for binary dataset

The mean CMI is used to rank the best PMs for each dataset across all models and area sizes for multiclass dataset

The mean CMI is used to rank the best PMs for each dataset across all models and area sizes for anomaly detection

Mean AM rankings for all datasets, models and region sizes

Over ResNet and Inception model architectures, the mean AM rankings for all datasets and region sizes

Conclusion

This study highlights the critical influence of perturbation method (PM) selection on the faithfulness evaluation of attribution methods (AMs) in neural time-series classification. Through a comprehensive set of experiments, we demonstrated that the interplay between dataset characteristics and model architectures substantially affects the suitability of different PMs. Importantly, our results emphasize that incorporating highly relevant and low-relevance features into evaluation metrics is essential for a fair assessment of AM performance.
To address the limitations of existing approaches, we introduced the Consistency-Magnitude Index (CMI), a novel metric that reliably estimates the suitability of PMs for specific datasets and models. Combined with our proposed methodology, CMI enables robust and consistent evaluation of AM faithfulness. Our large-scale experimental setup—covering five datasets, five deep learning architectures, twelve attribution methods, twenty-three perturbation methods, two region sizes, and two perturbation orders—represents, to the best of our knowledge, the most extensive investigation of its kind in the time-series domain.
The results show that careful PM selection is indispensable, as no single perturbation strategy is universally optimal. While SampleMean, Zero, and the newly introduced Laplace PM serve as strong default choices across most scenarios, other PMs may outperform them in specific cases. Notably, perturbations based on neighboring regions showed promising results, whereas the commonly used UniformNoise PM exhibited inconsistent behavior, sometimes leading to misleading outcomes.
Regarding attribution methods, FeatureAblation generally provides the most faithful explanations across different datasets and models. GradCAM is a suitable alternative for convolutional architectures, while Integrated Gradients offers a practical balance between faithfulness and execution time. Conversely, methods such as GuidedBackprop, KernelSHAP, and LIME were unsuitable for raw time-series data.\ Overall, our proposed methodology and the CMI metric establish a reliable framework for evaluating the faithfulness of attribution methods in time-series classification. Beyond providing practical guidelines for practitioners seeking trustworthy explanations, this work also equips AM developers with a rigorous tool for validating new methods. By ensuring more faithful explanations, our contributions enhance transparency and trust in deep learning systems, paving the way for their safer deployment in high-stakes domains.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
data/UCR_datasets		data/UCR_datasets
interpretability_methods		interpretability_methods
models		models
results		results
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
auto_train.py		auto_train.py
interpret_model_regions.py		interpret_model_regions.py
notebook - results summary and figures.ipynb		notebook - results summary and figures.ipynb
results_analysis.py		results_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of Perturbation Methods in Explainable AI

Abstract

Proposed Model

Experimental Results

Conclusion

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Analysis of Perturbation Methods in Explainable AI

Abstract

Proposed Model

Experimental Results

Conclusion

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages