Uncertainty Quantification of Surrogate Explanations: an Ordinal Consensus Approach


  • Jonas Schulz Dresden University of Technology
  • Raul Santos-Rodriguez
  • Rafael Poyiadzi




Surrogate Explanations, Uncertainty Quantification, Bootstrapping, Ordinal Statistics


Explainability of black-box machine learning models is crucial, in particular when deployed in critical applications such as medicine or autonomous cars. Existing approaches produce explanations for the predictions of models, however, how to assess the quality and reliability of such explanations remains an open question. In this paper we take a step further in order to provide the practitioner with tools to judge the trustworthiness of an explanation. To this end, we produce estimates of the uncertainty of a given explanation by measuring the ordinal consensus amongst a set of diverse bootstrapped surrogate explainers. While we encourage diversity by using ensemble techniques, we propose and analyse metrics to aggregate the information contained within the set of explainers through a rating scheme. We empirically illustrate the properties of this approach through experiments on state-of-the-art Convolutional Neural Network ensembles. Furthermore, through tailored visualisations, we show specific examples of situations where uncertainty estimates offer concrete actionable insights to the user beyond those arising from standard surrogate explainers.


D. Alvarez-Melis and T. S. Jaakkola. On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049, 2018.

J. Antoran, U. Bhatt, T. Adel, A. Weller, and J. M. Hernandez-Lobato. Getting a clue: A method for explaining uncertainty estimates. arXiv preprint arXiv:2006.06848, 2020.

V. Arya, R. K. Bellamy, P.-Y. Chen, A. Dhurandhar, M. Hind, S. C. Hoffman, S. Houde, Q. V. Liao, R. Luss, A. Mojsilovic, et al. One explanation does not fit all: A toolkit and taxonomy of ai explainability techniques. arXiv preprint arXiv:1909.03012, 2019.

U. Bhatt, P. Ravikumar, and J. M. Moura. Towards aggregating weighted feature attributions. arXiv preprint arXiv:1901.10040, 2019.

U. Bhatt, P. Ravikumar, et al. Building

A. F. Hayes and K. Krippendorff. Answerhuman-machine trust via interpretability. In ing the call for a standard reliability measure Proceedings of the AAAI conference on arfor coding data. Communication methods and tificial intelligence, volume 33, pages 9919– measures, 1(1):77–89, 2007.

, 2019. URL https://doi.org/10.1609/

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Pro-

U. Bhatt, A. Weller, and J. M. Moura. Evaluceedings of the IEEE conference on computer ating and aggregating feature-based model exvision and pattern recognition, pages 770–778, planations. arXiv preprint arXiv:2005.00631, 2016. doi:10.1109/CVPR.2016.90.

A. Hepburn and R. Santos-Rodriguez. Ex-

J. Cohen. A coefficient of agreement for nomiplainers in the wild: Making surrogate exnal scales. Educational and psychological meaplainers robust to distortions through percepsurement, 20(1):37–46, 1960. tion. In 2021 IEEE International Conference

B. Efron. Bootstrap methods: another look at the jackknife. In Breakthroughs on Image Processing (ICIP), pages 3717–3721, 2021. doi:10.1109/ICIP42928.2021.9506711. in statistics, pages 569–593. Springer, 1992.

M. G. Kendall and B. B. Smith. The doi:10.1214/aos/1176344552. problem of m rankings. The annals of mathematical statistics, 10(3):275–287, 1939.

J. L. Fleiss and J. Cohen. The equivdoi:10.1214/aoms/1177732186.

A. Krizhevsky, G. Hinton, et al. Learnof reliability. Educational and psychoing multiple layers of features from tiny logical measurement, 33(3):613–619, 1973. images. Technical report, 2009. URL


D. Garreau and U. von Luxburg. Explaining the explainer: A first theoretical

B. Lakshminarayanan, A. Pritzel, and analysis of lime. In Proceedings of the 23rd International Conference on Artificial dictive uncertainty estimation using deep Intelligence and Statistics (AISTATS), volensembles. In Advances in neural inforume 108 of Proceedings of Machine Learnmation processing systems, pages 6402–

ing Research, pages 1287–1296. PMLR, 6413, 2017. URL https://proceedings.

Aug. 2020. URL http://proceedings.mlr.

X. Glorot and Y. Bengio. Understandhtml. ing the difficulty of training deep feedfor-[19] E. Lee, D. Braines, M. Stiffler, A. Hudler, ward neural networks. In Proceedings of the and D. Harborne. Developing the sensithirteenth international conference on artifitivity of lime for better machine learning cial intelligence and statistics, pages 249–256.

explanation. In Artificial Intelligence and JMLR Workshop and Conference Proceedings, Machine Learning for Multi-Domain Oper-

URL http://dblp.uni-trier.de/db/ tional Society for Optics and Photonics, 2019.

A. Gosiewska and P. Biecek. ibreakdown: doi:https://doi.org/10.1117/12.2520149. additive predictive models. arXiv preprint Pacific Sociological Review, 9(2):85–90, 1966.

arXiv:1903.11420, 2019. doi:https://doi.org/10.2307/1388242.

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL https://aclanthology.org/P11-1015.

J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. doi:10.3115/v1/D14-1162.

R. Poyiadzi, X. Renard, T. Laugel, R. SantosRodriguez, and M. Detyniecki. On the overlooked issue of defining explanation objectives for local-surrogate explainers. arXiv preprint arXiv:2106.05810, 2021.

R. Poyiadzi, X. Renard, T. Laugel, R. SantosRodriguez, and M. Detyniecki. Understanding surrogate explanations: the interplay between complexity, fidelity and coverage. arXiv preprint arXiv:2107.04309, 2021.

K. Sokol, A. Hepburn, R. Poyiadzi, M. Clifford, R. Santos-Rodriguez, and P. Flach. Fat forensics: A python toolbox for implementing and deploying fairness, accountability and transparency algorithms in predictive systems. Journal of Open Source Software, 5(49):1904, 2020. doi:10.21105/joss.01904. URL https://doi.org/10.21105/joss.01904.

K. Wickstrøm, M. Kampffmeyer, and R. Jenssen. Uncertainty and interpretability in convolutional neural networks for semantic segmentation of colorectal polyps. Medical image analysis, 60, 2020. doi:10.1016/j.media.2019.101619.

K. K. Wickstrøm, K. ØyvindMikalsen, M. Kampffmeyer, A. Revhaug, and R. Jenssen. Uncertainty-aware deep ensembles for reliable and explainable predictions of clinical time series. IEEE Journal of Biomedical and Health Informatics, 2020. doi:https://doi.org/10.1109/jbhi.2020.3042637.

Y. Zhang, K. Song, Y. Sun, S. Tan, and M. Udell. ” why should you trust my explanation?” understanding uncertainty in lime explanations. arXiv preprint arXiv:1904.12991, 2019.

M. T. Ribeiro, S. Singh, and C. Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016. doi:10.18653/v1/N163020.

Z. Zhou, G. Hooker, and F. Wang. S-lime: Stabilized-lime for model explanation. arXiv preprint arXiv:2106.07875, 2021.

L. Rieger and L. K. Hansen. Aggregating explanation methods for stable and robust explainability. arXiv preprint arXiv:1903.00519, 2019.

D. Slack, S. Hilgard, S. Singh, and H. Lakkaraju. Reliable post hoc explanations: Modeling uncertainty in explainability. arXiv preprint arXiv:2008.05030, 2020.

K. Sokol, A. Hepburn, R. Santos-Rodriguez, and P. Flach. blimey: surrogate prediction explanations beyond lime. arXiv preprint arXiv:1910.13016, 2019.