(bLIMEy)
Surrogate explainers construct an inherently interpretable model in a desired – local, cohort or global – subspace to approximate a more complex, black-box decision boundary (Sokol et al. 2019).
By using different surrogate models we can generate a wide array of explanation types; e.g., counterfactuals with decision trees (van der Waa et al. 2018; Sokol and Flach 2020) and feature influence with linear classifiers (Ribeiro, Singh, and Guestrin 2016).
\(x^\star_0\): This
\(x^\star_1\): sentence
\(x^\star_2\): has
\(x^\star_3\): a
\(x^\star_4\): positive
\(x^\star_5\): sentiment
\(x^\star_6\): ,
\(x^\star_7\): maybe
\(x^\star_8\): .
Property | Surrogate Explainers |
---|---|
relation | post-hoc |
compatibility | model-agnostic ([semi-]supervised) |
modelling | regression, crisp and probabilistic classification |
scope | local, cohort, global |
target | prediction, sub-space, model |
Property | Surrogate Explainers |
---|---|
data | text, image, tabular |
features | numerical and categorical (tabular data) |
explanation | type depends on the surrogate model |
caveats | random sampling, explanation faithfulness & fidelity |
If desired, data are transformed from their original domain into a human-intelligible representation, which is used to communicate the explanations. This step is required for image and text data, but optional – albeit helpful – for tabular data.
Interpretable representations tend to be binary spaces encoding presence (fact denoted by \(1\)) or absence (foil denoted by \(0\)) of certain human-understandable concepts generated for a data point selected to be expalined.
Discretisation of continuous features followed by binarisation.
Super-pixel segmentation.
Tokenisation such as bag-of-words representation.
\(x^\star_0\): This
\(x^\star_1\): sentence
\(x^\star_2\): has
\(x^\star_3\): a
\(x^\star_4\): positive
\(x^\star_5\): sentiment
\(x^\star_6\): ,
\(x^\star_7\): maybe
\(x^\star_8\): .
\[ x^\star = [1, 1, 1, 1, 1, 1, 1, 1, 1] \]
\[ x^\star = [1, 0, 0, 1, 0, 0, 1, 0, 1] \]
\(x^\star_0\): This
\(x^\star_1\):
\(x^\star_2\):
\(x^\star_3\): a
\(x^\star_4\):
\(x^\star_5\):
\(x^\star_6\): ,
\(x^\star_7\):
\(x^\star_8\): .
This
\(x^\star_0\): sentence
has
a
\(x^\star_1\): positive
\(x^\star_1\): sentiment
,
\(x^\star_2\): maybe
.
\[ x^\star = [1, 1, 1] \]
\[ x^\star = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] \]
\[ x^\star = [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1] \]
\[ x^\star = [1, 1, 1, 1] \]
\[ x^\star = [1, 1, 1] \]
\[ x = [1.3, 0.2] \]
\[ x^\prime = [0, 0] \]
\[ x^\star = [1, 1] \]
\[ x^\star = [1, 0] \]
\[ x^\star = [1, 0] \;\;\;\; \longrightarrow \;\;\;\; x = [?, ?] \]
Data sampling allows to capture the behaviour of a predictive model in a desired subspace. To this end, a data sample is generated and predicted by the explained model, offering a granular insight into its decision surface.
Explanatory insights are extracted from an inherently transparent model fitted to the sampled data (in interpretable representation), using their black-box predictions as the target.
Additional processing steps can be applied to tune and tweak the surrogate model, hence the explanation. For example, the sample can be weighted based on its proximity to the explained instance when dealing with local explanations; and a feature selection procedure may be used to introduce sparsity, therefore improve accessibility and comprehensibility of explanatory insights.
Independent surrogate models explaining one class at a time:
A single model explaining a selected subset of classes:
\[ \def\IR{\mathit{IR}} \def\argmin{\mathop{\operatorname{arg\,min}}\limits} \def\argmax{\mathop{\operatorname{arg\,max}}\limits} \]
\[ \mathcal{O}(\mathcal{G}; \; f) = \argmin_{g \in \mathcal{G}} \overbrace{\Omega(g)}^{\text{complexity}} \; + \;\;\; \overbrace{\mathcal{L}(f, g)}^{\text{fidelity loss}} \]
\[ \Omega(g) = \frac{\sum_{\theta \in \Theta_g} {\Large\mathbb{1}} \left(\theta\right)}{|\Theta_g|} \]
\[ \Omega(g; \; d) = \frac{\text{depth}(g)}{d} \;\;\;\;\text{or}\;\;\;\; \Omega(g; \; d) = \frac{\text{width}(g)}{2^d} \]
\[ \mathcal{L}(f, g ; \; \mathring{x}, X^\prime, \mathring{c}) = \sum_{x^\prime \in X^\prime} \; \underbrace{\omega\left( \IR(\mathring{x}), x^\prime \right)}_{\text{weighting factor}} \; \times \; \underbrace{\left(f_\mathring{c}\left(\IR^{-1}(x^\prime)\right) - g(x^\prime)\right)^{2}}_{\text{individual loss}} \]
\[ \omega\left(\IR(\mathring{x}), x^\prime \right) = k\left(L\left(\IR(\mathring{x}), x^\prime\right)\right) \]
\[ \omega\left( \mathring{x}, x \right) = k\left(L\left(\mathring{x}, x\right)\right) \]
\[ \mathcal{L}(f, g ; \; \mathring{x}, X^\prime, \mathring{c}) = \sum_{x^\prime \in X^\prime} \; \omega\left( \IR(\mathring{x}), x^\prime \right) \; \times \; \underline{ {\Large\mathbb{1}} \left(f_\mathring{c}\left(\IR^{-1}(x^\prime)\right), \; g(x^\prime)\right)} \]
\[ \begin{split} f_{\mathring{c}}(x) = \begin{cases} 1, & \text{if} \;\; f(x) \equiv \mathring{c}\\ 0, & \text{if} \;\; f(x) \not\equiv \mathring{c} \end{cases} \text{ .} \end{split} \]
\[ \begin{split} {\Large\mathbb{1}}\left(f_{\mathring{c}}(x), g(x^\prime)\right) = \begin{cases} 1, & \text{if} \;\; f_{\mathring{c}}(x) \equiv g(x^\prime)\\ 0, & \text{if} \;\; f_{\mathring{c}}(x) \not\equiv g(x^\prime) \end{cases} \text{ ,} \end{split} \]
\(f(x)\) | \(f_\beta(x)\) | \(g(x^\prime)\) | \({\Large\mathbb{1}}\) |
---|---|---|---|
\(\alpha\) | \(0\) | \(1\) | \(0\) |
\(\beta\) | \(1\) | \(0\) | \(0\) |
\(\gamma\) | \(0\) | \(0\) | \(1\) |
\(\beta\) | \(1\) | \(1\) | \(1\) |
\(\alpha\) | \(0\) | \(0\) | \(1\) |
\[ \mathcal{L}(f, g ; \; \mathring{x}, X^\prime, \mathring{C}) = \sum_{x^\prime \in X^\prime} %\left( \omega( \IR(\mathring{x}) , x^\prime ) \; \times \; \underline{ \frac{1}{|\mathring{C}|} \sum_{\mathring{c} \in \mathring{C}} {\Large\mathbb{1}} \left( f_\mathring{c}\left(\IR^{-1}(x^\prime)\right), \; g_\mathring{c}(x^\prime) \right) } %\right) \]
\[ \mathcal{L}(f, g ; \; \mathring{x}, X^\prime, \mathring{C}) = \sum_{x^\prime \in X^\prime} %\left( \omega( \IR(\mathring{x}) , x^\prime ) \; \times \; \underline{ \frac{1}{2} \sum_{\mathring{c} \in \mathring{C}} \left( f_\mathring{c}\left(\IR^{-1}(x^\prime)\right) - g_\mathring{c}(x^\prime) \right)^2 } %\right) \]
As many as you wish to construct.
Python | R |
---|---|
LIME | lime |
interpret | iml |
Skater | |
AIX360 |