Chapter 7: Anchor and Counterfactual Explanations
In previous chapters, we have learned how to attribute model decisions to features and their interactions with state-of-the-art global and local model interpretation methods. However, the decision boundaries are not always easy to define nor interpret with these methods. Wouldn't it be nice to be able to derive human-interpretable rules from model interpretation methods? In this chapter, we will cover a few human-interpretable, local, classification-only model interpretation methods. We will first learn how to use scoped rules called anchors to explain complex models with statements such as if X conditions are met, then Y is the outcome. Then, we will explore counterfactual explanations that ...
Get Interpretable Machine Learning with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.