What to evaluate
How to evaluate it
Approaches
Humans | Task | |
---|---|---|
Application-grounded Evaluation | Real Humans | Real Tasks |
Human-grounded Evaluation | Real Humans | Simple Tasks |
Functionally-grounded Evaluation | No Real Humans | Proxy Tasks |
See the taxonomy module for a review of explainability desiderata.
Evaluation is task-specific and context-dependent
It should account for both aspect of XML systems
Overall, it should assess human understanding