Philosophy and Psychology of Explanation

Instructor info

Office hours: Fridays 5:30–6:30 PM, Nora Suppes Building, Room 103

Course description

What makes something a good explanation? This graduate seminar examines explanation from three complementary perspectives: philosophy of science, cognitive psychology, and artificial intelligence. In the first sessions we look at classical accounts from the philosophy of science that attempt to define normative criteria for what counts as a good scientific explanation. A second part of the seminar explores the cognitive science of explanations, which tries to capture the criteria underlying people's common sense explanations. We look into the analogies between those and the normative principles discussed by philosophers. Finally, the last part of the course applies some of these insights to the emerging challenges of AI interpretability. We will see how many of the strategies deployed to make black box models amenable to human understanding can be understood as applications of the criteria for good explanations studied by philosophers and psychologists.

Schedule

Theories of Scientific Explanation

Sessions 1–5 · Weeks 1–3

Session 1 Mon, Mar 30

Introduction: Theories of explanation

Woodward, "Scientific Explanation" (SEP)

Session 2 Wed, Apr 1

Statistical relevance model

Salmon, "Statistical Explanation" (1971), §§1–3, 7–8. Skim §13.

Hempel & Oppenheim, "Studies in the Logic of Explanation" (1948)

Session 3 Mon, Apr 6

Unificationist model

Friedman, "Explanation and Scientific Understanding" (1974)

Session 4 Wed, Apr 8

Pragmatics of explanation

van Fraassen, "The Pragmatics of Explanation" (1980)

Session 5 Mon, Apr 13

The interventionist account of explanation

Woodward, "Explanation, Invariance, and Intervention" (1997)

Woodward, "Interventionist Theories of Causation in Psychological Perspective" (2007)

Cognitive Science of Explanation

Sessions 6–12 · Weeks 3–6

Session 6 Wed, Apr 15

Causal explanation in cognition

Lombrozo, "Simplicity and Probability in Causal Explanation" (2007)

Lombrozo & Vasilyeva, "Causal Explanation" (2017)

Session 7 Mon, Apr 20

The causal hierarchy

Bareinboim et al., "On Pearl's Hierarchy and the Foundations of Causal Inference" (2022), READ ONLY §§1.1–1.3. OPTIONAL ADDITION: §§1.4–1.5.

Session 8 Wed, Apr 22

Actual causation

Halpern & Pearl, "Causes and Explanations" (2005)

Halpern, Actual Causality Ch. 1–2

Session 9 Mon, Apr 27

Causal selection & normality

Icard, Kominsky & Knobe, "Normality and Actual Causal Strength" (2017)

Session 10 Wed, Apr 29

Counterfactual effect sizes

Quillien & Lucas, "Counterfactuals and the Logic of Causal Selection" (2023). Read up to the end of study 2; you may also skip the "Reanalysis of existing data" section in the introduction.

Quillien & Barlev, "Causal Judgment in the Wild: Evidence from the 2020 U.S. Presidential Election" (2022)

Session 11 Mon, May 4

Pragmatics & communication

Kirfel, Icard & Gerstenberg, "Inference from Explanation" (2022)

Session 12 Wed, May 6

Communication-first explanation

Harding, Gerstenberg & Icard, "A Communication-First Account of Explanation" (2025). Focus on sections 1, 2.3, 3, 4.1, 4.4, and 5 (≈19 pages). Sections 2.1–2.2 recap material from earlier sessions (Hempel/Salmon, Woodward); skim only if you want a refresher. Sections 4.2, 4.3, and 4.5 apply the model to additional explanatory virtues — optional.

Interpretable AI

Sessions 13–17 · Weeks 7–9

Session 13 Mon, May 11

Foundations of interpretable AI

Erasmus, Brunet & Fisher, "What Is Interpretability?" (2021). Focus on §1, §2.2, §3, and §4.1–§4.3 (≈12 pages).

Session 14 Wed, May 13

Local model-agnostic explanations (LIME)

Ribeiro, Singh & Guestrin, "Why Should I Trust You?: Explaining the Predictions of Any Classifier" (2016).

Session 15 Mon, May 18

Feature attribution methods

Lundberg & Lee, "A Unified Approach to Interpreting Model Predictions" (SHAP) (2017).

Note: The paper is mostly heavy on technicalities. As an introduction you can read and reaction post instead through Hart, "Shapley Value" (The New Palgrave Dictionary of Economics, 2nd ed., 2008), instead which talks about Shapley values in general — not just in xAI — although the course will focus on the AI application.

Session 16 Wed, May 20

Causal abstraction & mechanistic interpretability

Geiger, Lu, Icard & Potts, "Causal Abstractions of Neural Networks" (2021) — listed here as the conceptual scaffold for the session; this is not the paper I recommend you read.

Please read this one instead: Ameisen et al. (Anthropic, 2025), "Circuit Tracing: Revealing Computational Graphs in Language Models" — sections 1–3 only (Introduction, Building an Interpretable Replacement Model, Attribution Graphs).

No Class Mon, May 25

Memorial Day — No class

Session 17 Wed, May 27

Causal mechanisms in LLMs

Refresher: quickly skim sections 1–3 of Ameisen et al. (Anthropic, 2025), "Circuit Tracing: Revealing Computational Graphs in Language Models" (the Session 16 reading) for context then read just the short section "Chain-of-thought Faithfulness" (≈3 pages) from the companion paper to this one at: Lindsey et al. (Anthropic, 2025), "On the Biology of a Large Language Model"

Optionally, you can also read the section "Uncovering Hidden Goals in a Misaligned Model" immediately following this one in the same paper .

Student Presentations

Sessions 18–19 · Week 10

Session 18 Mon, Jun 1

Student presentations I

Session 19 Wed, Jun 3

Student presentations II

General information

What to expect?

What you can expect from me

I will …

Provide an introduction and context for the papers discussed during the seminar
Facilitate class discussions
Provide feedback on final papers
Be available during office hours

What I expect from you

You will …

Attend class and participate in discussions
Lead one discussion session
Submit reaction posts (due 8pm before class)
Write a final paper
Present your work in Sessions 18–19

Grading

1/3 class participation
1/3 reaction posts
1/3 final paper

Reaction posts

Reaction posts should be submitted by 8pm the night before class. They should be 1-2 paragraphs and engage critically with the assigned readings. You should express your opinion rather than summarize the contents of the paper. You may raise questions, identify connections to other material, or offer a brief argument.

Final paper

The final paper should be 1000-2000 words and engage substantively with topics covered in the course. It may be one of the following three:

An empirical project proposal (can include computational modelling projects)
A literature review based on one of the class topics
A theoretical essay

You will also be expected to give a short presentation of your project during the last two sessions of the course, which will be part of the grading for the final paper.

Policies

Please familiarize yourself with Stanford’s honor code. We will adhere to it and follow through on its penalty guidelines.

Access and accommodations

Stanford is committed to providing equal educational opportunities for disabled students. Disabled students are a valued and essential part of the Stanford community. We welcome you to our class. If you experience disability, please register with the Office of Accessible Education (OAE). Professional staff will evaluate your needs, support appropriate and reasonable accommodations, and prepare an Academic Accommodation Letter for faculty. To get started, or to re-initiate services, please visit oae.stanford.edu. If you already have an Academic Accommodation Letter, we invite you to share your letter with us. Academic Accommodation Letters should be shared at the earliest possible opportunity so we may partner with you and OAE to identify any barriers to access and inclusion that might be encountered in your experience of this course.