Philosophy and Psychology of Explanation

Graduate seminar

Instructor info

Can Konuk
Office hours: Fridays 5:30–6:30 PM, Nora Suppes Building, Room 103

Course description

What makes something a good explanation? This graduate seminar examines explanation from three complementary perspectives: philosophy of science, cognitive psychology, and artificial intelligence. In the first sessions we look at classical accounts from the philosophy of science that attempt to define normative criteria for what counts as a good scientific explanation. A second part of the seminar explores the cognitive science of explanations, which tries to capture the criteria underlying people's common sense explanations. We look into the analogies between those and the normative principles discussed by philosophers. Finally, the last part of the course applies some of these insights to the emerging challenges of AI interpretability. We will see how many of the strategies deployed to make black box models amenable to human understanding can be understood as applications of the criteria for good explanations studied by philosophers and psychologists.

Schedule

1

Theories of Scientific Explanation

Sessions 1–5 · Weeks 1–3
Session 1 Mon, Mar 30
Introduction: Theories of explanation
Woodward, "Scientific Explanation" (SEP)
Session 2 Wed, Apr 1
Statistical relevance model
Salmon, "Statistical Explanation" (1971), §§1–3, 7–8. Skim §13.
Hempel & Oppenheim, "Studies in the Logic of Explanation" (1948)
Session 3 Mon, Apr 6
Unificationist model
Session 4 Wed, Apr 8
Pragmatics of explanation
van Fraassen, "The Pragmatics of Explanation" (1980)
Session 5 Mon, Apr 13
The interventionist account of explanation
2

Cognitive Science of Explanation

Sessions 6–12 · Weeks 3–6
Session 6 Wed, Apr 15
Causal explanation in cognition
Lombrozo & Vasilyeva, "Causal Explanation" (2017)
Session 7 Mon, Apr 20
The causal hierarchy
Bareinboim et al., "On Pearl's Hierarchy and the Foundations of Causal Inference" (2022), READ ONLY §§1.1–1.3. OPTIONAL ADDITION: §§1.4–1.5.
Session 8 Wed, Apr 22
Actual causation
Halpern & Pearl, "Causes and Explanations" (2005)
Halpern, Actual Causality Ch. 1–2
Session 9 Mon, Apr 27
Causal selection & normality
Icard, Kominsky & Knobe, "Normality and Actual Causal Strength" (2017)
Session 10 Wed, Apr 29
Counterfactual effect sizes
Quillien & Lucas, "Counterfactuals and the Logic of Causal Selection" (2023). Read up to the end of study 2; you may also skip the "Reanalysis of existing data" section in the introduction.
Session 11 Mon, May 4
Pragmatics & communication
Kirfel, Icard & Gerstenberg, "Inference from Explanation" (2022)
Session 12 Wed, May 6
Communication-first explanation
Harding, Gerstenberg & Icard, "A Communication-First Account of Explanation" (2025). Focus on sections 1, 2.3, 3, 4.1, 4.4, and 5 (≈19 pages). Sections 2.1–2.2 recap material from earlier sessions (Hempel/Salmon, Woodward); skim only if you want a refresher. Sections 4.2, 4.3, and 4.5 apply the model to additional explanatory virtues — optional.
3

Interpretable AI

Sessions 13–17 · Weeks 7–9
Session 13 Mon, May 11
Foundations of interpretable AI
Erasmus, Brunet & Fisher, "What Is Interpretability?" (2021). Focus on §1, §2.2, §3, and §4.1–§4.3 (≈12 pages).
Session 14 Wed, May 13
Local model-agnostic explanations (LIME)
Session 15 Mon, May 18
Feature attribution methods
Note: The paper is mostly heavy on technicalities. As an introduction you can read and reaction post instead through Hart, "Shapley Value" (The New Palgrave Dictionary of Economics, 2nd ed., 2008), instead which talks about Shapley values in general — not just in xAI — although the course will focus on the AI application.
Session 16 Wed, May 20
Causal abstraction & mechanistic interpretability
Geiger, Lu, Icard & Potts, "Causal Abstractions of Neural Networks" (2021) — listed here as the conceptual scaffold for the session; this is not the paper I recommend you read.
Please read this one instead: Ameisen et al. (Anthropic, 2025), "Circuit Tracing: Revealing Computational Graphs in Language Models"sections 1–3 only (Introduction, Building an Interpretable Replacement Model, Attribution Graphs).
No Class Mon, May 25
Memorial Day — No class
Session 17 Wed, May 27
Causal mechanisms in LLMs
Refresher: quickly skim sections 1–3 of Ameisen et al. (Anthropic, 2025), "Circuit Tracing: Revealing Computational Graphs in Language Models" (the Session 16 reading) for context then read just the short section "Chain-of-thought Faithfulness" (≈3 pages) from the companion paper to this one at: Lindsey et al. (Anthropic, 2025), "On the Biology of a Large Language Model"
Optionally, you can also read the section "Uncovering Hidden Goals in a Misaligned Model" immediately following this one in the same paper .
 

Student Presentations

Sessions 18–19 · Week 10
Session 18 Mon, Jun 1
Student presentations I
Session 19 Wed, Jun 3
Student presentations II

General information

What to expect?

What you can expect from me

I will …

What I expect from you

You will …

Grading

Reaction posts

Reaction posts should be submitted by 8pm the night before class. They should be 1-2 paragraphs and engage critically with the assigned readings. You should express your opinion rather than summarize the contents of the paper. You may raise questions, identify connections to other material, or offer a brief argument.

Final paper

The final paper should be 1000-2000 words and engage substantively with topics covered in the course. It may be one of the following three:

You will also be expected to give a short presentation of your project during the last two sessions of the course, which will be part of the grading for the final paper.

Policies

Please familiarize yourself with Stanford’s honor code. We will adhere to it and follow through on its penalty guidelines.

Access and accommodations

Stanford is committed to providing equal educational opportunities for disabled students. Disabled students are a valued and essential part of the Stanford community. We welcome you to our class. If you experience disability, please register with the Office of Accessible Education (OAE). Professional staff will evaluate your needs, support appropriate and reasonable accommodations, and prepare an Academic Accommodation Letter for faculty. To get started, or to re-initiate services, please visit oae.stanford.edu. If you already have an Academic Accommodation Letter, we invite you to share your letter with us. Academic Accommodation Letters should be shared at the earliest possible opportunity so we may partner with you and OAE to identify any barriers to access and inclusion that might be encountered in your experience of this course.