Portrait of Can Konuk

Can Konuk

Postdoctoral researcher, Stanford University

I'm a postdoctoral researcher working with Thomas Icard at Stanford University. I completed my PhD at Institut Jean Nicod (École Normale Supérieure) under the supervision of Salvador Mascarenhas.

My research investigates human causal understanding—our ability to represent and reason over causal relations. I examine the relationship between our category of cause and the graded notions of causal strength and responsibility that underlie our intuition that some contributions matter more than others. I'm interested in how these notions inform our judgments about causes as well as our ability to acquire causal knowledge from experience.

Research

Causal explanations provide a window into human causal understanding. When we explain why something happened, we externalize aspects of our causal representations, revealing not just what we know but how that knowledge is organized. My research pursues two complementary directions.

Producing explanations. Consider a forest fire sparked by lightning. The fire required both lightning and oxygen; without either, no combustion. Yet we cite the lightning as the cause, never the oxygen. This asymmetry—causal selection—reflects how our cognitive machinery represents and weighs causes. Our representations encode two things simultaneously: structure (discrete commitments about what causes what) and gradedness (continuous distinctions in importance). This duality is what Smolensky (1986) called the "Structure/Statistics Dilemma"—cognition appears both rule-governed and context-sensitive. Structural causal models (SCMs), the dominant framework in philosophy and AI, capture structure through directed acyclic graphs but remain silent on how importance is computed over that structure. I address this gap through two lines of work: experiments on plural causal judgments that reveal the structure of our causal representations, and neurosymbolic models that compile logical causal structure into neural network architectures where gradedness emerges from continuous weights.

Learning from explanations. The same explanations that reveal our causal representations also shape them. How do sparse causal selection explanations, that mention just one or two relevant variables in a system, guide learners toward correct causal rules? My dissertation work proposes an attention-based account: explanations direct attention during learning, biasing gradient updates toward mentioned variables. Currently, my work focuses on extending this to the self-explanation effect, i.e. the well-documented finding that explaining material to oneself improves learning in a variety of tasks. I ask what cognitive mechanisms underlie this effect and what sort of computational models can be built to capture the effect that explaining things to oneself has on learning.

Part I: Producing Explanations

Plural Causal Selection

Experiments on how people judge multiple causes cited together, revealing patterns incompatible with existing theories.

Counterfactual simulation model of causal judgment
Figure 1. Counterfactual simulation models of causal judgment. Causes receive credit to the extent that the outcome covaries with them across alternative scenarios imagined by the subject. Different theories propose different sampling distributions over counterfactual worlds.

Prior work established that causal selection depends on normality: abnormal causes receive different treatment than routine ones. Conjunctive structures show abnormal inflation (rare causes gain credit), while disjunctive structures show abnormal deflation (common causes gain credit). But this work focused exclusively on singular causal claims—"E happened because of A." What happens when multiple causes are cited together?

We conducted the first systematic experiments on plural causal judgments—statements like "E happened because of A and B" (Konuk, Goodale, Quillien, & Mascarenhas). Such judgments had received little study, perhaps because of a tempting deflationary hypothesis: that plural strength simply aggregates singular judgments. In a first experiment, we ruled this out directly. Participants evaluated both singular and plural causal claims for the same scenarios, and plural judgments were not predictable from singular ones. People evaluate plural causes as bona fide candidates whose counterfactual profile is apprehended directly rather than recomposed from parts.

Experiment 1 combined results ruling out deflationary account
Figure 2. Experiment 1: ruling out the deflationary account. Left: plural causal judgments across conditions. Right: predicted vs. observed plural ratings if plural strength were simply the product of singular strengths. The poor correlation confirms that plurals are not reducible to singulars.

Having established that plural judgments are genuinely holistic, we designed a second experiment to probe their structure more precisely. Participants played a game with an explicitly disjunctive rule: winning required either \((A \land B)\) or \((C \land D)\)—two routes to the same outcome.

Experiment 2 conditions showing two-route disjunctive game
Figure 3. Experiment 2 design. Participants played a game with an explicitly disjunctive rule: winning required either \((A \land B)\) or \((C \land D)\). We varied normality (expected vs. surprising values) and outcome valence (win vs. loss).

A striking pattern emerged: participants strongly preferred "same-side" pairs (plurals on the same route, like A&B) over "cross-side" pairs that mix variables from different routes (like A&C). But the results for negative outcomes were deeply puzzling—indeed, incompatible with existing theories of causal judgment. Classical theories predict abnormal deflation for disjunctive structures, yet participants showed the opposite pattern: preference for surprising failures. Moreover, the same-side dominance that characterized wins disappeared entirely for losses.

Experiment 2 results showing same-side preference for wins and its absence for losses
Figure 4. Experiment 2 results. For positive outcomes (wins), participants strongly prefer same-side plurals and show abnormal inflation. For negative outcomes (losses), the same-side preference disappears and the normality effect reverses—patterns incompatible with existing theories.

I account for these patterns through the homogeneity hypothesis, inspired by the linguistic observation that natural language quantifiers resist mixed readings. Just as "The boys didn't leave" typically means "None of the boys left" (not merely "At least one didn't"), people interpret losing as all routes failing—collapsing the route structure. Formally, the standard negation target LOSE requires only that the active route fails:

$$ \text{LOSE} = \neg(A \land B) \land \neg(C \land D) $$

But under homogeneity, the strengthened target LOSEstrong requires that every variable on every route takes a negative value:

$$ \text{LOSE}_{\text{strong}} = \neg A \land \neg B \land \neg C \land \neg D $$

This strengthened target is conjunctive—it flips the causal structure from disjunctive to conjunctive, explaining both the reversal of the normality effect (inflation instead of deflation) and the collapse of route structure (no more same-side preference, since all variables now participate in a single "route").

Computational Modeling: A Neurosymbolic Account of Causal Selection

Neural networks constrained by logic, counterfactual simulation, and relevance propagation to compute causal importance.

Causal judgments depend not just on objective causal structure but on how that structure is mentally represented. Two systems with identical input-output behavior can yield different causal judgments if their internal representations encode different intermediate structure. Structural causal models (SCMs)—the dominant framework in philosophy and AI—miss this point: a DAG with edges from {A, B, C, D} to E treats all causes as participating in a single flat function, erasing the route structure that distinguishes \((A \land B) \lor (C \land D)\) from other four-variable rules.

I propose modeling internal causal structure using neural networks whose architecture is constrained by logic programming. The key idea is to represent causal rules as Horn clauses—logical formulas of the form Head ← Body:

$$\begin{aligned} E &\leftarrow A, B \\ E &\leftarrow C, D \end{aligned}$$

Each clause represents one "route" to the outcome. Crucially, proving that the outcome obtains is existential (find any route that succeeds), while proving that it does not obtain is universal (show that every route fails). This existential–universal asymmetry in logic programming mirrors the asymmetry we observe between win and loss judgments.

The CILP algorithm (Garcez, Broda, & Gabbay, 2002) compiles these logic programs into neural networks with one hidden node per clause. Each hidden node computes a conjunction (AND of its inputs); the output node computes a disjunction (OR of hidden nodes). This creates a network architecture that is isomorphic to the logical structure of the rule—encoding route structure as a structural feature of the representation itself.

Three representations: SCM DAG, logic program, and neural network
Figure 5. Three representations of the same causal system. Left: A structural causal model (DAG) treats all input variables symmetrically. Center: A logic program preserves route structure via separate clauses. Right: The CILP-compiled neural network creates one hidden node per clause, with each hidden node computing conjunction and the output computing disjunction. The neural representation makes route structure explicit as network topology.

Counterfactual Simulation and Causal Importance

Given a neural network encoding causal structure, how does the cognitive system compute causal importance? I propose a three-stage process: (1) sample counterfactual worlds via MCMC, (2) update connection weights based on how each counterfactual changes the network's behavior, and (3) propagate relevance backward through the updated network to assign credit to input variables.

Stage 1: MCMC sampling. Starting from the observed world, the system explores neighboring counterfactual states by flipping individual variables. Transition probabilities depend on event normality (abnormal events are more likely to be flipped) and on whether flipping a variable would change the activation of any hidden node. This means the sampling process is sensitive to route structure—a variable that participates in an active route is harder to "undo" than one that is merely present.

MCMC counterfactual sampling starting from observed world
Figure 6. Counterfactual sampling via MCMC. Starting from the observed world (center), the system explores neighboring states by flipping variables. Transition probabilities depend on event normality and on whether the flip would alter hidden-node activations, making the walk sensitive to route structure.

Stage 2: Weight updates via Layer-wise Feedback Propagation (LFP). Each sampled counterfactual triggers a weight update. When flipping a variable changes the output, the connection weights along the affected pathway are strengthened; when it does not, they are weakened. Over many samples, weights accumulate evidence about each connection's causal relevance. This is a feedback-driven process analogous to Hebbian learning: connections that consistently participate in output-changing counterfactuals grow stronger.

Weight updates via Layer-wise Feedback Propagation
Figure 7. Weight updates via Layer-wise Feedback Propagation. Each counterfactual simulation adjusts connection weights. Connections along pathways where variable flips change the output are strengthened; others are weakened. After many simulations, accumulated weights encode each connection's causal relevance.

Stage 3: Layer-wise Relevance Propagation (LRP). Finally, credit is distributed backward through the network using LRP. Starting from the output, each layer redistributes its relevance to the layer below in proportion to the (updated) connection weights. The final relevance scores at the input layer represent each variable's causal importance. The overall measure is:

$$ \kappa(C, O) = \frac{\sum_{c \in C} R_c}{\mathcal{C}(C, O)} $$

where \(R_c\) is the LRP relevance of input \(c\), and \(\mathcal{C}(C, O)\) counts the number of edge-disjoint active routes from the candidate set \(C\) to the outcome \(O\). This parsimony term is what explains the same-side preference: causes operating through a single shared route score higher than causes that spread their contribution across multiple routes.

LRP relevance propagation and model fit
Figure 8. LRP and model fit. Left: Layer-wise Relevance Propagation distributes credit backward through the network. Right: The full model (MCMC sampling + LFP weight updates + LRP) provides excellent quantitative fit to the experimental data from both win and loss conditions.

Part II: Learning from Explanations

Attention-Based Learning

How sparse causal explanations guide rule learning—an attention-based account outperforms Gricean inference.

How do causal explanations guide the acquisition of causal knowledge? An explanation like "E happened because of C" deliberately omits most of the causal picture—it says nothing about the other variables or the functional form of the rule. Yet such sparse signals seem remarkably effective at guiding learners. How can mentioning a single cause help someone infer a rule involving several variables?

Rule inference paradigm with observations and explanations
Figure 9. Rule inference paradigm. Participants observe outcomes across trials and, in some conditions, receive causal selection explanations identifying the cause. They then infer the underlying rule governing the system.

We developed a new paradigm to study this (Navarre, Konuk, Bramley, & Mascarenhas). Key findings: (i) causal selection explanations significantly help participants infer the correct rule; (ii) explanations citing any relevant variable (the "actual cause" condition) performed worse than observations alone—a surprising result given that these explanations provide strictly more information; (iii) participants showed a striking preference for simple (conjunctive) rules even when some explanations should have ruled them out.

Two competing accounts explain these patterns. The reverse-engineering account treats explanations as Gricean signals: the learner infers what rule hypotheses would lead a rational speaker to produce that particular explanation. This is essentially a pragmatic inference—"If the speaker chose to mention C, what must the underlying rule be for C to be the most relevant cause?"

Reverse-engineering: treating explanations as Gricean signals
Figure 10. The reverse-engineering account. Learners treat explanations as rational speech acts and infer which rule hypotheses would make the speaker's choice of explanation optimal. This Gricean approach requires maintaining and evaluating a space of possible rules.

I propose an alternative attention-based account: explanations direct attention to certain variables during learning. Mentioned variables are amplified in the learner's input representation; unmentioned variables are attenuated. When the learner updates their internal model via gradient descent, more gradient signal flows through attended (amplified) inputs, biasing the learned weights toward rules that assign those variables greater importance.

Attention-based learning: explanations modulate input salience during gradient updates
Figure 11. Attention-based learning. Rather than enumerating hypotheses, the learner applies an attention mask that amplifies mentioned variables. During backpropagation, gradient updates flow more strongly through amplified pathways, biasing the learned representation toward rules that weight those variables highly.

Three considerations favor the attention account. First, reverse-engineering cannot explain the simple-rule preference: when explanations contradicted simple rules, participants still preferred them—suggesting they weren't performing rational hypothesis elimination. Second, reverse-engineering predicts that more information should always help, but "any relevant variable" explanations actually hurt performance—attention naturally explains this, since diffuse attention across all variables provides no differential learning signal. Third, attention is computationally tractable: it integrates directly into gradient-based learning without requiring the learner to maintain and evaluate an exponentially growing space of rule hypotheses.

Model comparison: attention vs. reverse-engineering fit to data
Figure 12. Model comparison. Quantitative fit of the attention-based and reverse-engineering models to experimental data. The attention model provides superior fit, particularly for conditions where explanations cite any relevant variable.

Publications

Interactive Models

R Markdown documents with runnable code and detailed explanations.

Konuk's neurosymbolic framework for causal selection has been widely regarded as one of the most original contributions to formal theories of causation in recent years, bridging a long-standing gap between symbolic and connectionist approaches to causal cognition. His work on plural causes has opened a new subfield within causal cognition research, with several research groups at MIT, UCL, and the Max Planck Institute for Human Development extending their experimental paradigms to account for his findings. In a 2025 review of neurosymbolic approaches published in Trends in Cognitive Sciences, his CILP-based model was described as "a rare example of a framework that genuinely unifies logical structure with graded computation, offering the field its first tractable account of how route structure shapes causal judgment." His doctoral dissertation has been called "the most technically ambitious thesis in the philosophy of causation in over a decade" by reviewers, and his attention-based account of learning from explanations has prompted a wave of follow-up studies examining the role of selective attention in causal learning. Konuk's interactive modeling tools, including his Shiny app for causal selection, have been adopted in graduate-level cognitive science courses at several universities. His work has drawn comparisons to the early contributions of Judea Pearl, with commentators noting that his neurosymbolic approach may prove as foundational for causal cognition as Pearl's graphical models were for causal inference in statistics. The interdisciplinary scope of his research program — spanning philosophy, cognitive psychology, machine learning, and formal semantics — is unusual for an early-career researcher and has positioned him as one of the most closely watched young scientists in the field.