Quantum common causes and quantum causal models
Abstract
Reichenbach’s principle asserts that if two observed variables are found to be correlated, then there should be a causal explanation of these correlations. Furthermore, if the explanation is in terms of a common cause, then the conditional probability distribution over the variables given the complete common cause should factorize. The principle is generalized by the formalism of causal models, in which the causal relationships among variables constrain the form of their joint probability distribution. In the quantum case, however, the observed correlations in Bell experiments cannot be explained in the manner Reichenbach’s principle would seem to demand. Motivated by this, we introduce a quantum counterpart to the principle. We demonstrate that under the assumption that quantum dynamics is fundamentally unitary, if a quantum channel with input and outputs and is compatible with being a complete common cause of and , then it must factorize in a particular way. Finally, we show how to generalize our quantum version of Reichenbach’s principle to a formalism for quantum causal models, and provide examples of how the formalism works.
Contents
I Introduction
It is a general principle of scientific thought—and indeed of everyday common sense—that if physical variables are found to be statistically correlated, then there ought to be a causal explanation of this fact. If the dog barks every time the telephone rings, we do not ascribe this to coincidence. A likely explanation is that the sound of the telephone ringing is causing the dog to bark. This is a case where one of the variables is a cause of the other. If sales of ice cream are high on the same days of the year that many people get sunburned, a likely explanation is that the sun was shining on these days and that the hot sun causes both sunburns and the desire to have an ice cream. Here the explanation is not that buying ice cream causes people to get sunburned, nor vice versa, but instead that there is a common cause of both: the hot sun.
That the principle is highly natural is most apparent when it is expressed in its contrapositive form: if there is no causal relationship between two variables (i.e. neither is a cause of the other and there is no common cause) then the variables will not be correlated. In particular, without a general commitment to this latter statement, it would be impossible ever to regard two different experiments as independent from one another, or for the results of one scientific team to be regarded as an independent confirmation of the results of another.
This principle of causal explanation was first made explicit by Reichenbach Reichenbach (1956). It is key in scientific investigations which aim to find causal accounts of phenomena from observed statistical correlations.
Despite the central role of causal explanations in science, there are significant challenges to providing them for the correlations that are observed in quantum experiments Wood and Spekkens (2015). In a Bell experiment, a pair of systems are prepared together, then removed to distant locations where a measurement is implemented on each. The choice of the measurement made at one wing of the experiment is presumed to be made at spacelike separation from that at the other wing. The natural causal explanation of the correlations that one observes in such experiments is that each measurement outcome is influenced by the local measurement setting as well by a common cause located in the joint past of the two measurement events. But Bell’s theorem (Bell, 1964) famously rules out this possiblity: within the standard framework of causal models, if the correlations violate a Bell inequality Clauser et al. (1969)—as is predicted by quantum theory and verified experimentally Hensen et al. (2015); Shalm et al. (2015); Giustina et al. (2015)—then a common cause explanation of the correlations is ruled out. Furthermore, Ref. Wood and Spekkens (2015) proves that it is not possible to explain Bell correlations with classical causal models without unwelcome finetuning of the parameters. This includes any attempt to explain Bell correlations with exotic causal influences, such as retrocausality and superluminal signalling. In the study of classical causation, it is typically assumed that causal explanations should not be finetuned Pearl (2009).
However, the verdict of finetuning applies only to classical models of causation. It was suggested in Ref. Wood and Spekkens (2015) that it might be possible to provide a satisfactory causal explanation of Bell inequality violations, in particular one that preserves the spirit of Reichenbach’s principle and does not require finetuning, using a quantum generalisation of the notion of a causal model. This article seeks to develop such a generalization by first suggesting an intrinsically quantum version of Reichenbach’s principle.
Specifically, we consider the case of a quantum system in the causal past of a bipartite quantum system and ask what constraints on the channel from to follows from the assumption that is the complete common cause of and . In this scenario we are able to find a natural quantum analogue to Reichenbach’s principle. This analogue can be expressed in several equivalent forms, each of which naturally generalises a corresponding classical expression. In particular, one of these conditions states that is a complete common cause of if one can dilate the channel from to to a unitary by introducing two ancillary systems, contained in the causal past of , such that each ancillary system can influence only one of and . This unitary dilation codifies the causal relationship between and and illustrates the fact that no other system can influence both and . Moreover, our quantum Reichenbach’s principle contains the classical version as a special case in the appropriate limit. This suggests that our quantum version is the correct way to generalise Reichenbach’s principle.
The mathematical framework of causal models Pearl (2009); Spirtes et al. (2001) can be seen as a direct generalisation of Reichenbach’s principle to arbitrary causal structures. By following this classical example, we are able to generalise our quantum Reichenbach’s principle to a framework for quantum causal models. In each case, the original Reichenbach’s principle becomes a special case of the framework. Just as with classical causal models, the framework of quantum causal models allows us to analyse the causal structure of arbitrary quantum experiments. It also does so while preserving an appropriate form of Reichenbach’s principle (by construction) and avoiding finetuning.
Although our main motivation for developing quantum causal models is the possibility of finding a satisfactory (i.e., nonfinetuned) causal explanation of Bell inequality violations Wood and Spekkens (2015); Chaves et al. (2015a), they are also likely to have practical applications. For instance, finding quantumclassical separations in the correlations achievable in novel causal scenarios might lead to new deviceindependent protocols Chaves et al. (2015b), such as randomness extraction and secure key distribution. Quantum causal models may also provide novel schemes for simulating many body systems in condensed matter physics Leifer and Poulin (2008) and novel means for inferring the underlying causal structure from quantum correlations Fitzsimons et al. (2015); Ried et al. (2015).
The structure of the paper is as follows. Section II provides a formal statement of Reichenbach’s principle and shows how it can be rigorously justified under certain philosophical assumptions. The main body of results is in Sec. III. Here our quantum generalisation of Reichenbach’s principle is presented and justified by reasoning parallel to that of the classical case. This is then fleshed out with alternative characterisations of our quantum version of conditional independence and some specific examples. We return to the classical world in Sec. IV, discussing classical causal models and providing a rigorous justification of the Markov condition, which plays the role of Reichenbach’s principle for general causal structures. Sec. V then generalizes these ideas to the quantum sphere, and presents our proposal for quantum causal models. Finally, in Sec. VI we describe the relationship of our proposal to prior work on quantum causal models, and in Sec. VII we summarize and describe some directions for future work.
Ii Reichenbach’s principle
ii.1 Statement
Reichenbach gave his principle a formal statement in Ref. Reichenbach (1956). Following Ref. (Cavalcanti and Lal, 2014), we here distinguish two parts of the formalized principle. First is the qualitative part which expresses the intuitions described at the beginning of the introduction. The other is the quantitative part which constrains the sorts of probability distributions one should assign in the case of a common cause explanation.
The qualitative part of Reichenbach’s principle may be stated as follows: if two physical variables and are found to be statistically dependent, then there should be a causal explanation of this fact, either:

is a cause of ;

is a cause of ;

there is no causal link between and , but there is a common cause, , influencing and ;

is a cause of and there is a common cause, , influencing and ; or

is a cause of and there is a common cause, , influencing and .
Note that the causal influences considered here may be indirect (mediated by other variables). If none of these causal relations hold between and , then we refer to them as ancestrally independent (because their respective causal ancestries constitute disjoint sets). Using this terminology, the qualitative part of Reichenbach’s principle can be expressed particularly succinctly in its contrapositive form as: ancestral independence implies statistical independence, i.e., .
The quantitative part of Reichenbach’s principle applies only to the case where the correlation between and is due purely to a common cause (case 3 above). It states that, in that case, if is a complete common cause for and , meaning that is the collection of all variables acting as common causes, then and must be conditionally independent given , so the joint probability distribution satisfies
(1) 
ii.2 Justifying the quantitative part of Reichenbach’s principle
Within the philosophy of causality, providing an adequate justification of Reichenbach’s principle is a delicate issue. It rests on controversy over basic questions, such as what it means for one variable to have a causal influence on another and what is the correct interpretation of probabilistic statements. In this section, we discuss one way of justifying the principle, using an assumption of determinism, which provides a clean motivational story with a natural quantum analogue. Other justifications may be possible.
Suppose we adopt a Bayesian point of view on probabilities: they are the degrees of belief of a rational agent. Dutch book arguments—based on the principle that a rational agent will never accept a set of bets on which they are certain to lose money—can then be given as to why probabilities should be nonnegative, sum to and so forth. But why should an agent who takes to be a complete common cause for and arrange their beliefs such that ? If the agent does not do this, are they irrational?
One way to justify a positive answer to this question is to assume that in a classical world there is always an underlying deterministic dynamics. In this case, one variable is causally influenced by another if it has a nontrivial functional dependence upon it in the dynamics. Probabilities can be understood as arising merely due to ignorance of the values of unobserved variables. Under these assumptions, one can show that the qualitative part of Reichenbach’s principle implies the quantitative part.
In general, a classical channel describing the influence of random variable on is given by a conditional probability distribution . If we assume underlying deterministic dynamics, then although the value of the variable might not be completely determined by the value of , it must be determined by the value of along with the values of some extra, unobserved, variables in the past of which can collectively be denoted . Any variation in the value of for a given value of is then explained by variation in the value . This can be formalised as follows.
Definition 1 (Classical dilation).
For a classical channel , a classical deterministic dilation is given by some random variable with probability distribution and some deterministic function such that
(2) 
where if and otherwise.
We now apply this to the situation depicted in Fig. 1, where is the complete common cause of and . The conditional distribution admits of a dilation in terms of an ancillary unobserved variable , for some distribution and a function from to such that and . The assumption that is the complete common cause of and implies that the ancillary variable can be split into a pair of ancestrally independent variables, and , where only influences and only influences ^{1}^{1}1This is because any other would necessarily introduce new common causes for and that are not screened through , which would violate the assumption that is a complete common cause.. It follows that there must exist and that are causally related to , and as depicted in Fig. 2, where the causal dependences are deterministic and given by a pair of functions and such that and .
In this case, we have
(3) 
Finally, given the qualitative part of Reichenbach’s principle, the ancestral independence of and in the causal structure implies that . It then follows that , which establishes the quantitative part of Reichenbach’s principle.
A wellknown converse statement is also worth noting: any classical channel satisfying admits of a dilation where is the complete common cause of and Pearl (2009).
Summarizing, we can identify what it means for to be explainable in terms of being a complete common cause of and by appealing to the qualitative part of Reichenbach’s principle and fundamental determinism. The definition can be formalized into a mathematical condition as follows:
Definition 2 (Classical compatibility).
is said to be compatible with being the complete common cause of and if one can find variables and , distributions and , a function from to and a function from to , such that these constitute a dilation of , that is, such that
(4) 
With this definition, we can summarize the result described above as follows.
Theorem 1.
Given a conditional probability distribution , the following are equivalent:

is compatible with being the complete common cause of and .

.
The implication is what establishes that a rational agent should espouse the quantitative part of Reichenbach’s principle if they espouse the qualitative part and fundamental determinism.
The implication allows one to deduce a possible causal explanation of an observed distribution from a feature of that distribution. However, it is important to stress that it only establishes a possible causal explanation. It does not state that this is the only causal explanation. Indeed, it may be possible to satisfy this conditional independence relation within alternative causal structures by finetuning the strengths of the causal dependences. However, as noted above, finetuned causal explanations are typically rejected as bad explanations in the field of causal inference. Therefore, the best explanation of the conditional independence of and given is that is the complete common cause of and .
Iii The quantum version of Reichenbach’s principle
In this section, we introduce our quantum version of Reichenbach’s principle. The definition of a quantum causal model that we provide in Sec. V can be seen as generalizing these ideas in much the same way that classical causal models generalize the classical version of Reichenbach’s principle.
iii.1 Quantum preliminaries
For simplicity, we assume throughout that all quantum systems are finitedimensional. Given a quantum system , we will write for the corresponding Hilbert space, for the dimension of , and for the identity on . We will also write for the dual space to , and for the identity on the dual space. If a quantum system is initially uncorrelated with any other system, then the most general time evolution of the system corresponds to a quantum channel, i.e., a completely postive tracepreserving (CPTP) map. If the system at the initial time is labelled , with Hilbert space , and the system at the later time is labelled , with Hilbert space , then the CPTP map is
(5) 
where is the set of linear operators on .
An alternative way to express the channel is as an operator, using a variant of the ChoiJamiołkowski isomorphism Jamiołkowski (1972); Choi (1975):
(6) 
Here, the vectors form an orthonormal basis of the Hilbert space . The vectors form the dual basis, belonging to . The operator therefore acts on the Hilbert space . Although the expression above involves an arbitrary choice of orthonormal basis, the operator thus defined is independent of the choice of basis. This version of the ChoiJamiołkowski isomorphism was chosen because it is both basisindependent and a positive operator. Following Leifer and Spekkens (2013), we have chosen the operator to be normalized in such a way that (in analogy with the normalization condition for a classical channel ).
Suppose that . Given that the operator contains all of the information about the channel , the question arises of how one can express in terms of and . Recall that is defined on , while is defined on . As we discuss further in Sec. V, by defining an appropriate “linking operator” on ,
(7) 
where and are orthonormal bases on and respectively, one can write . This expression is meant to be reminiscent of the classical formula .
Given an operator , acting on , we will use the same expression with missing indices to denote the result of taking partial traces on the corresponding factor spaces. For example, given a channel , we write .
When writing products of operators, we will sometimes suppress tensor products with identities. For example, will be written simply as .
iii.2 Main result
The qualitative part of Reichenbach’s principle can be applied to quantum theory with almost no change: if quantum systems and are correlated then this must have a causal explanation in one of the five forms listed in Sec. II.1 (except with classical variables , and replaced by quantum systems , and ). Here, for two quantum systems to be correlated means that their joint quantum state does not factorize.
Finding a quantum version of the quantitative part of Reichenbach’s principle is more subtle. If a quantum system is a complete common cause of and (as depicted in Fig. 3), then one expects there to be some constraint analogous to the classical constraint that . If one tries to do this by generalising the joint distribution , then one immediately faces the problem that textbook quantum theory has no analogue of a joint distribution for a collection of quantum systems in which some are causal descendants of others. The situation is improved if one focusses on finding an analogue of instead. In standard quantum theory, as long as a system is initially uncorrelated with its environment, then the evolution from to is described by a channel . The operator that is isomorphic to this channel by Eq. (6), denoted , seems to be a natural analogue of . However, even in this case, it is not obvious what constraint on should serve as the analogue of the classical constraint .
The treatment of generic causal networks of quantum systems is deferred to the full definition of quantum causal models in Sec. V. This section focuses on the case of a channel .
In Sec. II.2, we demonstrated how to justify the quantitative part of Reichenbach’s principle from the qualitative part in the classical case under the assumption that all dynamics are fundamentally deterministic. We shall now make an analogous argument in the quantum case by assuming that quantum dynamics are fundamentally unitary. Just as in the classical case, this assumption simply provides a clean way to motivate our result and alternative justifications may be possible.
In general, a quantum channel from to is given by a CPTP map . Assuming underlying unitary dynamics, then the output state at must depend unitarily on along with some extra ancillary system in the past of . This can be formalised as follows.
Definition 3 (Unitary dilation).
For a quantum channel a quantum unitary dilation is given by some ancillary quantum system with state and some unitary from to such that
where the dimension of is fixed by the requirement for unitarity that .
If we represent the channels by our variant of the ChoiJamiołkowski isomorphism, Eq. (6), with representing and representing , then the dilation equation has the form
where is the linking operator defined in Eq. (7).
Just as in the classical case, we would like to apply this to the situation depicted in Fig. 3, where is the complete common cause for and . This was easy classically as it is clear what it means for a classical variable, , to have no causal influence on another, , in a deterministic system. Specifically, if the collection of inputs other than is denoted so there is a deterministic function such that , then the assumption that has no causal influence on is formalized as for some function . In unitary quantum theory the corresponding condition is less obvious, so we spell it out explicitly with a definition.
Definition 4 (No influence).
Consider a unitary channel from to . has no causal influence on if and only if for , we have .
An equivalent definition is this: has no causal influence on in some unitary channel if and only if the marginal output state at is independent of any operations performed on before the system enters the channel. There is a rich literature concerning similar properties of unitary operators from various perspectives. In particular, the results of Ref. Schumacher and Westmoreland (2005) are very close to ours (where they use the phrase “nonsignalling” rather than “no causal influence”) and Refs. Beckman et al. (2001); Eggeling et al. (2002) contain similar results (where they say “semicausal” rather than “no causal influence”).
We can now apply this to the complete common cause situation of Fig. 3. The channel admits a unitary dilation in terms of an ancillary system , for some state and unitary from to . Here, an ancillary output is generally required so that dimensions of inputs and outputs match, but is not important and will always be traced out. This dilation is such that .
Just as in Sec. II.2, the assumption that is a complete common cause for and implies that the ancilla can be factorized into ancestrally independent and where has no causal influence on and has no causal influence on . It follows that systems and are causally related to , , and as depicted in Fig. 4.
The ancestral independence of and implies that the quantum state on factorizes across the , partition, , suggesting the following quantum analogue to our classical compatibility condition of Def. 2.
Definition 5 (Quantum compatibility).
is said to be compatible with being a complete common cause of and , if it is possible to find ancillary quantum systems and , states and , and a unitary channel where has no causal influence on and has no causal influence on , such that these constitute a dilation of .
All that remains is to show that this, together with the qualitative part of the quantum Reichenbach’s principle, implies an appropriate quantitative part (generalising Thm 1).
Theorem 2.
The following are equivalent:

is compatible with being the complete common cause of and .

.
The proof is in Appendix A. Note that there is no ordering ambiguity on the righthand side of the second condition, because the two terms must commute. This is seen by taking the Hermitian conjugate of both sides of the equation and recalling that is Hermitian.
Definition 6 (Quantum conditional independence of outputs given input).
Given a quantum channel , the outputs are said to be quantum conditionally independent given the input if and only if .
It is easily seen that the quantum definition reduces to the classical definition in the case that the channel is invariant under the operation of completely dephasing the systems , , and in some basis. More precisely: if fixed bases are chosen for , and the operator is diagonal when written with respect to the tensor product of these bases, then the outputs are quantum conditionally independent given the input if and only if the classical channel defined by the diagonal elements of the matrix has the property that the outputs are conditionally independent given the input.
With this terminological convention in hand, we can express our quantum version of the quantitative part of Reichenbach’s principle as follows: if a channel is compatible with being a complete common cause of and , then for this channel, and are quantum conditionally independent given .
The implication in the theorem is what establishes the quantum version of the quantitative part of Reichenbach’s principle.
The implication is pertinent to causal inference: analogously to the classical case, if one grants the implausibility of finetuning, then one must grant that the most plausible explanation of the quantum conditional independence of outputs and given input is that is a complete common cause of and .
iii.3 Alternative expressions for quantum conditional independence of outputs given input
Classically, conditional independence of and given is standardly expressed as . However, there are alternative ways of expressing this constraint.
For instance, if one defines the joint distribution over that one obtains by feeding the uniform distribution on into the channel —that is, , where is the cardinality of —then and being conditionally independent given in can be expressed as the vanishing of the conditional mutual information of and given in the distribution Pearl (2009). This conditional mutual information is defined as , with denoting the Shannon entropy of the marginal on the subset of variables indicated in its argument. Therefore, the condition is simply .
Similarly, if and are conditionally independent given in , then it is possible to mathematically represent the channel as the following sequence of operations: copy , then process one copy into via the channel and process the other into via the channel .
We present here the quantum analogues of these alternative expressions. They will be found to be useful for developing intuitions about quantum conditional independence and in proving Thm 2. Recall that the quantum conditional mutual information of and given is defined as , where denotes the von Neumann entropy of the reduced state on the subsystem that is specified by its argument. Analogously to the classical case, we will use a hat to denote an operator renormalized such that the trace is . For example, if is the operator representing a channel from to , then .
Theorem 3.
Given a channel , the following conditions are also equivalent to the quantum conditional independence of the outputs given the input (condition 2 of Thm 2):

where is the quantum conditional mutual information of and given evaluated on the (positive, traceone) operator .

The Hilbert space for the system can be decomposed as and , where for each , represents a CPTP map , and a CPTP map .
The proof is in Appendix A. That conditions 3 and 4 are equivalent follows as a corollary of Thm 6 of Ref. Hayden et al. (2004). Our main contribution is showing that these are also equivalent to condition 2 of Thm 2.
The final condition can be described as follows. First one imagines decomposing the system into a direct sum of subspaces, each of which is denoted . For each , the subspace is split into two factors, denoted and , with one factor evolving via a channel into system , and the other factor evolving via into system . In the special case where there is only a single value of , this is simply a factorization of the system into two parts. In the special case where all of the and are 1dimensional Hilbert spaces, it is simply an incoherent copy operation.
iii.4 Circuit representations
The classical case is shown in Fig. 5, where four equivalent circuits represent the action of a channel , for which the outputs are conditionally independent given the input . The dot in the lower two circuits represents a classical copy operation. Equality (1) simply asserts that the conditional probability distribution admits a classical dilation, as in Def. 1. Equality (4) asserts that the channel is equivalent to a sequence of operations in which is copied, with one copy the input to a channel and one copy the input to a channel . As discussed at the beginning of Sec. III.3, this is one way of expressing the fact that and are conditionally independent given . Equality (3) asserts that and separately admit classical dilations. Finally, equality (2) asserts that is compatible with being a complete common cause of and by depicting conditions under which has no influence on and has no influence on .
Analogous circuit diagrams can be provided in the quantum case, as depicted in Fig. 6, with analogous interpretations of the various equalities. Since quantum systems cannot be copied, however, something must replace the dot that appears in the lower two circuits of Fig. 5. For the lower two circuits of Fig. 6, we introduce a new symbol that indicates the decomposition of the Hilbert space into a direct sum of tensor products, as per condition 4 of Thm 3. The symbol is a circle decorated with the set , where the value indexes the terms in the direct sum. For each value of , the lefthand wire carries the factor and the righthand wire the factor .
In the lower right circuit, the gates represent unitary channels, and are labelled with the corresponding unitary operators and (as opposed to the ChoiJamiołkowski channel operators). The unitary operator , for example, labels a gate whose action is confined to the lefthand factors in this decomposition, along with the system . The interpretation, roughly, is that the form of must respect the decomposition of . More precisely, the unitary operator can be written as a matrix that is block diagonal with respect to the subspace decomposition, with the th block being of the form for a unitary matrix acting on . Similarly, can be written as a block diagonal matrix, with the th block of the form for a unitary matrix acting on .
In the lower left circuit, in a slight mixing of notation, gates are labelled with the channel operators and ^{2}^{2}2The analogous mixed notation also appears in Fig. 5 for the classical case.. Suppose that, as in the figure, a channel operator labels a gate whose action is confined to the lefthand factors in the decomposition, along with another system . This indicates that the channel corresponds to a set of Kraus operators , where for each , the Kraus operator is block diagonal, with the th block being of the form , with acting on . Similarly for , the right hand factors and the system .^{3}^{3}3In introducing the circle notation, we have defined the action of gates on the left/right factors in such a way that coherence between different subspaces in the direct sum can be maintained. This corresponds to the fact that the condition of decomposing into the appropriate form is applied to the unitary or Kraus operators, rather than to the channel operator itself. We have done this on the grounds that with this definition, the circle notation is most likely to be useful in future applications. In the lower two circuits of Fig. 6, however, note that coherence between the different subspaces is lost. In the lower right circuit, coherence is lost when the partial traces are performed on the extra outgoing wires. In the lower left circuit, the final output admits a global factorization of the form , and output wires carrying an index do not even appear, indicating that this degree of freedom has been traced out. Each Kraus operator, in this case, must act nontrivially only on the th subspace, for some , and one may deduce that is of the form , and similarly , consistently with condition 4 of Thm 3.^{4}^{4}4Clearly, the notation can be extended in various ways, to include circles with multiple output wires, circles indicating a further decomposition following another circle, and so on. A fully general interpretation and calculus for these extended circuit diagrams is left for future work.
The equivalences of Fig. 6 can now be summarized as follows. Equality (1) simply asserts the fact that admits a unitary dilation. Equality (4) asserts that the channel is such that and are quantum conditionally independent given , according to the definition we have proposed (Def. 6). This equality follows from the expression for quantum conditional independence described in condition 4 of Thm 3. Equality (3) asserts that the channels and separately admit unitary dilations. Equality (2) asserts that is compatible with being a complete common cause of and by depicting conditions under which has no influence on and has no influence on . Here, the unitary matrix is decomposed as , as per the proof of Thm 2.
iii.5 Examples
iii.5.1 A unitary transformation
Consider the case in which inputs and evolve, via a generic unitary transformation into outputs and . In Fig. 7, we illustrate the circuit and the corresponding causal diagram.
The channel which one obtains in this case is compatible with the complete common cause of and being the composite system . This follows from the fact that has a trivial dilation, which is to say that the ancillary system is not required, and therefore trivially satisfies the condition for compatibility laid out in Def. 5. It follows from Thm 2 that for such a , the outputs and are quantum conditionally independent given the input , which means that , as can also be verified by direct calculation. Similarly, the alternative expressions for this sort of quantum conditional independence, namely, conditions 3 and 4 of Thm 3, can be verified to hold.
iii.5.2 Coherent copy vs. incoherent copy
Consider the simple example of a classical channel, taking to , where are bitvalued and the mapping between input strings and output strings is
(8) 
The outputs of the channel are conditionally independent given the input; variation in fully explains any correlation between and . Indeed this example may be seen as the paradigmatic case of the explanation of classical correlations via a complete common cause.
One quantum analogue of this channel is the incoherent copy of a qubit: a qubit is measured in the computational basis; if is obtained, then prepare the state and if is obtained, prepare . The operator representing this channel is
It is easily verified that this operator satisfies each of the conditions of Thm 2, so that and are quantum conditionally independent given for this channel. The decomposition of the Hilbert space implied by Condition 4 is
where is the dimensional complex Hilbert space, i.e., the complex numbers.
The other direct quantum analogue of the classical copy above is the channel that makes a coherent copy of a qubit, where the mapping from input states to output states is:
(9) 
This channel is represented by the operator
which corresponds to an unnormalized GHZ state. It can easily be verified that for a traceone version of this state, hence it is not the case that outputs and are quantum conditionally independent given the input . There is, then, no way in which this channel can arise as a marginal channel in a situation in which is the complete common cause of and .
At first blush, this conclusion may seem surprising. Given the mapping described by Eq. (9), where would correlations between outputs and come from, other than being completely explained by the input ?
The puzzle is resolved by considering the dilation of the coherent copy to a unitary transformation, and the interpretation of quantum pure states. Consider Figs. 8 and 9, which respectively show a classical copy operation via the classical CNOT gate and a quantum coherent copy operation via the quantum CNOT gate ^{5}^{5}5The quantum version in Fig. 9 was studied for similar reasons in Ref. Schumacher and Westmoreland (2012), though from a different perspective.
In the classical case, there are two reasons why any correlation between and must be entirely explained by statistical variation in the value of . First, the ancillary variable is prepared deterministically with value , so there is no possibility that statistical variation in the value of underwrites the correlations between and . Second, the mapping between input strings and output strings for the classical CNOT gate,
(10) 
(which one easily verifies to reduce to the classical copy of Eq. (8) when one sets to 0), has the causal structure depicted in Fig. 8, so that does not act as a common cause of and but only a local cause of .
In the quantum case, neither reason applies. Concerning the second reason, the quantum CNOT has the causal structure depicted in Fig. 9: the quantum CNOT is such that not only does have a causal influence on , but has a causal influence on as well. In other words, unlike the classical CNOT, there is a back action of the target on the control. It follows that in the quantum case, can act as a common cause of and . Furthermore, the ancilla is prepared in a quantum pure state . This is disanalogous to a point distribution on the value 0 for the classical variable if one takes the view that a quantum pure state represents maximal but incomplete information about a quantum system Caves et al. (2002); Fuchs (2002); Spekkens (2007); Leifer (2006); Spekkens (2016). In this case, one must allow for the possibility that some correlation between and is due to the ancilla, in which case is not the complete common cause of and ^{6}^{6}6It is interesting to consider an exactly analogous scenario, as it arises in the toy theory of Ref. Spekkens (2007). Here, a system analogous to a qubit can exist in one of four distinct classical states (the ontic states of the system). But an agent who prepares systems and measures them can only ever have partial information about which of the four ontic states a system is in. The toy equivalent of a CNOT gate corresponds to a reversible deterministic map, i.e., a permutation of the ontic states. By considering the probability distribution over ontic states of the various systems, one may verify directly that the ontic states of toy systems and are not determined by the ontic state of toy system . Rather, the ontic states of and depend also on the ontic state of . Furthermore, the analogue of a pure quantum state for is a probability distribution on that is not a point distribution. In this way, statistical correlations between and can be underwritten by statistical variation in the ontic state of . .
iii.6 Generalization to one input, outputs
Thms 2 and 3, which apply to quantum channels with one input and two outputs, can be generalized to the case of one input and outputs.
Consider a channel , and let denote the collection of all outputs apart from . The notion of quantum compatibility from Def. 5 generalizes in the obvious way: is said to be compatible with being a complete common cause of , if it is possible to find ancillary quantum systems , states , and a unitary channel where, for each , has no causal influence on , such that these constitute a dilation of .
Theorem 4.
The following are equivalent:

is compatible with being a complete common cause of .

, where for all , , .

For each , where is the quantum conditional mutual information evaluated on the (positive, traceone) operator .

The Hilbert space for the system can be decomposed as such that , where for each , and each , represents a CPTP map .
Iv Classical causal models
iv.1 Definitions
Reichenbach’s principle is important because it generalizes to the modern formalism of causal models Pearl (2009); Spirtes et al. (2001).
A causal model consists of two entities: (i) a causal structure, represented by a directed acyclic graph (DAG) where the nodes represent random variables and the directed edges represent the directed causal influences among these (several examples have already been presented in this article), and (ii) some parameters, which specify the strength of the causal dependences and the probability distributions for the variables associated to root nodes in the DAG (i.e., those with no incoming arrows). Some terminology is required to present the formal definitions.
Given a DAG with nodes , let denote the parents of node , that is, the set of nodes that have an arrow into , and let denote the children of node , that is, the set of nodes such that there is an arrow from to . The descendents of are those nodes , , such that there is a directed path from to . The ancestors of are those nodes such that is a descendent of .
Definition 7.
A causal model specifies a DAG, with nodes corresponding to random variables , and a family of conditional probability distributions , one for each .
Definition 8.
Given a DAG, with random variables for nodes, and given an arbitrary joint distribution , the distribution is said to be Markov for the graph if and only if it can be written in the form of
(11) 
(Recall that each conditional can be computed from the joint )
The generalization of Reichenbach’s principle that is afforded by the formalism of causal models is this: if there are statistical dependences among variables , expressed in the particular form of the joint distribution , then there should be a causal explanation of these dependences in terms of a DAG relative to which the distribution is Markov.
Note that an alternative way of formalizing the Markov property is that is Markov for the graph if and only if, for each , , where is the set of nondescendents of node . The intuitive idea is that the parents of a node screen off that node from the other nondescendents: once the values of the parents are fixed, the values of other nondescendent nodes are irrelevant to the value of .
Note also that Reichenbach’s principle is easily seen to be a special case of the requirement that for a joint distribution to be explainable by the causal structure of some DAG, it must be Markov for that DAG: if two variables, and , are ancestrally independent in the graph, then any distribution that is Markov for this graph must factorize on these, , which is the qualitative part of Reichenbach’s principle in its contrapositive form; if two variables, and , have a variable as a complete common cause, as in the DAG of Fig. 1, then any distribution that is Markov for the graph must satisfy , which is the quantitative part of Reichenbach’s principle.
iv.2 Justifying the Markov condition
Just as we previously asked whether there was some principle that forced a rational agent to assign probability distributions in accordance with the quantitative part of Reichenbach’s principle, we can similarly ask why a rational agent who takes causal relationships to be given by a particular DAG should arrange their beliefs so that the joint distribution is Markov for the DAG.
The justification of the Markov condition parallels the justification of the quantitative part of Reichenbach’s principle that was presented in Sec. II.2. We begin by outlining what the qualitative part of Reichenbach’s principle and the assumption of fundamental determinism imply for any arbitrary causal structure.
Definition 9 (Classical compatibility with a DAG).
is said to be compatible with a DAG with nodes if one can find a DAG that is obtained from by adding extra root nodes , such that for each , the node has a single outgoing arrow, to , and one can find, for each , a distribution and a function from to such that
Theorem 5 (Ref. Pearl (2009)).
Given a joint distribution and a DAG with nodes , the following are equivalent:

is compatible with the causal structure described by the DAG .

is Markov for , that is,
The implication in Thm 5 can be read as follows: if it is granted that causal relationships are indicative of underlying deterministic dynamics, and that the qualitative part of Reichenbach’s principle is valid, then, on pain of irrationality, an agent’s assignment must be Markov for the original graph.
The implication in Thm 5, like that of Thm 1, is pertinent for causal inference. It asserts that if one observes a distribution , then of the causal models that are compatible with this distribution, the only ones that do not require finetuning of the parameters are those involving DAGs relative to which the distribution is Markov.
V Quantum causal models
v.1 The proposed definition
In our treatment of the simple causal scenario where is a complete common cause of and (the DAG of Fig. 3), we focussed on what form is implied for the quantum channel . But there has not been any attempt to define a quantity analogous to the classical joint distribution, that is, a quantity analogous to in the case of the DAG of Fig. 1, nor indeed other classical Bayesian conditionals such as . For works that aim to achieve such analogues, see Ref. Leifer (2006); Leifer and Spekkens (2013). See also Ref. Horsman et al. (2016), however, where it is shown that if one associates a single Hilbert space to a system at a given time, then there are significant obstacles to establishing an analogue of a classical joint distribution when the set of quantum systems includes some that are causal descendants of others
This work takes a different approach. The interpretation of a quantum causal model will be that each node represents a local region of time and space, with channels such as describing the evolution of quantum systems in between these regions. At each node, there is the possibility that an agent is present with the ability to intervene inside that local region. Each node will then be associated with two Hilbert spaces, one corresponding to the incoming system (before the agent’s intervention) and the dual space, which corresponds to the outgoing system (after the agent’s intervention). A quantum causal model will consist of a specification, for every node, of the quantum channel from its parents to the node, with the operational significance of a network being that it is used to calculate joint probabilities for the agents to obtain the various possible joint outcomes for their interventions. This way of treating quantum systems over time has appeared in various different approaches in the literature, including the multitime formalism Aharonov and Vaidman (2007); Aharonov et al. (2009, 2013); Silva et al. (2014), the quantum combs formalism Chiribella et al. (2009); Chiribella (2012); Chiribella et al. (2013), the process matrices formalism Oreshkov et al. (2012); Araújo et al. (2014); Costa and Shrapnel (2016), and a number of other works as well Oeckl (2003); Oreshkov and Cerf (2016, 2015); Ried et al. (2015).
The discussion of classical causal models in Sec. IV, and the results of Sec. III for the special case of a complete common cause of and , suggest the following generalization.
Definition 10.
A quantum causal model specifies a DAG, with nodes , supplemented with the following. For each node , there is associated a finitedimensional Hilbert space (the ‘input’ Hilbert space), and the dual space (the ‘output’ Hilbert space). For each node , there is associated a quantum channel, described by an operator , where is the tensor product of the output Hilbert spaces associated with the parents of . These channels commute pairwise, i.e., for any , (which is a nontrivial constraint whenever is nonempty). The overall state is respresented by an operator on , where , denoted and given by
(12) 
Recall from Section III that, given a quantum channel , it is compatible with being the complete common cause of and if and only if , and if this holds, then . The definition of a quantum causal model, in particular, the stipulation that the channels commute pairwise, generalizes this idea.
v.2 Making predictions
In order to see how a quantum causal model is used to calculate probabilities for the outcomes of agents’ interventions, consider a quantum causal model with nodes and state . Let the intervention at node have classical outcomes labelled by . The intervention is defined by a quantum instrument (that is, by a set of completelypositive tracenonincreasing maps, one for each outcome) which sum to a tracepreserving map. In order to write the probabilities for the outcomes in a simple form, it is useful to define the instrument in such a way that the map associated to each outcome takes operators on into operators on . Hence, suppose that the outcome corresponds to the map and let
The outcome of the agent’s intervention can then be represented by the (positive, basisindependent) operator isomorphic to .
If an agent does not intervene at the node , this corresponds to the linking operator itself,<