Next: Lower Expectations and Lower Up: A Brief Introduction to Previous: The Mathematical Axioms of

Subsections

Important Definitions: Conditional Preferences and Independence

There are two definitions in probability theory that are as important as the basic axioms: conditional probabilities and independence. These concepts really give life to probability theory and form the core of Bayesian thinking. Are there similar ideas in the theory of sets of probabilities? In fact, there are many ideas, but for the most part these issues are yet unresolved.

Conditionalization

Roughly speaking, conditional preferences arise when an agent has to choose options assuming that some event is given. The concept of conditional preferences induces the idea of conditional beliefs, i.e., the beliefs of the agent conditioned by the fact that some event is given.

Simple as it seems, the formalization of conditional beliefs has proved to be a great challenge. Here is the fact: there is no obvious way to define conditional distributions for all distributions in a credal set, and obtain an expression like Bayes rule. This seems to knock out many people. Apparently some people fell that anything harder to write down than Bayes rule is a sign of extraordinary complexity. Others feel that the best way is to come up with some new definition of ``conditionalization'', which is not related to Bayes rule but at least is easy to compute. The least I can say is that this matter is yet to be resolved conclusively.

I will present the definition proposed by Giron and Rios in their paper about Quasi-Bayesian behavior [11]; it happens to be quite similar to the definition used by Walley [30], and Walley has done a superior job at analyzing the implications of the theory.

Before I plunge into mathematics, here is the idea: a Quasi-Bayesian agent maintains a convex set of conditional distributions. Each conditional distribution is obtained using Bayes rule from a unconditional distribution.

Giron and Rios also provide a natural definition for preference relations conditioned on states, and obtain a characterization of this preference relation in terms of conditional probability.

First some notation. Giron and Rios consider an act I_A(f) that:

Yields an arbitrary constant value, say zero, outside the event A. So if A does not obtain, the agent gets this constant value.
Yields the same values as the act f is event A obtains.

To understand this, consider the real line. Now pick an act given by the function $f(x) = \sin(x)$ . Take the event A to be some arbitrary interval. The act I_A(f) yields zero if the state x is outside A, and yields f(x) if x is inside A.

Giron and Rios take the expression f is at least as preferred as g when event A obtains to mean the act I_A(f) is at least as preferred as act I_A(g). This makes intuitive sense: given that A obtains, the agent does not care about states that are outside A. So comparisons among acts given A should only focus on the rewards or losses that happen for states within A. All other states should receive the same arbitrary loss since they simply do not matter.

Now Giron and Rios postulate that the same axioms of preference should hold for acts given events. The result is: the relation of preference conditional on a state is characterized by a set of conditional distributions. For acts d₁ and d₂, the assertion d₁ < d₂, given A, implies that the expected utility of d₁ is larger than the expected utility of d₂, with respect to all probability distributions generated by conditioning in A.

A theorem obtained by Giron and Rios indicates that a set of posterior distributions is obtained through application of Bayes rule to each one of the distributions in the set of prior distributions.

The definition mimics the definition of conditionalization for a single distribution. But be careful, because here is the caveat: there is no general expression for conditional lower expectations, lower envelopes, lower probabilities or Choquet capacities that mimics Bayes rule (these expressions are explained later). As I said, this seems to cause a lot of anxiety in many people (but I think some optimism would lead us to be happy that there are many research breakthroughs to be made...).

Independence

Independence of events and experiments is a crucial part of probabilistic thinking in general; recently it has been raised to an even larger status in the wake of Bayesian nets research [21].

But independent has always been a murky concept in probability theory, with people fighting about its meaning and representation. The standard, Kolmogorov-based, definition of independence is this: two events A and B are independent if P(AB) = P(A) P(B). In probability theory, this is equivalent to P(A|B) = P(A) or P(B|A) = P(B), perhaps a more intuitive formulation.

A definition of independence for the theory of sets of probabilities is yet to be settled. There is a fundamental caveat in this issue, which I want to emphasize before the discussion gets too convoluted.

The issue is that you cannot pick two convex sets of functions and form a new convex set by multiplying each element in the first set by each element in the second set. Consider a simple example. Pick an interval defined by

$\begin{displaymath} 2 \alpha + 4 (1-\alpha), \end{displaymath}$

and another interval defined by

$\begin{displaymath} 30 \beta + 60 (1-\beta), \end{displaymath}$

with $\alpha$ and $\beta$ in the interval [0,1].

Both intervals are convex sets. But now form a set defined by multiplying terms from both sets:

$\begin{displaymath} (2 \alpha + 4 (1-\alpha))(3 \beta + 6 (1-\beta)) = 6(a-2)(b-2) \end{displaymath}$

which is the expression of a quadratic and clearly not a convex set.

So you cannot pick two convex sets arbitrarily, multiply them memberwise and stay in the theory of convex sets. So what, some may say. Let's abandon convexity. Unfortunately, it is really hard to put together a theory of preferences without convexity, because the axioms that produce linearity of expected value also cause the credal sets to be convex (maybe there is a way out of these difficult technical problems; as far as I know these are open problems).

But here is an even more dramatic way of perceiving these facts. Suppose you say that A and B are independent, and you give a credal set for A and a credal set for B. Now it seems that the way to produce a joint credal set for AB is to consider the convex hull of all distributions obtaining by multiplying memberwise the credal sets of A and B. But hey, since such a process creates a non-convex set, you will introduce joint distributions that are not equal to P(A)P(B) for any P(A) or P(B) in the credal sets of A and B!

Given this, it may seem that independence concepts related to lower envelopes and lower probability models in general are hard to state. In fact, there are a number of counter-examples, paradoxes and discussions that arise [30] in the arena of independence concepts.

The possible way to study independence is to look at convex sets of conditional distributions, and say that the important thing is that the conditional distributions behave like independent quantities. Here is Walley's definition:

Say that B is irrelevant to A when $\underline{p}(A\vert B) = \underline{p}(A\vert B^c) = \underline{p}(A)$ . Say that A and B are independent when B is irrelevant to A and A is irrelevant to B.

This definition agrees with the standard definition when all credal sets have single members (so everything is perfectly Bayesian), and it seems reasonable for other cases.

Using this definition, then it is true that the memberwise multiplication of two independently constructed lower envelopes will generate an ``independent'' joint lower envelope. What is not true (and perhaps disturbing to many) is that there may be more than one joint lower envelope that has the proposed marginals. Remember that in usual probability theory a joint distribution is uniquely determined by independent marginals; here we have to forget that.

This discussion about independence would be longer; for example, there is a need to define independence of variables (not just independence of events). There is a lot of research to be done before a clear synthesis can be achieved. I'll cut it here by now, and try to update as new interesting material emerges. Of course, let me know if you have something of interest.).

Next: Lower Expectations and Lower Up: A Brief Introduction to Previous: The Mathematical Axioms of

Fabio Gagliardi Cozman
1999-12-30