Next: The Mathematical Axioms of Up: A Brief Introduction to Previous: The Basics of the

Subsections

Foundation of the Theory of Sets of Probabilities

Decision theory starts from the states, acts and utilities that have to be specified by the acting agent. For the purposes of this discussion, usually we can represent the states, acts and utilities in a table. For example, you can either go to the park, go to the movie or stay home, and it can either be sunny or cloudy:

	sunny	cloudy
park	10	-10
market	-5	4
home	0	0

How do we choose the best act? Just by looking at the table we could argue that park is better because it could gives us maximum reward (10); but also home could be better because it never gives us a strong punishment.

The problem of decision theory is to specify how to choose the ``best act''. Bayesian theory has been very succesful in this regard [4,5,21,23] as a prescription for what a rational agent should do. The Bayesian framework essentially says that:

Given the states of nature $\theta_j$ , there is a single probability distribution $p(\theta)$ that summarizes the beliefs of the agent about which $\theta_j$ obtains.
An act with high expected utility is preferred to an act with lower expected utility.

The Bayesian framework is derived from a number of axioms that are supposed to apply to decision making.

The next idea is to start with a similar, but more general, set of axioms and generate a convex set of probability distributions, called the credal set [11,18]. When we follow this route, Bayesian theory is a particular case in which we assume that the agent always has a single distribution (the convex set of distributions has a single member).

Modifications of axioms of usual Bayesian decision theory have been proposed with a variety of justifications, ranging from psychological observations of human behavior to robustness techniques in statistical analysis. A theory of sets of probabilities represent one of the main ways in which one can relax the Bayesian framework in a principled manner. In Quasi-Bayesian theory, we ask: how can any agent be sure about preferences and decisions to the point that a single probability distribution can be chosen? This appears unreasonable for the kinds of agents that we have to deal with in real life; it also appears unreasonable if we consider agents composed of many entities (like organizations, for example).

In short: a rational agent has a utility function that translates his preferences and a convex set of probability distributions that translates his beliefs.

The Meaning of the Credal Set

Let us study carefully what a convex set of distributions means in terms of preferences. Consider a loss function $l(\cdot)$ and two acts a₁ and a₂. Since each act is a function of the states, we can obtain the expected loss of an act by picking a probability distribution.

Take a distribution p₁. You can obtain the expected values E₁[a₁] and E₁[a₂] for the acts.

Take another distribution p₂. You can obtain the expected values E₂[a₁] and E₂[a₂] for the three acts.

Suppose E₁[a₁] < E₁[a₂] and E₂[a₁] > E₂[a₂]. Now a₁ and a₂ cannot be compared with respect to expected loss.

There is a lot of controversy about what the agent should do at this point; this will be discussed later. Right now, the important point is to understand that we cannot create a complete order with a convex set of distributions.

So an agent that uses a credal set has a partial order of preferences. What is this supposed to mean?

There are two basic ways to look at this situation [30]:

Incomplete beliefs

In this interpretation, the agent could possibly refine beliefs and establish a unique, complete order among acts. In other words, the agent could specify a single probability distribution that would reflect a complete order of acts. That would be the ``true'' distribution. Why doesn't the agent do that in the first place? Here we can have two answers:

Because the agent is not confident that a single distribution is the ``true'' one. Call this the sensitivity analysis interpretation; this is used to justify robust Bayesian Statistics [1].
Because the agent does not have the time, resources or patient to specify a single distribution. Call this the abstraction interpretation.

Exhaustive beliefs

In this interpretation, the agent has already thought as much as possible about the situation, but still could not specify complete preferences. Some acts are just incomparable for the agent.

So here we have some similar but different interpretations of credal sets. Different interpretations have led to different technical questions and results, so it is important to pay attention to these issues.

A Digression: The Convexity of the Credal Set
Giron and Rios require that their axioms produce a convex set of distributions. A convex set of functions is a set of functions where, if f₁ and f₂ belong to the set, then a mixture of f₁ and f₂ belong to the set. A convex combination of a set of functions f_j is given by $\sum a_j f_j$ , where a_j are non-negative numbers that sum to unity.
Why a convex set? A partial order can be created with a non-convex set of distributions (for example, by picking the boundary of a convex set).
But here is the point: all preferences that are valid with a given set of distributions, are valid if we pick the convex hull of this set! This is due to the linear character of the the expected loss operation. Whatever happens with a set of distributions, it also happens with all convex combinations of those distributions -- hence you have the convex hull. In general, the partial order of preference is unchanged if we take the convex hull of a set of distributions.
What can we make of this fundamental observation? If we justify our theory in terms of preferences, then it seems that there is a strong bias toward convex sets. Convex sets of distributions are the larger sets that induce a particular pattern of preferences. This is the path followed by Quasi-Bayesian theory. The theory is formalized axiomatically in terms of preference axioms, such that convex credal sets arise as the basic representation for beliefs and preferences.
But if we have a different interpretation for sets of distributions, then there may be no reason to take them to be convex. One can construct a theory that explicitly differentiates between sets of distributions when they are convex and non-convex. We will see that this point can be used to discuss independence concepts later.

**A Digression: The Convexity of the Credal Set**
Giron and Rios require that their axioms produce a convex set of distributions. A convex set of functions is a set of functions where, if f₁ and f₂ belong to the set, then a mixture of f₁ and f₂ belong to the set. A convex combination of a set of functions f_j is given by $\sum a_j f_j$ , where a_j are non-negative numbers that sum to unity. Why a convex set? A partial order can be created with a non-convex set of distributions (for example, by picking the boundary of a convex set). But here is the point: all preferences that are valid with a given set of distributions, are valid if we pick the convex hull of this set! This is due to the linear character of the the expected loss operation. Whatever happens with a set of distributions, it also happens with all convex combinations of those distributions -- hence you have the convex hull. In general, the partial order of preference is unchanged if we take the convex hull of a set of distributions. What can we make of this fundamental observation? If we justify our theory in terms of preferences, then it seems that there is a strong bias toward convex sets. Convex sets of distributions are the larger sets that induce a particular pattern of preferences. This is the path followed by Quasi-Bayesian theory. The theory is formalized axiomatically in terms of preference axioms, such that convex credal sets arise as the basic representation for beliefs and preferences. But if we have a different interpretation for sets of distributions, then there may be no reason to take them to be convex. One can construct a theory that explicitly differentiates between sets of distributions when they are convex and non-convex. We will see that this point can be used to discuss independence concepts later.

Reasons to Adopt a Set of Probabilities

In short, there are strong reasons for adopting a set of probabilities as the basic model for uncertainty [25]:

The theory builds a realistic account of the imperfections in an agent's preferences. The theory can be used to represent poor elicitation of preferences and situations of indifference among actions. It can be used to represent vague beliefs that are poorly represented by a single prior (say a single uniform prior).
Robustness studies can be formalized through this model [1]. A set of distributions can be used to study how the acts are affected by changes in the agent's beliefs.
The theory can represent the disparate opinions of a group of agents [19], something that can hardly be represented in usual decision theory.

Of course, the theory of sets of probabilities has the advantage of a solid axiomatic foundation (take for example the Quasi-Bayesian theory of Giron and Rios or the theory of coherent lower previsions of Walley), which is no small thing when you consider the number of possible ad hoc approaches to uncertainty.

Next: The Mathematical Axioms of Up: A Brief Introduction to Previous: The Basics of the

Fabio Gagliardi Cozman
1999-12-30