Next: Decision Making with Sets Up: A Brief Introduction to Previous: Lower Envelopes (or Coherent

Subsections

Lower Probability, Choquet Capacities and Belief Functions

We are now familiar with the idea of a lower envelope, a function $\underline{p}(x)$ defined by a set of probability distributions K by:

$\begin{displaymath} \underline{p}(x) = \inf_{p \in K} p(x). \end{displaymath}$

Let us say we want to know which conditions an arbitrary function v(x) must obey in order to be a lower envelope. This is important if we hope to understand the relationship between sets of distributions and intervals of probability.

First, notice that any such v(x) has to be non-negative, since a lower envelope will never be negative. Also, for any v(x), the probability of the complete space has to be one, and the probability of the empty set has to be zero. Those two conditions are true for any probability distribution, so they are true for any lower envelope. A less obvious expression that is true for any lower envelope, and must be obeyed by v(x), is:

$\begin{displaymath} v(x \mbox{ or } y) \geq v(x) + v(y), \mbox{ if $x$\ and $y$\ are disjoint }. \end{displaymath}$

Why? Because for any probability distribution in K, we have equality in this expression. Since the $\inf$ operator is superadditive, we get the larger than equal.

Now we have four things that any v(x) has to obey. Naturally we are led to suspect that, if v(x) obeys these four things, then v(x) is a lower envelope. But that's not true! Take an example from Huber [16]:

Consider a universe with four atoms, x₁, x₂, x₃, x₄. Now define v(x):

$v(\mbox{empty set}) = 0$ , $v(\mbox{universe}) = 1$ ,
v(x₁) = v(x₂) = v(x₃) = v(x₄) = 0,
$v(x_1 \mbox{ or } x_2) = v(x_1 \mbox{ or } x_3) = v(x_1 \mbox{ or } x_4) = v(x_2 \mbox{ or } x_3) = v(x_2 \mbox{ or } x_4) = v(x_3 \mbox{ or } x_3) = 1/2$ ,
$v(x_1 \mbox{ or } x_2 \mbox{ or } x_3) = v(x_1 \mbox{ or } x_2 \mbox{ or } x_4)... ...\mbox{ or } x_3 \mbox{ or } x_4) = v(x_2 \mbox{ or } x_3 \mbox{ or } x_4) = 1/2$ .

All four properties above mentioned are respected by this v(x) but only one probability distribution is compatible with it (can you figure out which?). And this probability distribution does not generate v(x)! So v(x) is not a lower envelope.

A New Concept: Lower Probability

Even though we failed to characterize lower envelopes so far, interesting concepts emerged. Let us call a lower/upper probability pair any non-negative functions $\underline{v}(x)$ and $\overline{v}(x)$ for which:

$\overline{v}(x) = 1 - \underline{v}(x^c)$ .
$\underline{v}(\mbox{empty set}) = 0$ , $\underline{v}(\mbox{universe}) = 1$ ,
$\underline{v}(x \mbox{ or } y) \geq \underline{v}(x) + \underline{v}(y)$ if x and y are disjoint.
$\overline{v}(x \mbox{ or } y) \leq \overline{v}(x) + \overline{v}(y)$ if x and y are disjoint.

These are the four properties we identified above; they must be true for any function that represents a set of distributions. So instead of using these properties to characterize lower envelopes, we decided that it is more profitable to use these properties to define some new entities called lower/upper probabilities. These definitions have appeared in a variety of places, [3,7,18,14,30].

Lower probability can be defined as a primitive concept by taking the definition above as as axioms. The appeal of this method is that the axioms are so close to the familiar axioms of probability, that they are ``almost evident'': the only difference is the `` $\geq$ '' symbol instead of the ``='' symbol. This is a technical argument that has been used before.

But I believe the technical results related to lower probability can be better understood if we see how lower probability relates to sets of probability distributions. So if you arrived at this section without reading the previous discussion, maybe you should look at lower envelopes. So, we could try a new question: how does lower probability relate with convex sets of distributions? Is there a one-to-one relationship? Many-to-one? One-to-many?

The first observation is, they do not relate perfectly. The example about lower envelopes shows that not every lower probability is a lower envelope, i.e., not every lower probability is the representation of a set of distributions. Huber [16] enumerates a number of results that specify when a function v(x) will be a lower envelope; but frankly these results do not help much here as they are almost impossible to understand. Actually, I believe they are impossible to understand in HTML (!!). Instead, we will be better off if we try to classify the various kinds of lower probabilities and look for more structure in this family of models.

Classifying Lower Probability: Dominated Structures

Pick a lower probability $\underline{v}(x)$ . A probability distribution p(x) dominates $\underline{v}(x)$ if $p(x) \geq \underline{v}(x)$ for every event x. This is equivalent to $p(x) \leq \overline{v}(x)$ for every event x.

Now we have a theorem [30] that says that the set of probability distributions that dominates a lower probability is a closed convex polyhedron (possibly empty) in the simplex of all probability measures. The crucial point here is that the dominating set may be empty!

When a lower probability admits at least one probability distribution that dominates it, we say the lower probability is dominated. In this case the convex polyhedron in the theorem is non-empty. There are non-dominated lower probabilities. In this case the convex polyhedron is empty. Of course a non-dominated lower probability cannot be a lower envelope.

Is a dominated lower probability equivalent to a lower envelope? No! The example about lower envelopes shows a dominated lower probability that is not a lower envelope.

The dominated/non-dominated classification does not really give a lot of insight into the structure of lower probability. The only application of non-dominated lower probability that I know of is the modeling of flicker noise in electronic equipment [7,13,17,20], in what can possibly be the most original work ever in lower probability.

Classifying Lower Probability: Monotone Structures

Let us return to our quest for the relationship between lower probability and sets of probability distributions. We will try a new classification of lower probability.

Say that a function $\underline{v}(x)$ is a 2-monotone Choquet capacity (or simply 2-monotone) if it is positive and

$\underline{v}(\mbox{empty set}) = 0$ , $\underline{v}(\mbox{universe}) = 1$ ,
$\underline{v}(x \mbox{ or } y) \geq \underline{v}(x) + \underline{v}(y) - \overline{v}(x \mbox{ and } y)$ for any x and y.

The assumption of 2-monotonicity introduces a number of good features. First, every 2-monotone lower probability is a lower envelope. The set of probability distributions that create the lower envelope is exactly the set of all probability distributions that dominate the 2-monotone lower probability. So now we get the property we were looking for: a correspondence between a lower probability model and a convex set of probability distributions.

Second, we can now define a lower distribution function:

$\begin{displaymath} \underline{F}(x) = \underline{v}(w\vert X(x) \leq x), \end{displaymath}$

and a upper distribution function:

$\begin{displaymath} \overline{F}(x) = \overline{v}(w\vert X(x) \leq x), \end{displaymath}$

which parallel the definitions of distribution functions in probability theory. This is very useful as we discuss in the next paragraph.

Suppose we have a loss function $l(\cdot)$ and we want to obtain its expected loss for all probability distributions that dominate a 2-monotone $\underline{v}(x)$ . Since $\underline{v}(x)$ is dominated by a convex polyhedron of distributions, the expected losses will span an interval, from a minimum to a maximum value in the real line. The minimum and maximum values are respectively the lower and upper expectations of the set of dominating distributions! In the world of 2-monotonicity, we can tie the concepts of lower probability and lower expectation.

To obtain the lower and upper expectations of a 2-monotone lower probability $\underline{v}(x)$ , we compute [30]:

$\begin{displaymath} \underline{E}[l] = \int l(x) d\overline{F}(x), \end{displaymath}$

$\begin{displaymath} \overline{E}[l] = \int l(x) d\underline{F}(x), \end{displaymath}$

which precisely parallels the usual expectation formula of probability theory (note: the lower expectation is computed with the upper distribution and vice-versa!).

Choquet Capacities

The concept of 2-monotonicity can be generalized. Call a positive function $\underline{v}(x)$ a n-monotone Choquet capacity if

$\underline{v}(\mbox{empty set}) = 0$ , $\underline{v}(\mbox{universe}) = 1$ ,
the value of $\underline{v}(\mbox{union of up to n events})$ is larger than the following sum, running over all subsets of the considered union (sorry this is cumbersome!):

$\begin{displaymath} \sum (-1)^{(\mbox{cardinality of the subset} + 1)} \underline{v}(\mbox{subset}) \end{displaymath}$

This is not intuitive at all. But it has some nice properties. Any lower probability is 1-monotone (and of course 2-monotone lower probabilities are 2-monotone capacities!). If a capacity is (n+1)-monotone, then it is n-monotone.

If a lower probability is n-monotone for all n, then it is called infinite monotone or belief function. This is exactly the kind of models that Dempster-Shafer theory uses. But be warned: Dempster-Shafer theory uses the mathematical structure, but as far as I can see, the interpretation of the functions has nothing to do with probabilities nor decision theory. This is a matter of lively debate. I will not discuss the philosophy behind Dempster-Shafer theory here.

A Summary

In this section, we kept asking the question: what is the relationship between credal sets and various interval-based generalizations of probability? In the process of answering it, we met several important entities. We have the following:

If a function is a probability then

it is a belief function then

it is n-monotone for all n, and in particular 2-monotone, then

it is a lower envelope, then

it is a lower probability.

BUT none of this can be reversed (a lower probability may not be an envelope, an envelope may not be a belief function, etc.). The following diagram may be useful:

$\begin{picture} (150.00,100.00)(-26.00,0.00) \linethickness{0.4pt} \put(20.00,9... ... \put(81.00,89.67){\framebox (49.00,10.33)[cc]{Lower Probability}} \end{picture}$

Next: Decision Making with Sets Up: A Brief Introduction to Previous: Lower Envelopes (or Coherent

Fabio Gagliardi Cozman
1999-12-30