We are now familiar with the idea of a
lower envelope,
a function
defined by a set
of probability distributions K by:
Let us say we want to know which conditions an arbitrary function v(x) must obey in order to be a lower envelope. This is important if we hope to understand the relationship between sets of distributions and intervals of probability.
First, notice that any such v(x)
has to be non-negative, since a lower envelope will never be negative.
Also, for any v(x), the probability of
the complete space has to be one, and the probability of the
empty set has to be zero. Those two conditions are true for any
probability distribution, so they are true for any lower envelope.
A less obvious expression that is true for any
lower envelope, and must be obeyed by v(x), is:
Now we have four things that any v(x) has to obey. Naturally we are led to suspect that, if v(x) obeys these four things, then v(x) is a lower envelope. But that's not true! Take an example from Huber [16]:
Consider a universe with four atoms, x1, x2, x3, x4. Now define v(x):
Even though we failed to characterize lower envelopes so far, interesting
concepts emerged. Let us call a lower/upper probability pair
any non-negative functions
and
for which:
Lower probability can be defined as a primitive concept by taking
the definition above as as axioms.
The appeal of this method is that the axioms are so close to the
familiar axioms of probability, that they are ``almost evident'': the only
difference is the ``'' symbol instead of the ``='' symbol. This
is a technical argument that has been used before.
But I believe the technical results related to lower probability can be better understood if we see how lower probability relates to sets of probability distributions. So if you arrived at this section without reading the previous discussion, maybe you should look at lower envelopes. So, we could try a new question: how does lower probability relate with convex sets of distributions? Is there a one-to-one relationship? Many-to-one? One-to-many?
The first observation is, they do not relate perfectly. The example about lower envelopes shows that not every lower probability is a lower envelope, i.e., not every lower probability is the representation of a set of distributions. Huber [16] enumerates a number of results that specify when a function v(x) will be a lower envelope; but frankly these results do not help much here as they are almost impossible to understand. Actually, I believe they are impossible to understand in HTML (!!). Instead, we will be better off if we try to classify the various kinds of lower probabilities and look for more structure in this family of models.
Pick a lower probability
.
A probability distribution p(x)
dominates
if
for every event x. This
is equivalent to
for every event x.
Now we have a theorem [30] that says that the set of probability distributions that dominates a lower probability is a closed convex polyhedron (possibly empty) in the simplex of all probability measures. The crucial point here is that the dominating set may be empty!
When a lower probability admits at least one probability distribution that dominates it, we say the lower probability is dominated. In this case the convex polyhedron in the theorem is non-empty. There are non-dominated lower probabilities. In this case the convex polyhedron is empty. Of course a non-dominated lower probability cannot be a lower envelope.
Is a dominated lower probability equivalent to a lower envelope? No! The example about lower envelopes shows a dominated lower probability that is not a lower envelope.
The dominated/non-dominated classification does not really give a lot of insight into the structure of lower probability. The only application of non-dominated lower probability that I know of is the modeling of flicker noise in electronic equipment [7,13,17,20], in what can possibly be the most original work ever in lower probability.
Let us return to our quest for the relationship between lower probability and sets of probability distributions. We will try a new classification of lower probability.
Say that a function
is a
2-monotone Choquet capacity (or
simply 2-monotone) if it is positive and
The assumption of 2-monotonicity introduces a number of good features. First, every 2-monotone lower probability is a lower envelope. The set of probability distributions that create the lower envelope is exactly the set of all probability distributions that dominate the 2-monotone lower probability. So now we get the property we were looking for: a correspondence between a lower probability model and a convex set of probability distributions.
Second, we can now define a lower distribution function:
Suppose we have a loss function
and we want to obtain its
expected loss for all probability distributions that dominate a
2-monotone
.
Since
is
dominated by a convex polyhedron
of distributions, the expected losses will span an interval, from
a minimum to a maximum value in the real line. The minimum and maximum
values are respectively the lower and upper expectations of the
set of dominating distributions! In the world of 2-monotonicity, we
can tie the concepts of lower probability and lower expectation.
To obtain the lower and upper expectations
of a 2-monotone lower probability
,
we compute [30]:
The concept of 2-monotonicity can be generalized. Call a positive function
a n-monotone Choquet capacity if
This is not intuitive at all. But it has some nice properties. Any lower probability is 1-monotone (and of course 2-monotone lower probabilities are 2-monotone capacities!). If a capacity is (n+1)-monotone, then it is n-monotone.
If a lower probability is n-monotone for all n, then it is called infinite monotone or belief function. This is exactly the kind of models that Dempster-Shafer theory uses. But be warned: Dempster-Shafer theory uses the mathematical structure, but as far as I can see, the interpretation of the functions has nothing to do with probabilities nor decision theory. This is a matter of lively debate. I will not discuss the philosophy behind Dempster-Shafer theory here.
In this section, we kept asking the question: what is the relationship between credal sets and various interval-based generalizations of probability? In the process of answering it, we met several important entities. We have the following:
If a function is a probability then
it is a belief function then
it is n-monotone for all n, and in particular 2-monotone, then
it is a lower envelope, then
it is a lower probability.
BUT none of this can be reversed (a lower probability may not be an envelope, an envelope may not be a belief function, etc.). The following diagram may be useful: