“It is remarkable that a science, which commenced with a consideration of games of chance, should be elevated to the rank of the most important subjects of human knowledge. …we shall see that there is no science more worthy of our meditations, and that no more useful one could be incorporated in the system of public instruction.”

A Philosophical Essay on Probabilities, Marquis de Laplace 1819

“Our beliefs may predispose us to misinterpret the facts, where ideally the facts should serve as the evidence upon which we base beliefs.”

How to Think about Weird Things, Theodore Schick 1994

We are going to explore a most extraordinary thing; it masquerades as an equation yet produces philosophy, seems abstract yet finds application in the most unlikely of places, is equally useful to mathematicians and non-mathematicians alike, it requires no more than high school math yet is subtle and far reaching. How far reaching? In addition to the very real-world effectiveness it offers, it also transforms perennial questions in philosophy concerning epistemology and the foundations of logic’s bugaboo, induction. Applying it to one’s own thinking processes has been shown to improve psychological wellbeing. As it becomes more widely understood and appreciated it might even improve the quality of our social decision making and communications. These are extraordinary claims and as Carl Sagan wisely pointed out, “extraordinary claims require extraordinary evidence”. Ah, just so. Claims, likelihood, evidence – these are all the subjects at the heart of this book.

This is designed to be a handbook for using the extraordinary conceptual tool, Bayes Theorem. Learning any tool is best accomplished through using it, so throughout this book we put it into practice as a magnifying glass, a telescope and a microscope. Part of the uniqueness of this particular tool is how it applies to a wide array of intellectual concerns. Our journey will take us far and wide through tours of philosophy, biology, information, computation, cognitive science, psychology and sociology. Does this mean we will be cranking through one math equation after another? Hardly, there are plenty of more qualified math teachers than I for those who want to pursue the field of Bayesian probability in its full mathematical form. Instead, we will explore Bayesianism as a way of thinking. As a how-to manual for learning an intellectual technique it is my hope that by the time you have completed this book you will have gained a valuable addition to your own cognitive tool belt. Tools are best learned through use so…

Please rank how likely you consider the following statements on a scale of 1 to 10 with 1 being most certainly false and 10 being most certainly true. Fractional values are allowed, indeed encouraged.

- There is intelligent life in the universe beyond the confines of planet earth.
- There is intelligent extra-terrestrial life and it has visited the planet earth.
- Psychics are able to communicate with the dead.
- The Sun is approximately 4.57 billion years old.
- There is as much energy released in the few instants of a super-nova as our Sun will produce over its 10 billion year lifetime.
- The world supply of oil will reach its peak of production within a decade or two.
- Human consciousness is really experiencing a virtual reality; we exist in a Matrix-like computational world.

We will return to this little cognitive science experiment in rating how plausible you consider a set of propositions. It illustrates a traditional use of probability, one that appeals to our common sense. In the 1950s this classical interpretation of probability was considered too subjective to be used in science and was discarded by what is now considered the orthodox school of statistics. In the original development of mathematical probability theory in the 1700s it was applied to two sets of problems; measuring the frequency of so-called random events *and* measuring the uncertainty of a proposition given the information pertaining to it. The rise of this orthodoxy discarded the work involved with the logic of propositions and retained only that which was related to the frequency of events. The theoretical basis of the frequency theory of probability had been developed in the 1920s by Richard von Mises and its application in statistics brilliantly developed by Ronald Fisher. Interestingly the same decade saw the publication of the ‘Treatise on Probability’ by John Maynard Keynes which taught and defended the original concept of probability as a logic pertaining to propositions (Mises 1928, Fisher 1925, Keynes 1921). It is due to the domination of the orthodox school over the last century that today when we consider probability it is typically understood to relate only to the frequency of an event, “If I flip a coin three million times…” The discarded half of the traditional development of the subject was the study of how to deal with the uncertainty in human knowledge. This larger traditional understanding is once again receiving considerable attention in economics, neuroscience, imaging research, physics, cognitive science, robotics, machine intelligence and many other fields of research. Though the orthodox textbooks have for the most part yet to catch up, the influence of this conceptual revolution has started filtering its way throughout our society. The treatment of probability as logic provides techniques that can be used when the available data is sparse, or contains a large number of unknowns or when the need is to estimate the probability of non-repeating or rare events. In all these circumstances the frequency approach is inadequate, yet these are just the types of circumstances encountered regularly in modern computer aided research. These techniques with their roots in the other half of the traditional development of probability theory are often referred to as Bayesian.

In 1702 one of the first six non-conformist ministers ordained in England celebrated the birth of his first child, Thomas Bayes. He was to follow in his father’s footsteps becoming ordained a non-conformist minister and in 1733 became minister of the Presbyterian Chapel in Tunbridge Wells, just outside of London. The Reverend Bayes produced a paper that history only notes due to one of those serendipitous events that so often accompany intellectual discovery; after the Reverend’s death a friend, Richard Price, found the paper among his effects and sent it to the Royal Society. “Essay towards solving a problem in the doctrine of chances” was published posthumously in the Philosophical Transactions of the Royal Society in 1764. The paper begins,

“PROBLEM

*Given* the number of times in which an unknown event happened and failed:

*Required* the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named.”

The Bayesian theory that will concern us was fully developed long after the time of the discovery of ‘reverse conditionals’ in his paper. The field bears his name because the solution he found is critical to the later developments at the hands of Sir Harold Jeffries, Poincarè, Cox, Jaynes and Pearl. These researchers built on the works of the earlier probability theorists, most of whom today we would consider mathematical physicists: Laplace, Daniel and James Bernoulli, Poisson, Gauss, Boltzmann, Maxwell and Gibbs.

The contrast between the recent and traditional views can easily be seen by considering the old chestnut of probability theory, the toss of a die. What is the probability that the die will land with the number 2 on top? There are six sides on the cubical die and assuming it is fairly constructed and fairly tossed each is equally likely so the probability of a 2 is 1/6. In this trivial example all schools of thought agree with the numeric conclusion. Where they disagree is in what this number means. The frequentist approach explains it by an appeal to a long run of tosses; if the die were to be tossed a great many times one would find that it lands with the number 2 on the top one sixth of the time. The Bayesian points out that there are many unjustified assumptions involved in this frequency as a definition of probability. When the die is tossed only once it still makes sense to speak of the probability of it showing 2 as one sixth, but then what relevance are the great many tosses which did not in fact occur? Just how many are needed to give a ‘great many times’ a precise definition? Stranger still this interpretation seems to see the probability as a property of the die, as real as say its mass or size. Probability by this account is a most peculiar property unlike those recognized in physics, a spooky kind of “propensity” postulating cause and effect relations unlike anything known to science. For the Bayesian the explanation of probability presents none of these conceptual difficulties. They teach us to look more closely at the circumstances. A die, like all gambling devices, is designed to produce unpredictable outcomes. There is no need to suggest that the laws of mechanics are insufficient, no need to add a metaphysical postulate. It is obvious that what is going on is that the tractability of the laws goes beyond our power. The outcome of any single toss depends on the angle of the hand, the velocity imparted, height of the throw, number of bounces on the table, and the frictional forces it encounters as it bounces along – to mention but a few of the mechanical factors involved. The probability of tossing a 2 being one sixth is a measure of *our* uncertainty about how all these complex causal relationships will play out. It is the most we can say given the data we have. From the Bayesian perspective the traditional view is guilty of a ‘mind projection fallacy’ by seeing in the die a property of ‘randomness’ instead of recognizing the rather trivial physics of the situation and accurately assigning the probability to our knowledge, not the die itself.

Recognizing probability as a statement about our level of uncertainty instead of as a characteristic of certain phenomenon out in the world is what gives the Bayesian approach the power it has to address numerous problems that are highly complex or simply have no solutions in traditional statistics. In traditional statistics one often needs to find the best estimate of a parameter, a fixed but unknown value. Since the traditional view requires the application of probability only to the so-called random variables and the parameter being sought is fixed, it does not make sense in this school of thought to apply a probability to it. Traditional statistics instead seeks an estimator function, whole families of which are required since each type of problem requires a different estimator. It is not always possible to find the correct estimator function at all. Probability as logic stands in sharp contrast to this approach. It recognizes that we can use probability to describe what it is we know about the parameter being sought. In this alternative school of thought it makes as much sense to apply the same probabilistic techniques in finding the uncertain parameter as we would in seeking the value of an uncertain variable. This same difference between the approaches applies to dealing with nuisance parameters, performing hypothesis testing, and finding sufficient statistics. Where the Bayesian position solves all these problems by appealing to the same set of axioms, the traditional approach cannot appeal to a single theory. The result is that the traditional approach has had to develop a collection of ad hoc solutions where one set of techniques are used for parameter estimation, another for hypothesis testing and so on. These inference recipes of the traditional school include the maximum likelihood parameter estimation (MLE), analysis of variance (ANOVA), fiducial distributions, the randomized design of experiments, unbiased estimators, confidence intervals, and whole families of significance tests. Just when to apply which recipe in complicated circumstances is still hotly debated and if the person with data to analyze cannot find a published recipe that applies to their problem they need to seek the services of a professional statistician. For the working Bayesian any inference problem is solved the same way: calculate the probability of what is of interest but unknown, conditional on what is known and relevant to the problem. The application of Bayes theorem provides guidance on how to proceed every time.

Needing to reason in the face of incomplete information is the perennial human condition. Most of what concerns us matters considerably more than which number a die will show when it is tossed. More typically throughout evolutionary time we have been asking ourselves questions like, which side of the river is the Wooly Mammoth likely to be on? How many of us with our pointy sticks is it going to take to bring home the Mammoth bacon? When the questions are important to us we care about the answers and we use the most careful thinking of which we are capable. This thinking is our much vaunted reasoning, the characteristic that earned our species the scientific name *Homo sapiens sapiens,* the thinker who thinks about thinking. (The Latin term Sapiens, the present active participle of sapiō meaning to discern, or be capable of discerning.) As this meta-cognition has proceeded we have developed logic as an explanation of what it is we do when we carefully reason.

When rationally seeking the truth of a postulate we have at our disposal two logical tools. The first is deductive logic which allows us to derive deductive conclusions as illustrated in the well known syllogism: Socrates is a man – all men are mortal –therefore we deduce that Socrates is mortal. This is unassailable logic delivering certainty in our conclusions. Deductive logic only requires that the conclusion follows necessarily from the premises. Unfortunately most questions that are truly interesting and of great moment in our lives are different; the answers we seek do not follow necessarily from the premises. In these circumstances we use the second logical tool, inductive logic. This logic allows us to draw inductive conclusions which do not lend themselves to the simple yes or no answers of certainty. We say a deductive argument is valid or invalid but that an inductive argument is weak or strong. The degree of strength in an inductive argument comes directly from the evidential relationship between its premises and the conclusions (Skyrms 1986). This is the way of human life where from a few tracks in the mud we seek to determine the whereabouts of the Wooly Mammoth. This is the logic of science where, from a limited set of observations and experimental results, we seek generalizations on which to base a theory. Consider this example of an inductive argument: ten thousand times the acceleration of a falling apple was measured to be 9.8 meters per second per second – chances are measuring it another ten thousand times will produce the same result – therefore the acceleration due to gravity everywhere and every when is 9.8 meters per second per second. This is not unassailable logic; the conclusion does not follow necessarily from the premises. The premises included the telling phrase ‘the chances are’ and with the introduction of chance we have moved out of the realm of certainty and into the realm where all that can be logically asserted is a probability. In fact the acceleration example is incorrect; it does not apply everywhere but only on earth or within a gravity field with an earth-like mass. The correct conclusion is of course Newton’s brilliant insight that gravity falls off inversely with the square of the distance proportional to the masses of the objects involved. This historical achievement of inductive logic well illustrates its characteristics, the nature of the treacherous missteps that can lead to invalid conclusions as well as its necessity and the value it brings when properly applied.

There are two points relevant to our subject. The first is that in spite of the weakness of inductive logic it is the best tool we have for constructing theories. We all are caught up in theory making as we navigate through life, it is not a problem we can just shuck off on the scientists. The problem of induction has a long philosophical history and presents a very real problem for any coherent account of human knowledge. I have become persuaded that Bayesian thought provides a valuable contribution to understanding the problem of induction. It recognizes that there is an algorithm to guide us but no single solution that applies to all problems. This proposal was laid out in one of the most delightful mathematical textbooks of all time, Edwin Jaynes’ *Probability Theory: the Logic of Science* (Jaynes 2003). It is the inspiration behind much of what is to follow. The second point is that inductive logic inescapably includes a factor of uncertainty and this precludes absolute knowledge. The nature of human knowledge is constrained on all sides. Our evidence can never be complete if for no other reason than only a limited amount of time is available in which to make our observations and experiments. When we seek to understand the regularities of life, the universe and everything we necessarily rely on inductive logic and inductive logic can only justify a particular degree of belief in the conclusions it reaches, “a degree of partial entailment” (Keynes 1921).

It is important to be clear on what is and is not being stated by introducing the concept of degrees of rational belief. The degree of belief applies to our assessment of a proposition. The proposition itself, the fact in the world to which it applies, is either true or false. A person on trial is either guilty or not, extraterrestrial life either exists in the universe or it does not, psychics can communicate with the dead or they cannot, regardless of our opinions. In Bayesian thought it is muddled thinking to confuse these two, the degree of belief is not a degree of truth. To say we have 80% confidence in the proposition ‘it will rain tomorrow’ is not to say it is 80% true. It is to say we *believe* there is an 80% chance of rain tomorrow. It will either rain tomorrow or it will not and the day after tomorrow we will know with 100% certainty whether it had rained. Fuzzy logic is the subject that tries to make sense of a statement like 80% true. This is not at all the same subject as Bayesian thought.

“In frequentist statistics, a hypothesis is a proposition (which must be either true or false), so that the (frequentist) probability of a frequentist hypothesis is either one or zero. In Bayesian statistics, a probability can be assigned to a hypothesis.” – Wikipedia, Bayesian Probability

To illustrate the insight the Bayesian approach brings to the problem of induction let’s spend just a moment with swans. From the observation of 100 white swans are we justified in drawing the conclusion that all swans are white? How about from 10,000 observations? A million? Assuming the number of past observations is N what is the value of one more, N + 1?

These swans are somewhat famous among those who are fascinated with probability and inductive reasoning. The poet Juvenal writing in Latin in the early first and second century CE wrote “a rare bird in the lands, and very like a black swan.” The phrase became a common expression for something impossible throughout all the centuries in which Europeans did not know black swans existed. The whole apple cart was over turned in 1697 when the Dutch explorer Willem de Vlamingh discovered black swans in Australia. With this single observation the conclusion that all swans are white was disproved and the term black swan came to be used for the power of falsification (ala Popper) and associated with the undoing of reasoning and inductive logic when any of its fundamental postulates were found to be incorrect. In philosophy this is known as the black swan problem; a singular statement (‘there is a white swan’) cannot be used to affirm a universal statement (‘all swans are white’) but it can be used to show that the universal statement is false – observe one black swan and the hypothesis all swans are white is categorically shown to be false by deductive logic’s modus tollens. We will have a chance to explore this when we take up the logic of science. There we will see how the Bayesian acceptance of inductive logic provides a viable alternative to Popper’s falsification which is a philosophy of science that only admits deductive logic.

One of the results that have made Bayesian statistics popular among scientists and engineers is that it avoids some of the illogical conclusions that alternative forms of probability and statistics are prone to. The classical interpretation of probability is frequentist, as we saw with the die earlier. It defines a probability as how frequently something will occur over the long run of repeated trials. For example if a fair coin is flipped many, many times the number of heads will tend to equal the number of tails so the probability of heads is 1/2. What are the chances that in a given throw of a die it will land with 2 facing up? There are six sides to a die and one has the 2 so we say the chances are 1 out of 6. The frequentist interpretation is that 1/6 of the time in a long run of throwing the die it will land with the number 2 uppermost. The model is simply (number of times target observed) / (total number of possible observations). For the swans:

Applied to the problem of the black swan classical statistics would seek the maximum likelihood, specifically the Bernoulli form capable of handling binary outcomes. Suppose we have seen three white swans, N = 3. What is the probability that the next observed swan, N+1 is black? Setting up the frequentist equation: 0 / 3 = 0 which is the mathematical way of claiming that it is impossible, the very issue that gives rise to the problem of the black swan. If N = 100 this model produces the same result, 0 / 100 = 0. It is insensitive to the evidence involved.

The Bayesian approach differs and is more capable of handling even the small sample set where N = 3. Frequentist approaches are very powerful but include a reliance on large data sets, at least in theory. This is what it means to say ‘in the long run’ or ‘over very many observations.’ In the limit the probability becomes well defined but extending a very small set of observations to the limit is logically problematic. The Bayesian approach dealing with Bernoulli spaces includes what is called a Beta distribution. Distributions are a way of characterizing the spread of probabilities. Treating how the equation is derived as a black box (using a Beta(1,1) prior) the resulting general form is:

With three white swan observations the answer differs from impossible, 0 + 1 / 0 + 3 + 2 = 1/5 or 20%. Having seen three white swans the probability that black swans could also exist is 20%. As we see more and more white swans the chance of observing a black swan becomes more and more improbable; with 10 white swans 1/12 or 8%, with 100 white swans 0.98%. The Bayesian model is sensitive to the number of observations, the evidence involved. We expect each confirming observation of a white swan to lower the probability of encountering a black one. The progression captures the intuitively correct conclusion without introducing an absolute judgment about impossibility. Even with one million observations of white swans without a single black the probability is 0.00000099, tiny enough to perhaps qualify as practical certainty, but not zero. Being sensitive to the evidence also provides another characteristic we intuitively expect. When there are very few data points the addition of new ones have a large effect, each new observation adding substantially to our knowledge. Increasing the number of observations from three to ten changed the probability from 20% to 8%. On the other hand if there have already been a large number of observations we would expect one more to have less of an effect; that additional observation is not adding as much new evidence to our knowledge so carries less weight. Having seen one hundred white swans observing one more will only change the probability of a black swan very slightly, from 0.98% to 0.97%.

The technically minded should consult section 12.4.3 in Jaynes (2003) where he discusses when the indifference prior which has been used here is appropriate for Bernoulli trials. He finds that it applies when we know it is physically possible for there to be a success and failure. The above assumes it is physically possible for there to be a black swan before its first observation. It could be argued that before the observation of the first black swan the ‘pre-prior’ of Jaynes’ equation 12.50 applies instead which leads to the frequentist result. Jaynes’ analysis addresses the so called problem of Laplace’s succession as it relates to invariance of the prior under transformational groups, a mathematics that need not concern us here though readers familiar with this ‘problem’ and troubled by it are encouraged to review how it is understood in modern Bayesian analysis. Surprisingly the result shows that frequency statistics in practice admit only the succession prior and in problems where this is the correct approach the Bayesian and classical mathematics derives the same final results.

There is much more going on here than simply having sample equations arrive at different answers. A wholly different philosophy is involved. The frequentist school defines probability in such a way that it only makes sense to assign probabilities to repeatable events. In Bayesian thought the domain of probability theory is greatly expanded by defining probability as degrees of belief. Using this definition it makes sense to assign probabilities to propositions directly, making the proposition more or less plausible (Cox 1961). This shift interprets probability as logic, justifying statements about plausibility which may or may not be applied to repeatable events. Any question that can be well formed is dealt with correctly by the Bayesian machinery since they are well defined by its interpretation of probability. This extends the domain of applicability for probability theory to the many areas of experience in which we must form opinions or make decisions about singular situations. This is one of the reasons Bayesianism is becoming more widely used by scientists and engineers who often need to deal with sparse data. How strong should the conclusions be that are drawn from observing a single super nova? How much risk is involved in a possible volcanic eruption that has never occurred? The same situation occurs in many of the circumstances we encounter everyday. Should I take this job, go to this college, marry this person, do I have this disease, should I bet on this roll of the dice – all are questions concerned with singular events, many of which have little or no previous data on which to base our decision.

When there is no previous data directly related to the question we are pondering are we left with nothing? Do we approach the question as a pure blank slate? Bayesian thinking recognizes that it is impossible for real humans in the real world to be wholly objective observers. Nor would it be desirable if we could. We bring prior beliefs to bear on the question of whether a given hypothesis is likely to be true or not. We bring the sum of our previous valuable and relevant experiences. In the sciences previous theory and experimental results will strongly influence the assessment of how likely a hypothesis is. The same occurs outside of science as we bring experience and past considerations to bear to any question we might ask. There is always some prior probability involved in our reasoning. Sometimes the prior belief is that all hypotheses are equally likely in which case we use what is called an indifferent prior. The Beta(1,1) used in the example of the black and white swans is one such indifferent prior, assuming a flip of a coin will come up heads 1/2 of the time is another. Accounting for this prior is the defining characteristic of the Bayesian approach.

In our search to discover truth we perform experiments or make observations to gather data that relate directly to the hypothesis we are considering. This data is the evidence that will be involved in the process of evaluating the plausibility of the hypothesis. The evidence could arise from an enormous, though not infinite, set of possible causes. Considering all these possible causes produces some probability for the evidence itself regardless of the actual cause. This differs from the likelihood that this particular evidence is true given that the hypothesis being considered is in fact true. I might consider 1,000 observations of white swans to describe the true state of the world’s swan population as 60% likely, whereas the same observations given the hypothesis that all swans are white is true I may assess as being closer to 99% likely, leaving a point of percentage for the odd non-white mutation. So there is the probability for the evidence itself, p(evidence) and the probability for the evidence given that the is hypothesis is true, p(evidence | hypothesis) where the ‘|’ is read “given that.”

Bayes theorem is used to determine *how likely the proposed hypothesis is given the evidence*. It transforms the prior probability into what is referred to as the posterior probability. In symbols it calculates p(hypothesis |evidence). Consider the implications of those italicized words. It captures all rational striving for human understanding.

All the conceptual building blocks are now in place to introduce Bayes theorem. In words: Bayes Theorem states the probability of a hypothesis given the observed evidence is equal to the probability of the evidence given the hypothesis, times the prior probability of the hypothesis divided by the probability of the evidence given any hypothesis. In the artfully succinct symbolism of mathematics, Bayes Theorem looks like this in which p( ) is the probability, H the hypothesis, E the evidence:

p(H) is the prior probability that H is correct before taking into account the current evidence E.

p(E | H) is the conditional probability of seeing the evidence E given that the hypothesis H is true, often called the likelihood.

p(E) is the marginal probability of the evidence, how likely this particular evidence is without respect to the current hypothesis or under the condition of any possible hypothesis.

p(H | E) is the posterior probability, the result. It provides the probability that the hypothesis is true given the evidence and the previous belief in the hypothesis.

Are all swans white? Instead of the analysis worked out earlier here is how Bayes rule might be used to provide a back of the envelope estimate. Let’s assume we have no idea if there is a biological law that dictates all swans must be white, so the prior possibility of the hypothesis is indifferent – 50/50, maybe, maybe not. We assign p(E) at 60% and the p(E|H) at 90%. Although this is a toy problem let’s pause a moment to justify the numbers. The time is pre-1697 and we consider that although only white swans have been reported much of the world has yet to be thoroughly explored. We also notice that many species of birds are found to sport a variety of colors. Together these considerations lead us to assign the probability of our current observations, which consists of only white swans, at p(E) = 60%. On the other hand if the hypothesis that all swans are white is in fact true then the probability of the observed evidence rises closer to certainty, allowing for the occasional radical mutation in nature we consider the p(E|H) to be 90%. The result of cranking the mathematical machinery is that it is reasonable to hold the hypothesis that all swans are white at having a 75% chance of being true given our current evidence: (.9 * .5) / .6 = .75.

The power of this simple equation is in how its terms are able to deal with some of the most meaningful and important aspects of human experience. It deals with knowledge and how it affects the beliefs we hold. Beliefs in turn affect the goals we pursue, the behaviors we engage in and the attitudes we bring to every social exchange. Desire and emotion work together with reason to determine which goals we will choose, but once we begin planning how to achieve these goals it is our beliefs that guide us every step of the way.

### Tool for the challenges of the 21^{st} century

“Those that can make you believe absurdities can make you commit atrocities. … Men will cease to commit atrocities only then they cease to believe absurdities.” – Voltaire

In making explicit the process by which beliefs are revised due to new information the Bayesian equation provides a prophylactic for the information age. An intellectual mine-field has been exposed in the material of the internet where any belief, however fanciful and divorced from reality, can find its salesmen. On the other hand it has also made more of the best of human knowledge available than at any time in history. Using this ever growing collection of information well has become a critical skill for the 21^{st} century, both for individuals and societies. It is all too easy to use the immediate access to information provided by our electronic devices to restrict ourselves to only those which support our prior beliefs. Education consists of just the opposite behavior. An honest desire to learn true things involves bringing one’s current beliefs into contact with alternatives to see how they fair. This is the process that brings nuance to understanding, clarifying the strengths and limitations of our chosen justifications. The hope is that in the end this often difficult and disorienting process will lead to that ambiguous Holy Grail, wisdom. Bayes rule models this process by illustrating the interaction of information and the beliefs that both are and are not part of one’s prior pre-conceptions.

Foreshadowing some of the later discussion, people often express surprise when a scientist insists that there are no certainties in science. Among scientific laws and theories some are extremely probable, to the point of practical certainty, but it is the nature of science not to cross that final boundary and claim absolute certainty. By accepting the inherent limitations of human reason and logic, science only makes probabilistic statements. One accepts plausibility in place of certainty. Conversion to a probabalist position is in part learning to think scientifically.

By focusing our attention on the nature of inductive logic the Bayesian position removes from rational communications the assertions of absolute certainty. As a society few would disagree our democracy would be strengthened if more public discourse concerned itself with assessing the probabilities involved in our challenges and risks and spent less time shouting absolutes at one another. The probabilist’s way of thinking includes an element of intellectual humility, honesty and integrity that accepts the human condition. It is an attitude capable of dealing with the threats of fundamentalisms of every stripe. It roots out the last refuge of scoundrels wherever they may hide. The episode ‘Knowledge or Certainty’ in polish philosopher and mathematician Jacob Bronowsky’s film *The Ascent of Man* concludes with him on the grounds of the Nazi concentration camp of Auschwitz. Before the pond in which the ashes of four million human beings were scattered, including his relatives, he claims that there are two things our species must always be on guard against. The first is the noxious idea that the end can justify the means. The second is our temptation to claim “monstrous certainty.” As he so eloquently summed up the task at hand, “We have to cure ourselves of the itch for absolute certainty and power.” I submit that this will become a critical skill for the 21^{st} century if we are to avoid the carnage of the 20^{th}.

It is time to revisit the little cognitive science experiment which began the chapter. Below are the same propositions but this time accompanied by evidence to be considered. Observe how the additional evidence alters how you rank each proposition. With god-like certainty banished the new ranking scale runs from 0.01 for most almost certainly false to 9.99 for almost certainly true.

- There is intelligent life in the universe beyond the confines of planet earth.
- The search for extra-terrestrial intelligence (SETI) has not found evidence of technological societies anywhere in 200 years of searching space for their signatures.

- There is intelligent extra-terrestrial life and it has visited the planet earth.
- The New York Times has reported an alien landing at the White House.
- The supermarket tabloid Weekly World News has reported an alien landing at the White House.

- Psychics are able to communicate with the dead.
- Dr. Mysterious was able to convince a widow in the audience he was communicating with her deceased husband because he knew how much they enjoyed sharing weekend breakfasts together.

- The Sun is approximately 4.57 billion years old.
- The radiometric date of the oldest solar system material is 4.567 billion years.

- There is as much energy released in the few instants of a super-nova as our Sun will produce over its 10 billion year lifetime.
- The energy is so large that they’ve coined a unit to deal with it, the Foe. One Foe=10^51 ergs, thus, the name, Fifty One Ergs. The core collapse generates 100 Foe, about one of which is captured by the star (and which destroys the star.) The other 99 Foe escape as neutrinos. To give you an idea of the scale of 100 Foe — the entire output of our sun, from fusion ignition to today, is about .03 Foe. The total output, from ignition to white dwarf, will be about .1 Foe. (http://ask.metafilter.com/22940/Tiger-tiger-burning-brightIn-the-forests-of-the-night accessed 2.20.2011)

- The world supply of oil has reached its peak production right about now, give or take a decade or two.
- M. King Hubbard in 1956 predicted that the production of oil in the United States would peak between 1965 and 1970. The peak was achieved in the early 1970s. Using the same model most predictions of world wide oil peak place it in 2020 or slightly later.

- Human consciousness is really experiencing a virtual reality; we exist in a Matrix-like computational world.
- In all the centuries of philosophy no single air-tight argument has been formulated to refute this idea, Sophilism.
- You are deeply in love with another human being.