Cheapest Hostings: Names, damned names, and statistics

When I tell people that I'm a statistician, the usual response is a blank stare. Explaining that I work with statistics only makes matters worse. Those who have been exposed to statistics at university often blurt out "I had to take a statistics course -- and I hated it!" What they remember of the course is mostly that it was boring and there were a lot of formulas. Those who have had no formal exposure to statistics seem to think it might have to do with collecting and tabulating figures, like sports statistics or national economic figures. This isn't completely off the mark, but by itself it's a poor description of what statisticians do.

And about half the time, mention of "statistics" elicits the helpful response: "There are lies, damned lies, and statistics!" (Often attributed to Mark Twain, but apparently originally from Benjamin Disraeli.) Something of a variant on this is the claim that "You can prove anything with statistics!"

Clearly there are several issues at play here, including minimal public knowledge of what the field of statistics is about, poorly taught statistics courses, and prejudices about empirical reasoning. Statisticians must accept a good part of the blame for each of these. (John Nelder writes [1] that "Almost nobody knows what statisticians do, and we in turn have been remarkably ineffective in explaining to non-statisticians what we are good at.") But part of the problem is the word "statistics" and its difficult-to-pronounce-and-spell sibling "statistician".

A statistic is a function of a set of observations, for example the total, the average value, the maximum value, or what have you. Governments have always wanted to keep track of information about the state (like births and deaths, imports and exports, agricultural production, etc.), which is where the word statistic comes from.

"Statistics" means more than one statistic, but confusingly it also refers to the study of how to draw conclusions from observations. A more formal term for this is inductive inference, to be contrasted with deductive inference. Deductive inference (or simply deduction) is classical logic: when the premises are true and the argument is valid, the conclusion must be true. If all swans are white, and Tom is a swan, then Tom is white. Inductive inference (or induction) is not so simple. Suppose we observe 100 hundred swans and they are all white. We might conclude that all swans are white. But this conclusion might be incorrect. (Apparently there are black swans, by the way.) Uncertainty is inevitable: for example in political polling, the stock phrase is that the results are accurate to within plus or minus 3%, 19 times out of 20. Uncertainty is inevitable because of the variability that we find everywhere: political opinions vary, height and weight differ, some people are more susceptible to certain diseases than others (perhaps due to differences in genetics, among other things). When we try to measure something accurately several times, we get slightly different answers. This is sometimes called measurement error, or noise, but in a sense it's just another source of variability. Probability theory lets us describe variability. For example, if we toss a fair coin 4 times, the probability of getting 4 heads is one sixteenth. But statistical inference uses probability theory to deal with the inverse problem: if we toss a coin 4 times and it comes up heads each time, can we conclude that it's not a fair coin?

Given that statistics has such broad relevance, it's a shame that it has been saddled with such a poor name. If "a rose by any other name would smell as sweet", I'm hoping that statistics by another name will smell sweeter!

Bill Cleveland suggests the name data science. John Nelder suggests "statistical science" [1]. And a friend of mine suggests, tongue-in-cheek, that statisticians could be called "noise-busters".

Of the above suggestions, my preference would be "statistical science", so that a statistician would be a "statistical scientist". But maybe there's a better name out there somewhere ...

[1] Nelder J.A. From statistics to statistical science. The Statistician. Vol. 48, No. 2 (1999), 257-269.

Cheapest Hostings

Monday, 2 January 2006

Names, damned names, and statistics

No comments:

Post a Comment