Getting Wagenmakers wrong

EJ Wagenmakers et al published the first reply to the horribly flawed Feeling the Future paper by Daryl Bem. I’ve blogged about it more times than I care to count right now.

Their most important point was regarding the abuse of statistics. Or, as they put it, that Bem’s study was exploratory rather than confirmatory.
They also suggested a different statistical method as a remedy. I’ve expressed doubts about that because I don’t think that there is a non-abusable method.

Unfortunately ,what they proposed has been completely and thoroughly misunderstood. The latest misrepresentation appeared in an article by 3 skeptics in The Psychologist. I blogged.

How to get Wagenmakers right

The traditional method of evaluating a scientific claim or idea is Null-Hypothesis Significance Testing (NHST). This involves coming up with a mathematical prediction of what happens if the new idea is wrong. It’s not enough to say that people can’t see into the future, you must say what the results should look like if they can’t.
After the experiment is done, this prediction is used to work out how likely it was to get results such as those that one actually got. If it is unlikely then one concludes that the prediction was wrong. The null hypothesis is refuted. Something happens and this then is taken as evidence for the original idea.
There’s a number of things that can go wrong with this method. One is choosing that null prediction after the fact, based on whatever results you got.

The method that Wagenmakers argued for is different. It involves not only making a prediction about what  happens when the original idea is wrong. It also requires making a prediction about what happens if it is right.
Then, with the results of the experiment, one works out how likely the result was under either prediction. Finally, calculate how much more likely the result is under one hypothesis rather than the other. This last number is called the Bayes Factor.

For an example, imagine an ordinary 6-sided die but instead of being ordinarily labeled it has only the letters “A” and “B”. The die comes in 2 variations, one has 5 “A”s and 1 “B”, the other 1 “A” and 5 “B”s.
You roll a die once and get an “A”. This result is 5 times likely in the first variant.
You could use this result to speculate about what kind of die you rolled. But what if there is a third variant of die? One that has, say, 3 “A”s and 3 “B”s. Then your Bayes Factor would be different.

The Bayes Factor depends crucially on the two hypothesis being compared. Depending on which 2 hypothesis are being compared one can seem more likely or the other.

In the case of Feeling the Future, the question is basically what we should assume about what happens when something happens. How much feel for the future should we assume?
Wagenmakers et al said if one cannot assume anything for lack of information then one should use this default assumption as suggested by several statisticians. This assumption implied that people might be a little good at feeling the future or maybe very good.
Bem, along with two statisticians, countered that we already know that people are not good at feeling the future. Parapsychological abilities are always weak and therefor one should use a different assumption under which the strength of the evidence was very much confirmed.

Let’s make this intuitively clear. Think again of of die with 5 As or 5 Bs. You are told that one die was rolled 100 times and showed 30 As and 70 Bs. Clearly that is more likely to be a 5 B die than a 5 A die. But wait, what if, instead of comparing those 2 with each other we compare either of these with a die with 2 As and 4 Bs. That die would win.

I have simplified a lot here. If something doesn’t seem to make sense it’s probably because of that and not because of a problem in the original literature.

Bem’s argument makes a lot of sense but overlooks that belief in strong precognition is wide-spread, even among parapsychologists. Tiny effects are what they get but not what they hope for or believe in. Both parties have valid arguments for their assumptions but neither makes a compelling case. On the whole, however, it does show a problem with the default Bayesian t-test.

Let me emphasize again that Wagenmakers made two points. The first that Bem made mistakes in applying the statistics. And secondly that it would be better to use the default Bayesian t-test rather than the traditional NHST. These are separate issues.
In my opinion, the abuse of statistical methods is the crucial issue that cannot be solved by using a different method.

How to get Wagenmakers wrong

Bayesian statistics is often thought of as involving a prior probability. In fact, the defining characteristic of Bayesian statistics is that it includes prior knowledge.

Again let’s go with the example. You’re only concerned with the 2 die variants with the 5 “A”s and the 5 “B”s. Someone is throwing always the same die and telling you the result. You can’t see the die, of course, but are supposed to guess which die was thrown solely based on the result.
Intuitively, you’ll probably be tending more toward the first kind with every “A” and more toward the second type with every “B”.
But what if I told you that I randomly picked the die out of a box with 100 die of the 5 “A” variant and only 1 of the 5 “B” variant. You’ll start out assuming it should be the 5 “A” variant and will require a lot of “B”s before switching.
Formally, we’d compute the Bayes Factor from the data and then use that factor to update the prior probability to get the posterior probability. The clearer the data is, and the more data one has, the greater the shift in what we should hold to be the case.

In reality one will hardly ever know which of several competing hypotheses is more likely to be true. Different people will make their own guess. Some, maybe most, people will regard precognition as a virtual impossibility, a few as a virtual certainty.
Wagenmakers et al showed that even if one assumes a very low prior probability to the idea that people can feel the future (or rather the mathematical prediction based on that idea), 2,000 test subjects would yield enough data to shift opinion firmly towards precognition being true.

Unfortunately, some people completely misunderstood that. They thought that Wagenmakers et al were saying that we should not regard Bem’s data as convincing because they assigned a low prior probability. In truth the only assumption that went into the Bayes factor calculation was regarding the effect size. That point was strongly emphasized but still people miss it.