A Quantum Understanding

This is the first part in my series on parapsychological double-slit experiments. However, this post contains just straightforward mainstream science. This tells you the minimum about quantum physics that you need to know to make sense of the parapsychological adaptation.

In a double-slit experiment, particles are shot at a barrier with two slits or holes in it. The particles are usually electrons or photons but large molecules, such as C70 bucky balls, have also been used.
Behind the barrier, there is a detector. In the case of photons, aka light, you can simply put a piece of cardboard there and see a special pattern projected onto it.
The interference pattern indicates that a wave is passing through the double-slits.


In a nutshell, when a wave crest encounters another wave crest, their heights will add. When a crest encounters a trough, they will cancel out. That is called interference. This can be seen when throwing two stones into a pond. It will look like this:


The simple and obvious explanation for what can be seen in the double-slit experiment is that there are waves coming from either slit and interfering, as can be seen in this diagram:

This is curious since we thought we were shooting particles at the slits. Particles shouldn’t show interference. Think of kicking a ball at two windows. Have you ever noticed an interference pattern in such a situation? Or in any?

So how can it be that a particle behaves like a wave? The answer is given by quantum mechanics.
Quantum means ‘amount’ and there’s quite a story behind ‘amount mechanics’ and how it was discovered and why it is so named. It involves Einstein but this is not the place to tell it.

For one, in order to get an interference pattern, the size of the slits needs to match the wavelength of the particles. Just in case you were wondering why kicking a ball at two windows does not produce interference.


We think of as particles having a definite state. They have a location and a momentum.
However, it turns out that location and momentum, and a couple other properties, can’t be known with arbitrary precision. Not because of some technological limitation but because of the very laws of physics.
Going even further, it is so that the very state of a particle, or even many particles together, can only be known in terms of probabilities. A quantum mechanical description of something is known as a wavefunction and allows you to predict with what probability you will observe a certain outcome.
As the name wavefunction implies, these probabilities behave mathematically like waves.

When you shoot a particle at the double-slits, it might go through either slit. There is a probability of finding it at or behind either slit. The probabilities of finding it in some place spread from both slits like waves. The probability waves from both slit interfering is what causes the interference pattern.

Note that this pattern only emerges when you have a sufficient number of particles. One particle leaves behind only one blip on the screen behind the double slits. But even when you shoot one particle at a time, the pattern will eventually emerge. Where it is more likely to find a particle, more particles will be found.

Now we have the basics down. This is how double-slit experiment works.


Now we need to ask, what happens if we try to determine through which slit the particle went?
The simple answer is that if the particle has to have come from either slit, then the probability wave only spreads from that one slit. And that means no interference.

Whenever such ‘which-way information’ exists then there is no interference pattern. In fact, how much contrast there is in the pattern depends on how much information there is. (see Wootters and Zurek 1979)

In an experiment with C70 bucky balls, these molecules were heated as they went through the slits. They were made so hot that they glowed, they gave off thermal photons. These photons carried which-way information from the bucky balls to the environment. The hotter they were, the more photons they gave off and the less pronounced the interference pattern was.
Here you can see the interference patterns obtained when different intensities of heating were applied (given in Watts).

Collisions with air molecules play the same role. The higher the pressure is, the less pronounced the interference pattern.

Warning! Philosophy ahead!

Now we must head into the somewhat murkier waters of philosophy.

In quantum mechanics everything is all about probabilities. And as the interference pattern shows, in some way these probabilities are real. If they weren’t real, the probability waves could hardly interfere.
And yet, on the screen we get a single blip for each particle. Not a ‘maybe here, maybe there’.
At first, you have a wavefunction that goes through both slits, and then, on the detector, you have a single definite location (within in the limits of the uncertainty principle).
That is known as the measurement problem.
Obviously, interactions with the environment play some role to explain this. We have seen that in the experiment with the bucky balls. Such processes cause so-called decoherence which causes interference phenomena to be suppressed in everyday situations.

However, this, most say, cannot be the entire solution. Even if there is no interference, the particle is still described only in terms of probabilities, as a wavefunction. So why do we perceive a seemingly definite outcome?

One answer to this is taking the math at face value. The wavefunction is all there is and all that matters. In that view, all possibilities are equally real. You perceiving the blip on one end of the screen or on the other, both happens. In a way, the universe splits up into different versions for each outcome. Of course, there is really just one wavefunction describing it all, so that’s more what it appears to us than reality.

This view is known as the Many Worlds Interpretation (MWI).

Another answer is that the wavefunction collapses on measurement. That means that when we perceive a definite outcome, then this outcome is really all there is. Something happened to reduce the probabilities down to one actuality.
That means that another physical process must be assumed. And it is absolutely unknown when or why or how it happens. Not to mention if.

That view is known as the Copenhagen Interpretation.

So far this is just philosophy. Both these views are interpretations of the same experiments and the same theories. The math and the predictions stay the same, whether you think that all possibilities are true or just the one you experience. There are many variants of these views, especially of the Copenhagen Interpretation.

One particular variant of the Copenhagen Interpretation holds that the collapse takes place when consciousness gets involved. This view was held by some big names like Von Neumann or Wigner but is decidedly a minority opinion now. In my opinion that’s because no one has figured out a way in which a conscious observer should be different from any old measurement device. But I’ll just leave it at pointing out that there is no evidence for this view, just like there is no evidence that collapse is real or not.

It has also been argued that this view is incompatible with experimental evidence. Though I think most physicist would rather regard the view as being unfalsifiable philosophy. Eventually all experiments we know of came into the awareness of at least one conscious being (That’s you, my dear reader. Not humble me.).

Put that way, consciousness causes collapse is awfully close to solipsism.


Getting Wagenmakers wrong

EJ Wagenmakers et al published the first reply to the horribly flawed Feeling the Future paper by Daryl Bem. I’ve blogged about it more times than I care to count right now.

Their most important point was regarding the abuse of statistics. Or, as they put it, that Bem’s study was exploratory rather than confirmatory.
They also suggested a different statistical method as a remedy. I’ve expressed doubts about that because I don’t think that there is a non-abusable method.

Unfortunately ,what they proposed has been completely and thoroughly misunderstood. The latest misrepresentation appeared in an article by 3 skeptics in The Psychologist. I blogged.

How to get Wagenmakers right

The traditional method of evaluating a scientific claim or idea is Null-Hypothesis Significance Testing (NHST). This involves coming up with a mathematical prediction of what happens if the new idea is wrong. It’s not enough to say that people can’t see into the future, you must say what the results should look like if they can’t.
After the experiment is done, this prediction is used to work out how likely it was to get results such as those that one actually got. If it is unlikely then one concludes that the prediction was wrong. The null hypothesis is refuted. Something happens and this then is taken as evidence for the original idea.
There’s a number of things that can go wrong with this method. One is choosing that null prediction after the fact, based on whatever results you got.

The method that Wagenmakers argued for is different. It involves not only making a prediction about what  happens when the original idea is wrong. It also requires making a prediction about what happens if it is right.
Then, with the results of the experiment, one works out how likely the result was under either prediction. Finally, calculate how much more likely the result is under one hypothesis rather than the other. This last number is called the Bayes Factor.

For an example, imagine an ordinary 6-sided die but instead of being ordinarily labeled it has only the letters “A” and “B”. The die comes in 2 variations, one has 5 “A”s and 1 “B”, the other 1 “A” and 5 “B”s.
You roll a die once and get an “A”. This result is 5 times likely in the first variant.
You could use this result to speculate about what kind of die you rolled. But what if there is a third variant of die? One that has, say, 3 “A”s and 3 “B”s. Then your Bayes Factor would be different.

The Bayes Factor depends crucially on the two hypothesis being compared. Depending on which 2 hypothesis are being compared one can seem more likely or the other.

In the case of Feeling the Future, the question is basically what we should assume about what happens when something happens. How much feel for the future should we assume?
Wagenmakers et al said if one cannot assume anything for lack of information then one should use this default assumption as suggested by several statisticians. This assumption implied that people might be a little good at feeling the future or maybe very good.
Bem, along with two statisticians, countered that we already know that people are not good at feeling the future. Parapsychological abilities are always weak and therefor one should use a different assumption under which the strength of the evidence was very much confirmed.

Let’s make this intuitively clear. Think again of of die with 5 As or 5 Bs. You are told that one die was rolled 100 times and showed 30 As and 70 Bs. Clearly that is more likely to be a 5 B die than a 5 A die. But wait, what if, instead of comparing those 2 with each other we compare either of these with a die with 2 As and 4 Bs. That die would win.

I have simplified a lot here. If something doesn’t seem to make sense it’s probably because of that and not because of a problem in the original literature.

Bem’s argument makes a lot of sense but overlooks that belief in strong precognition is wide-spread, even among parapsychologists. Tiny effects are what they get but not what they hope for or believe in. Both parties have valid arguments for their assumptions but neither makes a compelling case. On the whole, however, it does show a problem with the default Bayesian t-test.

Let me emphasize again that Wagenmakers made two points. The first that Bem made mistakes in applying the statistics. And secondly that it would be better to use the default Bayesian t-test rather than the traditional NHST. These are separate issues.
In my opinion, the abuse of statistical methods is the crucial issue that cannot be solved by using a different method.

How to get Wagenmakers wrong

Bayesian statistics is often thought of as involving a prior probability. In fact, the defining characteristic of Bayesian statistics is that it includes prior knowledge.

Again let’s go with the example. You’re only concerned with the 2 die variants with the 5 “A”s and the 5 “B”s. Someone is throwing always the same die and telling you the result. You can’t see the die, of course, but are supposed to guess which die was thrown solely based on the result.
Intuitively, you’ll probably be tending more toward the first kind with every “A” and more toward the second type with every “B”.
But what if I told you that I randomly picked the die out of a box with 100 die of the 5 “A” variant and only 1 of the 5 “B” variant. You’ll start out assuming it should be the 5 “A” variant and will require a lot of “B”s before switching.
Formally, we’d compute the Bayes Factor from the data and then use that factor to update the prior probability to get the posterior probability. The clearer the data is, and the more data one has, the greater the shift in what we should hold to be the case.

In reality one will hardly ever know which of several competing hypotheses is more likely to be true. Different people will make their own guess. Some, maybe most, people will regard precognition as a virtual impossibility, a few as a virtual certainty.
Wagenmakers et al showed that even if one assumes a very low prior probability to the idea that people can feel the future (or rather the mathematical prediction based on that idea), 2,000 test subjects would yield enough data to shift opinion firmly towards precognition being true.

Unfortunately, some people completely misunderstood that. They thought that Wagenmakers et al were saying that we should not regard Bem’s data as convincing because they assigned a low prior probability. In truth the only assumption that went into the Bayes factor calculation was regarding the effect size. That point was strongly emphasized but still people miss it.

[sound of sighing]

The May issue of The Psychologist carries an article by Stuart Ritchie, Richard Wiseman and Chris French titled Replication, Replication, Replication plus some reactions to it. The Psychologist is the official monthly publication of The British Psychological Society. And the article is, of course, about the problems the 3 skeptics had in getting their failed replications published.

Yes, replication is important

That the importance of replications receives attention is good, of course. Depositories for failed experiments are important and have the potential to aid the scientific enterprise.

What is sad, however, is that the importance of proper methodology is largely overlooked. Even the 3 skeptics who should know all about the dangers of data-dredging cavalierly dismiss the issue with these words:

While many of these methodological problems are worrying, we don’t think any of them completely undermine what appears to be an impressive dataset.

But replication is still not the answer

I have written about how replication cannot be the whole answer before. In a nutshell, by cunning abuse of statistical methods it is possible to give any mundane and boring result the impression of showing some amazing, unheard of effect. That takes hardly any extra work but experimentally debunking the supposed effect is a huge effort. It takes more searching to be sure that something is not there than to simply find it. For statistical reasons, an experiment needs more subjects to “prove” the absence of an effect with the same confidence as finding it.
But there’s also that there might be some difference between the original experiment and the replication that explain the lack of effect. In this case it was claimed that maybe the 3 failed because they did not believe in the effect. It takes just seconds to make such a claim. Disproving it requires finding a “believer” who will again run an experiment with more subjects that the original.

Quoth the 3 skeptics:

Most obviously, we have only attempted to replicate one of Bem’s nine experiments; much work is yet to be done.

It should be blindingly obvious that science just can’t work like that.

There are a few voices that take a more sensible approach. Daniel Bor writes a little of how neuroimaging which has, or had, extreme problems with useless statistics might improve by foster greater expertise among the practitioners. Neuroimaging seems to have made methodological improvements. What social psychology needs is a drink of the same cup.

The difficulty of publishing and the crying of rivers

On the whole, I find the article by the 3 skeptics to be little more than a whine about how difficult it is to get published, hardly an unusual experience. The first journal refused because they don’t publish replications.
Top journals are supposed to make sure that the results they publish are worthwhile. Showing that people can see into the future is amazing, not being able to show that is not. Back in the day it was simply so that there was only a limited number of pages that could be stuffed into an issue, these days, with online publishing, there’s still the limited attention of readers.
The second journal refused to publish because one of the peer-reviewers, who happened to be Daryl Bem, requested further experiments to be done. That’s a perfectly normal thing and it’s also normal that researchers should be annoyed by what they see as a frivolous request.
In this case, one more experiment should have made sure that the failure to replicate wasn’t due to the beliefs of the experimenters. The original results published by Bem were almost certainly not due to chance. Looking for a reason for the different results is good science.

I’ve given a simple explanation for the obvious reason here. If the 3 skeptics are unwilling or unable to actually give such an explanation they are hardly in a position to complain.

Beware the literature

As a general rule, failed experiments have a harder time to get published than successful ones. That’s something of a problem because it means that information about what doesn’t work is lost to the larger community. When there is an interesting result in the older literature that seems not to have been followed up on then it probably is the case that it didn’t work after all. The original report was a fluke and the “debunking” was never much published. Of course, one can’t be sure if it was not maybe overlooked, which is a problem.
One must be aware that the scientific literature is not a complete record of all available scientific information. Failures will mostly live on in the memory of professors and will still be available to their ‘apprentices’ but it would be much more desirable if the information could be made available to all. With the internet, this possibility now exists and that discussion about such means is probably the most valuable result of the Bem affair so far.

About another skeptical award

A few weeks ago, I blogged about Daryl Bem being awarded a Pigasus by James Randi.

Today, I am going to tell you about another such negative award. This one is called ‘Das goldene Brett’ and is awarded by the Austrian Society for critical thinking (Gesellschaft für kritisches Denken). This society is the Vienna chapter of the GWUP which is the German language equivalent of the CSI.

“Das goldene Brett” means “the golden board. In German saying that someone has ‘a board before his head’ (ein Brett vor’m Kopf) means that he or she is an idiot. Someone who obviously can’t see and is unable to work out why.

Perhaps this recalls the bible Matthew 7:3
And why do you look at the splinter in your brother’s eye, and not notice the beam which is in your own eye?
But enough about that quaint and unwieldy language.


Is it a bird? Is it a plane? No! It’s a food buffet!

But why, my dear reader, would I bother you with the local affairs of an obscure mountain province?
The reason is that one the three prize winners 2011 has managed to make the international news. Just the skeptical news but still.

That winner was a P.A. Straubinger who directed the movie In the Beginning There Was Light. That movie promoted Breatharianism which is the belief that eating (or drinking) is not necessary for survival. People can survive on (sun-)light alone. It boggles the mind that people could possibly belief such a thing. Yet, not only do people belief that, a select few of them died trying to do it.

When news of the movie was heard among the local skeptics they immediately saw the danger. The publicity would inspire further copy-cats and among them, deaths.
Precisely that has happened now. A Swiss woman was found dead by starvation. (English report) Her journey into death started with Straubinger’s movie.

This leaves us with many open questions.
How much blame should we assign the propagandists? Or was it just the dead woman’s free choice?
Was she open-minded or gullible? Was she gullible or mentally ill?

What should skeptics learn from such a case? What should be done to protect the vulnerable from dangerous nonsense?

How about counter-arguments? There is some extensive debunking of the supposed “evidence” in the film available on German skeptic blogs. But it seems unlikely that one can reach the vulnerable with information, otherwise they were not vulnerable. Everyone already knows that one can’t survive without nourishment. If someone is willing to dismiss such an everyday fact as merely a ‘materialistic belief’ then any further details must fall on deaf ears.
Even worse, a nuanced reply might even be seen as confirmatory. A scientific person will not ever rule out anything as impossible. Nothing can be known with such certainty. Distinguishing between the practically impossible and the literally impossible is a fine point that is rarely made in daily life. A scientist acknowledging the fundamental, philosophical limits of our knowledge may be heard as endorsing a practical possibility where none exists.

What about ridicule and a clear word? Some warn that that will just push away believers but I wonder if it might not still be the best method. I don’t know what truly motivates people to believe in the clearly untrue but if it is largely driven by emotion then emotional appeals must be made to reach them.
even if many skeptics will disagree with such methods on principle. In truth, it seems dishonest to me to seek to convince others with emotional, rationally invalid, rhetorics. But if there are lives at stake maybe I should swallow my distaste?
It seems plausible that ridicule will not reach the entrenched and only push them away but maybe it is a good method to reach the broad mass of people. A more open approach is surely needed to reach the truly vulnerable, the ‘spiritual’.

Should such nonsense be banned? In Germany Holocaust denial is illegal. And yet science denial is not, even when the danger is clear and present. It seems impossible to get a legislature to ban certain kinds of speech based on objective danger rather than offense taken.
However, I do not see a clear conflict with the principle of free speech. Hardly anyone would seriously say that an add offering money for the death of someone, that is an adds seeking a contract killer should be legal. Such speech is aimed solely at getting someone killed, that is denying someone a right even more important than the right to free speech, the right to live. There is no ethical duty to tolerate speech that will get people killed.

Shut up and ignore? Straubinger has actually thanked skeptic for the attention they paid to the movie and the extra publicity that gave. That raises a worrying specter. May skeptics share part of the responsibility for the breatharian deaths? Many people have a reflexive sort of sympathy for the underdog especially when that underdog is an enemy of an enemy. When skeptics denounce a dangerous fringe idea, does that maybe drive some people into accepting it?

Is Replication the Answer?

One question that is forced on us by the publication of papers like Daryl Bem’s Feeling the Future is what went wrong and how it can be fixed.

One demand that often arises is for replication. It is one of the standard demands made by interested skeptics in forums and such places. I can understand why calling for replication is seductive.
It is shrewd and skeptical. It says: Not so fast, let’s be sure first while at the same time offering a highly technical criticism. Replication is technical jargon, don’t you know?. On the other hand it’s also nice and open-minded. It says: This is totally serious science and some people who aren’t me should spend a lot of time on it.
And perhaps most important of all, it requires not a moments thought.

Cynicism aside, replication really is important. As long as a result is not replicated it is very likely wrong. If you don’t replicate you’re not really generating knowledge. Not only can you not rely on the results, you also lose the ability to determine if you are using good methods or are applying them correctly. Which I’d speculate will decrease reliability still further over time.

Replication is essential but is replication really all that is needed?

Put yourself in the shoes of a scientist. You have just run an experiment and found absolutely no evidence that people can see the future.  That’s going to be tough to publish.
Journals are sometimes criticized for being biased against negative results but the simple fact is that they are biased against uninteresting results. Attention is a limited quantity; there’s only so much time in a day that can be spent reading. Most ideas don’t work out and so it is hardly news when an idea fails in experiment. Think for an example of all the chemicals that are not drugs of any kind.

Before computers and the information age it probably wouldn’t even have been possible to handle all the information about failed ideas. Things have changed now but the scientific community is still struggling to incorporate these new possibilities. However, one still can’t expect real life humans to pay attention to evidence of the completely expected.

Now you could try a new idea and hope that you have more luck with that.
Or you could do what Bem did and work some statistical magic on the data. And by magic I mean sleight of hand. The additional work required is much less and it is almost certain to work.
The question is simply if you want to advance science and humanity or your career and yourself.

If you go the 2nd route, the Bem route, your result will almost certainly fail to replicate.

So you might say that replication, if it is attempted solves the problem. Until then you have a confused public by premature press reports, perhaps bad policy decisions, and certainly a lot of time wasted trying to replicate the effect. Establishing that an effect is not there always takes more effort than just demonstrating it.

To this one might say that the nature of science is just so, tentative and self-correcting. Meanwhile the original data magician, our Bem-alike, has produced a publication in a respectable journal, which indicates quality work, and received numerous citations (in the form of failed replications), which indicates that the paper was fruitful and stimulated further research. These factors, number of publications, reputation of journal and number of citations are usually used to judge the quality of work by a scientist in some objective way.

Eventually, if replication is all the answer needed, one should expect science to devolve into producing seemingly, amazing results that are then slowly disproven by subsequent failed replications. Any of that progress we have come to expect would be merely an accidental byproduct.

The problem might be said to lie rather in judging scientists in such a way. Maybe we should include the replicability of results in such judgments. But now we’re no longer talking about replication as the sole answer. We’re now talking about penalizing bad research.

And that’s the point. Science only works if people play by the rules. Those who won’t or can’t must be dealt with somehow. In the extreme case that means labeling them crackpots and ostracizing them.
But there’s less extreme examples.

The case of the faster than light neutrinos

You probably have heard that some scientists recently announced that they had measured neutrinos to go faster than light. This turned out to be due to a faulty cable.

This story is currently a favorite of skeptics who pointed out that few physicists took the result seriously, despite the fact that it was originally claimed that all technical issues had been ruled.. It makes a good cautionary tale about how implausible results should be handled and why. Human error is just always possible and plausible.

There’s another chapter to this story, one that I fear will not get much attention.

The leaders of the experiment were forced to resign as a consequence of the affair.

There were very many scientists involved in the experiment due to the sheer size of the experimental apparatus. Among them, there was much discontentment about how the results were handled. Some said that they should have run more tests, including the test that found the fault, before publishing. Which means, of course, that they shouldn’t have published at all.

It is easy to see how a publish-or-perish environment that puts a premium on exciting results encourages not looking too closely for faults. But what’s the alternative? No incentive to publish equals no incentive to work. No incentive for exciting results just cements the status quo and hinders progress.

A Pigasus for Daryl Bem

Every year on April Fools day, James Randi hands out the Pigasus Award. Here is the announcement for the 2011 awards, delivered on April 1 2012.

One award went to Daryl Bem for “his shoddy research that has been discredited on many accounts by prominent critics, such as Drs. Richard Weisman, Steven Novella, and Chris French.”

I’ve called this well deserved but there’s certainly much that can be quibbled about. For example, these critics are hardly those who delivered the hardest hitting critiques. Far more deserving of honorable mention are Wagenmakers, Francis and Simmons (and their respective co-authors) for their contribution of peer reviewed papers that tackle the problem.

A point actually concerning the award is whether it is fair to single out Bem for a type of misconduct that may be very wide-spread in psychological research. Let’s be clear on this, his methods are not just “strange” or “shoddy” as Randi kindly puts it, they border on the fraudulent. Someone else, in a different field, might have found themselves in serious trouble with a paper like this. Though I think it very hard to get such a paper past peer review in a more math savvy discipline.
But even if you think it is just a highly visible example of normal bad practice, surely it is appropriate to use the high visibility to bring attention to it. Numerous people have done exactly that. Either using it to argue for different statistical techniques or to draw attention for the lack of necessary replication in psychology.

I doubt that Randi calling this out will do much good since I doubt that many psychologists will even notice. And even if, I doubt that it will cause them to rethink their current (mal)practice. There’s a good chance that Bem will be awarded an IgNobel prize later this year. That probably gets more attention but even so…


The reactions from believers have been completely predictable. They have so far ignored the criticisms of the methods and so they ignore that Randi explicitly justifies the award with the “strange methods”. They simply pretend that any doubt or criticism is the result of utter dogmatism.

Sadly, some skeptical individuals have also voiced disappointment, for example Stuart Ritchie on his Twitter feed. Should I ever come across a justification for such reactions I will report and comment.

Why doesn’t experiment 9 replicate?

I have written about Daryl Bem’s paper “Feeling the Future” before and laid out a few of the serious issues that invalidate it.

Recently it’s been in the news again because one of the nine experiments presented in it, experiment 9, was repeated and failed to yield a positive result. Of course, no one was particularly surprised by this, except perhaps the usual die hard believers. Still, some may wonder where the positive result came from in the first place. Just chance or something more?

Before we can look at the actual research we need to look at the dangers of pattern seeking…

Patterns are for kilts

Image of group of 9 people

Let’s do a little game. We pick a few people in this image and then we try to find some way to split those nine people into two groups in such a way that most of our picks end up in one group.
For example let’s take the 1st from the left in the first row and the 2nd in the bottom row.
Answer: Males vs. Females.

Again: We take the 2nd in the top row and the 2nd and 4th in the bottom row.
Possible answer: People with and without sunglasses.
It doesn’t work perfectly but mostly.

If you’re creative you can find a more or less good solution for any possible combination of picks. That’s the first take-away point.

Now let’s add a bit of back story and extend our game. The group went to a casino and some of them won big and those are the people we point out.
The goal of the game is now not only to find a good grouping but also to make up some story for why the one group had most of the winners.

For example: The sunglasses are a lucky charm and that’s why the group with glasses did better.
That’s alright, but lucky charm is kind of lame.
How about: Hiding the eyes helps bluffing in poker. Much better…
But wait, correlation does not equal causation as statisticians never tire of telling us. Pro-players like to wear sunglasses, as everyone knows, and that’s why that group did better.

So if you’re creative you can even find some semi-plausible explanation for why a group did better than another.
And when the explanation need not even be semi-plausible then you can always find one without any creativity. Lucky charm, magic or divine favor fits any case. That’s the second take home point.

You can always find some sort of pattern in any set of random data. For example, shapes in clouds. Random means that you rarely find the same pattern again.

For one final encore, let’s make up for each person in that picture how much money they won or lost in the casino. Say top left: Lost $145; top 2nd from left: Won $78; And so on…
An answer might be skin bared in square cm, or height in inches and so on.

Experiment 9

Experiment 9 is derived from a simple psychological experiment that could run something like this:
Step 1
Ask a subject to remember a list of words. The words are flashed one at a time on a computer screen for 3 seconds each.
Step 2
Then randomly select some words for the subject to practice. The selected words appear on the screen again and the subject types them. Of course, the subject can’t make notes.
Step 3
The subject is asked to recall the words.

The result is, unsurprisingly, that more of the practiced words are recalled.
Bem switched steps 2 and 3. That is the words are practiced after they were recalled. You wouldn’t expect that what one does after the fact makes a difference but Bem claimed that the experiment was a success.

If you are new to parapsychology you would probably assume that this means that more practice words were recalled. In fact, Bem does not tell us that. We don’t know if that was the case but the omission is telling.
Bem constructs what he calls a “differential recall index” for each subject. You compute this by first subtracting the number of control words (words that did get practiced) recalled from the number of practice words. Then you multiply this by the number of words recalled in total. This is then turned into a percentage but I’ll omit that in the examples.

So if subject 1 recalls 39 words in total and 20 of these are practiced later then the index is 1*39.
And if subject 2 recalls only 18 words and 8 are practiced then the index is (-2)*18= -36.

You can already guess where this is going. The justification that Bem gives for this manipulation is:
Unlike in a traditional experiment in which all participants contribute the same fixed
number of trials, in the recall test each word the participant recalls constitutes a trial and is
scored as either a practice word or a control word.

This is just massive nonsense. As we have seen, not every recalled word is equal. Those words that come from participants who recalled many count heavier. The function of the index runs counter to the stated purpose.
Let’s combine the examples above. Subject 1 recalled one more practice word but subject 2 recalled two fewer. This indicates that practicing after the fact does not work, although in an actual experiment 2 subjects would be too little to state anything with confidence.
But now look at the combined index: 39 – 36 = +3. This indicates success. Obviously the index misleads here.

That the reviewers let this through is certainly a screw-up on their part. There’s no sugar-coating it.

Charitably one might assume that Bem also made a mistake and just through luck got a significant result. However, that is unlikely.
The evidence, namely the advice he gives on writing articles as well as his handling of the other experiments, indicates that the index was created to force a positive result.
Still, that not necessarily implies ill intent. He may have played around with a statistics program until he got results he liked without ever realizing that this is scientifically worthless. In fact, objectively this is scientific misconduct.
Unfortunately, Bem displays an awareness of the inappropriateness of such methods.

The fact that the actual result of the experiment is not reported by Bem but only the flawed and potentially misleading Differential Recall index makes me conclude that the experiment was probably a failure. There was simply a random association between high recall and favorable outcome on which the DR index capitalizes.
By random chance such a pattern may arise again but only rarely, hence the failure to replicate.

Conceptual vs. Close Replication

Believers often insist that Bem has only replicated previous work. The implication being that these experiments are replicable. But when they say replication they mean a so-called “conceptual replication”. By that they mean experiments in general that purport to show retroactive effects, that is the present affecting the past. Of course, when one makes up a whole new experiment one can simply use the now familiar tricks to force a positive result.
A close replication actually repeats the experiment and is therefore bound to the same method of analysis. Only a close replication is a real replication.

Back from hiatus

As you can see I took a time out from this blog for half a year now and never delivered the promised Ganzfeld series. It’s tedious and unrewarding work and I simply had better things to do. Hopefully I’ll bring things home in the next couple of months. Even though I have no idea who really cares I feel a sense of duty to finish what I started.

Randi’s Prize
What I won’t finish is the chapter by chapter review of Randi’s Prize by Robert McLuhan. It simply doesn’t work. He cites a lot of research and it is really this research that should be addressed rather than McLuhan’s take on it. The basic errors he himself makes are already pointed out in the reviews of the first few chapters.

Next up will be my take on the current hoopla about the failed replication of one of Bem’s experiments. Stay tuned.

Randi’s Prize: Answering Chapter 4

Chapter Four:  Uncertain Science

This chapter deals with parapsychological experiments in general, rather than mediumship specifically, as the previous chapter. We are run quickly past a number of claims and rebuttals without dealing with any in detail.


Of the problems the most hard-hitting is probably the fact that experiments are not repeatable but unfortunately this huge problem does not get discussed but rather ignored. Perhaps McLuhan simply chose to believe assertions to the contrary?

He mentions the card-guessing experiments of JB Rhine, conducted in the 1930s, and tells us of Hubert Pearce, a theology student, who could consistently and repeatably demonstrate his ESP by scoring, on average, 33% hits where 20% was expected.

He also tells us of the Ganzfeld experiments, conducted in the 1980s onward, where, on average, people score 33% instead of 25%. The Ganzfeld is a method of creating a state of mild sensory deprivation which is supposed to enhance someone’s ability to receive extra-sensory information and thus to enable better scoring.

Curious, isn’t it? Decades pass during which parapsychologists develop a method to increase scoring but… The increase is worse than what was achieved at the time by using a “star subject”.

McLuhan tells us that card-guessing was abandoned because it was to boring, just sitting there calling out one guess after the other. In the Ganzfeld experiments, someone has to endure 20 minutes of sensory deprivation for a single guess. I am not sure how that relieves the problem.

I wonder if it may be one thing that distinguishes skeptics and believers, that skeptics have a higher need for internal consistency?

Bad statistics are a serious problem in parapsychology as they can create the impression of an effect where there is none. Naturally, not all criticisms are correct. McLuhan incorrectly generalizes rebuttals of some criticisms to mean that such criticisms as a whole are unwarranted. Looking at recent works like Bem’s Feeling the Future, it is obvious how misleading that is.

One thing that stood out to me is how McLuhan speaks with 2 voices. He generally makes an effort (or a show?) of considering both sides. Sometimes he even intimates that these argument affected him. Yet every so often a different attitude breaks through. Then he tells us why these arguments are made. Not because they are true or reasonable but only to create doubt.

In-Depth Controversy

The first controversy that is addressed in-depth is the “sense of being stared at”. Unfortunately this is not one I have studied and so I will not comment on it. I intend to do so at some time but not in the next few weeks.

The next controversy concerns Sheldrake’s psychic dogs. This has already been examined on this blog.
Someone who goes to the original articles and actually evaluates the data for himself should be able to see past Sheldrake’s wall of make-belief but McLuhan completely falls for his spin and retells it as such.

He is so faithful to that version that he even follow Sheldrake in making nasty attacks on a skeptic who had the bad judgement of taking the claims seriously enough to conduct his own investigation.

After this low of investigative effort comes a more extensive exploration of the Ganzfeld experiments. These are in many ways amongst the best parapsychology has to offer. Many other results shrivel to nothingness under scrutiny or are simply unrepeatable which means we have to take them on faith.

By comparison, this series of experiments is a shining example of methodological rigor and solidity. Some time I will make a post on why I don’t believe that there is no real effect there. I expect they will eventually end up like Rhine’s experiments in the 1930s. Never fully explained but simply abandoned.

McLuhan quotes the same skeptic praising these experiments who he had just a few pages earlier accused of trying to sabotage Sheldrake’s research. He seems completely oblivious of the inherent contradiction.

Finally, there comes the remote viewing experiments called the Stargate project, performed by the US government. Here things get more mixed. Eventually this was a debate between Ray Hyman(skeptic) and Jessica Utts(believer). McLuhan, of course, finds the believer convincing, never realizing the gaping holes in her arguments.

Feeling the Future: Part 2

In my first post on Feeling the Future, I discussed mainly how it’s misuse of statistics related to science in general. I said little about how exactly the statistics were misused. My thinking was that a detailed examination would be too boring for the average reader.
I still think that but nevertheless I will spread out exactly how we know that Bem misused statistics.

The Problem Explained

The good news is that you don’t need to know statistics to understand this problem. You surely know games that use dice. Something like monopoly for example. In that game you throw 2 dice that tell you how far to move.
What if someone isn’t happy with the outcome and decides to roll again? That’s cheating!

Even small kids intuitively understand that this is an advantage, something that skews the outcome in a direction of one’s choosing. There’s no knowledge of statistics or probability theory necessary. While the outcome of each roll is still random,there’s a non-random choice involved.

If you roll 3 dice and pick 2. That’s the same thing, right?

How about we roll 4 and then pick 2 but with the stipulation that the 2 remaining dice must be used on the next? Again there’s a choice involved. Within limitations the player can choose
how to move which allows an advantage. The player’s moves are not longer random.
This is despite the fact that the dice rolls are random and none are discarded.

Now we’re ready to get to Feeling the Future.
The results presented were very unlikely to have arisen by chance. Therefore, the argument goes, they probably didn’t arise by chance. Which means there must have been some unknown factor influencing the outcome.

You may realize that this is a shaky argument. Just because something is unlikely does not mean it doesn’t happen. The impossible doesn’t happen but the unlikely, by definition, must and does. The unlikely is set apart from the likely merely by happening less often.
Then again, the impossible is only impossible as far as we know. And that we’re wrong on something is at best unlikely, if that. In reality, as opposed to in mathematics, we’re always dealing with probability judgements, never with absolutes.
In other words, that argument is all we have. It is used in the same way in almost every scientific experiment.

So the argument is solid enough. In fact, I believe that there is something other than chance involved. Of course, dear reader, if you didn’t know that already you must have skipped the beginning of the post.

Bem’s experiments each had, according to Feeling the Future, 100-200 participants. In reality, at least some of them were assembled from smaller blocks of maybe around 50. This is a problem for exactly the same reason as the dice examples. Even if the outcomes in every block were completely random, once hand-picked blocks are assembled to a larger whole, this whole no longer is.

Proof that it happened

How do we know that this happened? This doesn’t require knowledge of statistics either, just a bit of sleuthing.
First we note what it says in footnote about experiment 5

This experiment was our first psi study and served as a pilot for the basic procedures adopted in all the other studies reported in this article. When it was conducted, we had not yet introduced the hardware based random number generator or the stimulus seeking scale. Preliminary results were reported at the 2003 convention of the Parapsychological Convention in Vancouver, Canada (Bem, 2003); subsequent results and analyses have revised some of the conclusions presented there.

Fortunately, this presentation is also available in written form. Unfortunately it is immediately obvious that it doesn’t present anything corresponding to experiment 5.
The presentation from 2003 reported not 1 but 8 experiments, each with at most 60 participants. The experimental design, however, matches that reported in 2011.
The 8 experiments are grouped into 3 experimental series, so perhaps he pooled these together for the later paper? But no, that doesn’t work either.

I could write several more paragraphs of this kind, trying to write up a logic puzzle full of numbers as if it were a car chase. But my sense of compassion wins out. I know I would merely bore you half blind, my dear readers, and I won’t have that on my conscience.

Therefore I shall only give my answers as one does with puzzles. Check them with the links at the bottom if you like. I could easily have overlooked something or made a typo.

Experimental series 300 of the presentation is the “small retroactive habituation experiment that used supraliminal rather than subliminal exposures” that is mentioned in the File-Drawer section of “Feeling the Future”.
Experiment 102 with 60 participants must have been excluded because it has 60 rather than 48 trials per session.
Experiments 103, 201, 202, 203 combined form experiment 6. They have the same number of participants (n=150). Moreover, the method matches precisely. 100 of these 150 were tested for “erotic reactivity“. This is true for experiment 6 as well as the combination.
Experiment 101 could be part of experiment 5 but there aren’t enough participants. Additional data must have been collected later.
Note that the footnote points to “subsequent results“.

Warning signs

Even without following up the footnotes and references there are some warnings signs in Feeling the Future that hint that something is amiss. For example.

The number of exposures varied, assuming the values of 4, 6, 8, or 10 across this experiment and its replication.

The only reason one would change something in an experiment is to determine if this one factor has any influence on the results. Here we learn that a factor was varied but there is neither reason nor justification given. Much less results.

These two items were administered to 100 of the 150 participants in this replication prior to the relaxation period and experimental trials.

The same thing applies here. A good experiment is completely preplanned and rigidly carried through. There’s no problem with doing less formal, exploratory work to find good candidate ideas that merit the effort necessary for a rigid test. But such exploratory experiments have almost no evidential weight.

Such warning signs are also present in the other experiments described in Feeling the Future. That could indicate that the same thing was done there as well. But don’t make the mistake of assuming that this issue is the only one that invalidates Bem’s conclusions. There’s also the issue of data dredging which is like deciding which card game to play depending on what hand you were dealt. Small wonder then, if you find your cards to be unusually good, according to the rules of the game you chose.

In terms of an experiment that means analyzing the results in various ways and then reporting those results that favor the desired conclusion. That Bem did this is also evident from a comparison of the 2003 and 2011 description of what is apparently and purportedly the same data.

Particularly worrying is that Bem has explicitly and repeatedly denied using such misleading methods. I shall restrain myself from speculating about what made him deny such an obvious, documented fact. It does not have to be dishonesty but none of the other possibilities is flattering, either.

There’s a common conceit among believers that skeptics don’t look at the data. Whenever someone claims this, ask them if there is anything wrong with Feeling the Future and you will know the truth of that.

Bem, D. J. (2003, August). Precognitive habituation: Replicable evidence for a process of anomalous cognition. Paper presented at the 46th Annual Convention of the Parapsychological Association, Vancouver, Canada.
Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology.

« Older entries Newer entries »