Feeling The Future, Smelling The Rot

Daryl Bem is (or was?) a well-respected social scientist who used to lecture at Cornell University. The Journal of Personality and Social Psychology is a peer-reviewed, scientific journal, also well-respected in its field. So it should be no surprise that when Bem published an article that claimed to demonstrate precognition in that journal it made quite a splash.

It was even mentioned, at length, in more serious newspapers like the New York Times. Though at least with the skepticism a subject with such a lousy track record deserves. In fact, if the precognition effect that Bem claims was real, casinos were impossible, as a reply by dutch scientists around EJ Wagenmakers points out.

By now, several people have attempted to replicate some of Bem’s experiments without finding the claimed effect. That’s hardly surprising but it does not explain how Bem got his results.

What’s wrong with the article?

It becomes obvious pretty quickly that the statistics were badly mishandled and a deeper look only makes things look worse. The article should never have passed review but that mistake didn’t bother me at first. Bem is experienced, with many papers under his belt. He knows how to game the system.

The mishandled statistics were not just obvious to me, of course. They were pointed out almost immediately by a number of different people.

These issues should be obvious to anyone doing science. If you don’t understand statistics you can’t do social science. What does statistics have to do with understanding people? About the same thing that literacy has to do with writing novels. At its core nothing, it’s just a necessary tool.

Mishandled statistics are not all that uncommon. Statistics is difficult and fields such as psychology are not populated by people with an affinity for math. Nevertheless, omitting key information and presenting the rest in a misleading manner really stretched my tolerance. That he simply lied about his method when responding to criticism, went too far. But that’s just in my opinion.

Such an accusation demands evidence, of course. The article is full of tell-tale hints which you can read about here or in Wagenmakers’ manuscript (link at the bottom).
But there is clear proof, too. As Bem mentions in the article, some results were already published in 2003. Comparing that article to the current article reveals that he originally performed several experiments with around 50 subjects each. He thoroughly analyzed these batches and then assembled then to packets of 100-200 subjects which he presents as experiments in his new paper.

[Update: There is now a more extensive post on this available.]

That he did that is the omitted key information. The tell-tale hints suggest that he did that and more in all experiments. Yet he has stated that exploratory analysis did not take place. Something that is clearly shown to be false by the historical record.

Scientists aren’t supposed to do that sort of thing. Honesty and integrity are considered to be pretty important and going by the American Psychological Association’s ethics code that is even true for psychologists. But hey, it’s just parapsychology.

And here’s where my faith in science takes a hit…

The Bem Exploration Method

Bem Exploration Method (BEM) is what Wagenmakers and company, with unusual sarcasm for a scientific paper, called the way by which Bem manufactured his results. They quote from an essay Bem wrote that gives advice for “writing the empirical journal article”. In this essay, Bem outlines the very methods he used in “Feeling the Future”.

Bem’s essay is widely used to teach budding social psychologists how to do science. In other words, they are trained in misconduct.

Let me give some examples.

There are two possible articles you can write: (a) the article you planned to write when you designed your study or (b) the article that makes the most sense now that you have seen the results. They are rarely the same, and the correct answer is (b).
The conventional view of the research process is that we first derive a set of hypotheses from a theory, design and conduct a study to test these hypotheses, analyze the data to see if they were confirmed or disconfirmed, and then chronicle this sequence of events in the journal article.

I just threw a dice 3 times (via random.org) and got the sequence 6,3,3. If you, dear reader, want to duplicate this feat you will have to try an average of 216 times. Now, if I had said I am going to get 6,3,3 in advance this would have been impressive but, of course, I didn’t. I could have said the same thing about any other combination, so you’re probably just rolling your eyes.
Scientific testing works a lot like that. You work out how likely it is that something happens by chance and if that chance is low, you conclude that something else was going on. But as you can see, this only works if the outcome is called in advance.
This is why the “conventional view” is as it is. Calling the shot after making the shot just doesn’t work.

In real life, it can be tricky finding some half-way convincing idea that you can pretend to have tested. Bem gives some advice on that:

[T]he data. Examine them from every angle. Analyze the sexes separately. Make up new composite indexes. If a datum suggests a new hypothesis, try to find additional evidence for it elsewhere in the data. If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something —anything —interesting.

There is nothing, as such wrong, with exploring data, to come up with new hypothesis to test in further experiments. In my dice example, I might notice that I rolled two 3s and proceed to test if maybe the dice is biased towards 3s.
Well-meaning people, or those so well-educated in scientific methodology that they can’t believe anyone would argue such misbehavior, will understand this passage to mean exactly that. Unfortunately, that’s not what Bem did in Feeling The Future.

And again, he was only following his own advice, which is given to psychology students around the world.

When you are through exploring, you may conclude that the data are not strong enough to justify your new insights formally, but at least you are now ready to design the “right” study. If you still plan to report the current data, you may wish to mention the new insights tentatively, stating honestly that they remain to be tested adequately. Alternatively, the data may be strong enough to justify re-entering your article around the new findings and subordinating or even ignoring your original hypotheses.

The truth is that once you go fishing, the data is never strong (or more precisely the result).

Bem claimed that his results were not exploratory. Maybe he truly believes that “strong data” turns an exploratory study into something else?
In practice, this advice means that it is okay to lie (at least by omission) if you’re certain that you’re right. I am reminded of a quote by a rather more accomplished scientist. He said about science:

The first principle is that you must not fool yourself–and you are
the easiest person to fool. So you have to be very careful about
that. After you’ve not fooled yourself, it’s easy not to fool other
scientists. You just have to be honest in a conventional way after
that.

That quote is from Richard Feynman. He had won a Nobel prize in physics and advocated scrupulous honesty in science. I imagine he would have used Bem’s advice as a prime example of what he called cargo cult science.

Bayesians to the rescue?

Bem has inadvertently brought this wide-spread malpractice in psychology into the lime-light.
Naturally, these techniques of misleading others also work in other fields and are also employed there. But it is my personal opinion that other fields have a greater awareness of the problem. Other fields are more likely to recognize them as being scientifically worthless and, when done intentionally, fraud.
If anyone knows of similar advice given to students in other fields, please inform me.

The first “official” response had the promising title: Why psychologists must change the way they analyze their data by Wagenmakers and colleagues. It is from this paper that I took the term Bem Exploration Method.
The solution they suggest, the new way to analyze data, is to calculate Bayes factors instead of p-values.
They aren’t the first to suggest this. Statisticians have long been arguing the relative merits of these methods.
This isn’t the place to rehash this discussion or even to explain it. I will simply say that I don’t think it will work. The Bayesian methods are just as easily manipulated as the more common ones.

Wagenmakers & co show that the specific method they use fails to find much evidence for precognition in Bem’s data. But this is only because that method is less easy to “impress” with small effects, not because it is tamper-proof. Bayesian methods, like traditional methods can be more or less sensitive.

The problem can’t be solved by teaching different methods. Not as long as students are simultaneously taught to misapply these methods. It must be made clear that the Bem Exploration Method is simply a form of cheating.

Sources:
Bem, D. J. (2003). Writing the empirical journal article. In J. M. Darley, M. P. Zanna, & H. L. Roediger III (Eds.), The compleat academic: A career guide (pp. 171–201). Washington, DC: American Psychological Association.
Bem, D. J. (2003, August). Precognitive habituation: Replicable evidence for a process of anomalous cognition. Paper presented at the 46th Annual Convention of the Parapsychological Association, Vancouver, Canada.
Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology.
Bem, D. J., Utts, J., & Johnson, W. O. (2011). Must psychologists change the way they analyze their data? A response to wagenmakers, wetzels, borsboom, & van der Maas (2011). Manuscript submitted for publication.
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (in press). Why psychologists must change the way they analyze their data: The case of psi. Journal of Personality and Social Psychology.

See here for an extensive list of links on the topic. If I missed anything it will be there.

The Saga of Rupert Sheldrake and the Psychic Dog

The saga starts  in 1994 with a book with the not-quite-modes promise in its title Seven experiments that could change the world.

Sheldrake relates how some pet owners think that their pets can tell when they are coming home, even if that should be impossible. He believes that there is a telepathic link between pet and owner. One of these seven world-changing experiments is to show this behavior.

Three Surveys

The saga continues with three surveys of pet owners in England and California, published in 1997/98. About 50% of dog owners said their pet anticipated the return of a family member and 30% of cat owners. Almost 20% of dog owners said that this behavior started more than 10 minutes before the person’s arrival.

Jaytee

Meanwhile a specific dog by the name of Jaytee was the center of an exhaustive investigation. Jaytee’s owner, Pam Smith, and her parents had noticed as early as 1991 that Jaytee was anticipating Pam’s return. They put this down to routine as she returned home from work at always the same time. However, the behavior seemed to persist even after Pam was laid off in 1993 and no longer followed a set routine.
Pam Smith learned of Rupert Sheldrake’s interest in psychic pets in April 1994 from a newspaper article. She volunteered for an experiment. In the following month her parents began taking notes.
The first observations seemed promising so the notes got more detailed and eventually lead to several specific tests. A few had Pam return by an unusual mode of transport so that the dog would not hear the familiar car. In two tests the return time was determined by coin toss.
There was also a test by austrian state television (ORF) for a documentary.

What Jaytee could do

Based on these observations and tests Sheldrake argued that Jaytee reacted whenever Pam Smith decided to journey home.
Sometimes the dog reacted before Pam Smith started journeying home but Sheldrake said that, in fact, this was because Jaytee had reacted when Pam prepared to travel home, rather than when the journey actually started. When the dog reacted late this might have been because the parents had not been paying attention and simply missed the proper time, so that the dog only seemed to have been late.
For some failures other reasons were found such as distractions outside (like a bitch on heat) or the dog being ill.

Such arguments to explain failures away may not be too convincing but Sheldrake could also point to some successes. Yet those successes relied greatly on the reliability of Pam Smith’ parents as unbiased observers, or in one case, of a film crew.

Videotaped experiments

The next step was to videotape the whole thing. The camera was trained on a certain spot, in front of a window. Going there and looking out was, according to the Smiths, how Jaytee anticipated his owner.
This would take place in several locations. On 30 occasions, Jaytee was left with Pam’s parents, as before. Five times, he was left with Pam’s sister, and 50 times he was left alone.
When Jaytee was with Pam’s sister at her place he spent altogether less time at the window but Sheldrake describes his behavior as being similar to when he was at the parents’ place.
There were also 50 such observations where Jaytee was alone. There he usually did not go to the window at all. Only in 15 cases he showed his usual response. However, no graphs, or other information, are given to support this statement.

Jaytee’s behavior when he was with Pam’s parents is shown here:

Graph from Sheldrake 1998

The 30 trials were separated according to how long Pam Smith was absent.
Each step on the x-axis represents a 10 minute (600 second) period. The y-axis tells us how many seconds of these
10 minutes, Jaytee spent at the window. The filled circle/square indicates the first 10 minutes of Pam’s return journey.  The lower line, marked with squares excludes 7 observations where Jaytee spent especially much time at the window before Pam returned, I won’t be using it.

But what does that mean?
For one thing it clearly contradicts Sheldrake’s earlier conclusion. Jaytee does not suddenly go to the window and wait there as soon as Pam starts returning. He simply spent more and more time there.

It does seem as if he had a rough idea of when Pam would return and behaved accordingly but maybe he was merely reacting to the parents anticipation. Even though she should not have told them they may have been able to guess from clues like what she took along. Or indeed, Jaytee may guessed himself.
I must admit, though, that I am not entirely certain if that may not be simply a statistical illusion.

Weird!

Now things get seriously weird. A normal person, or at least a normal scientist, faced with that data would now seriously reevaluate his assumptions. Maybe when the parents took the notes, they were picking up some different, more subtle clues from the dog. Maybe just looking at when the dog goes to a certain spot is not good enough.
Or maybe the telepathic link was between Pam and her parents in the first place.

That’s however not what Sheldrake does. Sheldrake argues that the data confirms his idea. Jaytee spends the most time at the window right before Pam returns therefor he’s psychic. That’s the argument. No kidding.
It gets worse.
Yes.
Really.

He is aware that if the dog goes to the window more and more this will also have him at the window most when Pam returns. And this is why he produced that graph. I took it right from his paper. And, you see, it shows how Jaytee did not go to the window more and more. You don’t see? Good for you.
His argument is simply wrong but for the morbidly curious here it is: He compares the short, medium and long absences. For example, when Pam returned after 80 minutes (short) the dog spent an average of about 300 of the last 600 seconds (10 minutes) at the window. But after 80 minutes in the medium and long absences, he only spent about 100 or 50 seconds there respectively.

That’s true, and as I said, might indicate that the dog knew something. It just tells us nothing about whether the dog really did go to the window more and more. It is obvious from the graph anyway but if you wanted to test that mathematically you would use a so-called linear regression. Based on an off-hand remark in a different section this seems to have been done (by Dean Radin) with expected results but not included.

This may seem like the end but the saga is not finished yet.

Randomization

There is one final experiment to be done. By determining a return time for Pam at random and only communicating it to her once she is on her way, we can make sure that Jaytee has no clue when she is going to return. Sheldrake performed 12 such experiments that naturally showed Jaytee being at the window most right before Pam’s return. He still thinks that this indicates telepathy, in complete defiance of the facts and any rational argument.

Richard Wiseman

When you hear this saga related elsewhere you will always hear of Richard Wiseman as well. Wiseman is a British psychologist with a well-known skeptical interest in the paranormal. The seemingly stunning performance of Jaytee that was filmed by the austrian television crew lead him to contact Sheldrake. Pam Smith graciously agreed to take part in his experiment and Sheldrake allowed him to use the same video-camera.
Wiseman, with the assistance of two colleagues, Matthew Smith and Julie Milton, performed four experiments.

Since Sheldrake had already done all the preliminary groundwork, Wiseman could jump right in. The dog was supposed to do a certain thing, that is, go to the spot at the window right when Pam Smith was about to return. He would simply test if that was the case. Wiseman would stay with the dog, filming him. Smith would go with Pam and tell her to return at the appointed time.
If the dog went to the window in the same 10-minute time frame as the return, the test would be a success.
As we would expect, the dog was much too early.
However, the dog stayed there only a brief moment, maybe because of some distraction outside. It was decided to try again but this time the dog would have to stay at the window for a full two minutes.
Same thing again, of course.
So it was decided to wait until winter with the next try, when there would be fewer distraction outside.
Yet again too early.
In the fourth experiment the dog didn’t ‘signal’ at all.

Of course, Jaytee’s pattern of going to the window more and more is present in this data as well. By Sheldrake’s twisted logic this means that Wiseman found evidence of telepathy. This is where the saga takes an unsavory turn.
Wiseman has bluntly stated that he failed to find evidence of Jaytee being psychic, moreover he finds Sheldrake’s own data unconvincing. To Sheldrake this is an outrage.
When Wiseman confirmed that he agreed that his data showed the same pattern as Sheldrake’s this was to Sheldrake an admission that Wiseman had found telepathy. To Sheldrake, Wiseman is simply being dogmatic and irrational in not saying so.
It may seem hard to believe that anyone could read through Sheldrake’s work and not see the foolishness in his logic but it isn’t just fans of Rupert Sheldrake who uncritically accept his twisted reality. It is also authors, such as Chris Carter and Robert McLuhan as well, who pride themselves on having investigated such issues. This has, by now, turned into a character assassination campaign against Wiseman.
I must add that Wiseman himself has largely ignored this and never criticized Sheldrake for his irrationality. He has only expressed disagreement and laid out his arguments.

Kane
There was also a small number of tests with another dog called Kane. His pattern seems to have been slightly different but there were only very few tests. That makes it virtually impossible to say anything with confidence.

Conclusion?

You may now think that all this psychic dog business is completely debunked. Well, in a way.
We have seen how Sheldrake’s original hypothesis seemingly collapsed with more stringent tests but one could claim that this was due to error on the part of the scientists.
Wiseman was with the dog, filming him, did that throw the dog off? Sheldrake switched to a different, nonsensical statistical analysis which may cover up evidence.

And even if Sheldrake’s hypothesis about how the dog expresses telepathy is completely wrong, the dog may still be telepathic but just expressing it in a different way.
There are any number of reasons why the tests would have failed to find telepathy.

Is there anything we could interpret as possibly evidence of telepathy?
When we look at the twelve highest quality experiments, those with videotape and random return tim, we find that in four of these Jaytee only went to the window when Pam was on her way home. Not any sooner Maybe that means something?
On the other hand, that only happened when the return time was very early. When the return time was late, he was always too soon (except in one case when he did not do anything at all). That makes it seem much less interesting. It makes it look like the “hits” depend more on the random time being just right than anything else.
The case for telepathy can be strengthened again by subtracting those times where there was some identifiable distraction that may have caused Jaytee to go to the window. Two tests then turn from failures to successes.
But how reliable is such a retrospective judgement? A worrying detail is that the graphs that were published in the parapsychological literature and those contained in Sheldrake’s books show slight differences.
Also there are Wiseman’s results which were all clear failures by this standard.

There’s another issue and it’s the most important one. These few cases that might be telepathy are the result of me going over the results in detail, searching for anything that, at least, doesn’t contradict telepathy. I had to completely ignore Sheldrake’s argument which is simply wrong.
I also had to ignore that the failed tests suggest that this was just chance.

That makes the whole evidence not very convincing. We’d need additional tests to determine if this idea stands up.

The question is, how much effort do we put in before giving up?

Most people, surely most scientists, would look at the track record of telepathy claims. Perhaps they would also look at Sheldrake’s track record who had a well-deserved reputation for irrationality, well before this episode. Based on that they would dismiss the whole thing from the start.
Wiseman gave it more of a chance than most would. His fate may hold something of an answer to those who wonder why people aren’t more open-minded.

How much effort would you personally expend?

Sheldrake thinks he has good evidence of telepathy in dogs. And yet he, too, has given up on research. One would think that by finding a telepathic dog the science would only begin. One would think that your average scientist would continue by uncovering the physiological basis for it.

If one wanted to pick up the work that Sheldrake dropped one would first have to find a psychic dog. Going by Sheldrake’s surveys, this should be easy, if people don’t fool themselves about their pets being telepathic.
There was one person who tried this. A former high-tech entrepreneur turned podcaster by the name of Alex Tsakiris. He put in quite some effort and money.
His plan was to turn the project over to professional scientists once he had found some suitable dogs but it never happened. He found candidates that seemed promising to him but nothing worked out. Eventually he quietly abandoned the project.

So here’s my personal conclusion: I am going to live my life as if there is no such thing as a psychic pet or telepathic dog or whatever. I am also going to be highly doubtful about anything coming from Rupert Sheldrake.
You draw your own conclusion.

Sources:
Papers by Rupert Sheldrake
Papers by Richard Wiseman
Dogs that know by Alex Tsakiris

Randi’s Prize: Answering Chapter 2

Chapter Two: Eusapia Palladino and the Phantom Narrative

In this chapter we get an overview of spiritualism at the end of nineteenth/beginning of the twentieth century. Back then séances were all the rage. Mediums like Eusapia Palladino produced ghosts made from ectoplasm and performed real magic.

McLuhan compares these to contemporary charlatans like Uri Geller. They must be conjurers of genius, he concludes from their effect on the audiences. He forgets that not everyone was impressed by Geller, and those who were, were impressed by his psychic abilities rather than his magic skills. What sets people like Geller apart from other conjurers is his cunning ability to manipulate people and the media. Look at how he used naive people like Targ and Puthoff to further his reputation. That may require genius of a sort but it most of all requires ruthlessness.

We are treated to a number of descriptions of miraculous events that took place in the séance room. In many ways it is a repeat of chapter 1. He is incredulous that so many people sober people could be fooled. Same old, same old…

We also learn of skeptical magicians who find themselves stumped and even endorse paranormal explanations. McLuhan doesn’t understand why skeptics ignore such admissions but only retell explanations. The reason is simple, of course. Because explanations are interesting and ignorance is not.
Many people devote themselves to the study of physics where they learn the explanations for a variety of phenomena, such as gravity. No one is interested in a list of people who do not know these explanations.

Eusapia Palladino features big in the chapter as the title implies. We learn that she is caught “cheating” frequently. However, one team of scientists sticks it out with her nonetheless. They figured that she is only using trickery sometimes and at other times not.

Palladino herself seemed aware of this. She explained – and it seemed to be confirmed by observation – that psychokinetic effects occurred during her trance state by a process of will. The initial channel for the will would be physical: if you or I want to lift something we grasp it with our hands and raise it up, and this was a natural impulse in her also. It was by checking this impulse, allegedly, that the psychokinesis could be unlocked. For this reason she is recorded shouting ‘Controllo!’ at moments when she felt the energy building, to ensure that she was properly held and did not release it by reaching out to perform an action manually.

Amazing. And the evidence seems to confirm it even!

Of course, how could it not? They can’t catch her every time. Their very persistence ensures that they must be fooled and yet it is this very persistence that impresses McLuhan. People who caught her cheating and gave up on her were just being shoddy debunkers.

In this chapter McLuhan also develops his concept of “rational gravity” and  the “phantom narrative”.

Rational gravity, people’s tendency to gravitate towards rational explanations, is certainly a real phenomenon. The reason is quite simply that it works, oohing and aahing over mysteries not so much. He also suggests that stories change over time to become more compatible with rational explanations. I’m pretty sure that happens but I cannot understand why McLuhan fails to see that this works in both directions.

Richard Hodgson and Davey staged a fake séance in 1887. That the séance was fake was unknown to the sitters who duly took notes of the proceedings. The descriptions of the happenings were so inaccurate as to prompt this conclusion:

…the account of a trick by a person ignorant of the method used in its
production will involve a misdescription of its fundamental conditions…so
marked that no clue is afforded the student for the actual explanation.

Richard Hodgson, Proceedings of the Society for Psychical Research, 9, 360,
1894.

Practically this means that some happenings will be literally inexplicable not because they were paranormal but simply because the account is garbled.
McLuhan cites work done by Hodgson and Davey but clearly fails to realize the implication.

The phantom narrative is what McLuhan calls attempts of skeptics to explain what went on in some séances, that is how the tricks were performed. McLuhan does not find these speculations convincing. No problem, after all they are just speculations. We don’t have a time machine, we can’t go back.
The problem is that he takes his doubts about these speculations as evidence for the paranormal. Either, there is a perfectly convincing and satisfying normal explanation or the event is evidence for the paranormal.
Showing how one explanation falls short is meritorious but it does nothing to show that another explanation is right.
What’s worse is that at least since Hodgson and Davey, we have a mechanism that explains the inexplicable. Even if nothing paranormal happened, there can still be accounts of this that are inexplicable!

In the end McLuhan mentions the possibility that modern technologies like infra-red cameras might settle the matter. Yet he has doubts, according to him it is a question of reconciling ourselves with the idea of psychokinesis.
I found this curious because it suggests that McLuhan is unaware that infra-red videos of séances have been made and also that modern physical mediums generally disallow that.

One example would be psychologist Kenneth Batcheldor who filmed himself with some students while they rocked a table in pitch darkness. McLuhan quotes Batcheldor’s claim that during one séance the table levitated for 9 seconds. The filmed séances show nothing of the sort. See for yourself:

There are 3 more parts, go to youtube to watch them
Why would I believe that these people rock the table with their minds, or via some spirit, rather than with their hands? I guess I am just not reconciled with the idea of psychokinesis.

Another example that I want to mention is the scandal that took place at camp chesterfield. There’s a bit of footage of that to be found, too.

I guess McLuhan would say that just because people clearly have their hands on a table, they might still be using their minds to actually move it. He would probably also say that just because infra-red videos of sèances do nothing but uncover trickery, does not mean there aren’t real cases, too.
Both would be true. Of course, the logical impossibility of proving a negative isn’t actually evidence for anything either.

Randi’s Prize: Answering Chapter 1

Chapter 1: NAUGHTY ADOLESCENT SYNDROME

Chapter 1 deals with Poltergeist cases. It starts out by retelling some cases that were soundly debunked by various skeptics, including the infamous James Randi. These are related with hardly a counter argument and are so damning towards any idea of paranormal involvement that I actually checked the cover to make sure I was reading the right book.

The conclusion seems inescapable that poltergeist cases are usually caused by a troubled teen out for mischief.

It seems impossible that anyone who acknowledges the work of skeptics on these cases as valid and valuable would go on  to argue for their paranormality. And yet McLuhan does exactly that.

Here’s how McLuhan puts his misgivings:

Yet there was something here that didn’t seem quite right to me, and it kept drawing me back. If you read the literature on the subject you’ll find that poltergeist incidents tend to be extraordinarily fraught. The people involved are overcome with panic and confusion, not just for a few hours but for days and weeks on end. This isn’t an effect one expects to result from mere children’s pranks. And as I said before, I often wondered how these children managed to create such convincing illusions and remain undetected.

The fallacy which we see here twice in a paragraph is a common one and sometimes called by the unwieldy name of the fallacy of the transposed conditional. Nevermind the name.

I wouldn’t either expect a child’s prank to spook a normal person so thoroughly. But there’s another thing I wouldn’t expect: To hear from it.

We only hear of those cases where some people got really spooked. Is it possible that some pranks could do that? In my opinion yes. Either the child may be gifted with an ability to play with people’s expectations, or the victims may be especially prone to seeing some paranormal influence, or maybe everyone just wants to get into the news and does more to promote the case then to find some solution. Whatever the individual circumstances are, the exceptional cases are the only ones we hear about.

This fallacy is central to the chapter and indeed the book.

He just can’t believe why normal children would do this or that they could do it at all. He is right in his disbelief but normal children just don’t get involved in poltergeist cases either.

There’s another problem with what he says. He says the children remained undetected and yet he has just related several cases in which they were caught red-handed. Oh, and what he calls children are all teenagers, one as old as nineteen.

He has more arguments:

There are a large number of similar cases which suggests a distinct natural phenomenon. I could agree with that but I would have to point out that the cases also suggest a distinct natural cause: The troubled teen.

And, of course, he mentions cases where no trickery was found. Unfortunately we already know that sometimes people hoax others. Should we really assume that every hoaxer is found out? If so then perhaps we should also count unsolved crimes as poltergeist cases.

Some parapsychologists compound the problem by insisting that some cases are “real” even when someone was found hoaxing. It is normal, they say, that people under pressure should use trickery to produce the phenomena that previously happened spontaneously and paranormally.

McLuhan comes closest to addressing this by pointing out that believing investigators expose some cases as hoaxes. This means we should assume that they know what they are doing. If someone can uncover one hoax, he must be able to uncover them all. It’s just like with police detectives. If a detective can solve one case, he or she is  able to solve all cases or else it must be alien abduction, right?

Skeptical investigators, meanwhile, deal with too few cases. The more cases someone investigates, the more credible they are. This may seem sensible, practice makes perfect. But who else than a believer will devote so much of their lives to this? To the skeptic this is just an endless parade of dysfunctional families. Dragging them into a paranormal investigation is not just a waste of time, it is downright unethical. What they need is a social worker.

Eventually, it will be the truest of believers, the downright delusional, who investigate most cases.

McLuhan does his best to raise doubts about the “normal explanation” and some of his arguments have merit. If we knew that hauntings were “for real” and had only been looking at cases to find which were probably real and which faked then they might even have had a point. But, as it is, we don’t know that. That is what these cases were supposed to establish.

One has to give McLuhan credit. He sees that the cases are not convincing by their nature. Where he fails is in taking the unremarkable as evidence.

Randi’s Prize by Robert McLuhan

At the beginning of this year (2011) I obtained an e-book edition of Randi’s Prize from the author who was so kind as to perform a give away for new year.

The sub-title is: What Sceptics Say About the Paranormal, Why They Are Wrong, and Why It Matters.

The first half of the book relates supposedly paranormal incidences and experiments, both from the point of view of skeptics and believers interspersed with the authors thoughts. As you can guess from the sub-title, the author almost invariably sides with the believers.

The second half of the book deals more with the question of how people can be so wrong, why the paranormal is not accepted, what would happen if it was accepted and related musings. This is mostly opinion. I found that part self-indulgent, boring and hard to get through.

Nevertheless I quite enjoyed the first half. I think it offers a great insight into the reasoning of someone who is seemingly sane and does not have first-hand psychic experiences and still comes to belief. The underlying arguments are, to me, transparently fallacious but they are also, judging from my online debating, common.

Skeptical writers often focus on coming up with explanations for supposedly inexplicable incidents. This is sensible, for these incidents are posed as riddles and an answer is demanded. Yet such answers fail to address the underlying errors in reasoning, chief among them the non sequitur: Unexplained equals explainable only after a scientific revolution that vindicates age-old superstitions.

I fear that even some skeptics fail to realize how broadly wrong the underlying reasoning is. What’s worse is that this failure leads to unrealistic expectations among believers.

I will write a series taking on McLuhan’s book in detail. I will not give any detailed normal explanations of the supposedly paranormal. That would be pointless, especially since McLuhan himself does a good job of summarizing skeptical objections. I will merely point out the false conclusions.

That Wiseman Quote

Richard Wiseman is a british psychologist known for his pop-sci books as well as his skeptical interest in paranormal claims. The Daily Mail quotes him thusly:

“I agree that by the standards of any other area of science that remote viewing is proven, but begs the question: do we need higher standards of evidence when we study the paranormal? I think we do.

“If I said that there is a red car outside my house, you would probably believe me.

“But if I said that a UFO had just landed, you’d probably want a lot more evidence.

“Because remote viewing is such an outlandish claim that will revolutionise the world, we need overwhelming evidence before we draw any conclusions. Right now we don’t have that evidence.”

This is frequently quoted by believers in the paranormal as support for their position. The spin is that Wiseman, the skeptic, admits that something paranormal is proven but then resorts to a double standard to deny this.

Is there any truth to this?

Those who know Wiseman will know he is usually a rather rational person. Those who know the Daily Mail will know that it is not the most sober or reliable newspaper, that is to say a british tabloid. One suspects that the quote was simply mangled beyond recognition.

Surely remote viewing is disproven by any normal standard! Besides what’s that talk about different standards in different areas of science? I think we can infer that he is not talking about having different standards in different sciences but rather for different claims. Also, he’s misusing the phrase begs the question.

And finally, I assume that when he talks about “revolutionising the world” he talks about the scientific world. IE he means that this would uncover glaring and massive holes and/or errors in our understanding rather than that it simply would unlock new technologies.

New technologies, of course, don’t require extraordinary evidence, they are extraordinary evidence. Everyone can test if they work. Those who employ new, effective techniques profit, the rest gets left behind.

A misquote?

The misquotation hypothesis receives a partial confirmation on another blog where Wiseman clarifies thusly:

“It is a slight misquote, because I was using the term in the more general sense of ESP — that is, I was not talking about remote viewing per se, but rather Ganzfeld, etc as well.  I think that they do meet the usual standards for a normal claim, but are not convincing enough for an extraordinary claim.”

So he is not talking about remote viewing but instead about something else. Unfortunately it is quite unclear what.

It doesn’t look like there is any real clarification from Wiseman forthcoming. I don’t know what he really meant to say and who was responsible for mangling it but I will critique it anyways.

“Extraordinary claims require extraordinary evidence” is a truism that I have previously justified. An extraordinary claim is one that is unlikely to be true. And this likelihood is something we judge from our previous knowledge. Taking this into account we could render the Wiseman quote like so:

“If ESP had still more evidence going for it, it would be proven.”

Or as:

“ESP would be proven if there was not also a lot of counter-evidence.”

Both statements are true but not very sensible.

Wiseman, according to a JREF thread, specifically mentions psychology as a realm where normal claims have as much evidence as whatever he was talking about. The exchange between the poster and Wiseman went so:

Kuko 4000:
The existing RV database does not convince you, ok. But at the same time you seem to say that RV has been proven by scientific standards, now I’m confused. I would really appreciate a “clarification-for-dummies”, so to speak
This could be an issue with my understanding of science or it could be a matter of language barrier, but I’m having problems getting my head around this. Do you mean that by the standards of any other area of science, say biology, evidence of similar quality would be considered scientifically convincing? If so, could you please direct me to the research so I could look it up for myself?
Richard:
yes, it is different standards for different types of claims
so, a normal scientific claims requires a certain level of proof, but a paranormal one requires a higher level
Kuko 4000:
Could you give me an example of a normal scientific claim that in your view offers the same level of proof as the best available evidence for Remote Viewing? This way I could understand the comparison much better.
Richard:
most of psychology!

So let’s look at a psychological effect and contrast it to telepathy.

Normal vs. Extraordinary effect

For a not quite random example let’s take priming. For example, test subjects are given a list of words to read that contains the word carpet. When they are later given the beginning of words they are more like to complete C-A-R to carpet than otherwise.  An other example might be dropping the word ‘yellow’ and then finding that subjects are more likely to mention ‘banana’ when asked about fruits.

That’s not a particularly exciting effect. We already know that humans have memory, that practice helps, etc… We also know that much mental processing is unconscious. Priming is a specific effect of unconscious memory (properly called implicit memory).

Establishing priming is only showing a particular behavior of something that undoubtedly exists. I can’t think of any reason why it should be so but neither of any reason why it shouldn’t. Then again I’m not a psychologist.

In a telepathy test one will have at least two participants between who any normal communication is (supposedly) impossible. Then the experimenter will employ some method to show that communication actually happens. This is where experiments differ.

Parapsychologists argue that if they gain evidence that communication happens while it should be impossible, this must be telepathy. There are problems with that logic but that isn’t the point of this post.

The difference between priming and telepathy should be clear. One has a firm basis, the other, by definition, has none. That’s not even mentioning that people have been trying to establish telepathy for well over a century without managing to convince more than a tiny handful of the validity of the phenomenon.

Bottom line?

I think that Wiseman’s heart is in the right place but the quote is nonsense and those who criticise him for it are justified in doing so. It’s not the only time that he has said something that made little sense to me but it probably happens out of a desire not to call the emperor naked but rather to say something nice.

Extraordinary Claims and Extraordinary Evidence

“Extraordinary claims require extraordinary evidence” is a common skeptical quote and there has already been a lot written about it.

So rather than reinvent the wheel and talk about the history of the statement or give some abstract justification I am going to give an example of how it is applied.

A Medical Example

Think of a medical test like a HIV test or a pregnancy test. Such tests can go wrong. A pregnancy test could say you or your partner is pregnant when she is not. Or it may fail to say so when she actually is. There can be a false positive (aka Type I error) or false negative (aka Type II error). Such medical tests are extensively tested themselves before being marketed. It is therefore well-known how often they are wrong on average.

For our example we will imagine a test that has a 5% chance of a false positive and to keep things simple we will ignore false negatives. What happens when we apply that test to 10.000 people, 20 of whom have the disease and 9.980 who don’t?

We get 20 true positives and 9.980*5%=  489 false positives for a total of 519 positives. That means that although the test proudly proclaims to show a false positive in only 5% of cases, we found that of all the positives less than 4% are true.

Of course, in reality one will only know the test results and not what the actual truth is but one may have a good idea from previous experience. And, of course, one will rarely get a result that is exactly average but let’s not get into probability theory.

Now let’s say we test 10.000 people of whom 1.000 have the disease. We find 1.450 positives, 450 of which are false (9.000*5%). This time over 2/3rds of the positives are true.

In both cases we have the same evidence, namely a positive test result, but in the first case it is very, very likely false but in the second case probably true.

Such situations are encountered frequently in medicine. It is the reason that only risk populations are screened for diseases. Screening everyone would produce an overwhelming number of false positives that would cause needless distress.

Think back to the first example where we had 519 positives. If we apply another test, even a better one with only a 1% rate of false positives we would still get 5 false positives besides the 20 true ones. So even though we applied 2 tests, one with a 5% rate of false positives and another with a 1% rate we still only have 80% true positives.

There’s an implication here. Have you ever heard of a case where a tumor was suddenly gone? Well, maybe it wasn’t there in the first place. Even though medicine is well aware of this logic and compensates for it by using tests that are very reliable we cannot, as a matter, of principle achieve certainty. Especially since real life also throws us mixed up paperwork and human error besides any technical faults.

From The Medical To The General

The same logic can be applied to real life in a very straight-forward way. Say someone claims to have developed a perpetual motion machine. Many people have claimed that in the past but it never panned out. This is so extreme that patent offices these days refuse to review patents on such machines.

So in the very least we must assume that there are 1.000s if not 10.000s of perpetual motion claims that are untrue for every one that is true even if such a thing is possible.

But what does a test look like? Typically there will be a demonstration where the machine is shown in action. This will at least prove that there is some sort of machine. Any claims that exist merely in the form of an april fools press release or suchlike will fall to the wayside. So we can say that this is a true test in that it may be either negative or positive.

On the other hand, a demonstration only proves that some machine exists, it does not prove that this machine is perpetual, so the rate of false positives must be very high indeed.

The next test might be allowing an engineer or physicist to inspect the machine. How weighty is that evidence? Hard to say. If it is a scam, that person may simply be in on it. And even if not, that person may be fooled. How likely is that? That depends very much on how much leeway he or she is allowed. Your chances of seeing through a magic trick while sitting in an audience are pretty much nil but very high if you have free reign to roam backstage and to set up cameras etc. Don’t mistake knowing how a trick is done with actually seeing through it.

At what point would it be reasonable to believe in perpetual motion? Generally, when a radical change to something considered a law of nature is proposed one would like to have many independent replications in different labs and by different groups. In the case of a perpetual motion machine this should be quite easy.

An entirely different example of an extraordinary claim is Bigfoot. There are many, many species around the world. There’s even a fair number of large mammals that are different enough to be easily distinguishable by an amateur. There also are many fictitious beasts. I couldn’t put a number on it but the ratio of fictitious to real can’t be all that bad.

Two or three hundred years ago one would surely have considered the report of some reasonable credible person as sufficient to establish the existence of Bigfoot. So what has changed?

Nowadays, there are people everywhere. Zoologists have scoured every corner of the globe for new species. Discovering a large mammal means fame eternal. But there simply can’t be many left, especially not in North America. One needn’t even find a live specimen, with modern DNA techniques, simply finding a few hairs, feces or bones is enough.

Every expedition that fails to find Bigfoot, or even every Hiker, equals a negative test result. Of course, there can be false negatives. Just because you failed to find something doesn’t mean it is not there but it does mean that it is less likely.

You probably have a usual spot for your keys. But when you don’t see them there you look somewhere else. Before you looked, you thought it most likely that you’d find them there but when you didn’t see them you adjusted the probabilities. You’ve probably made the experience of finding something in a spot where you had searched before, even exhaustively. There you were a victim of a false negative. Every test can throw a false result!

Back to Bigfoot, if we think of every person who might have found evidence of Bigfoot as a test then we have a massive lot of negative results. Yet we don’t know if that means anything because we should expect a lot of false negatives even if Bigfoot exists. At the same time we also have a few positive results. People who reported seeing Bigfoot or who found Bigfoot tracks, there’s even Bigfoot films. These might be false positives, though. Misidentifications, hoaxes and the like.

On balance what carries more weight, positives or negatives? The answer is, as you might have guessed, the negatives. There are no DNA sample, no carcass, evidence which would be all but conclusive. And we should have such evidence if even a fraction of the positives, IE the tracks, the sightings, especially the film, were true.

You’re probably thinking that the medical analogy, this talk about tests gets quite strained here. Well, it’s what I’m thinking anyways, so let’s just let it be. There’s just one more thing I want to mention.

Think about what would happen if I tested 100 healthy people with a test that has a 5% rate of false positives. You get 5  positives, of course (on average). Test 100 more and you have 10. Test 1.000 more and you have 110! Why! You have an epidemic on your hand!

At least that’s how it might seem to a casual observer. The point is, it only takes dedication to have a growing pile of evidence for anything.  So whenever you hear of a “statistically significant result”, and newspapers are fully of them, remember that.

Science as a whole has ways of dealing with that and will probably not be fooled for long, at least, if reasonably unbiased people take an interest. The lasting facts tend to emerge more slowly, over the course of several years, or even decades. They rarely make newspaper headlines because there is no one event that could be reported on.

Conclusion

I hope this was enough to bring home the principle behind “extraordinary claims require extraordinary evidence”.

The important thing to take away is that claims are not different by their nature but by what we know. Ordinary claims can become extraordinary once counter-evidence piles up and vice versa.

One might equally well say: “Look at all the evidence!”

Don’t just look at the demonstration for that perpetual motion machine, look at all the identical claims in the past that have failed. Don’t just look at those Bigfoot tracks, look at what’s not been found.

We already know a thing or two. There’s nothing open-minded about ignoring that for the sake of evidence that may or may not materialize.