The Ganzfeld Experiments: Quality – Conclusion

Previously, I assessed the quality of evidence provided by the ganzfeld experiments. I found that the typical ganzfeld experiment could only be considered to yield Moderate evidence, and further that it was necessary to downgrade the entire body of studies at least once, for heterogeneity.
That leaves the overall quality of evidence for the ganzfeld trials as Low. There is no way that I could justify any higher grade but one could certainly justify a lower one. One could justify a double downgrade for the heterogeneity, on account of the serious implications. One could also justify downgrading for publication bias. And then, I didn’t look in detail at the individual studies, which could only uncover reasons for downgrading, as I found that there was no reason to upgrade.
When two (or more) factors are borderline, one should downgrade for at least one.

Put like that, calling the evidence of Low Quality is a very favorable assessment.

The best argument for a better grade is claiming that the ganzfeld design as it is should be regarded as High Quality, like a medical RCT. I’ve already laid out why I don’t agree with that. It would simply lead to another borderline case and at some point you can’t ignore all these borderline calls and must downgrade for at least one of them.

But what does that mean?

Quality level Current definition
High We are very confident that the true effect lies close to that of the estimate of the effect
Moderate We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
Low Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect
Very low We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect

Note well what this does not mean. It does not mean that there is no effect. It means that we can have no confidence that there is one. But equally it means that we can have no confidence that there is none.

And that simply means that everyone will retain whatever opinion they had beforehand which leads us to another curious feature of parapsychology in general.
Parapsychologists say that the hit-rate should be 25%. Any conventional cause for a deviation is not of interest and should be regarded as a bias. The basic ganzfeld design has been intensely scrutinized for any such potential bias and modified to rule it out.
This puts parapsychologists into the position to make a solid and credible argument that the hit-rate must be 25% by any conventional expectation. And it is that which lends credence to the argument that any systematic deviation, any effect, must be due to something amazing, that some worthwhile scientific discovery is waiting there.

Unfortunately, the sheer solidity of the theoretical argument means that few mainstream scientists will be swayed by low quality evidence. Curiously, many vocal parapsychologists seem unable to understand this.
They accuse mainstream science of being “dogmatic” and yet the failure to convince the mainstream with low quality evidence is precisely because of the solidity of their arguments that, by all prior evidence, the hit-rate must be 25%.

  1. Parapsychologists work hard and convince people that the hit-rate should be 25%.
  2. Parapsychologists accuse people of being dogmatic for believing it.

It’s one of those things about parapsychology that does not make the slightest bit of sense. Such displays of irrationality are ultimately responsible for parapsychology’s bad reputation. Low quality evidence is normal enough. That’s why there is such a thing as the GRADE approach.
If someone appears irrational, you probably won’t attempt a rational dialogue. And if you try anyway and your open-mindedness is rewarded with accusations of dogmatism and even dishonesty, then your probably give up.
It is that which leads mainstream science to, for the most part, shun parapsychology. Which then leads prominent parapsychologists to double down and declare that there is a “taboo” against dealing with them. But that’s a different matter.

Does GRADE work?

That’s a very good question. I hope it occurred to you.

One thing we would like to know is how reliable the assessment is. How much agreement is there between different raters? And the answer is: Not as much as we’d like. There is human judgement involved in the rating which is one reason that the GRADE approach demands transparency.
I have tried my best to make the reasoning as clear as possible and have already discussed where others might differ in their assessment.

The other thing we would like to know is how solid the conclusion is. Say, you have 1,000 different apparent effects but based on evidence rated Low or Very Low. How many of those effects would really be found to be substantially different?
The answer: No one knows, yet.

In relation to the ganzfeld, however, we can say that the assessment would have been exactly spot on. I’ve talked about a 33% hit-rate because that is often claimed but, in truth, the hit-rate has varied wildly. When some of the earliest experiments were analyzed in 1985, a hit-rate of 37% was obtained; while when studies from between 1987 and 1997 were analyzed a hit-rate of only 27% was obtained.
In the latter case it was, of course, the parapsychologists who were not impressed and argued that this was due to certain specific biases. That’s something for a later post.


So eventually we find that the ganzfeld evidence is of Low Quality but that should not come as a surprise to anyone.
The more important lesson is probably that this is so according to the standards of mainstream medical science. Other sciences may have a lower standard; I’m thinking of psychology in particular. Indeed, it has been asserted by psychologist Richard Wiseman that the ganzfeld is solid by the standards of his field but, as far as I can tell, his colleagues are, on the whole, not particularly impressed which would seem to contradict his assessment.
In any case, accusations of a double standard clearly lack merit.

What I find more worrying are the problems that parapsychology has in interpreting the evidence and drawing supportable conclusions, regardless of quality considerations. Low Quality evidence is not unusual, but the irrationality surrounding the whole issue is.

What If: High Quality Evidence ?

I think many parapsychologists have very unrealistic expectations about that. Remember that all that could be concluded from the ganzfeld experiments is the presence of some unexplained effect causing the chance expectation to be wrong.
High quality ganzfeld evidence would just indicate that there is probably something worth studying there. Some scientists would become interested enough to look into it. Most would simply be too busy with whatever they are already doing.
The interested scientists would then start out by repeating the original, standard ganzfeld experiment to create the effect in their own lab. And then, once they have succeeded in that they would study the effect. If the found themselves unable to recreate the effect, they would still give up. If you can’t create an effect, you can’t study it, even if you are convinced it exists.
And that’s all that would happen.

The situation currently, with low quality evidence, is not fundamentally different! It just means that fewer people are going to think they can recreate the effect or that the effect is due to something worthwhile.
This idea that high quality evidence for psi would lead to some sort of “paradigm shift” because of some single experiment is just nonsense. That kind of thing has never happened before and I don’t see how it could happen even in principle.

While this concludes the GRADE business, this does not conclude the quality series. There are some more issues we need to talk about, such as what parapsychologists had to say about all this.

Examples of mishaps

I want to give some examples of things that actually went wrong in the ganzfeld experiments. I hope it may illustrate how these vague biases may look “on the ground”.
Do not take these examples as a reason to dismiss the experiments. You can take the Low Quality as a reason for that but these examples are just, you know, life. Things don’t go perfect.
Parapsychology doesn’t stand apart in that respect.

These results differ slightly from those reported earlier since an independent check of our database by Ulla Böwadt found an extra hit in study V. Two trials in study IV (a hit and a miss) had also been included although the experimenters apparently were not agreed on this prior to the results. Their exclusion would make however virtually no effect on the final figures.
A review of the ganzfeld work at Gothenburg University by Adrian Parker, 2000

This shows how individual trials may simply fall through the cracks. It would have been completely justifiable not to include those. One has to wonder if the media demonstration, in particular, was conducted with the same diligence as the regular trials.
Somewhat similar problems are known in medicine. In a medical studies, patient may drop out; they quit the experiment. One must suspect that it will usually be those who are disillusioned by the offered treatment, or, perhaps, those who see no need to continue because the feel cured. In either case, this will bias the results. This so-called attrition is considered by GRADE.
Another thing this is similar to is transcription errors. You probably won’t be surprised to learn that people have actually been maimed and killed because of doctors’ illegible hand-writing but it’s also a problem in science. Bias may be introduced into a study simply because of faulty data entry. That published values had to be corrected has happened on occasion in ganzfeld experiments and particularly meta-analyses.


An amusing, recent example is an analysis published in 2010: Meta-analysis of free-response studies, 1992-2008: assessing the noise reduction model in parapsychology by Storm L, Tressoldi PE, Di Risio L.
The article is more full of errors than I care to point out but this is just about one of them (one of the less embarassing ones).
As part of their analysis they rated the ganzfeld experiments for quality. What they did wrong there is for a later post. The details of the rating were obtained by a couple of scientists, Jeffrey N. Rouder, Richard D. Morey and Jordan M. Province. They argued that improper randomization could explain a part of the results. More on that later.
One of the counter-arguments by the original authors was that the ratings, obtained from them, contained errors!


After about 80% of the sessions were completed, it was becoming clear that our hypothesis concerning the superiority of dynamic targets over static targets was receiving substantial confirmation. Because dynamic targets contain auditory as well as visual information, we conducted a supplementary test to assess the possibility of auditory leakage from the VCR soundtrack to R. With the VCR audio set to normal amplification, no auditory signal could be detected through R’s headphones, with or without white noise. When an external amplifier was added between the VCR and R’s headphones and with the white noise turned completely off, the soundtrack could sometimes be faintly detected.
Psi Communication in the ganzfeld: experiments with an automated testing system and a comparison with a meta-analysis of earlier studies by Honorton et al, 1990

This means that the receiver(R) may have heard the sound of the correct target, which certainly would have allowed him or her to make the correct guess. That’s potentially serious. The counter-argument, however, sounds quite convincing, as well: There was no drop in the hit-rate after the sound system was modified to rule that out.

That’s certainly quite suggestive but mind that it is not high quality evidence. We have a bunch of trials conducted before the sound system was fixed and a bunch afterwards but there is no direct, randomizzed comparison.
And what does the unchanging hit-rate indicate anyway? Maybe they just failed to remove the problem with the modification.

For what it’s worth, when that lab closed the equipment was moved elsewhere where it was used by different experimenters. They were unable recreate the effect.
You could take that as evidence that maybe the sound system played a role but once again: Low quality evidence. There certainly were other documented potential biases in the experiments at that lab which may not have been present at the new location.


The Ganzfeld Experiments: Quality -Part 1

Previously I have discussed what might be concluded from the ganzfeld experiments. In this post I will address a notoriously contentious issue: The quality of the evidence.
Practitioners and proponents of parapsychology often voice concern that their experiments are held to a different standard than experiments in ordinary science. To alleviate these concerns I will be scrupulous in applying the standards of medicine.
Medical experiments are in many ways similar to parapsychological ones. They are conducted on humans with all the messiness and limitations that comes with that. There are also some important differences, of course. I will discuss that as I go along.
Medicine faces more than any other science, the task of drawing conclusions from multiple studies, at least if the goal is to give patients truly evidence-based advice. Guidelines have been developed for that. Of course, these guidelines are themselves based on research on what methods are best suited to drawing correct conclusions and are still evolving.
I will mainly be relying on the Cochrane Handbook (v 5.1.0). Much of the handbook is not relevant to our purpose since we are not preparing a Cochrane review. The relevant material for this post is to be found in Part 2 of the handbook.

What is Quality of evidence?

It should be obvious what is meant by quality of evidence. It means how reliable it is. If it is of high quality then we probably won’t have to revise our judgement later on. If it is of low quality then we can’t be certain of that. The conclusion we draw from it might still be true but we would probably not be willing to bet the farm on it.

Let’s look at a practical example of how a medical trial might go wrong:
Experimenter: We have just given you a very expensive treatment. Do you feel any better?
Patient: I… don’t know?
Experimenter: It was really very expensive. I’m sure it must have worked. You do feel better, right?
Patient: Err…. Sure.

Clearly, if that were to happen in a medical trial, the results would be completely worthless for anyone outside of the marketing department. You probably already know the remedy to the problems here: Blinding and a control group.
The control group will typically receive a placebo. That is something that looks almost exactly like the real treatment but with the essential part removed, for example a pill without the drug. Often the control group will receive the old treatment instead because one cannot ethically let people go without treatment. Besides, one is usually more interested in whether a new treatment is better and should replace the current treatment rather than if it is better than nothing at all.
Blinding means concealing who belongs to the treatment and who to the control group. A double-blind trial is one where neither subjects nor experimenters know who is in what group. Which in turn means that they cannot be influenced by their personal preferences.

But hold on a moment. Just because someone is not blinded, does not mean that he or she will go to such lengths to influence the patient. Maybe the patient will not be influenced at all. If we know that a trial was unblinded we know that there was a risk of getting a wrong result. We don’t know if that actually happened, though.

Risk of bias

The Cochrane Collaboration talks about “risk of bias”. A bias in science is something that causes a result to be false. It has nothing to do with the personal motivations of the people involved.A completely impartial experimenter may still conduct a biased study while a biased experimenter may conduct an unbiased experiment.

The reason that the collaboration talks about risk of bias rather than quality is because in some cases it may be impossible to conduct a study with all the necessary safe-guards. In such cases it would be unfair to talk about a low quality study, but unfair or not, such a study would be still be at risk to yield a false results, ie to be biased. Another reason is that study quality is often regarded to be about more than just the reliability of the results. Other factors, such as properly following ethical guidelines also come into play.

According to the Collaboration’s suggested scheme, studies are assigned either a ‘Low risk’ of bias, ‘High risk’ of bias, or ‘Unclear risk’ of bias. Note that there is no category for no risk. This will probably come as a surprise to many, particularly fans of parapsychology. When mainstream scientists regard a parapsychological study as potentially flawed, that is not because of some prejudice toward parapsychology, it is because they apply the same standard as is applied to all studies.

The basic reason for that is that we can never rule out human error. We can never presume to know everything. Particularly, the risk of bias is assessed only by reading the report from the experimenters. Such reports are always idealized. If you ever read a scientific paper reporting an experimenter, you will almost never find mention of equipment malfunction or such mishaps. Of course, they do happen but every student learns not to report them. They are not supposed to whine about how hard the experiment was to get right or what personal drama they experienced. No one wants to hear that. Only events that might compromise the results are to be reported.
That leaves a lot to the judgement of the experimenters who have every reason not to mention things which might reflect badly on them or imply that they wasted their time.

By the way, the FDA does conduct on-site inspections to ensure that proper protocol is followed but that is only for some trials connected to drug licensing and the like. Regulatory bodies around the world will not accept anything but high quality evidence. Be aware, though, that alternative medicine is not held to these standards.

Bias vs. Conflict of interest

Personal motivations, the biases of the people involved, are not assessed.

Scientific journals, particularly in medicine, require researchers to declare financial conflicts of interest. That means mainly financial conflicts, such as when the researcher owns a considerable amount of stock in a company that will benefit from a certain result, or is employed by one, or funded by one. Financial benefits can be reaped even if the result is unmasked as false within a short time.
Other conflicts of interest such as personal friendships or family associations generally receive less attention. religious or ideological commitments are generally not considered at all.

Studies show indeed that, for example, trials financed by industry, are a little more likely to find a beneficial effect for the drug. Insofar, extra care is advised when conflicts of interest exist. If the majority of studies being looked at are industry-funded then one may even take this as evidence that publication bias is likely.

I am going to spare a few words on conflict of interest related to the ganzfeld and let that be enough.

Declaring conflicts of interest is not done in parapsychology. I don’t know why but I think it is not widely practiced in psychology either (do not mix up psychology with psychiatry or psychotherapy, though).
With regards to the ganzfeld experiment, there may be financial conflicts of interest related to funding. Parapsychological research is financed mostly by private grants and donations. I suspect that receiving such moneys is closely related to being able to obtain positive results. However, if or how much such factors correlate to positive results is, of course, unknown.
More serious are perhaps the religious or spiritual convictions of parapsychologists. I have previously analyzed what one can conclude from the ganzfeld experiments and found that it does not allow conclusions about the correctness of any world-view. Nevertheless parapsychologists argue this and insofar may be strongly committed to obtaining certain results.

Fans of parapsychology and parapsychologists often take it for granted that people will go to any length, even falsifying their research to uphold their world-view. Of course, that is an accusation only made at people who are not able to reproduce parapsychological results.
I, personally, do not know that anyone actually does that and rather suspect it to be rare outside of parapsychology. However, those accusations make it clear that some parapsychologists regard this as a completely obvious and natural thing to do. Which in turn makes me suspect that they themselves may be prone to such behavior.

Such considerations are not evidence-based, though, and as such I will not take them into account when assessing the evidence. One could point to past failures of parapsychology as a sort of indirect evidence for such problems and, of course, many people do that in practice when they consider parapsychology to be ridiculous. Eventually that is not different in principle from displaying knee-jerk suspicion of industry-funded studies.

To be continued…