LEHRER: What is a “voodoo correlation”?
Late last year, Ed Vul, a graduate student at MIT working with neuroscientist Nancy Kanwisher and UCSD psychologist Hal Pashler, prereleased “Voodoo Correlations in Social Neuroscience” on his website. The journal Perspectives in Psychological Science accepted the paper but will not formally publish it until May.
The paper argues that the way many social neuroimaging researchers are analyzing their data is so deeply flawed that it calls into question much of their methodology. Specifically, Vul and his coauthors claim that many, if not most, social neuroscientists commit a nonindependence error in their research in which the final measure (say, a correlation between behavior and brain activity in a certain region) is not independent of the selection criteria (how the researchers chose which brain region to study), thus allowing noise to inflate their correlation estimates. Further, the researchers found that the methods sections that were clearing peer review boards were woefully inadequate, often lacking basic information about how data was analyzed so that others could evaluate their methods.
In the paper, Vul and his coauthors cite specific studies, many of which were published in leading journals such as Nature and Science, going so far as to call some of the studies “entirely spurious.” Suddenly, a number of researchers found themselves under attack. The paper began filling neuroscientists’ inboxes. Two groups of neuroimaging scientists, shocked by the speed with which this paper was being publicly disseminated, wrote rebuttals and posted them in the comments section of several blogs, including Begley’s. Vul followed up in kind, linking to a rebuttal of the rebuttals in the comment sections of several blogs. This kind of scientific discourse — which typically takes place in the front matter of scholarly journals or over the course of several conferences — developed at a breakneck pace, months before the findings were officially published, and among the usual chaos of blog comments: inane banter, tangents, and valid opinions from the greater public.
Find links to the pro/con rebuttals to Vul in the gray sidebar to the Seed article. There is also a link to a .pdf file of Vul’s paper.
Now back to the interview of Vul by Jonah Lehrer in Scientific American:
LEHRER: What is a “voodoo correlation”?
VUL: We use that term as a humorous way to describe mysteriously high correlations produced by complicated statistical methods (which usually were never clearly described in the scientific papers we examined)—and which turn out unfortunately to yield some very misleading results. The specific issue we focus on, which is responsible for a great many mysterious correlations, is something we call “non-independent” testing and measurement of correlations. Basically, this involves inadvertently cherry-picking data and it results in inflated estimates of correlations.
To go into a bit more detail:
An fMRI scan produces lots of data: a 3-D picture of the head, which is divided into many little regions, called voxels. In a high-resolution fMRI scan, there will be hundreds of thousands of these voxels in the 3-D picture.
When researchers want to determine which parts of the brain are correlated with a certain aspect of behavior, they must somehow choose a subset of these thousands of voxels. One tempting strategy is to choose voxels that show a high correlation with this behavior. So far this strategy is fine.
The problem arises when researchers then go on to provide their readers with a quantative measure of the correlation magnitude measured just within the voxels they have pre-selected for having a high correlation. This two-step procedure is circular: it chooses voxels that have a high correlation, and then estimates a high average correlation. This practice inflates the correlation measurement because it selects those voxels that have benefited from chance, as well as any real underlying correlation, pushing up the numbers.
One can see closely analogous phenomena in many areas of life. Suppose we pick out the investment analysts whose stock picks for April 2005 did best for that month. These people will probably tend to have talent going for them, but they will also have had unusual luck (and some finance experts, such as Nassim Taleb, actually say the luck will probably be the bigger element). But even assuming they are more talented than average — as we suspect they would be — if we ask them to predict again, for some later month, we will invariably find that as a group, they cannot duplicate the performance they showed in April. The reason is that next time, luck will help some of them and hurt some of them — whereas in April, they all had luck on their side or they wouldn’t have gotten into the top group. So their average performance in April is an overestimate of their true ability — the performance they can be expected to duplicate on the average month.
It is exactly the same with fMRI data and voxels. If researchers select only highly correlated voxels, they select voxels that “got lucky,” as well as having some underlying correlation. So if you take the correlations you used to pick out the voxels as a measure of the true correlation for these voxels, you will get a very misleading overestimate.
This, then, is what we think is at the root of the voodoo correlations: the analysis inadvertently capitalized on chance, resulting in inflated measurements of correlation. The tricky part, which I can’t go into here, was that investigators were actually trying to take account of the fact they were checking so many different brain areas—but their precautions made the problem that I am describing worse, not better!
VUL: The debate we have spurred is quite interesting to watch. At first some of the authors whose papers we criticized challenged our statistical point, but—for good reason–that line of argument doesn’t seem to have caught on. Right now, so far as I know, everyone seems to concede that the analysis used in these studies was not kosher, in the sense of providing correlation numbers that can be taken seriously. Instead, we are mostly hearing a couple of other arguments at this point.
One is that the correlation values themselves don’t really matter — it’s just the fact there is a correlation in a certain spot in the head that matters. I don’t agree with this observation at all, and we think the fact that many of these papers appeared in such high profile places is because editors were (justifiably) impressed with big effects. If one can account for, say, three quarters of individual differences in something important such as anxiety or empathy — obviously, that’s a real breakthrough, and it tells you not only where future research ought to look, but also where it shouldn’t. On the other hand, if it’s just 3 percent of the variance, that’s a whole lot less impressive, and may reflect much more indirect kinds of associations.
Read the full interview, if you like. [ link ]