Academic publishing versus retraction, or: how much Twitter knows about the market

Home > Uncategorized > Academic publishing versus retraction, or: how much Twitter knows about the market

Academic publishing versus retraction, or: how much Twitter knows about the market

July 27, 2015 Cathy O'Neil, mathbabe

Papers have mistakes all the time. If they’re smallish mistakes that don’t threaten the main work, often times the author is told to write an erratum, which the academic journal publishes in a subsequent volume. Other times the problems are more substantial, and might deserve the paper to be retracted altogether.

For example, if a paper is found to have fraudulent data, retraction is called for. Even when the claims made are outlandish, implausible, and unreproducible, but the authors hadn’t been intentionally fraudulent, there still may be just cause to seriously question their claims and retract. On the other hand, if a paper that was once deemed cutting edge and new is, in retrospect, not very innovative at all, then typically no retraction is called for; the paper is simply ignored. When exactly retraction happens, and how, probably depends on the journal, and even the editor.

Today I want to tell you a story in which that process seems to have gone badly wrong.

Elsevier, the academic publishing giant owns a journal called the Journal of Computational Science (JoCS) which published a paper called Twitter Mood Predicts the Stock Market (preprint version here) back in 2010. It got a lot of press, and even more, and according to Google Scholar has been cited 1300 times. According to media reports, the paper showed that Twitter, when it was enhanced with emotional tags, was able to predict the Dow Jones Industrial Average with an accuracy of 87% (whatever that means).

Full disclosure: I haven’t read the paper, but even so I don’t believe the results of this paper. People in hedge funds have been trolling for signal in all sorts of news and social media text-based ways for a long while, and there’s simply no way that they would have ignored such a strong signal all the way into 2008. If it was real, they wouldn’t have ignored it, and it would have faded. But I also don’t think it’s so real either.

Anyway, that’s my personal intuition about this, but I could be wrong! That’s what’s cool about academic publishing, right? That we could just be super wrong and people can say what they think and then we get to have this open conversation?

Well, sometimes. What actually happened here is that a bunch of people tried to replicate these results, which was harder because suddenly Twitter started charging lots of money for their data, and a hedge fund also tried the Twitter strategy that was similar to the one outlined in the paper, but everyone lost money*.

After a while, one of these frustrated would-be traders, who we will call LW, decides to write a letter to the editor complaining about the original paper. He even blogged about his letter here. In his letter he had two complaints. First, that the results were consistent with datamining, which is to say that there’s statistical evidence the authors cherry picked their data. Second, that if the results were true, they would violate the “Efficient Market Hypothesis,” and would surprise a bunch of traders with many decades of experience.

So far, so good. A paper is published, people are complaining that the results are wrong or extremely implausible. This is what academic publishing is for.

Here’s what happens next. The editor sends out the letter to reviewers. Two out of 3 of the reviewers respond, and I’ve got a copy their responses. The first reviewer is enthusiastic about doing something – although whether that means retracting the Twitter paper or publishing the complaint letter in the “Letter To The Editor” section is not clear – and uses the phrase “The original paper’s performance claims are convincingly shown to be severely exaggerated.” That first reviewer has minor requests for modifications.

The second reviewer is less enthusiastic but still thinks there is merit to the complaint letter. The second reviewer is dubious as to whether the original article should be withdrawn, but is clearly also skeptical of the stated claims. Finally, the second reviewer suggests that the original authors should be given a chance to respond before their article is retracted.

At this point, the editor writes to the complaint letter writer LW and says, you need to modify your letter, at which time I’ll “reconsider my decision.” The editor doesn’t say whether that decision is to retract the paper or to publish the letter.

So far, still so good. But here’s where things get very weird. After modifying the letter, LW sends it back to the editor, who soon comes back with another review, and importantly, a decision not to take further action. Here are some important facts:

The new review is scathing, passionate, and very long. Look at it here.
The new review has a name on it – possibly left there by accident – it’s the author of the original paper!
Perhaps this was intentional? Did the editor want to give the original author a chance to defend his work?
In the editor’s letter, he states “Reviewers’ comments on your work have now been received. You will see that they are advising against publication of your work. Therefore I must reject it.”
The way that was phrased, it doesn’t sound like the editor was acknowledging that this was not an unbiased reviewer, but was in fact one of the original authors.
In any case, before the final reviewer weighed in, it looked like the reviewers had been suggesting publication of the letter at the very least, possibly with the chance for another reaction letter from the author. So this author’s review seems to have been the deciding vote.
You can read more about the details here, on the complaining letter writer’s blog.

What are the standards for this kind of thing? I’m not sure, but I’m pretty certain that asking the original author to be the deciding vote on whether a paper gets retracted isn’t – or should not be – standard practice.

To be clear, I think it makes sense to allow the author to respond to the complaints, but not at this point in the process. Instead, the decision of whether to publish the letter should have been made, with the help of outside reviewers, and if it was decided to publish the letter, the original author should have been given a chance to compose a rebuttal to be published side by side with the complaint.

Also to be clear, I’m not incredibly sympathetic with someone trying to make money off of a published algorithm and then getting pissed when they lose money instead. I’m willing to admit that more than one of these parties is biased. But I do think that the process over at Elsevier’s Journal of Computational Sciences needs auditing.

* Or at least the ones that are talking. Maybe other traders are raking it in but aren’t talking?

Categories: Uncategorized

Comments (15)

deaneyang

July 27, 2015 at 3:26 pm

I don’t see anything here that warrants any actions by the journal. As far as I can tell from your blog entry (I have not clicked on any of the links), the paper makes no claims about being able to make money. All it does is report the 87% accuracy rate of something and there is no evidence that anything was fabricated or that the description of what the authors did is incorrect. So to me it is at worst a bad paper, and perhaps the original referees and the handling editor did not judge it properly.

If the journal routinely publishes letters commenting on the articles, then publishing one of the letters you describe seems appropriate but not obligatory.

LikeLike
- Cathy O'Neil, mathbabe
  
  July 27, 2015 at 3:28 pm
  
  Agree to disagree. They shouldn’t have asked an author to be a reviewer.
  
  LikeLike
  - deaneyang
    
    July 27, 2015 at 3:57 pm
    
    I didn’t really understand the handling of the letter. Are letters usually refereed, too? There, I agree that it would have been more appropriate to publish the letter (maybe unrefereed?) along with a response by the author.
    
    LikeLike
    - mlachans
      
      July 28, 2015 at 11:44 pm
      
      This is, as far as I can tell, the standard approach to these sort of things.
      
      LikeLike
  - Arturo Magidin
    
    July 28, 2015 at 12:06 pm
    
    Absolutely; having the author be a reviewer is a serious breach; review should be done by non-interested parties. It would make sense to ask the author for a response after a decision has been made (yea or nay).
    
    Like deaneyan, however, I am puzzled on why a letter would be sent out for refereeing/review, unless the person writing it wasn’t writing a “letter” but was rather writing a kind of “third party corrigendum”.
    
    LikeLike
  - n8chz
    
    July 29, 2015 at 5:44 pm
    
    It seems to replace peer review with self review. Or is there some kind of “double jeopardy” protection when it comes to peer review?
    
    LikeLike
art2science

July 27, 2015 at 10:42 pm

Dear MB,
My data mining friends keep insisting that they have found ways to predict stock prices. My finance friends, who have spent decades on this topic, keep pointing out that they are making errors. So any such predictions that are not published in finance journals, or at least reviewed by empirical finance faculty, are automatically highly suspect.
In this case, given the black box (and likely over-fitting) of their neural network, the only useful test of their results is the validation period, which consists of only 15 days (Dec. 1 to Dec. 19, minus weekends). If their methods were actually useful, they would have 1) extended the validation period by many months since it is now 7 years after the original data, and 2) tried a rolling validation. I don’t see such follow-up publications, so we can guess that they tried and failed to extend the data series.
But in any case, both sides are entitled to present their evidence in publication. Unfortunately LW does not include his original letter, which is odd. But he should publish it in some other journal, and let the chips fall where they may.
(Replication studies are now in much greater favor than a few years ago, of course.)

LikeLike
- mlachans
  
  July 28, 2015 at 1:44 pm
  
  In fact, LW’s critique (from which you can infer the content of the letter) is public and can be found here http://sellthenews.tumblr.com/post/21067996377/noitdoesnot and here: http://sellthenews.tumblr.com/post/22334483882/derwents-performance. I have seen the critique all the material contained in it is derived from these two posts.
  
  One overlooked aspect to all this is that the paper in question is the most cited ever in the Journal of Computational Science (JoCS) by an order of magnitude. Retracting it would move JoCS from third-tier to no-tier in terms of impact factors. The Editor was strongly incentivized to use any means available to keep the paper from being retracted, whatever its errors.
  
  In some ways, Cathy is too generous on the original authors, who were running a consulting firm based on the technology developed in the paper at the time they were asked to review Letter.
  
  LikeLike
  - mlachans
    
    July 28, 2015 at 10:11 pm
    
    “I have seen the critique all the material contained in it is derived from these two posts.”
    
    should be:
    
    “I have seen the critique and all the material contained in it is derived from these two posts.”
    
    and
    
    “In some ways, Cathy is too generous on the original authors, who were running a consulting firm based on the technology developed in the paper at the time they were asked to review Letter.”
    
    should be:
    
    “In some ways, Cathy is too generous to the original authors, who were running a consulting firm based on the technology developed in the paper at the time they were asked to review LW’s Letter.”
    
    LikeLike
ElsevierSucks

July 28, 2015 at 12:18 pm

It is unusual but not unheard of for an editor to send a letter out for refereeing, but choosing the author of the original paper as a reviewer is absolutely ridiculous and completely unheard of — it’s comparable to asking an author to referee their own paper!

I am not familiar with this journal, but on the basis of this editorial decision I would put it in the category of Chaos, Solitons, and Fractals, i.e., a junk journal with absolutely no academic merit whatsoever. This is yet another reason to join the boycott of Elsevier if you have not already done so: http://thecostofknowledge.com/.

I’m not saying that every Elsevier journal is junk, but a company that facilitates and profits from this kind of editorial behavior has no place in the world of academic publishing. If you need any further evidence of this, I note that Elsevier has recently been boycotted by the entire university system of its home country.

You may want to consider this the next time you are asked to serve as a referee or offered an editorial position at an Elsevier journal.

LikeLike
captain obvious

August 2, 2015 at 6:00 pm

Sorry, what is the actual crime here? Not identifying one of the “reviewers” (the one who was an author of the paper) when sending their comments to the letter-writer “LW”? I don’t see how that disclosure is owed to the outside critic writing the letter to the editor, or how it compromises the decision process if the critic is not told who that reviewer was. If anything, concealing the authors’ identity may obtain a fuller response from them, at a stage when the important thing is to determine facts.

LW’s criticisms stand or fall on their own, regardless of who he believes is answering them, and the editor knows very well that the authors have an interest in spiking LW’s letter. It would be improper to completely exclude the authors from the process, except in cases where there is objectively a problem independent of anything the authors might say (such as severely wrong statements in the article, clear misconduct such as plagiarism or fake citations, or undisclosed conflicts of interest). This wasn’t such a case. There were several voices, including LW and the initial reviewers, involved in the decision and the editors were in no way solely reliant on the authors.

LikeLike
rando

August 2, 2015 at 7:44 pm

I am unsure if you were genuinely confused or writing in bad faith (Dr. Bollen, is that you? Dr. Sloot?) because nearly everything in your second paragraph is question-begging or simply wrong.

(“…severely wrong…”)

1. There WERE unambiguous, fatal errors in the paper involving multiple comparison bias. Anyone with training in statistics to at least the intermediate level could have caught them. Correction for these biases eliminates all statistically significant results from Table 2 and the bulk of the results from the non-linear tests. There are several other such errors in the paper.

(“…conflicts of interest..”)

2. The authors DID have an undisclosed conflict of interest because they had a consulting firm that relied on the technology that LW was critiquing (Guidewave Analytics).

“…in no way solely reliant on the authors…”

3. Two of the three initial reviewers supported LW (with one reviewer declining to submit their review) and only one letter, that submit by the authors of the paper under critique, recommended against publishing. Thus, it is fair to say that the decision not to publish the letter was “solely reliant on the authors” in the sense that any *argument* convincing the editor not to publish had to have come from them.

LikeLike
- captain obvious
  
  August 3, 2015 at 9:54 pm
  
  Fun stuff, this Internet. I’m not connected to anyone or anything even remotely resembling an interested party to this matter (such as the authors of the paper, journal editors, publishers, online commenters, you name it). Are you? Because your comments have that signature combination of confused distractions and overconfident dismissals that suggests an unhealthy level of personal involvement.
  
  Going point by point.
  
  “1. There WERE unambiguous, fatal errors in the paper involving multiple comparison bias. Anyone with training in statistics to at least the intermediate level could have caught them. Correction for these biases eliminates all statistically significant results from Table 2 and the bulk of the results from the non-linear tests. There are several other such errors in the paper.”
  
  Under LW’s null of “it’s all random uniform [0,1]-valued noise generating the distribution of p-values” the odds are infinitesimal of getting the extremely strong correlation between p-values and columns seen in table 2. One column (Calm) does rather well, and at least two other columns have extremely consistent p-values along the different rows. This is not at all what one would see if it’s just random noise. The authors (apparently Reviewer #4) pointed out part of that, about Calm, in their letter, and the editor was probably bright enough to understand that they had a point, which LW’s criticism had not adequately taken into account (say by actually simulating his null hypothesis, or doing the binomial distribution calculations algebraically).
  
  Do tell what other errors in the paper you have in mind. Or at least a demonstration that this error with table 2 is a product of statistical confusion other than your own.
  
  2. “undisclosed conflict of interest” : There is no known undisclosed conflict (or plagiarism, fake content, bogus affiliations, or other objective deception) involved in the decision to publish the paper, and LW didn’t allege any in his letter. That’s the only “conflict of interest” that would have been relevant here. After publication, everyone knows that the authors are as biased and conflicted as anyone could possibly be in relation to decisions about retracting their paper, and it is always presumed that this includes a financial interest whether or not they commercialized the work. The editor would have been right to retract the paper or to publish LW’s letter without involving the authors, if there had there been an objective breach of basic procedural standards that can be checked without asking the authors. But there wasn’t, and the editor was right to get the authors’ response before deciding.
  
  3. “solely reliant” : your comments are just factually wrong. The editors had plenty of additional non-author input *known* to be available to them, plus an unknown amount of additional input not disclosed to LW, plus the ability to obtain as much more as they needed from as many parties as necessary, in order to reach their decision. This is very, very far from being solely reliant on the authors. (And the arguments convincing the editors did not “have to have come from the authors”. The authors are the only supporters of the paper that LW knows or surmised to be involved in the process, but that does not mean others were not involved.)
  
  If the editors found the authors’ argument sufficient, they did so having additional contrary input from LW and at least 2 reviewers. Given that some of the authors’ arguments were on point and correct (eg., they don’t propose a trading strategy, the Calm column is anomalously good under LW’s null, etc) and that the editor could see that, I’m still not seeing the crime by the editors. Feel free to point one out.
  
  LikeLike
  - rando
    
    August 11, 2015 at 12:36 pm
    
    @Captain Obvious,
    I am *personally* involved in that I am an academic, financial consultant, and care about seeing bad work corrected. I am particularly disappointed that BMZ have been awarded lucrative government contracts on the basis of a mutual ignorance of intermediate-level statistics.
    
    To respond:
    
    1. None of the p-values reported in the table are correct because they have not been corrected for multiple comparison bias. Without correcting for this bias, it does not make sense to identify patterns. Any out-of-the-box correction eliminates all statistical significance in the linear results and the burden of proof is on BMZ to propose a multiple hypothesis correction under which the original results survive. Otherwise, they’re simply identifying figures in the clouds. As for “consistency”, most of the columns do consistently bad, so this remark is disingenuous at best.
    
    Bollen and Mao’s reply displays characteristic ignorance and consists of:
    A. A sociological critique of multiple hypothesis testing
    B. Embarrassing self-praise with regards to “perfect patterns” and all that.
    C. A fair attack on the Bonferroni statistic which is irrelevant because Lowly Werm used the Holm statistic.
    D. An ad hoc appeal to a Bayesian framework
    
    Ironically for someone who sells themselves as a big data analyst, if BMZ had identified a real effect they could overcome any hurdle imposed by multiple hypothesis testing by simply using more data (which they had access to at the time). Of course, we both know that the reason they did not do this is because the original result was data dredged.
    
    The good people at Indiana and GNIP (!) had actually warned BMZ about these problems ahead of time, as Bollen himself remarks here: https://www.youtube.com/watch?v=n0it1M0vILs
    
    The absurdity of the result obtained by BMZ has nothing to do with whether or not BMZ actually propose a trading strategy. Claiming that the results of “Twitter mood…” are valid because BMZ doesn’t propose a trading strategy would be like a physicist saying its ok to violate the 2nd law of thermodynamics in theory as long as you don’t propose an engine using that theory. It’s a Bollen-esque non-sequitur.
    
    2. If you can’t see why someone with a financial stake in the outcome of an academic publication shouldn’t be reviewing that publication, then I guess we’ll never see eye-to-eye on this point; I think the readers should be able to see who is right and wrong on this point.
    
    3. Tell me Captain Obvious, how do you know? 🙂
    
    LikeLike
n8chz

August 4, 2015 at 5:01 pm

I don’t entirely understand this, but somehow it seems relevant:

[Retraction Watch] published our first news feature on fake peer reviews, in Nature, followed by a look in Nautilus at how the world’s retraction record holder, Yoshitaka Fujii, was caught.

LikeLike