Unintended Consequences of Journal Ranking

Home > data science, modeling, open source tools, statistics > Unintended Consequences of Journal Ranking

Unintended Consequences of Journal Ranking

March 8, 2013 Cathy O'Neil, mathbabe

I just read this paper, written by Björn Brembs and Marcus Munafò and entitled “Deep Impact: Unintended consequences of journal rank”. It was recently posted on the Computer Science arXiv (h/t Jordan Ellenberg).

I’ll give you a rundown on what it says, but first I want to applaud the fact that it was written in the first place. We need more studies like this, which examine the feedback loop of modeling at a societal level. Indeed this should be an emerging scientific or statistical field of study in its own right, considering how many models are being set up and deployed on the general public.

Here’s the abstract:

Much has been said about the increasing bureaucracy in science, stifling innovation, hampering the creativity of researchers and incentivizing misconduct, even outright fraud. Many anecdotes have been recounted, observations described and conclusions drawn about the negative impact of impact assessment on scientists and science. However, few of these accounts have drawn their conclusions from data, and those that have typically relied on a few studies. In this review, we present the most recent and pertinent data on the consequences that our current scholarly communication system has had on various measures of scientific quality (such as utility/citations, methodological soundness, expert ratings and retractions). These data confirm previous suspicions: using journal rank as an assessment tool is bad scientific practice. Moreover, the data lead us to argue that any journal rank (not only the currently-favored Impact Factor) would have this negative impact. Therefore, we suggest that abandoning journals altogether, in favor of a library-based scholarly communication system, will ultimately be necessary. This new system will use modern information technology to vastly improve the filter, sort and discovery function of the current journal system.

The key points in the paper are as follows:

There’s a growing importance of science and trust in science
There’s also a growing rate (x20 from 2000 to 2010) of retractions, with scientific misconduct cases growing even faster to become the majority of retractions (to an overall rate of 0.02% of published papers)
There’s a larger and growing “publication bias” problem – in other words, an increasing unreliability of published findings
One problem: initial “strong effects” get published in high-ranking journal, but subsequent “weak results” (which are probably more reasonable) are published in low-ranking journals
The formal “Impact Factor” (IF) metric for rank is highly correlated to “journal rank”, defined below.
There’s a higher incidence of retraction in high-ranking (measured through “high IF”) journals.
“A meta-analysis of genetic association studies provides evidence that the extent to which a study over-estimates the likely true effect size is positively correlated with the IF of the journal in which it is published”
Can the higher retraction error in high-rank journal be explained by higher visibility of those journals? They think not. Journal rank is bad predictor for future citations for example. [mathbabe inserts her opinion: this part needs more argument.]
“…only the most highly selective journals such as Nature and Science come out ahead over unselective preprint repositories such as ArXiv and RePEc”
Are there other measures of excellence that would correlate with IF? Methodological soundness? Reproducibility? No: “In fact, the level of reproducibility was so low that no relationship between journal rank and reproducibility could be detected.
More about Impact Factor: The IF is a metric for the number of citations to articles in a journal (the numerator), normalized by the number of articles in that journal (the denominator). Sounds good! But:
For a given journal, IF is not calculated but is negotiated – the publisher can (and does) exclude certain articles (but not citations). Even retroactively!
The IF is also not reproducible – errors are found and left unexplained.
Finally, IF is likely skewed by the fat-tailedness of citations (certain articles get lots, most get few). Wouldn’t a more robust measure be given by the median?

Conclusion

Journal rank is a weak to moderate predictor of scientific impact
Journal rank is a moderate to strong predictor of both intentional and unintentional scientific unreliability
Journal rank is expensive, delays science and frustrates researchers
Journal rank as established by IF violates even the most basic scientific standards, but predicts subjective judgments of journal quality

Long-term Consequences

“IF generates an illusion of exclusivity and prestige based on an assumption that it will predict subsequent impact, which is not supported by empirical data.”
“Systemic pressures on the author, rather than increased scrutiny on the part of the reader, inflate the unreliability of much scientific research. Without reform of our publication system, the incentives associated with increased pressure to publish in high-ranking journals will continue to encourage scientiststo be less cautious in their conclusions (or worse), in an attempt to market their research to the top journals.”
“It is conceivable that, for the last few decades, research institutions world-wide may have been hiring and promoting scientists who excel at marketing their work to top journals, but who are not necessarily equally good at conducting their research. Conversely, these institutions may have purged excellent scientists from their ranks, whose marketing skills did not meet institutional requirements. If this interpretation of the data is correct, we now have a generation of excellent marketers (possibly, but not necessarily also excellent scientists) as the leading figures of the scientific enterprise, constituting another potentially major contributing factor to the rise in retractions. This generation is now in charge of training the next generation of scientists, with all the foreseeable consequences for the reliability of scientific publications in the future.

The authors suggest that we need a new kind of publishing platform. I wonder what they’d think of the Episciences Project.

Categories: data science, modeling, open source tools, statistics

Comments (26)

medicalquackblog

March 8, 2013 at 9:39 am

Good thoughts and if this was not an issue we wouldn’t need a site called “Retraction Watch”. Ivan Oransky started it I want to say a year ago, but there’s no explanation needed if a site as such exists to reaffirm your article today. Marketing has definitely had impact in areas where it really doesn’t belong in the formats we see out there today.

LikeLike
Greg Taylor

March 8, 2013 at 10:12 am

If we got rid of journal ranks, academics would have to read their colleagues research to make promotion and tenure recommendations. I’m not sure we’re ready for that!

LikeLike
- JSE
  
  March 8, 2013 at 12:22 pm
  
  You’re sort of joking, but in a way you’re not, right? We _need_ heuristics in order to assess other people’s science — we are not all going to become experts in every subject so that we can carry this out by direct inspection without resort to indirect measures. As Cathy always says, “don’t model” is not one of the choices — hopefully, “use a better model instead of a worse one” still is.
  
  LikeLike
Zathras

March 8, 2013 at 1:23 pm

“Journal rank is bad predictor for future citations for example.”

This definitely leads more. I have wondered this for a long time, but I have not seen it claimed. One issue with citation counts is the issue of self-citations. “Lower” publications are replete with examples of people who publish prodigously, and who constantly cite every paper they wrote previously. Any real analysis of citation count therefore needs to exclude self-citations.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 8, 2013 at 1:27 pm
  
  True! Raw citation counts are an easily manipulable metric of importance by themselves. It’s entirely conceivable that people who don’t manage to get their papers in the higher-ranked journals do more gaming of the citation model afterwards.
  
  LikeLike
  - Michael
    
    March 8, 2013 at 2:48 pm
    
    Beyond self-citations, how could we differentiate reciprocal citation collusion from true responses?
    
    LikeLike
Henry Cohn

March 8, 2013 at 4:47 pm

By the way, the “only the most highly selective journals such as Nature and Science come out ahead over unselective preprint repositories such as ArXiv and RePEc” part of the article is nonsense. Google Scholar compares publishing venues by five-year h-index, which heavily rewards sheer size. Comparing journals in this way makes no sense, and comparing journals with the arXiv is ridiculous.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 8, 2013 at 5:10 pm
  
  Wait explain. Isnt h-score different?
  
  LikeLike
  - Henry Cohn
    
    March 8, 2013 at 5:42 pm
    
    The five-year h-index is the largest n such that n papers published in the last five years have each received at least n citations. In other words, it is based just on the most highly cited papers, with no normalization for journal size. The h-index was intended to compare people, in which case it makes some sense to penalize people for not publishing enough. (It doesn’t make a lot of sense, but it makes about as much sense as any single number is likely to.) It’s not uncommon for two journals to differ in size by a factor of 20 or more, in which case it is much easier for the larger journal to get a high h-index. This is not at all what we are looking for with journal rankings.
    
    This arises in practice in pretty dramatic ways. For a numerical case study of some actual journals, see http://scholarlykitchen.sspnet.org/2012/04/24/googles-new-scholar-metrics-have-potential-but-also-prove-problematic/#comment-46943
    
    LikeLike
    - Cathy O'Neil, mathbabe
      
      March 9, 2013 at 9:11 am
      
      I agree that makes absolutely no sense. By that token any arXiv would win just by sheer numbers. Dumb.
      
      LikeLike
- peace
  
  March 9, 2013 at 7:35 am
  
  Respectfully Mathbabe, in the chart of retractions x impact factor on page 31 of the original paper, both Science and Nature are high on retractions. Science is even above the curve of retractions/IF.
  
  Thanks for bringing our attention to this paper! and the open access, peer-reviewed Episciences project which is similar to plos.org
  
  LikeLike
  - Cathy O'Neil, mathbabe
    
    March 9, 2013 at 9:12 am
    
    Wait, did I say something to contradict that fact?
    
    LikeLike
    - peace
      
      March 9, 2013 at 9:18 pm
      
      The statement including the phrase “Nature and Science come out ahead” could be read two ways. The phrase “come out ahead” could mean that those two journals were above the fray of retractions and avoided retractions. I now see that “come out ahead” means they came out ahead by attaining more citations, but also are forced to retract more also. Thank you again.
      
      LikeLike
peace

March 9, 2013 at 7:50 am

The “file drawer” problem also biases and brings into question the validity and strength of published research findings. Insignificant results (p > .10, etc.) remain unpublishable in “file drawers” but these findings often contradict published work. Some people attempt to collect and make this data accessible (especially for meta-analyses).

LikeLike
- Cathy O'Neil, mathbabe
  
  March 9, 2013 at 9:12 am
  
  Yes this is referred to as the “publication bias” in the above article.
  
  LikeLike
Benoit Essiambre

March 9, 2013 at 8:58 am

In all the university labs I’ve worked in, fudging statistics by picking and choosing the methods that gave you the most interesting results was common practice. It made me sick.

I’m of the opinion that we can no longer trust most research coming out of universities. Until university positions and research grants stop being given out based on prior research results we won’t be able to trust the research performed there.

There are millions of dollars on the line for those involved. It is the difference between a well paid career and a life of destitution while being a slave to huge student debts. It’s no wonder researchers ignore flaws in their research when it suits them.

I believe strongly that teaching positions and research grants should be given out based on criterions that are only incidental to research.

Evaluate profs and grants based on:
1. math skills (test the applicants)
2. domain knowledge (test the applicants)
3. motivation and leadership
4. prior and current research proposals (but ignore results and how often they lead to publication).
5. written and oral communication skills

Universities should not rely on journals to evaluate their professors. This corrupts the whole system. Journals have different goals. They want to publish well done research with interesting results. Universities should hire researchers that do well done research with interesting _questions_ regardless of the results.

If universities keep giving out jobs based on prior results, they are going to keep getting researchers that ignore biases to advance their career. It’s as simple as that.

LikeLike
Sara K.

March 9, 2013 at 8:28 pm

This reminds me of stuff that my mother told be about when she was in the sciences – for example, she wasted a lot of time when she was working on her Ph.D. thesis because she couldn’t reproduce a classmate’s results, and when she asked her classmate about it (who had already graduated), the classmate cut off all contact. My mother doesn’t blame her classmate – my mother said that if her classmate hadn’t … done something to the results … she (the classmate) would not have been able to get her degree. Instead she blames the system which pressures people to manipulate results. And this was back in the 1970s.

My mother’s last science-job was at a science-publishing company, and while she says that ethical problems was one of the main reasons she left that job, she has never wanted to talk about that in detail.

Speaking of science and ethical failure, she has also worked for BP (though not in the Gulf of Mexico), and one her jobs to was to help make environmental impact reports (i.e. how can the drilling be done safely). Many years after having left BP, she found out by reading a newspaper that BP had basically ignored her environmental impact reports. Yet her face and voice do not seem as horrified when talking about BP than when talking about why she left the publishing job, which makes me wonder about what was going on there.

Then again, without these ethical failures in science, I might not exist – she might have continued her career in science instead of becoming a mother (based on what she’s said, I think she probably would have never concurrently pursued science and motherhood).

LikeLike
Alex SL

March 9, 2013 at 10:08 pm

Using IF to evaluate scientists is completely nonsensical – but even the people who developed and calculate the IF come right out and say that. It is meant to help a library decide what journals to subscribe to, and maybe to help me decide where to submit a paper, but for evaluating somebody there are simpler and more reasonable ways: how often are they being cited, what is their h index, etc. Of course, the best way would be to read their papers!

At least judging from my area, I would say that the best journals, as in the sense of those publishing the really useful and important stuff and those having the most critical and helpful review process are the ones in the middle, with IF from ca. 2 to ca. 6. Too many below an IF of 1 publish a lot of poor quality research along with good but unimportant papers; all above ca. 6 appear to base their decision mostly on the flashiness of the topic and the status of the authors as opposed to quality. I cannot tell how often I have seen Nature or Science papers that merely ~~plagiarize~~ review an advance somebody else published in Systematic Biology or similar a few months earlier, or how often we have read papers from PNAS in our journal club only to find that their methodology completely fell apart under even slight scrutiny…

LikeLike
Alex SL

March 10, 2013 at 1:07 am

Reading again through all the very negative comments about the quality of the work coming out of science, I feel I should add something. Publication pressure or no, all scientists are human and have always been. You always got cranks, you always got people operating under confirmation bias, you always got people trying to suppress ideas that do not fit their preconceived notions, and so on.

It still works because while we are all irrational about something, on each individual topic most of us are rational. In the end it works because science is a social activity, and people will try to replicate results or test the same idea from a different angle, and then it turns out that something that was published earlier was wrong. In the end, given enough time and enough people working in the same field, we will get it right.

That does not mean we should not reduce incentives to cheat, but it means that all that doom and gloom is a bit over the top.

LikeLike
Ken

March 10, 2013 at 1:40 am

Editors are also notorious for manipulating IF. The most common I have seen are:

– In the instructions to authors, and also as added commentary to authors who have to revise their submission (i.e. most authors), using an argument like: “The journal encourages the publication of papers that continue themes and arguments that already exist in previous issues of the journal.” – In other words, cite articles from previous issues of this journal, and you have a greater chance of being published.

– Writing lovely and long editorials that add nothing of value whatsoever, EXCEPT that they cite many of the papers in the current issue. Therefore, as the journal issue is published, just about every paper in that issue has already been cited once. Two or three editorials that happen to mention the same paper, and your IF for that issue is already growing before anybody has even read it.

And more – and we wonder why academics simply laugh at IF, and cry at administrators who treat IF with even scant respect.

LikeLike
Ismael Rafols

March 10, 2013 at 7:31 am

Another unintended consequence of journal ranking is the suppression of interdisciplinary research. This is because journal rankings are in practice created with disciplinary perspectives: the best journal according a a given discipline (or disciplines).

The empirical evidence of this bias is published in “Research Policy” journal.
http://www.sciencedirect.com/science/article/pii/S0048733312000765

Further information (and postprint) available at:
http://www.interdisciplinaryscience.net/topics/rankingsidr

LikeLike
Dennis Eckmeier

March 10, 2013 at 9:21 am

First, it would have been nice to name the authors in your blog. Further I find the outcomes are not really news. I am pretty sure I’ve read most of them before in other publications. I assume they are among thos cited byt the study at hand.

However, I agree that self-monitoring scientific methods and mechanisms for peer review and assessment of scientists are necessary to ensure the quality control that is absolutely crucial for scientific progress.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 10, 2013 at 9:25 am
  
  Good point, I’ve added the authors’ names and links to their webpages.
  
  LikeLike
lkafle

March 11, 2013 at 10:39 pm

Reblogged this on lava kafle kathmandu nepal.

LikeLike
brembs

March 15, 2013 at 1:46 pm

The episciences project is great and we cite it in the current version of the article (it’s under review right now and we’re working in the reviewer comments as I’m typing this).

LikeLike
- Cathy O'Neil, mathbabe
  
  March 15, 2013 at 1:52 pm
  
  Woohoo! Cool, and thanks for commenting.
  
  Cathy
  
  LikeLike