Let’s experiment more

Home > data journalism, modeling, rant > Let’s experiment more

Let’s experiment more

April 15, 2014 Cathy O'Neil, mathbabe

What is an experiment?

The gold standard in scientific fields is the randomized experiment. That’s when you have some “treatment” you want to impose on some population and you want to know if that treatment has positive or negative effects. In a randomized experiment, you randomly divide a population into a “treatment” group and a “control group” and give the treatment only to the first group. Sometimes you do nothing to the control group, sometimes you give them some other treatment or a placebo. Before you do the experiment, of course, you have to carefully define the population and the treatment, including how long it lasts and what you are looking out for.

Example in medicine

So for example, in medicine, you might take a bunch of people at risk of heart attacks and ask some of them – a randomized subpopulation – to take aspirin once a day. Note that doesn’t mean they all will take an aspirin every day, since plenty of people forget to do what they’re told to do, and even what they intend to do. And you might have people in the other group who happen to take aspirin every day even though they’re in the other group.

Also, part of the experiment has to be well-defined lengths and outcomes of the experiment: after, say, 10 years, you want to see how many people in each group have a) had heart attacks and b) died.

Now you’re starting to see that, in order for such an experiment to yield useful information, you’d better make sure the average age of each subpopulation is about the same, which should be true if they were truly randomized, and that there are plenty of people in each subpopulation, or else the results will be statistically useless.

One last thing. There are ethics in medicine, which make experiments like the one above fraught. Namely, if you have a really good reason to think one treatment (“take aspirin once a day”) is better than another (“nothing”), then you’re not allowed to do it. Instead you’d have to compare two treatments that are thought to be about equal. This of course means that, in general, you need even more people in the experiment, and it gets super expensive and long.

So, experiments are hard in medicine. But they don’t have to be hard outside of medicine! Why aren’t we doing more of them when we can?

Swedish work experiment

Let’s move on to the Swedes, who according to this article (h/t Suresh Naidu) are experimenting in their own government offices on whether working 6 hours a day instead of 8 hours a day is a good idea. They are using two different departments in their municipal council to act as their “treatment group” (6 hours a day for them) and their “control group” (the usual 8 hours a day for them).

And although you might think that the people in the control group would object to unethical treatment, it’s not the same thing: nobody thinks your life is at stake for working a regular number of hours.

The idea there is that people waste their last couple of hours at work and generally become inefficient, so maybe knowing you only have 6 hours of work a day will improve the overall office. Another possibility, of course, is that people will still waste their last couple of hours of work and get 4 hours instead of 6 hours of work done. That’s what the experiment hopes to measure, in addition to (hopefully!) whether people dig it and are healthier as a result.

Non-example in business: HR

Before I get too excited I want to mention the problems that arise with experiments that you cannot control, which is most of the time if you don’t plan ahead.

Some of you probably ran into an article from the Wall Street Journal, entitled Companies Say No to Having an HR Department. It’s about how some companies decided that HR is a huge waste of money and decided to get rid of everyone in that department, even big companies.

On the one hand, you’d think this is a perfect experiment: compare companies that have HR departments against companies that don’t. And you could do that, of course, but you wouldn’t be measuring the effect of an HR department. Instead, you’d be measuring the effect of a company culture that doesn’t value things like HR.

So, for example, I would never work in a company that doesn’t value HR, because, as a woman, I am very aware of the fact that women get sexually harassed by their bosses and have essentially nobody to complain to except HR. But if you read the article, it becomes clear that the companies that get rid of HR don’t think from the perspective of the harassed underling but instead from the perspective of the boss who needs help firing people. From the article:

When co-workers can’t stand each other or employees aren’t clicking with their managers, Mr. Segal expects them to work it out themselves. “We ask senior leaders to recognize any potential chemistry issues” early on, he said, and move people to different teams if those issues can’t be resolved quickly.

Former Klick employees applaud the creative thinking that drives its culture, but say they sometimes felt like they were on their own there. Neville Thomas, a program director at Klick until 2013, occasionally had to discipline or terminate his direct reports. Without an HR team, he said, he worried about liability.

“There’s no HR department to coach you,” he said. “When you have an HR person, you have a point of contact that’s confidential.”

Why does it matter that it’s not random?

Here’s the crucial difference between a randomized experiment and a non-randomized experiment. In a randomized experiment, you are setting up and testing a causal relationship, but in a non-randomized experiment like the HR companies versus the no-HR companies, you are simply observing cultural differences without getting at root causes.

So if I notice that, at the non-HR companies, they get sued for sexual harassment a lot – which was indeed mentioned in the article as happening at Outback Steakhouse, a non-HR company – is that because they don’t have an HR team or because they have a culture which doesn’t value HR? We can’t tell. We can only observe it.

Money in politics experiment

Here’s an awesome example of a randomized experiment to understand who gets access to policy makers. In an article entitled A new experiment shows how money buys access to Congress, an experiment was conducted by two political science graduate students, David Broockman and Josh Kalla, which they described as follows:

In the study, a political group attempting to build support for a bill before Congress tried to schedule meetings between local campaign contributors and Members of Congress in 191 congressional districts. However, the organization randomly assigned whether it informed legislators’ offices that individuals who would attend the meetings were “local campaign donors” or “local constituents.”

The letters were identical except for those two words, but the results were drastically different, as shown by the following graphic:

Conducting your own experiments with e.g. Mechanical Turk

You know how you can conduct experiments? Through an Amazon service called Mechanical Turk. It’s really not expensive and you can get a bunch of people to fill out surveys, or do tasks, or some combination, and you can design careful experiments and modify them and rerun them at your whim. You decide in advance how many people you want and how much to pay them.

So for example, that’s how then-Wall Street Journal journalist Julia Angwin, in 2012, investigated the weird appearance of Obama results interspersed between other search results, but not a similar appearance of Romney results, after users indicated party affiliation.

Conclusion

We already have a good idea of how to design and conduct useful and important experiments, and we already have good tools to do them. Other, even better tools are being developed right now to improve our abilities to conduct faster and more automated experiments.

If we think about what we can learn from these tools and some creative energy into design, we should all be incredibly impatient and excited. And we should also think of this as an argumentation technique: if we are arguing about whether a certain method or policy works versus another method or policy, can we set up a transparent and reproducible experiment to test it? Let’s start making science apply to our lives.

Categories: data journalism, modeling, rant

Comments (11)

mb

April 15, 2014 at 7:43 am

Interesting, but you leave out the real gold standard – double blind randomized control studies. If the Swedish experiment succeeds I would doubt that it would translate to a replicable real world outcome – simply because the treatment and control groups know the purpose of the experiment and what their group is. Now if I knew slacking off for 2 hours would mean I have the possibility to work 2 hours less per day, I would slack off. Conversely if knew that working hard for 6 hours during the study would mean I might always work 6 hours, I would work hard for 6 hours. Most people would do the same. Then if that was not enough, once people became use to 6 hour work days, would they only be focused for 4? Replicable studies that translate to real world outcomes are actually quite hard when dealing with people. Just look at the problems with psychology research papers.

LikeLike
- Cathy O'Neil, mathbabe
  
  April 15, 2014 at 7:47 am
  
  Great points!
  
  LikeLike
Shecky R

April 15, 2014 at 8:11 am

To some extent I feel ALL experiments involving humans (either behavior or biology) are doomed because the variables are simply too many and complex for adequate control or interpretation… that doesn’t mean don’t do such experiments, but it does mean take any findings with a HUGE grain of salt.
In a similar vein, like everyone else, I heard that coffee was bad for me, then it was good for me, then it was bad for me, and then it was good for me again (all based on “studies”). I eventually upped my intake of coffee, and switched to a handy French press… only to then learn that French press use is the one and only brewing method known to literally raise cholesterol (which I already have plenty of, thank you). Oy veyyyy!….

LikeLike
- Guest2
  
  April 15, 2014 at 9:18 am
  
  This month’s Nutrition Action Newsletter has a cover story on this exact problem — and a center fold board game that epitomizes all the problems establishing causality experimentally.
  
  “What’s the Catch? Why the Latest Study is Rarely the Final Answer” by Bonnie Liebman.
  
  Culprits include: confounding variables, confusing cause with effect, chance, misclassification, bias, too-short term, too small, wrong people, wrong dose, too little difference.
  
  Mathematically, I have a quibble with the notion of randomness. Nothing is truly random, since it is contingent; and contingency always involves constraints. Once constraints come into play, there is no absolute randomness. It’s just a math/philosophy thing.
  
  LikeLike
Mike

April 15, 2014 at 8:16 am

OMG!! HR!! How do I hate thee, HR? Let me count the ways…
HR is like Walter Brennan as Judge Roy Bean, the self-appointed hanging judge who loves the beauteous Lily Langtree and presides over the pseudonymous town of Vinegaroon, Texas in the Gary Cooper movie The Westerner…
HR is like one of Pigmeat Markham’s (a rap forerunner) vaudeville shticks with an inflated pig bladder-balloon…”here come da judge??”…
HR is a confederacy of dunces replete with kangaroo courts, whistleblower vilification, the corporate equivalents of public stocks, paid mercenary crisis managers and PR flacks, hacks and flunkies…all of these metaphors for the despicable treatment and practices that are associated with blaming the victim or haplessly honest employee wrt corporate malfeasance…
HR exists to expedite the status quo as defined by senior management. HR is profoundly asymmetric in its prosecution and adjudication of due process. God forbid that someone from the corporate rank-and-file has an issue with a manager. HR will listen patiently to the complainant, promise confidentiality and right to privacy, and then immediately relay said complaint to said manager. HR seems to relish and specialize in throwing people under the bus…
HR exists to cover-up the most monstrous and sociopathic of managerial sins. Even in the face of incontrovertible evidence of mismanagement, over-management, insane OCD, bullying, lying, politically charged distortions and misrepresentations, HR “investigations” and due diligence wrt these very real problems inevitably result in a whitewash of the perps — as long as they are in senior management. That said, they do occasionally scapegoat a perp or two if for no other reason than to demonstrate that the corporate system of injustice works.
All of this leads to the Golden Rule of HR…NEVER, EVER REPORT ANYTHING TO HR!!!
If you have a work-related issue, it’s five thousand times better to take care of it yourself…even if that means walking, quitting, etc…..
That said, “protected minorities” are given special privileges under federal laws regarding diversity in the workplace. “Protected minorities” include the identified diverse: women, people of color, the physically impaired, discrimination on the basis of sex, race, religion, sexual preference, and so on. PMs even have the right to sue an employer for mistreatment while the rest of us hang on by our fingernails since we are, at best, employed “at will”…
HR is not an ally, nor is it your friend — except maybe in the sense of Mickey Rourke’s characterization of “friends” in the movie Barfly, where he plays a chronic alcoholic and bar room brawler/poet who drools copious quantities of spittle in a dazed stupor after singularly bad pummelings while slurring the word “friend” into a grotesque and unrecognizable caricature…
Stay as far away as possible…

LikeLike
- Guest2
  
  April 15, 2014 at 9:07 am
  
  Funny, but sadly, true.
  
  We owe the emergence of HR departments to that same period in history in the US when corporations, management, and Taylorism were invented, all in the name of efficiency (1910ish).
  
  LikeLike
- Sub-Boreal
  
  April 15, 2014 at 12:26 pm
  
  In a previous job, probably the most consistent bullying behaviour that I observed was exhibited by the middle manager who used the HR dept. as a stepping-stone for his upward moves through the hierarchy. My favourite part was hearing after the fact that one of his successors as HR dept. head was quietly terminated a few months after being hired when it was discovered that his credentials were fabricated. Ooops!
  Fortunately, I left all of that viciousness behind when I left government. Where I now work, HR is much more benign: merely ineffectual and time-wasting.
  While I certainly understand the need to have internal protections against harassment by employers, it’s hard to see how a creature of management could do this without being compromised by the obvious conflict of interest. Much better to get a good union!
  The general uselessness of HR departments simply illustrates the general problem of the cult of management – the notion that whether the organization produces widgets or PhDs, it can be run by a generic managerial class who have no detailed knowledge of the distinctive processes and products of the organization. But this culture is strongly self-perpetuating – folks who actually understand what’s going on find it more appealing to move laterally rather than move into management where they would have to spend most of their time in meetings with ambitious weenies.
  
  LikeLike
Joe McCarthy (@gumption)

April 15, 2014 at 9:21 am

Interesting and provocative post.
I can’t help but invoke Ralph Waldo Emerson’s observation:
“Life is an experiment. The more experiments, the better”.
WIth respect to the political study, I think it’s important to note the cause and organization for which meetings were being sought:
“The experiment was embedded in a political organization’s effort to build support for a
bill before Congress to ban a chemical. The organization, CREDO Action, is a US liberal
political organization with around 3.5 million members”
I do not see any discussion about which party representatives in the 191 districts were from, or anything about their voting records, but the party and/or liberal vs. conservative orientation of the legislators may be a confounding factor.
I also don’t see any reference to a journal or other peer-reviewed publication forum in the paper or on the corresponding author’s site, not that I believe that peer review is a fool-proof process.

LikeLike
Michael L

April 15, 2014 at 9:26 am

You’re right that the randomized experiment is regarded as the gold standard. But the gold may be a bit more tarnished than is often recognized. A main reason has to do with the issue of the population. In the social sciences and I think medicine too the people randomly assigned to treatment or control often aren’t a random sample from some population of interest. Instead they’re often a so called convenience sample. This raises the issue of external validity or the question of whom the findings of the experiment are meant to apply to. I think this comes up in medicine quite a bit but that’s not my area of expertise. An experiment might find that a treatment worked for men but the question of whether it will work for women too can’t be known if the experiment only randomized men to treatment and control. Now although there are obvious biological differences between men and women there are similarities too so maybe it’s still relatively easy to generalize to women from an experiment that was conducted only with men or the other way around. But once one moves to the social sciences arguably things get even more dicey. Take the experiment you mentioned about a shorter work day. I don’t know how the sample for the study was selected but if it’s like many I’ve seen in the social sciences the sample was neither random nor any other kind of probability sample. So if a causal effect is found would it just apply to people like those in the study or to people more broadly? Further it may be that shorter hours only has a causal effect in work environments like the one in the study. That is, the intervention might have no effect in other settings. To make matters worse we might not even know what the factors in the work environment are that must interact with the intervention in order for it to have an effect. And if we don’t know this we won’t have much guidance when it comes to whether we should use the intervention in another setting because we won’t know if that other setting contains these crucial factors that must interact with the treatment in order for it to work. None of what I’ve said means that experiments shouldn’t be conducted. But we shouldn’t overstate what they allow us to learn. The philosopher Nancy Cartwright has gone into a lot of depth about the issues I’ve raised. A short overview of her ideas can be found here:

http://download.thelancet.com/pdfs/journals/lancet/PIIS0140673611605631.pdf?id=caajeE9h1Sksh51-pBLvu

LikeLike
davidflint

April 21, 2014 at 12:34 pm

in the UK the Cabinet Office (a small dept in the national government) published “Test, Learn, Adapt: Developing Public Policy with Randomised Controlled Trials” in 2012. See https://www.gov.uk/government/publications/test-learn-adapt-developing-public-policy-with-randomised-controlled-trials.

It also worth reading Poor Economics, a book on development by Abhijit V. Banerjee and Esther Duflo which looks hard at evidence and RCTs in development. See http://pooreconomics.com/about-book.

LikeLike
Joe Fusion

April 23, 2014 at 10:19 pm

Cool article! I agree, but would add a slight modification: “The gold standard in *some* scientific fields is the randomized experiment.”

The results of randomized trials also depend how likely there was to be an effect. For example, this comes up when doing RCTs for therapies that have no scientific basis. The outcome should be adjusted based on a prior likelihood.

Here’s an article that probably explains it better than I can. There is also a list of links at the bottom, of other related articles. I found them interesting, in terms of designing experiments.

http://www.sciencebasedmedicine.org/prior-probability-the-dirty-little-secret-of-evidence-based-alternative-medicine-2/

LikeLike