Home > data science > Data science code of conduct, Evgeny Morozov

Data science code of conduct, Evgeny Morozov

March 18, 2013

I’m going on an 8-day long trip to Seattle with my family this morning and I’m taking the time off from mathbabe. But don’t fret! I have a crack team of smartypants skeptics who are writing for me while I’m gone. I’m very much looking forward to seeing what Leon and Becky come up with.

In the meantime, I’ll leave you with two things I’m reading today.

First, a proposed Data Science Code of Professional Conduct. I don’t know anything about the guys at Rose Business Technologies who wrote it except that they’re from Boulder Colorado and have had lots of fancy consulting gigs. But I am really enjoying their proposed Data Science Code. An excerpt from the code after they define their terms:

(c)  A data scientist shall rate the quality of evidence and disclose such rating to client to enable client to make informed decisions. The data scientist understands that evidence may be weak or strong or uncertain and shall take reasonable measures to protect the client from relying and making decisions based on weak or uncertain evidence.

(d) If a data scientist reasonably believes a client is misusing data science to communicate a false reality or promote an illusion of understanding, the data scientist shall take reasonable remedial measures, including disclosure to the client, and including, if necessary, disclosure to the proper authorities. The data scientist shall take reasonable measures to persuade the client to use data science appropriately.

(e)  If a data scientist knows that a client intends to engage, is engaging or has engaged in criminal or fraudulent conduct related to the data science provided, the data scientist shall take reasonable remedial measures, including, if necessary, disclosure to the proper authorities.

(f) A data scientist shall not knowingly:

  1. fail to use scientific methods in performing data science;
  2. fail to rank the quality of evidence in a reasonable and understandable manner for the client;
  3. claim weak or uncertain evidence is strong evidence;
  4. misuse weak or uncertain evidence to communicate a false reality or promote an illusion of understanding;
  5. fail to rank the quality of data in a reasonable and understandable manner for the client;
  6. claim bad or uncertain data quality is good data quality;
  7. misuse bad or uncertain data quality to communicate a false reality or promote an illusion of understanding;
  8. fail to disclose any and all data science results or engage in cherry-picking;

Read the whole Code of Conduct here (and leave comments! They are calling for comments).

Second, my favorite new Silicon Valley curmudgeon is named Evgeny Morozov, and he recently wrote an opinion column in the New York Times. It’s wonderfully cynical and makes me feel like I’m all sunshine and rainbows in comparison – a rare feeling for me! Here’s an excerpt (h/t Chris Wiggins):

Facebook’s Mark Zuckerberg concurs: “There are a lot of really big issues for the world that need to be solved and, as a company, what we are trying to do is to build an infrastructure on top of which to solve some of these problems.” As he noted in Facebook’s original letter to potential investors, “We don’t wake up in the morning with the primary goal of making money.”

Such digital humanitarianism aims to generate good will on the outside and boost morale on the inside. After all, saving the world might be a price worth paying for destroying everyone’s privacy, while a larger-than-life mission might convince young and idealistic employees that they are not wasting their lives tricking gullible consumers to click on ads for pointless products. Silicon Valley and Wall Street are competing for the same talent pool, and by claiming to solve the world’s problems, technology companies can offer what Wall Street cannot: a sense of social mission.

Read the whole thing here.

Categories: data science
  1. March 18, 2013 at 1:47 pm

    It’s hard to “know” that you are screwing up when your paycheck depends on your client’s needs being met. Data scientists will succomb to many of the same forms of “capture” as regulators, consultants, physicians, etc. They tend to tell their “clients” whatever they need to hear. A code of conduct might help a bit but don’t count on it working much better then those used in medicine, accounting or other professions. The key is finding “good” clients who will pay for unvarnished recommendations based on solid empirical methods. If the clients paying the bills are corrupt, they’ll compromise any professionals they hire.

    An effective code of conduct needs to focus on both sides of the data science transaction. Ethical client behavior should be defined and considered along with the behavior of the data scientist.

    Like

    • March 19, 2013 at 10:31 am

      The only practical way to make encourage clients to act ethically is to have a “noisy withdrawal” policy in which you leave conspicuously when a client is not acting ethically. The problem is meeting that requirement without getting sued for business defamation if you’re wrong or for violating secrecy (like trade secret) obligations. In general, it’s very hard to enforce ethical behavior on clients.

      Like

      • March 19, 2013 at 10:35 am

        True. I like noisy withdrawals. On the other hand if people are worried they might just never get help in the first place.

        Like

        • March 20, 2013 at 10:32 am

          Thanks, Cathy. I’ve learned as a lawyer that clients who want to cheat don’t want help in the first place. Liability carriers now have a category of “unworthy clients”.

          On a related note, I’d love your comments on the following general question that I have regarding statistical measures of school and student performance.

          I have degrees in chemistry from the University of Chicago and Harvard, and I studied at lot of mathematics classes at both schools. I’m also married to a teacher and a member of my local school board, and I’ve read many of Diane’s books and pay close attention to her ‘blog; so I have been following the discussions about school quality for some time now. I’ve been a patent attorney for over 20 years, and I’ve spent a lot of time reviewing critically the arguments and evidence in scientific publications.

          As a scientist, I often used least-squares techniques to fit measurement data to a curve (usually a line) and determine the significance of that fit. I’ve understood that the idea behind these techniques is based on the collection of measurements of a system in response to some external input (e.g., the length of a metal bar in response to controlled changes in temperature). The measured values are plotted as ordered pairs (input,result) and some indication of the error in measurement is also recorded. The question is whether the points will define a line that is theoretically postulated to exist. The appropriate formula is applied and the results are examined to demonstrate a significant “fit” or not. I think Wikipedia’s entry summarizes my understanding well.

          Now, I understand there is a lot a statistician can do with data (more on that in a moment) beyond what most scientists are taught or will ever need. But I can’t help but be troubled by what I see is a wanton misuse of these techniques throughout the social sciences and economics, and especially in education. Here are my concerns on which I ask for your comment:

          1. Least squares was developed to handle conditions of multiple measurements of defined physical objects. The “measurements” thus are statements of some physical property of a defined physical body that under a given set of conditions should be repeatable by anyone at any time. (The system being measure may have to be recreated in the case of time-varying properties like nuclear decay, but you get the idea.)

          I don’t see how student test scores can be a measurement. There is no physical property under examination. A test doesn’t actually measure anything; it’s just a statement of the ability of a student to answer a given set of questions at a certain time and place. It’s been well documented that test scores can vary widely for a variety of testing conditions. Since a test can really be taken only once, given the changes in the student’s underlying condition between tests, how do you determine “error” for the test’s “measurement”, at least in a way that’s meaningful to the concepts that underlie least squares fitting?

          And how can anyone use a least squares to define a line based on statistical aggregates of student test scores? The problems mentioned above are only compounded when we start taking various averages.

          2. The point of least squares is to determine whether the recorded data fall sufficiently close to a theoretical line that the data can be said to confirm the theory. Thus, you need a theoretical line as well as sound experimental measurements. Yet, most publications I see simply produce a scatterplot of points, and then draw the line which is the result of simply applying the least squares formula to the x and y values of the points. In other words, the method is used backwards to create the line that was supposed to be standard against which the points are compared. Such an approach just begs the question, since the line itself is defined by the points. The correlation coefficient is meaningless, since you don’t really correlate the points with anything independently of the points.

          3. I’ve become vary chary of words like “data” and “information”. If you check the dictionary, both words are defined in way that suggests they refer to true or confirmed statements. In fact, I’ve started using “statement” when I can’t be sure of its truth or accuracy. Given my comments above, I can’t find much, if anything, that really “data-driven” or “information-based”. In fact, much of the arguments around education policy remind me of Prof. Harry G. Frankfort’s excellent definition of “bullsh’t”.

          4. I think these problems become even worse when comparing schools systems and the education policies of countries, such as done by Harvard’s Kennedy School. Notice the graphs (e.g., Figure 3) try to show a “correlation” among the scores of different states. How can the average test scores of different state school systems provide the uniformity and regularity that underlie the lest squares method. Sure, you can draw a line calculated using the least squares formula applied to series of points, but I would argue that that line is meaningless! We’re not measuring anything! There is no physical system that can be measured. There is no “error”, since there is no “measurement”. The scores only represent a one-time event; so there’s no way for anyone to repeat the observation. There’s no theoretical line to compare against the points. How can least squares provide any useful information in this case?

          Sorry for the long post. I look forward to reading your comments. Keep up the good fight!

          Like

        • March 20, 2013 at 10:42 am

          I like your questions.

          I think though that you’re being too rigid in applying a linear model. It is what it is, to be sure, and we shouldn’t pretend there’s a line where there isn’t, but on the other hand it can certainly useful to ask a question like, if there were a line there, where would it be?

          As for students and tests, it’s not true that you’re not measuring anything. Quite obviously you are measuring the ability for that student to do well on that test at that time. You can separate out the idiosyncrasies of that concept and separately test them for variability – for example, do students do better or worse on the same test at different times? Do they do differently on different tests which are supposedly testing the “same thing”? Is the variance across students larger than the variance across a single student testing at different times?

          In other words, there are all sorts of reasons that the data points on the scatter point might be considered fuzzy areas rather than points, and they should all be considered. And the question of whether there should be a line that explains something should be examined once, twice, and then thrice. Finally, the inferences we take from plotting such a line should be seriously examined, especially in consideration of the errors we described earlier.

          I hope that’s helpful.

          Like

        • March 20, 2013 at 1:14 pm

          Definitely helpful. But I’d like to carry on another round if that’s ok.

          I agree with your point about being too rigid, which wasn’t my intention; in fact I actually meant to include that very point as a caveat, and I’ve done just such exploration myself.

          But I see a big difference between exploring data to find possible relationships as clues to a model and offering such work as a complete argument. Yes, use the data to develop a model, but then prove the model rigorously just the same. Having worked in the pharmaceutical industry, I can’t imaging anyone offering such an argument in front of an FDA approval panel. So why accept the same when considering the future of our public schools?

          About the tests, my point isn’t that tests can’t offer useful information; under the right conditions they can. But I still can’t see my way to agreeing that tests are “measurements” in the sense that word is used for the statistical techniques used to demonstrate functional relationships between inputs and responses. The fact that one would have to disentangle the idiosyncrasies you mentioned (and I agree with your point) still suggests to me that test scores are too pregnant with confounding variables to reliably define a simple functional relationship. Again, how can any researcher use such methods ethically to make claims that concern the nation’s educational policies?

          I agree with your conclusion, which sees to confirm my thoughts on this matter. My hope is to start providing the public with a serious critique and background to see through the misapplied statistical measures and question the reliability of the statements and certainty of the conclusions presented.

          Like

        • March 20, 2013 at 1:46 pm

          Love it! Good luck and keep me posted.

          Cathy

          Like

        • March 20, 2013 at 1:59 pm

          My pleasure! Thanks again for taking the time to write.

          Like

    • March 19, 2013 at 8:43 pm

      I like the argument of shifting the focus to “finding “good” clients who will pay for unvarnished recommendations based on solid empirical methods. If the clients paying the bills are corrupt, they’ll compromise any professionals they hire.” You can’t necessarily get the applied statistics specialists to self-regulate with professional standards of validity, but if you want to be an ethical practitioner, you can preferentially accept work from people who are looking for meaningful feedback from the data they’ve got, instead of someone to torture the data until it reinforces the preconceptions they’ve invested so much in trying to confirm.

      But if most clients are putting themselves in a position where by the time they’ve spent the money to collect the data and they want it analyzed professionally, they’ve over-committed to confirming the hypothesis and can’t afford a climb-down, you’ll have trouble with that strategy. And in a professional culture that buys into the rhetoric of meritocracy where performance=potential, everyone takes being found out in an error personally, because it somehow reflects on their individual potential if they admit they make mistakes, so if the people commissioning a data analysis have preconceived ideas about what the data will tell them, they may also resist alternative outcomes for more irrational reasons.

      Which in turn raises the question of how to enforce professional standards, even if you could regulate the practitioners as a group (e.g., through professional licensing and an ethics standard that can get you kicked out for violations). If getting caught is the big fear, they’ll just get better at not getting caught when doing something corrupt. And to a degree, this fear will also trigger the self-serving bias and some people will be able to convince themselves their intentions were above-board when they did whatever is supposedly against the rules, because the personal stakes are too high for them to be willing to admit the truth if they were influenced by a conflict of interest and put themselves in a vulnerable position because they misjudged the risk of consequences from “walking a fine line”.

      At the same time, if you make it too easy to accept criticism by saying “oops, that was an accidental error of interpretation” you give them too much wiggle-room for plausible deniability when they get caught doing something highly corrupt.

      I imagine if there were more in-house statisticians and less reliance on data-handling consultants, it would be easier to rein in perverse incentives, at least to a degree. The consultant analyst doesn’t really have time or interest in the long-term priorities of the institutions they take work from, so they’ll encourage them in making mistakes that look good in the short-term and walk away leaving someone else holding the bag when/if those particular mistakes are repeated on a larger scale with undue optimism about the expected outcome.

      Like

  2. ChrisJS
    March 18, 2013 at 5:34 pm

    I’m a (math) graduate student at UWashington. I’ve been reading your blog weekly since your 20th post or so. I very rarely comment, though. I hope you have a chance to meet w/ the math department graduate students. I’m very interested in both issues of math in industry and financial regulation.

    Like

  3. June 12, 2013 at 1:11 am

    Every-time I see sites as delightful as this because I should stop surfing and start working on mine

    Like

  1. March 18, 2013 at 9:02 am
Comments are closed.
%d bloggers like this: