Home > Uncategorized > Piece in Slate about ethical data science

Piece in Slate about ethical data science

February 5, 2016

Yesterday Slate published a piece I wrote for them entitled The Ethical Data Scientist. Take a look and tell me what you think, I enjoyed writing it.

One thing I call for in the essay is the teaching of ethics to aspiring data scientists, and yesterday some very cool professors from the Berkeley School of Information wrote to me and told me about their two classes on data science and ethics, one for undergrads and the other for graduate students. I seriously wish I could enroll in them!

Please tell me of other efforts in this direction if you know of them.

Categories: Uncategorized
  1. February 5, 2016 at 7:30 am

    After one of the too many scandals on Wall Street, a solution was found: Teach the “registered reps” ethics. So I found myself in a room being taught “ethics.” My first question was WHOSE ethics?

    Do we teach, in the infamous words of Ted Cruz, “New York values?” Is it ethical to have four wives? It is for Mormons and Muslims. Is it ethical to have women go through female genital mutilation? It is for many Muslims.

    http://www.nytimes.com/2016/02/05/health/indonesia-female-genital-cutting-circumcision-unicef.html?_r=0

    http://news.sky.com/story/1636303/fgm-case-reported-every-109-minutes-in-england

    Is it ethical to have women sit in the back of the bus? It is for some Ultra Orthodox Jews.

    Ethics are no substitute for rules and regulations, and when someone is already an adult, it is too late to teach them ethics and morality. It is either a part of one’s constitution (as it is for Cathy) or it is not.

    Like

    • February 5, 2016 at 8:04 am

      As you noted, if ethics are a series of rules, then everyone will have their own list. What I would propose is actually a skills course (contrasted with an information transfer course) where the key skills developed are:
      (1) habits of questioning whether what we are doing is ethical or not, from multiple perspectives
      (2) techniques to reduce bias in the process of answering questions raised in part (1), particularly the bias to find ways to justify actions that are self-serving

      Also, it is clear that data scientists aren’t the only ones in need of these skills.

      Like

  2. February 5, 2016 at 7:33 am

    Not many people would understand the mechanics of the algorithm, but the need for openness to independent testing is vital. One question is “How do we get to find out that some big data algorithm is being used?”

    Like

  3. Mike Williams
    February 5, 2016 at 11:10 am

    Have you seen this list? http://bdes.datasociety.net/council-output/pedagogical-approaches-to-data-ethics-2/. Not as long as we’d all like to be, but fairly complete at the time of writing, as far as I can tell.

    Like

    • February 5, 2016 at 12:11 pm

      Very impressive list. Three people put in a lot of work to put it together. An interesting quote in their research is:

      “actually harms, broader moral reasoning skills that operate beyond compliance”

      I get the point about compliance, but I still wonder about “broader moral reasoning,” being defined by whom. Do we all share common ideas about what is moral? Based on my experience across three continents, the answer is no.

      Like

      • Aaron Lercher
        February 5, 2016 at 4:23 pm

        The article says the main problem is not ethical relativity.
        The main problem is integrating students’ learning about ethics with practical problem-solving or case studies, rather than teaching ethics in a way that is detached from practice. As a former philosophy teacher, that makes a lot of sense to me.
        Metcalf, Crawford, and Keller are pleasantly surprised that most data ethics courses take an integrative approach.
        But if I may adopt a “detached” perspective, philosophers are in surprising agreement that there are just three kinds of ethical theories. Ethical disagreements are real, but not completely incomprehensible and irrational.
        The article’s remark about “negative” versus “positive” ethics is a sophisticated summary of a basic division between rule-based non-consequentialist ethics and goal-based consequentialist ethics. Those are two of the three kinds of ethical theory. The third kind might not be as useful in this context, since these are theories of what it is to be a good human being.
        But if data science becomes increasingly professionalized, then it is likely to need a code of ethics that standardizes expectations. This won’t be perfect, but maybe Cathy will be on the committee.

        Like

        • Aaron Lercher
          February 5, 2016 at 6:46 pm

          The American Statistical Association has an Ethics Committee: http://community.amstat.org/ethics/aboutus/new-item

          There is a link on this page to a proposed revision of their code of ethics. http://community.amstat.org/ethics/home

          These might be helpful in thinking about a data science ethical code. But in Cathy’s Slate piece, do the homeless people in the dataset qualify as research subjects? Perhaps they are better understood as similar to medical patients.

          The American Medical Association also has an ethics code. http://www.ama-assn.org/ama/pub/physician-resources/medical-ethics/code-medical-ethics.page

          These codes aren’t going to resolve the really hard problems. Perhaps neither is really a good way of understanding how the data scientist should act with respect to the homeless people. But they clarify how each professional group defines the ethical problems it has to deal with.

          Like

        • Aaron Lercher
          February 6, 2016 at 5:33 pm

          A radical approach would be to treat all the homeless people in the dataset as *clients*, on par with whomever had requested the study.
          I don’t know the implications of that, but I’m suggesting it because the relationship between data modeler/analyst and the people whose data the data modeler/analyst is working with can be understood in many ways.
          A professionalized data science ethics code might be seen as an attempt to define this relationship and others. One purpose would be to avoid liability, if that’s a threat. Another purpose for any profession is to claim a task as its own, because it can claim it has the special skills to handle the problems inherent in this task. Yet another purpose would be ethical as such.

          Liked by 1 person

  4. February 6, 2016 at 1:36 pm

    You note that there’s very little theoretical literature on data science in practice. But doesn’t that siloing of practice from theory occur in all applicable disciplines? While having little time for theory may be part of the problem, I suspect not being at liberty to say may be a bigger part. Perhaps working in theory is “publish or perish” and working in practice is “publish and perish” (variant on “I could tell you but then I’d have to kill you.”

    I think the only performant accountability mechanism for data science is a competing data science community that is outside the for-profit sector. (public option?) This could combine participants from all the sectors other than that one; academics, civil servants; even hobbyists. Whatever is to data science as open source software is to software. I think you’ll get farther letting researchers do applied work than getting applied workers to publish research.

    The call for professional ethics among data scientists is cute. Before you can have professional ethics, you have to have a profession. Michael O. Church provides some insight as to how a profession gets to be a profession:

    Clerking evolved from an apprenticeship to a tournament at which most would fail, and the post-1870-ish specialization of the clerkship phase meant like must compete against like, as is true even today, for the limited supply of positions in The Business proper. Accountants competed with other accountants for the limited leadership positions, and marketing analysts went against other marketing analysts, and so on for each field. There was one group that realized, very quickly, that they were getting screwed by this: lawyers. Law is, of all the specialties that a business requires, perhaps the most cognitively demanding one that existed, at least with substantial numbers, in the late 19th century. It tended (like computer programming, today) to draw in the hyper-cerebral types insistent on tackling the big intellectual challenges. Or, to put it more bluntly, they were a very smart pool and it was tough for a lawyer to distinguish himself with a level of intelligence that would be dominant had time and opportunity brought him into a different pool, but might only be average among attorneys.
    Lawyers realized that they were getting burned by “like competes against like” in the tournament for the opportunity to become actual partners in the business (i.e. executives and owners). The good news, for them as a tribe, was that they knew how to control and work the law, that being their job. They professionalized. They formed the American Bar Association (in the U.S.) and made it standard that, in corporate bureaucracies, lawyers report only into other attorneys. In-house corporate attorneys report to other attorneys, up to the General Counsel, who reports to the board (not the CEO). Law firms cannot be owned by non-lawyers. Accreditation also ensures (at least, in theory) a basic credibility for all members of the profession: one who loses a client is still a lawyer. The core concept here is that, while an attorney is a provider of services, attorneys are not supposed to be business subordinates (clerks). They have the right and obligation to follow ethical mandates that supersede managerial authority. This requires that the profession back them if they lose their jobs; the value of a limited-supply accreditation is that a person who is fired for exercising that (mandatory) ethical independence remains marketable and (in theory, at least) can resume his or her career, more or less, uninterrupted. Without that type of assurance, that level of ethical and professional independence is quite obviously impossible.

    That, or gotta wobble that job…

    Like

  5. February 6, 2016 at 10:31 pm

    The grad class (which is great! I took it last summer) is part of UC-B I School’s MIDS degree, which is delivered via the internet. I can’t think it’d be anything other than excellent for everyone involved if you could get hooked up in that loop somehow. Maybe talk to the profs and/or I School dean if you’re interested in being a guest, etc?

    Like

  6. February 8, 2016 at 2:11 pm

    Excellent piece on a very important topic, thank you! I also wrote about this last year: http://ouzor.github.io/blog/2015/06/17/datascience-responsibility.html

    Liked by 1 person

  7. February 14, 2016 at 6:44 pm

    Very well done, that Slate piece. There are many ways that ethical data and statistics practice need to be pursued, but I think they can be wrapped up into three goals: Try Not To Fool Yourself, Don’t Knowingly Fool Your Client, and Try To Keep The Client From Fooling Themselves. I emphasize “try” because I know I am the easiest person to fool, and, so, it is important to do careful cross-checks and such. I know, too, that if I “indulge” in these, to some, I may seem slow in my process.

    I would add one more thing: When it comes to predictive outcomes, whether in terms of classification or forecasting, I think it’s critical to elucidate what’s being optimized during a model fit. A lot of times, people seem to just go through the mechanics of doing something, and the risks are not detailed in the analysis. Sure, it’s possible to look at how outcomes vary on subsets of data, but it is not common practice to examine the possibility of specification error, that is, the very model being used to assess goodness. Just as with the need to try differing priors in Bayesian (computational) methods, it is, I think, a good idea to try different valuation schemes and see if they concur, or give insight.

    And, yes, interpretability is key: If a result cannot be understood, no matter how good, I think it should be distrusted.

    Like

  1. No trackbacks yet.
Comments are closed.
<span>%d</span> bloggers like this: