There’s a new breed of models out there nowadays that reads your face for subtle expressions of emotions, possibly stuff that normal humans cannot pick up on. You can read more about it here, but suffice it to say it’s a perfect target for computers – something that is free information, that can be trained over many many examples, and then deployed everywhere and anywhere, even without our knowledge since surveillance cameras are so ubiquitous.
Plus, there are new studies that show that, whether you’re aware of it or not, a certain “gut feeling”, which researchers can get at by asking a few questions, will expose whether your marriage is likely to work out.
Let’s put these two together. I don’t think it’s too much of a stretch to imagine that surveillance cameras strategically placed at an altar can now make predictions on the length and strength of a marriage.
I guess it brings up the following question: is there some information we are better off not knowing? I don’t think knowing my marriage is likely to be in trouble would help me keep the faith. And every marriage needs a good dose of faith.
I heard a radio show about Huntington’s disease. There’s no cure for it, but there is a simple genetic test to see if you’ve got it, and it usually starts in adulthood so there’s plenty of time for adults to see their parents degenerate and start to worry about themselves.
But here’s the thing, only 5% of people who have a 50% chance of having Huntington’s actually take that test. For them the value of not knowing that information is larger than knowing. Of course knowing you don’t have it is better still, but until that happens the ambiguity is preferable.
Maybe what’s critical is that there’s no cure. I mean, if there was therapy that would help Huntington’s disease sufferers delay it or ameliorate it, I think we’d see far more people taking that genetic marker test.
And similarly, if there were ways to save a marriage that is at risk, we might want to know on the altar what the prognosis is. Right?
I still don’t know. Somehow, when things get that personal and intimate, I’d rather be left alone, even if an algorithm could help me “optimize my love life”. But maybe that’s just me being old-fashioned, and maybe in 100 years people will treat their computers like love oracles.
My friend Jordan Ellenberg sent me an article yesterday entitled Coin-flip judgement of psychopathic prisoners’ risk.
It was written by Seena Fazel, a researcher at the department of psychiatry at Oxford, and it concerns his research into the currently used predictive risk models for violence, repeat offense, and the like, which are supposedly tailored to people who have mental disorders like psychopathy.
Turns out there are a lot of these models, and they’re in use today in a bunch of countries. I did not know that. And they’re not just being used as extra, “good to know” information, but rather as a tool to assess important decisions for the prisoner. From the article:
Many US states use such tools to assess sexual offending risk and to help decide whether to exercise their powers to detain sexual offenders indefinitely after a prison term ends.
In England and Wales, these tools are part of the admission criteria for centres that treat people with dangerous and severe personality disorders. Outside North America, Europe and Australasia, similar approaches are increasingly popular, particularly in clinical settings, and there has been a steady growth of research from middle-income countries, such as China, documenting their use.
Also turns out, according to a meta-analysis done by Fazel, that these models don’t work very well, especially for the highest risk most violent population. And what’s super troubling is, as Fazel says, “In practice, the high false-positive rate probably means that some offenders spend longer in prison and secure hospital than their true risk would suggest.”
Talk about creepy.
This seems to be yet another example of a mathematical obfuscation and intimidation that gives people a false sense of having a good tool at hand. From the article:
Of course, sensible clinicians and judges take into account factors other than the findings of these instruments, but their misuse does complicate the picture. Some have argued that the veneer of scientific respectability surrounding such methods may lead to over-reliance on their findings, and that their complexity is difficult for the courts. Beyond concerns about public protection, liberty and costs of extended detention, there are worries that associated training and administration may divert resources from treatment.
The solution? Get people to acknowledge that the tools suck, and have a more transparent method of evaluating them. In this case, according to Fazel, it’s the researchers who are over-estimating the power of their models. But especially where it involves incarceration and the law, we have to maintain an adherence to a behavior-based methodology. It doesn’t make sense to put people in jail an extra 10 years because a crappy model said so.
This is a case, in my opinion, for an open model with a closed black box data set. The data itself is extremely sensitive and protected, but the model itself should be scrutinized.
Yesterday at our weekly Alt Banking meeting we had an extraordinary speaker, Merlyna Lim, come talk to us about social media and grass roots organizing.
Her story was interesting and nuanced; I won’t get everything down here. But there were quite a few sound byte takeaways I can express.
- The organizing which culminated in the Arab Spring started way before Facebook or Twitter came to the region.
- To a large extent social media has replaced chatting in the cafe, which we don’t do anymore.
- But that’s actually a good thing, since many regimes are so oppressive they won’t let large groups of people hold regular meetings (and large can mean 5 or more).
- Whereas social media is pretty good at energizing people to “get rid of their enemy” at a given critical moment, and mobilize on the street, it’s not that great at nuanced discussions for how to build something permanent and lasting afterwards.
One other thing I wanted to mention was Merlyna’s work on Mohamed Bouazizi, the Tunisian street vendor who self-immolated after getting into a dispute with a police officer.
The original story that got people mobilized in Tunis and out on the street, was that the police officer was a woman, that she slapped him, and that he was a college educated street vendor. It turns out these were white lies – he never finished high school, the police officer may have been a man, and there was probably no slap – but they built a narrative that people really loved. Merlyna wrote a paper about this available here if you want to know more.
That brings us to the question of why this particular framing was so appealing. Merlyna put it this way: plenty of other people had self-immolated under similar circumstances in Tunisia in the past 6 months alone. But they didn’t start a revolution because they were just very poor and didn’t have this story with extra (made-up) humiliating details. Killing yourself because you are frustrated at not having enough to eat just isn’t as compelling.
It reminds me of this Bloomberg View piece I’ve been chewing on for a couple of week, written by Peter Turchin and entitled Blame Rich, Overeducated Elites as Our Society Frays. He studies conditions for revolution as well, and claims that having a large unemployed but highly educated population – the “lawyer glut” we’re seeing today – is asking for trouble. From his article:
Elite overproduction generally leads to more intra-elite competition that gradually undermines the spirit of cooperation, which is followed by ideological polarization and fragmentation of the political class. This happens because the more contenders there are, the more of them end up on the losing side. A large class of disgruntled elite-wannabes, often well-educated and highly capable, has been denied access to elite positions.
Food for thought. Does one have more sympathy for people whose foodstamps have been recently cut or for someone who got a law degree and can’t find a job? Or is the real outrage when both happen (or at least are said to happen)? Personally, and this is maybe because I’ve been reading Jonathan Kozol’s Savage Inequalities, I’m not as worried about the lawyer.
You guys know Aunt Pythia loves you. And Aunt Pythia feels the love from you readers as well, especially in person (some of you are reticent to add comments online, for whatever reason).
So don’t take it the wrong way when I say this: you guys are nerds. I have like a 5-to-1 ratio of math-related versus sex-related questions, and today I’m effectively withholding the sex until the end as a hook to keep you guys.
Don’t get me wrong, I love nerd questions. Happy to answer them. But people! Let’s spice this up! And if you can’t go all the way to sex at least come up with something about breastfeeding in public or thereabouts. As you know, Aunt Pythia doesn’t make up questions – that would be beneath her – but she has no problem with prompts.
In other words, as you enjoy today’s column:
please, think of something sexy to ask Aunt Pythia at the bottom of the page!
What’s the deal with employers being dishonest in their job descriptions, and the general acceptance of this sort of unethical behavior? I work in a somewhat prestigious buy-side shop where I was told I’d be in a front-office quant research position. After I arrive, I find out that my responsibilities are really more like that of a middle-office tech position. Instead of doing research on market inefficiencies, I’m relegated to automating an endless number of reports. My employer knew what the job would entail before I joined and yet portrayed it to be something it’s not. Worst of all, it seems like 80% of the people I consult with say (expressly or implicitly) that I should be glad I got my foot in the door and that this stuff is very common, so it’s nothing to fret about. WTF’s wrong with people?
For whatever reason, which I certainly don’t relate to, there are some people that still desperately want to work in finance as front-office quants. They want it so badly, in fact, that they’re willing to pretend to be doing that while they actually do other stuff. You seem to not be one of those people. Awesome.
My suggestion to you is to get another job, simple as that. You’re not going to change their mind about what your job should be, since they’re clearly perfectly comfortable with lying to people. I mean, once you’ve got another job lined up, there’s no harm in telling them you’re leaving unless you get moved to the position you were promised, but please don’t hold your breath for that to actually happen.
One last thing: look outside finance! There are plenty of other ways to be a nerd.
Dear Aunt Pythia,
How does Mathbabe break down a data problem into manageable steps? I’m a mathematician who has tried a few data mining problems on the side for fun, and I get totally overwhelmed whenever I try to start. If I were solving a math problem, I’d read relevant papers to see what is known and get ideas for techniques, I’d break down my desired result into lemmas and work on them one by one, and I’d have a plan in mind throughout (it might change, of course, but I’d always know why I was doing what I was doing).
But if I’m trying to, say, classify a bunch of labeled feature vectors, I’m at a loss. I experiment and play around with the data, but I feel so random about everything. How do I choose how many hidden units to have in a neural net? How do I choose K in K-nearest neighbor classification? And so on. Some stuff works better than other stuff, but I don’t know how to be systematic. I end up getting discouraged, which is too bad because data problems are awesome and I want to master them.
Any tips for this mathematician on how to solve problems whose solutions aren’t proof-based?
Great question! And I’m glad you’re asking that. It’s a sign that you want to do things right, and know why you’ve made decisions. I want you to cultivate that desire.
First, (after separating my out-of-sample data from my in-sample data) I spend a lot of time with smallish samples getting the feel of things through “exploratory data analysis.” This helps make sure the data is clean, gives me the overall distribution and feel for the various data sources, and gives me some idea of the kind of relationships I might expect between the inputs and possibly the target, if there’s a well-defined target.
You’d be surprised how much you learn by doing that.
Next, how do you even choose which algorithm to use, never mind how exactly to tune the hyperparameters of a given algorithm? The answer is that it’s a craft, and over time you gain intuition, but at first you just don’t know and you experiment. Put the science in data science. Try a bunch of different ones and see which works better, and hypothesize on why, and try to test that hypothesis.
Here’s another possibility. Start with synthetic data that is “perfectly set up” for a given algorithm – figure out what that means – and then pretend you don’t know that, and see whether the above testing procedure would give you the correct result. Now add noise to that perfect data set, and see how quickly (i.e. with how much noise) your perfect solution doesn’t seem optimal anymore. That gives you an overall way of thinking about optimizing algorithms and hyperparameters. It’s hard, even with linear regression.
Oh, and buy my book. It should hopefully help.
p.s. when I worked in math, I didn’t break things down into lemmas first. I first tried to answer the question, why is this true? (maybe by starting with small examples) and then only later, in order to explain it on paper, would I break things down into lemmas.
Dear Aunt Pythia,
I have a tenure-track job in a “hard-core STEM field”; I’m also a very young looking woman. I have a serious and rewarding research program, I really enjoy teaching at the board, and I hear that I give great seminars.
Yet recently, for the first time, I have been overcome with extreme, physiological, panic when I stand at the front of a room to give a seminar. This is not because I’m worried about the material; I’m not. This is also not stage fright; I have iron nerves about performing.
It is a feeling of panic brought on by watching the room fill up with men, with maybe only 1 or 2 very junior women. I start thinking “what happened to all the other young women who, like me, loved mathematics? At what point were they all removed from the community? When will too much get to be too much for me too?”
This started happening about a year ago and it’s only getting worse. I’m not expecting to change all the weird experiences of being a young woman in my field; I just want to figure out how to deal with my own thoughts as I stand in front of my audience.
Feeling like a fox in a room full of hunting dogs
This is going to sound trite, but here goes: you are not a statistic, you are an individual person. And although you are a woman person, that doesn’t mean you have to do stuff that other women have done. If things are working for you on a minute-to-minute basis, then that means you can be happy and proud of having set up your life to be fulfilled.
Nobody is asking you to explain why other people do the things they do. We can barely explain why we do the things we do – and then half the time the understanding only comes years later. Just focus on who you are, who you want to be and how you want to spend your time.
I’d also like to mention that, as a woman who left math, I also loved teaching and I loved giving seminars – that was the good stuff! For that matter there were lots of great things about being a professor. And I didn’t leave because I was a woman and felt like it was time to leave – nor did I not leave because I wanted to prove a point about women not leaving. I left because, in my individual life and with my individual goals, it was what I wanted.
So I guess I’m suggesting that you be a bit more self-centered and somewhat less identified with women, at least at those moments, if that is possible and if that helps. If that doesn’t help, consider going to a cognitive therapist who specializes in dealing with panic attacks. Good luck!
Dear Aunt Pythia,
I feel an eerie compulsion to answer this email. I love the broken grammar and all. What should I do?
How are you doing today? My name is Colvin Hostetter. I came across your e-mail under the Graduate Students portal while surfing online for tutorial for my daughter, Debra is a 18 years old girl. She is ready to learn. I would like the lessons to be at your location. Kindly let me know your policy with regard to the fees, cancellations, location and make-up lessons.
Also, get back to me with your area of SPECIALIZATION and any necessary information you think that might help.
The lessons can start by last week of November. Mind you, any break during Thanksgiving and Christmas would be observed respectively.
Looking forward reading from you.
My best regard,
Professor Has Ignored Silly Ignoble New Game
Really? PHISING? I think you really are a bit kinky in the grammar and spelling rules department.
So this must be a spam email, since it’s talking about an 18-year-old girl who is “ready to learn.” It sounds like soft porn. And it doesn’t describe what she needs to learn – math? physics? German? I’d be not at all surprised to hear someone describe the actual financial bamboozling mechanism that would transpire if you did answer this, although a quick Google search doesn’t uncover it.
My suggestion is to mark this, and any other similar emails, as “spam” so that Google will do the work for us in the future and delete this bullshit.
Dear Auntie P,
My wife and I have not used any birth control other than rhythm and/or withdrawal for more than 16 years now (~mid late twenties to early mid forties.) We have not had any unwanted pregnancies through this. We did have one successfully planned pregnancy that corresponded exactly to the month she charted to pinpoint ovulation.
So, are we lucky outliers or is this a much more successful strategy than we were both led to believe in high school sex ed?
Any suggestions for here on out?
Lucky in love
The reason that rhythm might not work is if women have irregular ovulations. If your wife doesn’t, though, then cool (although she may experience that as she approaches menopause).
The reason withdrawal doesn’t work is because men often forget the “withdrawal” part of the plan. I mean, it’s certainly possible to get pregnant with the pre-cum (just ask Alice) but super unlikely.
In other words, you are a special, special man with a very excellent memory.
Here on out: don’t forget to remember the plan! And be aware of irregular cycles!
Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!
This is a guest post by Marc Joffe, the principal consultant at Public Sector Credit Solutions, an organization that provides data and analysis related to sovereign and municipal securities. Previously, Joffe was a Senior Director at Moody’s Analytics.
As Cathy has argued, open source models can bring much needed transparency to scientific research, finance, education and other fields plagued by biased, self-serving analytics. Models often need large volumes of data, and if the model is to be run on an ongoing basis, regular data updates are required.
Unfortunately, many data sets are not ready to be loaded into your analytical tool of choice; they arrive in an unstructured form and must be organized into a consistent set of rows and columns. This cleaning process can be quite costly. Since open source modeling efforts are usually low dollar operations, the costs of data cleaning may prove to be prohibitive. Hence no open model – distortion and bias continue their reign.
Much data comes to us in the form of PDFs. Say, for example, you want to model student loan securitizations. You will be confronted with a large number of PDF servicing reports that look like this. A corporation or well funded research institution can purchase an expensive, enterprise-level ETL (Extract-Transform-Load) tool to migrate data from the PDFs into a database. But this is not much help to insurgent modelers who want to produce open source work.
Data journalists face a similar challenge. They often need to extract bulk data from PDFs to support their reporting. Examples include IRS Form 990s filed by non-profits and budgets issued by governments at all levels.
The data journalism community has responded to this challenge by developing software to harvest usable information from PDFs. Examples include Tabula, a tool written by Knight-Mozilla OpenNews Fellow Manuel Aristarán, extracts data from PDF tables in a form that can be readily imported to a spreadsheet – if the PDF was “printed” from a computer application. Introduced earlier this year, Tabula continues to evolve thanks to the volunteer efforts of Manuel, with help from OpenNews Fellow Mike Tigas and New York Times interactive developer Jeremy Merrill. Meanwhile, DocHive, a tool whose continuing development is being funded by a Knight Foundation grant, addresses PDFs that were created by scanning paper documents. DocHive is a project of Raleigh Public Record and is led by Charles and Edward Duncan.
These open source tools join a number of commercial offerings such as Able2Extract and ABBYY Fine Reader that extract data from PDFs. A more comprehensive list of open source and commercial resources is available here.
Unfortunately, the free and low cost tools available to modelers, data journalists and transparency advocates have limitations that hinder their ability to handle large scale tasks. If, like me, you want to submit hundreds of PDFs to a software tool, press “Go” and see large volumes of cleanly formatted data, you are out of luck.
It is for this reason that I am working with The Sunlight Foundation and other sponsors to stage the PDF Liberation Hackathon from January 17-19, 2014. We’ll have hack sites at Sunlight’s Washington DC office and at RallyPad in San Francisco. Developers can also join remotely because we will publish a number of clearly specified PDF extraction challenges before the hackathon.
Participants can work on one of the pre-specified challenges or choose their own PDF extraction projects. Ideally, hackathon teams will use (and hopefully improve upon) open source tools to meet the hacking challenges, but they will also be allowed to embed commercial tools into their projects as long as their licensing cost is less than $1000 and an unlimited trial is available.
Prizes of up to $500 will be awarded to winning entries. To receive a prize, a team must publish their source code on a GitHub public repository. To join the hackathon in DC or remotely, please sign up at Eventbrite; to hack with us in SF, please sign up via this Meetup. Please also complete our Google Form survey. Also, if anyone reading this is associated with an organization in New York or Chicago that would like to organize an additional hack space, please contact me.
The PDF Liberation Hackathon is going to be a great opportunity to advance the state of the art when it comes to harvesting data from public documents. I hope you can join us.
Tonight I’m going to be on a panel over at Columbia’s Journalism School called Algorithmic Accountability Reporting: On the Investigation of Black Boxes. It’s being organized by Nick Diakopoulos, Tow Fellow and previous guest blogger on mathbabe. You can sign up to come here and it will also be livestreamed.
Unlike some panel discussions I’ve been on, where the panelists talk about some topic they choose for a few minutes each and then there are questions, this panel will be centered around a draft of a paper coming from the Tow Center at Columbia. First Nick will present the paper and then the panelists will respond to it. Then there will be Q&A.
I wish I could share it with you but it doesn’t seem publicly available yet. Suffice it to say it has many elements in common with Nick’s guest post on raging against the algorithms, and its overall goal is to understand how investigative journalism should handle a world filled with black box algorithms.
Super interesting stuff, and I’m looking forward to tonight, even if it means I’ll miss the New Day New York rally in Foley Square tonight.
As many of you are aware, food stamps were recently cut in this country. This has had a brutal effect on people and families and on neighborhood food pantries, which are being swamped with new customers and increased need among their existing customers.
One thing that I come away with when I read articles describing this problem is how often they detail individuals who have been diagnosed with diabetes but can no longer afford to pay for appropriate food for their condition.
As a person with a family history of diabetes, and someone who has been actively avoiding sugars and carbs to control my blood sugar for the past couple of years, I have a tremendous amount of sympathy for these struggling people.
Let me put it another way. Eating well in this country is expensive, and I’ve had to spend real money on food here in New York City to avoid sugary and fast carb-laden food. I don’t think I could have done that on a skimpy food budget. It’s especially hard to imagine budgeting healthy food on a withering food stamp budget.
Because here’s the thing, and it’s not a secret: shitty food is cheap. If I need to buy lots of food (read: calories) for a small amount of money, I can do it easily, but it will be hell for my blood sugar control. I’m guessing I’d be a full-blown diabetic by now if I were poor and on food stamps.
And that brings me to my nerd question of the morning. How much money are we really saving by decreasing the food stamp allowance in this country, if we consider how many more people will be diagnosed diabetic as a result of the decreased quality of their diet? And how many people’s diabetes will get worse, and how much will that cost?
It’s not over, either: apparently more cuts are coming over the next 10 years (maybe by $4 billion, maybe by $40 billion). And although diabetes care costs have gone up 40% in the last 5 years ($245 billion in 2012 from $174 billion in 2007), that doesn’t mean they won’t go up way more in the next 10.
I’m not an expert on how this all works, but the scale is right – we’re talking billions of dollars nationally, so not small potatoes, and of course we’re also talking about people’s quality of life. Never mind in a moral context – I’m definitely of the mind that people should be able to eat – I’m wondering if the food stamp cuts make sense in a dollars and cents context.
Please tell me if you know of an analysis in this direction.