Cathy O'Neil, mathbabe

Modeling fraud in the financial system

March 10, 2013 Cathy O'Neil, mathbabe 43 comments

Today we have a guest post by Dan Tedder. Actually it’s a letter he sent me after listening to my EconTalk podcast with Russ Roberts which he kindly agreed to let me post. Dan’s bio is below the letter.

I think this letter is profound (although I don’t completely agree about the Markov stuff), because it points out something that I see as a commonly held blindspot by people who think about regulation and modeling. Namely, that any systemic risk model of the financial system that doesn’t take account of lying isn’t worth the memory it takes up on a computer.

That brings us to the following question: can we incorporate lies into models? Can we anticipate and model fraud itself, in addition to the underlying system? Or do we give up on models and rely on skeptical people to ferret out lies? Or possibly some hybrid?

——

Hi Cathy,

I really liked your interview, and I think you are right on in pointing to a lack of ethics. I would say further that what we need is rigorous honesty in all aspects of the financial system. I agree with your objections to conflicts of interest. Allowing such conflicts to exist demonstrates a lack of rigorous honesty on the part of the participants. In my opinion a lot of bankers and folks on Wall Street should be headed to jail. The inability of the SEC to file charges and prosecute them further demonstrates the lack of honesty and character in the financial system and the government. So why am I telling you things you already know?

My father was a successful businessman. Years ago I was invited to invest in an ice cream franchise by another faculty member. I spent several days developing models using Excel. Finally, I decided to talk to my father. I called him and he immediately asked me to tell him about the present owners and their accounting. I told him the husband was in jail and accounting was five years behind. Further, his wife was probably taking money out of the till.

He stopped me right there, and pointed out that I needed to look no further. The present owners were not honest and therefore the opportunity was too risky. No telling what liabilities they had incurred and passed on to the franchise. I felt like an idiot. My modeling was a total waste of time because it assumed the present owners were honest. In fact, they were dishonest and no defensible model could be constructed based upon their accounting or lack thereof.

I think the complexity of our present financial problems will largely disappear if we try to focus more on the obvious. First, it is obvious that bankers, accountants, modelers, and other participants must be rigorously honest. Second, George Box, a statistician at the University of Wisconsin, studied the stock market and found through time series analysis that stock market prices are Markov processes. So in modeling stock prices we need only worry about today and tomorrow. The best indicator of tomorrow’s price is today’s price. The best indicator of what will happen tomorrow is where we are today, and probably our models of the larger process should also be Markovian. Third, apply the KISS method, “Keep it simple, stupid.” Instead of worrying about the mathematical model, worry about the honesty of the participants. The financial system cannot tolerate dishonesty. Making sure the bankers are honest will go a long way toward balancing the books.

Regards, Dan

——

Daniel William Tedder is Associate Professor Emeritus, School of Chemical and Biomolecular Engineering, and Adjunct Professor, School of Mechanical Engineering, both at the Georgia Institute of Technology. He attended Kenyon College and received a Bachelor’s in Chemical Engineering at the Georgia Institute of Technology. He obtained MS and PhD degrees in Chemical Engineering at the University of Wisconsin, Madison. He was a staff engineer in the Chemical Technology Division of the Oak Ridge National Laboratory before joining the faculty at Georgia Tech. He served as an independent technical reviewer at the Nuclear Regulatory Commission after retiring from Georgia Tech. He has numerous publications, has edited 11 books, and has authored one book, Preliminary Chemical Process Design and Economics, which is available from Amazon. He is an expert in chemical separations and in actinide partitioning, an advanced method for radioactive waste management.

Categories: finance, guest post

Aunt Pythia’s advice

March 9, 2013 Cathy O'Neil, mathbabe 18 comments

You’ve stumbled upon yet another week’s worth of worthy questions that will be awkwardly sidestepped by mathbabe’s alter ego Aunt Pythia.

By the way, if you don’t know what you’re in for, go here for past advice columns and here for an explanation of the name Pythia. Most importantly,

Please submit your question at the bottom of this column!

I’ve officially run out of questions so this is for real.

Please come up with something before I do.

——

Dear Aunt Pythia,

I just moved to NYC from a small university town, and I’m finding it much harder to meet nerd girls. Most of the nerd hangout spots that I’ve found are male dominated, and I meet mostly artists at the bars and coffee shops. Do you have any suggestions beyond trolling the nearest physics department?

Nice, Easygoing Roamer Drawn Swiftly Around Real, Engaging Hackers On Town

Dear NERDSAREHOT,

Let me suggest you enroll in Meetup yesterday and sign yourself up for all the nerd meetups you can find. There are plenty of cute nerd girls who go to those, and it’s a perfect situation for you to ask someone to have a beer afterwards. Also consider getting involved in weekend hackathons, which attract lots of nerd girls as well.

By the way, these events are still male dominated, but that’s a good thing. Nerd girls should have their pick. It’s one of the many advantages of being a nerd girl and it aint going away.

Aunt Pythia

——

Dear Aunt Pythia,

I recently got a job as a data scientist, and I’m feeling like my stats skills are woefully inadequate. I have a master’s in pure math and I work as a programmer, but I’ve never taken a statistics class. What books would you recommend I read to get up to speed on statistics? I’m looking for something with examples that’s applicable to my work (not too much definition/theorem/proof), but that isn’t scared of the math.

Regretting Spurning Statistics

Dear RSS,

Congratulations! Can you write back and tell everyone how you got the job? Guest post?

Honestly I learned stats (the stuff I know anyway) by reading wikipedia extensively. It’s surprisingly good. Also, the book I’m writing with Rachel Schutt will contain some good explanations of how stats is used in data science, thanks of course to Rachel, not me. She’s working on the causality chapter right now.

In general my advice to you is, draw lots of pictures, including a histogram as well as a time-value scatter plot of every data set you use, and every data set you generate as well. You’d be surprised by how quickly you learn the statistics that is relevant to your dataset when you’re intimately familiar with its properties.

Good luck!

Aunt Pythia

——

Dear Aunt Pythia,

I have been reading up on regression to the mean originally as described by Galton. He notes that the sons’ height data had reduced variance versus the height data of the preceding fathers’ generation. If this is so, wouldn’t the grandsons’ generation have even more reduced variance in height compared with the 2nd generations’ height…and so on down the generation lineage. Therefore wouldn’t the variance in succeeding generations get narrower and narrower and approach some limit? Where am I going wrong with this, or am I misunderstanding something?

MeanIQ

Dear MeanIQ,

Thanks for bringing my attention to this, it’s clearly an important historical part of linear regression and I’d never heard of it.

You’re absolutely right to think that Galton was wrong. Galton’s working theory was that two people have children by averaging their characteristics, which is just not how genetics works (as we now know). Not only would what you say be true, that after a few generations everyone would be the exact same height, but we’d also see that, if you went backwards in time, there’d be people of arbitrary height, tall and short.

As for why he saw larger variance in older generations, my best guess is that he had a selection bias. Maybe the decreasing variance he observed was due to environmental factors such as the quality and size of the local food supply, where the “current” generation were localized (and so more consistent) but the “older” generation had come from various other places where they were either better fed or less well fed, which would lead to an increased variance.

There’s another totally different interpretation for the phrase “regression to the mean” which is also confusing though. Namely, the idea that if your first measurement of something is extreme, then your second measurement will tend to be less so. The problem with this is that you have to have a notion of “extreme” in the first place. And if you do, then it’s kind of obvious (and also kind of dumb).

Aunt Pythia

——

Dear Aunt Pythia,

Is the Mathbabe religious?

I really like the new mathbabe logo/marque. The typeface is totally flapper and I really like those bulbous upside down B’s, and the offsetting of the bottom text in order to give the text texture. But when I look at the symbolology of the whole logo/marque I can’t help but wonder if the Mathbabe is religious. The T looks like a deproportioned Greek cross, and the alpha above it suggests that there should be an omega below it somewhere. So clearly the new logo/marque has some Christian symbolology, and my eyes keep looking for more. Maybe the A’s are three sided figures that represent the Trinity, and the M represents a firmament that has fallen, and therefore symbolologizes our fallen state.

Anyway, it’s cool if you are religious, as lots of great mathematicians were devout people, and some were even priests, like Bayes. And if you’re not that’s cool too. I see you describe sex both profanely and sacredly, so I know you are a spiritual person. And it’s cool if you don’t want to answer either. I respect that religion is a personal matter. Just saw your new logo/marque and was wondering.

Semi-semiotic

Dear Semi-semiotic,

Honestly I have so little religious background that I am not even sure if you’re kidding (but the “symbolologizes” kind of gives you away).

For the record, my parents were atheists who made fun of me when I told them I believed in God in first grade (I think I learned about the idea of God from a babysitter). One of their favorite stories of my childhood is when my first grade teacher, a devout Catholic, called up my parents in alarm over my essay which said “I believe in God but please don’t tell my parents” and my mom was like, “Har har that’s a good one, thanks” and hung up on her. Not that my mom is a rude person, she isn’t.

Two more points: First, I plan to refer to myself in third person from now on as “The Mathbabe”, and second, when did I ever refer to sex sacredly? That’s bullshit. Blasphemy even.

Aunt Pythia

——

Please please please submit questions, thanks! I’m desperate!

Categories: Aunt Pythia

Unintended Consequences of Journal Ranking

March 8, 2013 Cathy O'Neil, mathbabe 30 comments

I just read this paper, written by Björn Brembs and Marcus Munafò and entitled “Deep Impact: Unintended consequences of journal rank”. It was recently posted on the Computer Science arXiv (h/t Jordan Ellenberg).

I’ll give you a rundown on what it says, but first I want to applaud the fact that it was written in the first place. We need more studies like this, which examine the feedback loop of modeling at a societal level. Indeed this should be an emerging scientific or statistical field of study in its own right, considering how many models are being set up and deployed on the general public.

Here’s the abstract:

Much has been said about the increasing bureaucracy in science, stifling innovation, hampering the creativity of researchers and incentivizing misconduct, even outright fraud. Many anecdotes have been recounted, observations described and conclusions drawn about the negative impact of impact assessment on scientists and science. However, few of these accounts have drawn their conclusions from data, and those that have typically relied on a few studies. In this review, we present the most recent and pertinent data on the consequences that our current scholarly communication system has had on various measures of scientific quality (such as utility/citations, methodological soundness, expert ratings and retractions). These data confirm previous suspicions: using journal rank as an assessment tool is bad scientific practice. Moreover, the data lead us to argue that any journal rank (not only the currently-favored Impact Factor) would have this negative impact. Therefore, we suggest that abandoning journals altogether, in favor of a library-based scholarly communication system, will ultimately be necessary. This new system will use modern information technology to vastly improve the filter, sort and discovery function of the current journal system.

The key points in the paper are as follows:

There’s a growing importance of science and trust in science
There’s also a growing rate (x20 from 2000 to 2010) of retractions, with scientific misconduct cases growing even faster to become the majority of retractions (to an overall rate of 0.02% of published papers)
There’s a larger and growing “publication bias” problem – in other words, an increasing unreliability of published findings
One problem: initial “strong effects” get published in high-ranking journal, but subsequent “weak results” (which are probably more reasonable) are published in low-ranking journals
The formal “Impact Factor” (IF) metric for rank is highly correlated to “journal rank”, defined below.
There’s a higher incidence of retraction in high-ranking (measured through “high IF”) journals.
“A meta-analysis of genetic association studies provides evidence that the extent to which a study over-estimates the likely true effect size is positively correlated with the IF of the journal in which it is published”
Can the higher retraction error in high-rank journal be explained by higher visibility of those journals? They think not. Journal rank is bad predictor for future citations for example. [mathbabe inserts her opinion: this part needs more argument.]
“…only the most highly selective journals such as Nature and Science come out ahead over unselective preprint repositories such as ArXiv and RePEc”
Are there other measures of excellence that would correlate with IF? Methodological soundness? Reproducibility? No: “In fact, the level of reproducibility was so low that no relationship between journal rank and reproducibility could be detected.
More about Impact Factor: The IF is a metric for the number of citations to articles in a journal (the numerator), normalized by the number of articles in that journal (the denominator). Sounds good! But:
For a given journal, IF is not calculated but is negotiated – the publisher can (and does) exclude certain articles (but not citations). Even retroactively!
The IF is also not reproducible – errors are found and left unexplained.
Finally, IF is likely skewed by the fat-tailedness of citations (certain articles get lots, most get few). Wouldn’t a more robust measure be given by the median?

Conclusion

Journal rank is a weak to moderate predictor of scientific impact
Journal rank is a moderate to strong predictor of both intentional and unintentional scientific unreliability
Journal rank is expensive, delays science and frustrates researchers
Journal rank as established by IF violates even the most basic scientific standards, but predicts subjective judgments of journal quality

Long-term Consequences

“IF generates an illusion of exclusivity and prestige based on an assumption that it will predict subsequent impact, which is not supported by empirical data.”
“Systemic pressures on the author, rather than increased scrutiny on the part of the reader, inflate the unreliability of much scientific research. Without reform of our publication system, the incentives associated with increased pressure to publish in high-ranking journals will continue to encourage scientiststo be less cautious in their conclusions (or worse), in an attempt to market their research to the top journals.”
“It is conceivable that, for the last few decades, research institutions world-wide may have been hiring and promoting scientists who excel at marketing their work to top journals, but who are not necessarily equally good at conducting their research. Conversely, these institutions may have purged excellent scientists from their ranks, whose marketing skills did not meet institutional requirements. If this interpretation of the data is correct, we now have a generation of excellent marketers (possibly, but not necessarily also excellent scientists) as the leading figures of the scientific enterprise, constituting another potentially major contributing factor to the rise in retractions. This generation is now in charge of training the next generation of scientists, with all the foreseeable consequences for the reliability of scientific publications in the future.

The authors suggest that we need a new kind of publishing platform. I wonder what they’d think of the Episciences Project.

Categories: data science, modeling, open source tools, statistics

Poseurs should not own the backlash against data science poseurs

March 7, 2013 Cathy O'Neil, mathbabe 7 comments

I’ve noticed a recent trend in coverage of data science. Namely, there’s backlash against the hype and the over-promising, intentional or not, of data science and data scientists. People are beginning to develop smell tests for big data and raise incredulous eyebrows at certain claims.

This is a good thing. We data scientists should welcome the backlash, first because it’s inevitable, and second because it allows us to have a much-needed conversation about how to behave and what is reasonable to claim or even hope for with respect to big data. There is a poseur problem in big data, after all.

But, fellow data nerds, let’s take this as a cue to start an internal discussion about data science skepticism. Let’s make sure that it’s coming from our community, or at least the surrounding technical community, rather than from yet another set of poseurs who don’t actually know what data is and would only serve to lampoon and discredit our emerging field rather than improve it. We should be the ones leading the charge and admitting when we’re full of shit. We need to own the backlash.

Let me give you an example. A serious data scientist friend of mine recently got asked to be interviewed as part of a conversation on data science skepticism. After thinking hard about what her contribution could be, she wrote back to accept the offer, but was then told she was “off the hook” because they’d found someone else who was “perfect for the assignment.” It turned out to be a journalist who had previously interviewed her. That was his credential for this conversation.

But how can you actually have informed skepticism if you are not yourself an expert?

Another example. David Brooks recently wrote a column wherein he declared himself a data science skeptic and then followed that up by referring to no fewer than eight random statistical studies that made no coherent sense and had no overall point. My conclusion: this is the wrong man to lead the charge against poseurs in data science.

If we are going to rebel against big data soundbites, let’s not do it in soundbites. Instead, let’s talk to people on the inside, who see specific problems in the field and are willing to talk openly about them.

I liked the recent Strata talk by Kate Crawford entitled “Untangling Algorithmic Illusions from Reality in Big Data” (h/t Alan Fekete) which discusses bias in data using very concrete examples, and asks us to examine the objectivity of our “facts”.

For example, she talked about a smart phone app that finds potholes in Boston and report them to the City, and how on the one hand it was cool but on the other it would mean that, if naively applied, richer neighborhoods like Lincoln would get better services than Roxbury. She explained an important point: data analysis is not objective, which most people know. But often the data itself is not either – it was collected in a certain way with particular selection biases.

We need more conversations like this or else we will be leaving a hole which will be filled with loud, uninformed skeptics who would be right to raise the alarm.

One last thing. I’m aware that tons of people, especially serious academic statisticians and computer scientists, criticize data scientists for a totally different reason, namely that we are overly self-promoting (although academics have their own status plays).

But I don’t apologize for that. The truth is, a data scientist is a hybrid between a business person and a researcher. And this is a good thing, not a bad thing: it means the world gets direct access to the modeler, and can challenge any hyperbolic claims by asking for details, rather than having to go through a marketing person who acts (usually quite poorly) as a nerd interpreter. I for one would rather represent my work directly to the world (and be called a self-promoter) then to be kept in the back room.

Categories: data science, rant

WTF happened to feminism?!

March 6, 2013 Cathy O'Neil, mathbabe 46 comments

I usually don’t talk about feminism per se, because honestly I usually don’t think about it. Thanks to role models like my mom, who was an MIT co-ed in the ’60’s and an original nerd, helping develop the internet at Bolt Beranek and Newman and teaching computer science at UMass Boston, I’ve never for one second doubted my personal right to be a thoughtful, argumentative, and ambitious woman. I learned from my mom, and from other trailblazers, that I can pursue my personal interests and trust that the world will welcome my contributions.

Two events in the past week have made me think about how confusing this message has become for today’s growing girls, however.

First, the Sheryl Sandberg thing. To be honest, I haven’t read the book. But I have read this Washington Post article describing the book, and here’s my take on it: a corporate branding campaign loosely tied to women, but mostly pushing forward the agenda of how to be a company drone. From the article:

Sandberg’s understanding of leadership so perfectly internalizes the power structures of institutions created and dominated by men that it cannot conceive of women’s leadership outside of those narrow spaces. Does this also explain why, for Sandberg, the biggest threat to our ability to occupy a position of leadership is a woman’s desire to have a child? This is what men have been telling us for years.

Sandberg may miss so many women in her movement simply because her brand of gender equity is almost entirely privatized, doled out from employer to employee. Women, she advises, will find their way to the top through telling employers upfront about their childbearing plans, through learning how to negotiate pay raises (say “we” instead of “I,” Sandberg cautions, though the collective here is the corporation), through comportment exercises, as taught through Lean In’s web videos.

Like I said in this post, wouldn’t an actual feminist agenda include saying “The hell with this!” to a corporation that is so stifling that all our imaginations could bring us is better maternity leave negotiation tactics with the Borg? Resistance is futile, man!

Here’s the second thing that pissed me off this week. Harvard MBA Rachel Greenwald tells women what makes men not call back after a date.

Answer? As it turns out, anything where you have an opinion and they feel intimidated by you. Solution? Dumb it down, sex it up, and act like a toy. That way, in her words, you’ll be empowered, because they’re all calling you back, and the choice is yours. The choice, I’d add, from a long list of wimps. No thank you.

The video:

Can we do better than this, people??

Categories: rant

A blogging parliament

March 5, 2013 Cathy O'Neil, mathbabe 7 comments

Last night I found myself watching Steve Waldman’s talk at the 2011 Economic Bloggers Forum at the Kaufman Foundation. I’m a big fan of Waldman’s blog Interfluidity. His talk was interesting and thought-provoking, like his writing. I suggest you watch it.

After expressing outrage at the failure of control systems and the political system after the financial crisis, Waldman asks the question, why are we where we are? His answer: there’s a monopoly of power in this country even as information itself is increasingly available. The monopoly of power is extremely correlated, of course, to the rising wealth inequality, beautifully visualized in this recent video (h/t Leon Kautsky, Debika Shome) by Politizane.

The solution, he hopes, may include the blogosphere (although it’s not a perfect place either, with its own revolving doors, weird incentives and possibly conflicts of interest). The work of bloggers is valuable social capital, Steve argues, so how do we deploy it?

Steve introduced the concept of policy entrepreneurs, which have three characteristics:

They are sources of information in the form policy ideas. They possible even write laws.
They have some kind of certification in order to cover the policy maker’s ass.
They exert some kind of influence on policy makers, to create incentives for their policy goals.

In other words, a policy entrepreneur is someone in the business of shaping policy makers’ agendas.

If you stop there, you might think “lobbyist,” and you’d be right. But the problem with our current lobbyist system is not the above three characteristics, but rather that it’s a such a closed system. In other words, you essentially need to be rich to be an influential lobbyist (or at least, as an influential lobbyist, you are backed by enormous wealth), but then that increases the monopolistic nature of political power. It doesn’t solve our “monopoly of power” problem.

The question becomes, is there a way for normal people, or groups of people, to be policy entrepreneurs?

One possible solution, Waldman suggests, is to from a parliament of bloggers. Since groups are taken seriously, can bloggers form official groups in which they gain consensus around a topic and issue policy?

An intriguing idea, and I like it because it’s not really abstract: if bloggers decided to try this, they could literally just form a group, call ourselves a name, and start issuing policy proposals. Of course they’d probably not get anywhere unless we had influence or leverage.

Does something like this already exist? The closest thing I can think of is the hacker group Anonymous – although they might not be bloggers, they might be. They’re anonymous. I’m going to guess they are active on the web even if they don’t specifically blog. In any case, let’s see if they qualify as policy entrepreneurs in the above sense.

They don’t issue specific policy proposals, but they certainly object clearly to policies they don’t like.
Their credentials lie in their unparalleled ability to take control of information systems.
Likewise, their leverage is fierce in this domain.

In all, I don’t think Anonymous fits the bill – they’re too devoted to anarchy to deliver policy in the sense that Waldman suggests, and their tools are too crude to make fine points. This might have to do with the nature of hackers in general (keeping in mind that Anonymous stand for something far more extreme than the average hacker), which I read about in an essay by Paul Graham yesterday (h/t Chris Wiggins):

Those in authority tend to be annoyed by hackers’ general attitude of disobedience. But that disobedience is a byproduct of the qualities that make them good programmers. They may laugh at the CEO when he talks in generic corporate newspeech, but they also laugh at someone who tells them a certain problem can’t be solved. Suppress one, and you suppress the other.

Here’s another problem: aren’t bloggers in general kind of their own 1%? Is policy via a “parliament of bloggers” not enough of an improvement to the current system of insiders?

What about if Occupy got into the idea of being a vehicle of policy entrepreneurship? Even though we tend not to support specific political candidates in Occupy, we do consistently think about policy and decide whether to endorse a given bill or policy proposal. Could we, instead of commenting on existing policy, start thinking about proposing new policy, even to the point of writing new laws?

On the one hand such work requires enormously long discussions and difficult-to-obtain consensus, but on the other hand we have the knowledge, the abilities, and the moral persuasion. Do we have the influence? And would Occupiers think exerting influence on policy in the current corrupt system tantamount to selling out?

Categories: #OWS, musing

HSBC Action today at noon

March 4, 2013 Cathy O'Neil, mathbabe 1 comment

Here’s what I’m doing today at lunch time.

——

FOR IMMEDIATE RELEASE Monday, March 4, 2013

Occupy Wall Street Pickets At HSBC in New York

The action

HSBC To Issue Annual Earnings Report on Monday, March 4, 2013. These are the same unindicted criminals that admitted to money-laundering for drug cartels.

We Demand Justice for Executive Criminals & An End to “Too Big to Jail”!

Picket at HSBC New York Headquarters – Noon to 1:30pm

Gather on the steps of the New York Public Library Main Branch, 41st St. & Fifth Ave. See more at #OWSaltbanking and at our Facebook page for the event.

NEW YORK CITY – The “Occupy Wall Street – Alternative Banking” working group today continues its campaign to call on local, state and federal criminal and financial authorities to pursue prosecutions of executives at HSBC responsible for the bank’s admitted record of laundering money for drug cartels and alleged terrorists.

On Monday, March 4, as HSBC announces its annual earnings for 2012, OWS Alt Banking will rally at noon on the steps of the New York Public Library, Main Branch, across the street from HSBC’s New York headquarters.

The Story In Brief

In a recent settlement with the US Department of Justice, HSBC admitted to laundering billions of dollars for Mexican and Colombian drug cartels over many years. HSBC also admitted violating US sanctions regimes on Iran and on entities that the US government designates as “terrorist.”

Under the deal with DOJ, HSBC was forced to pay a $1.9 billion institutional penalty, which represents only six weeks worth of HSBC’s 2011 profits. The Justice Department agreed not to prosecute bank officials and other persons responsible for the admitted severe criminal conduct. All executive salaries and bonuses will be paid in full – some on deferred schedules. US financial authorities also declined to pull the criminal bank’s license to operate in the US.

Reportedly, authorities feared that jailing any of the megabank’s executives or shutting down its operations would cause its collapse and set off other bank collapses. This highlights the continuing systemic danger of “Too Big To Fail,” which also means “Too Big To Jail.” Thus OWS Alternative Banking is also calling on regulatory and legislative authorities finally to break up the big banks that dominate financial markets and can act with such impunity thanks to their sheer size.

Additional Context

OWS Alternative Banking Group points out that not to prosecute the HSBC executives responsible for money laundering to the full extent of the law diminishes the law and sends the wrong message. It creates an incentive for other banks to engage in the same criminal conduct.

Banks are being told that if they are big enough, they can commit any institutional crime without fear that personal punishments will follow, and in the confidence that institutional penalties will be minor in comparison to the profits made by breaking the law.

What message does this send to the American people, and to the world? “The war on drugs” and “the war on terror” rage on as centerpieces of US global policy.

In the United States, hundreds of thousands of people, predominantly people of color, have been imprisoned for often minor drug offenses. This has destroyed the futures of many young people and contributed to the biggest prison-industrial complex in the world.

In Mexico and Colombia, the US government supports a “drug war” in which literally thousands of people are murdered, often by the military personnel of those nations acting as death squads. In Pakistan, Yemen and other nations, US military drones bomb targeted persons – often killing their families or neighbors – on suspicion of “terrorism,” without trial or appeal.

In the US, executives at Islamic charities accused of funneling money to organizations designated as terrorist have received multiple life sentences. How is this different from HSBC’s conduct in helping to maintain the finances of drug cartels and alleged terrorists? Money laundering is absolutely essential to the business of the illegal drug trade. Furthermore, money launderers make the fattest profits out of all participants in the illegal drug trade.

Why are the banker-criminals getting a free pass? Why do we allow a two-tier justice system, with harsh punishments for minor drug offenders and rewards and impunity for the biggest offenders of all?

Therefore OWS Alternative Banking is asking for fair prosecution of the HSBC criminals and for ethical practices and staffing to replace the blatant abuse of customer money and good will. HSBC’s license to operate in the United States must be pulled.

The further message is to break up the big banks. They can no longer be considered too big too fail and allowed to commit blatant crimes of fraud and money laundering at US taxpayers’ expense.

Given the United Nations estimate of $400 billion in drug money laundered annually, it is nearly impossible that this enormous volume of dirty cash does not in large part go through the other big Wall Street and City of London banks.

“IT IS A DARK DAY FOR THE RULE OF LAW.” – New York Times, 12/11/2012

“Apparently non-violent demonstration against corrupt banking is subject to more criminal scrutiny than actual corrupt banking.” – Village Voice, 12/26/2012

http://alternativebanking.nycga.net

Categories: #OWS

Nasty reader comments and blogging

March 3, 2013 Cathy O'Neil, mathbabe 36 comments

I’m pretty sure you guys know this already, but I love my regular readers and commenters. It’s a large part of why I blog – I feel like I’m having a super interesting cocktail party every morning in my underwear. I’m investing in the quality of the rest of my day, stealing a moment before my family wakes up so I can articulate one single idea. The payoff is, most of the time, dependably good conversation that lasts all day, or even more than a day, as your comments and emails come in.

Of course, there are sometimes nasty people and comments in addition to thoughtful ones. Not everyone interprets me as trying to figure stuff out, they think I’m being intentionally asinine or manipulative. Or sometimes they just don’t agree with me, and instead of explaining their reasoning they just yell. Or sometimes they are just jerks, getting out their aggression on a stranger.

My first rule is to allow comments that disagree with me, as long as the reasons are articulated and as long as the comment isn’t abusive. Rude is ok, “you are stupid” is not ok.

My second rule is to have a thick skin. I can completely ignore the sentiment of an abusive commenter calling me names, because first of all I’ve heard it all before and second I’m pretty sure it’s not about me.

I’m not saying it doesn’t bother me at all, because obviously it’s a pain to have to go through my email and make sure people are being civil.

For example, whenever I get onto the top 10 of Hacker News, which has been a few times now, I’ve noticed a huge wave of nasty comments. Of course this could be a direct result of how many people I get (thousands per hour), but I don’t think so – the ratio of interesting to abusive comments coming from Hacker News traffic is tiny. It creates nasty work for me, which I feel compelled to do because letting nasty comments stay on my blog makes me feel violated and intentionally misunderstood.

This morning I found this article via Naked Capitalism regarding reader comments, and how nasty ones make subsequent readers evaluate the message differently, and in particular, more negatively. In other words, my intuition was right – it’s super important to curate comments.

My experience with Hacker News has also given me sympathy for Izabella Laba‘s position that she doesn’t accept comments on her blog (read this post for example). She puts herself out there, with strong opinions, and many of her posts are important and thought-provoking. And by the same token people can get pretty threatened by what she has to say. I can well imagine what her experience has been. What if every day was a Hacker News day? What if a majority of comments contained ridiculous and personal attacks? Yuck.

Makes me even more grateful to have you guys.

Categories: musing, news, women in math

Aunt Pythia’s advice

March 2, 2013 Cathy O'Neil, mathbabe 3 comments

You’ve stumbled upon yet another week’s worth of worthy questions that will be awkwardly sidestepped by mathbabe’s alter ego Aunt Pythia.

By the way, if you don’t know what you’re in for, go here for past advice columns and here for an explanation of the name Pythia. Most importantly,

Please submit your question at the bottom of this column!

——

Aunt Pythia,

I graduated seven years ago and since then I’ve been working in finance. (I was a floor clerk when Bear Stearns took a nosedive and the words “too big to fail” reared their ugly head.) Now I’m finally ready to flee this flaming cesspool. Do you have any advice on how to get out without suffering a major career setback? I have some skills relevant to data science — python, SQL, some tinkering with Hadoop — but I don’t have any formal training in either computer science or in statistics, and I don’t know a soul outside the financial industry. Is there a way out, or am I stuck here forever?

Lonely in Finance

Dear Lonely,

You are not stuck. Quit your job, live off your savings, and start networking in another space. What do you want to do? What turns you on? Take a leap of faith and get yourself moving. Of course it will be a career setback! Because you are going to begin anew! That’s a good thing, not a bad thing.

I’ll be you don’t have 3 kids and a mortgage even, and yet you still somehow feel like you need to be completely safe. You don’t! You have highly marketable skills, and yes you’ll have to develop even more, but for god’s sake don’t stay in a flaming cesspool just because the money’s good. That is something you know you will regret on your deathbed.

Get the fuck out.

Aunt Pythia

——

Dear Aunt Pythia,

I’m terrible at asking for advice, because whenever I think of a problem I immediately dismiss it as too silly, or too easy, or I convince myself that I know the answer. Sometimes I don’t ask for help because I feel too much like I’m mooching people’s time, or something like that. How can I get better at asking for advice?

Acrimoniously Chancing Ridiculously-Off Name, You Minx

p.s. Too meta? Or not meta enough?

Dear ACRONYM,

You’re right to be worried. Asking a question is a tough business, and it’s all about timing.

Say, for example, you came up to me on a packed subway car at rushhour, when I was reading my kindle (current book: Mostly Harmless Econometrics), say, and you poked me in my back and said, “hey buddy I’ve got a problem and you’re the person that’s gonna solve it!”. In that situation, and I have to be honest here, I’d be somewhat reluctant to consider your problem as one of my own.

Or, imagine you approached me while I was in the women’s lavatory stall at a public bathroom, and, say, wrote down your question on the back of the bathroom tissue and scooted it over to me on the floor, again I can’t promise you I’d appreciate it (unless you were asking me for more toilet paper, then we’d be good).

However, this being an advice column, I’m pretty confident I’ve made a safe place for even silly questions (and questions that are too easy are even better, because they make me feel smart!). That’s what Saturday mornings are for: I love doing this, and you are helping me do this!

Love,

Auntie P

p.s. Just meta enough!

——

Dear Aunt Pythia,

Our culture would have us believe that we are nothing if we are not statistically above average, preferably gifted. Being successful is practically considered a matter of common courtesy. Nonetheless, most of us stubbornly persist in a state of statistical mediocrity, defiantly average. How can we quell our culture’s craving for exceptionalism?

Average Without Being Mean

Dear Average,

I’ve never met someone who thinks they are actually average. I might meet someone who will tell me they’re bad at tests or that they suck at math, but people generally know better than to stop there when self-assessing (hopefully! unless they’re actually really depressed).

After all, a given person has their own personal passions and interests and each develops their own skills and natural talents. While it may be true that someone is born with average potential for a certain thing, it’s more a matter of their passion and time spent practicing that thing than anything about their inherent ability that makes them good or great at something.

In other words, to exist in a state of (supposed) statistical mediocrity is to submit entirely to external measures of generic skills. Who would do that, and why? If I was against standardized testing before, this idea makes me double down.

In terms of our culture, I don’t know how to avoid these kind of “cravings for exceptionalism” if you want to work as a data scientist in a tech startup, for example, because of the competitive nature of that industry. But there are plenty of jobs where being a thoughtful, hard-working person who isn’t a jerk is welcomed.

Aunt Pythia

——

Dear Aunt Pythia,

I have a PhD in math and have been interested in getting a job as a data scientist. I have been following your blog, following a few classes online, and talking to people from my program about other resources. I have been applying for jobs in California for over a month now, and since I have no established experience with analyzing big data sets, I have not received any requests for interviews. I would appreciate any advice you can offer!

Searching in California

Dear Searching,

I wish I had a better answer, because it kind of drives me nuts how hard it is for people like you to get a good job. But first I’d examine your reasoning: how do you know it’s because of a lack of experience analyzing big data sets that you haven’t gotten an interview? I’m not saying that’s not relevant, but I’m pretty sure it will be a combination of factors – including connections. Plus, what is your “program”? Are you still a student? Are you in an academic institution? Possible things you might try:

finding a class that will give you mad skillz working with big data sets
reading “Mostly Harmless Econometrics“
networking with other Ph.D.’s you know who already have a job in industry
going to data conferences or tech meetups and introducing yourself to a bunch of people
finding out about internship possibilities
going to data hackathons and working alongside someone who knows the ropes

Good luck!

Aunt Pythia

——

Please please please submit questions, thanks!

Categories: Aunt Pythia

Prices in the junk bond market

March 1, 2013 Cathy O'Neil, mathbabe 7 comments

There are various ways of deciding how valuable something is. People spend some amount of time talking about “the current value of future earnings til the end of time” as a rule-of-thumb measurement. That sometimes works (i.e. jives with what the selling price is), but it’s certainly not robust – in a given case, plenty of people think there’s a good reason a stock should be worth more than that, if their personal growth projections are rosy (you could argue that they are still valuing future earnings, but they’ve got a different projection than, say, the current dividends continued as is. Another possibility is that they’re simply valuing future values coming from other people). Similarly, some stocks are underpriced with respect to this baseline. Could it be that they’re cooking their books? If they don’t last til the end of time then they could hardly be making earnings til then (Groupon).

Of course when you go down that road, nothing lasts til the end of time. Never mind companies, the industry in which the company sits will be dead before too long unless it’s food or cosmetics.

Anyway, throw out the future earnings price for a moment, and replace it by something else entirely: there’s a certain amount of money invested in the (international) market at a given moment, and it has to go somewhere. I think of it as a big pot that sloshes around and achieves equilibrium depending on various things like relative interest rates in different countries, and to a lesser extent, regulation in different countries and access to markets. Like, the carry trade is kind of a big deal, and depends almost entirely on the Japanese interest rate being tiny.

Of course it’s not really that simple, since people can and do remove money from the market at certain times – it’s not a closed system. But not as much money is removed as you might think, because if you think about it, lots of people have set up their livelihoods to be investing large pots of money, so they need to appear busy.

Articles like this one from Bloomberg make me think about the “where should we put our money that we need to invest somewhere?” effect is particularly strong right now. We see people “chasing yield” in the junk bond market, buying junk bonds that have positive yields because their options are limited while the Fed keeps the rates really low (this is not a side-effect of the Fed’s keeping the rates low, it’s their goal. They want people to invest in financing businesses, which is what buying junk bonds is).

But they (the investors) all want the same stuff, so the prices are too ~~low~~ high, which is another way of saying the yields are a lot lower than they’d otherwise be if there were other things to buy. This might be a good example of where the price of junk debt is not particularly good at exposing the actual risk of default. Well, it might be an ok indicator of the very short-term default rate, but that’s just because money is so cheap right now, businesses in trouble can just borrow more. It’s kind of a set-up for a bubble.

The article makes the point that once the Fed raises rates, people will flee this market, since they will actually be able to make money again with less risky bonds. The slower actors will be left with much-reduced-in-value junk debt. The big pot of money which is the market will have an entirely new equilibrium point, and there will be lots of death and destruction in the transition. It’s become even more crucial than usual to time the Fed’s moves, but keep in mind money managers are going to stay in there as long as they possibly can because they don’t want to miss yield while their bonuses depend on it (“opportunity costs”). It’s a game of chicken.

Staying with the meta-analysis, can someone do a back-of-the-envelope estimate of how much built-in interest rate risk we’ve taken on by the issuance of so much junk debt in the overall international portfolio? Is it sizeable?

Categories: finance, musing, news

Is mathematics a vehicle for control fraud?

February 28, 2013 Cathy O'Neil, mathbabe 19 comments

Bill Black

A couple of nights I ago I attended this event at Columbia on the topic of “Rent-Seeking, Instability and Fraud: Challenges for Financial Reform”.

The event was great, albeit depressing – I particularly loved Bill Black‘s concept of control fraud, which I’ll talk more about in a moment, as well as Lynn Turner‘s polite description of the devastation caused by the financial crisis.

To be honest, our conclusion wasn’t a surprise: there is a lack of political will in Congress or elsewhere to fix the problems, even the low-hanging obvious criminal frauds. There aren’t enough actual police to take on the job of dealing with the number of criminals that currently hide in the system (I believe the statistic was that there are about 1,000,000 people in law enforcement in this country, and 2,500 are devoted to white-collar crime), and the people at the top of the regulatory agencies have been carefully chosen to not actually do anything (or let their underlings do anything).

Even so, it was interesting to hear about this stuff through the eyes of a criminologist who has been around the block (Black was the guy who put away a bunch of fraudulent bankers after the S&L crisis) and knows a thing or two about prosecuting crimes. He talked about the concept of control fraud, and how pervasive control fraud is in the current financial system.

Control Fraud

Control fraud, as I understood him to describe it, is the process by which a seemingly legitimate institution or process is corrupted by a fraudulent institution to maintain the patina of legitimacy.

Once you say it that way, you recognize it everywhere, and you realize how dirty it is, since outsiders to the system can’t tell what’s going on – hey, didn’t you have overseers? Didn’t they say everything was checking out ok? What the hell happened?

So for example, financial firms like Bank of America used control fraud in the heart of the housing bubble via their ridiculous accounting methods. As one of the speakers mentioned, the accounting firm in charge of vetting BofA’s books issued the same exact accounting description for many years in the row (literally copy and paste) even as BofA was accumulating massive quantities of risky mortgage-backed securities (update: I’ve been told it’s called an “Auditors Report” and it has required language. But surely not all the words are required? Otherwise how could it be called a report?). In other words, the accounting firm had been corrupted in order to aid and abet the fraud.

“Financial Innovation”

To get an idea of the repetitive nature and near-inevitability of control fraud, read this essay by Black, which is very much along the lines of his presentation on Tuesday. My favorite passage is this, when he addresses how our regulatory system “forgot about” control fraud during the deregulation boom of the 1990’s:

On January 17, 1996, OTS’ Notice of Proposed Rulemaking proposed to eliminate its rule requiring effective underwriting on the grounds that such rules were peripheral to bank safety.

“The OTS believes that regulations should be reserved for core safety and soundness requirements. Details on prudent operating practices should be relegated to guidance.

Otherwise, regulated entities can find themselves unable to respond to market innovations because they are trapped in a rigid regulatory framework developed in accordance with conditions prevailing at an earlier time.”

This passage is delusional. Underwriting is the core function of a mortgage lender. Not underwriting mortgage loans is not an “innovation” – it is a “marker” of accounting control fraud. The OTS press release dismissed the agency’s most important and useful rule as an archaic relic of a failed philosophy.

Here’s where I bring mathematics into the mix. My experience in finance, first as a quant at D.E. Shaw, and then as a quantitative risk modeler at Riskmetrics, convinced me that mathematics itself is a vehicle for control fraud, albeit in two totally different ways.

Complexity

In the context of hedge funds and/or hard-core trading algorithms, here’s how it works. New-fangled complex derivatives, starting with credit default swaps and moving on to CDO’s, MBS’s, and CDO+’s, got fronted as “innovation” by a bunch of economists who didn’t really know how markets work but worked at fancy places and claimed to have mathematical models which proved their point. They pushed for deregulation based on the theory that the derivatives represented “a better way to spread risk.”

Then the Ph.D.’s who were clever enough to understand how to actually price these instruments swooped in and made asstons of money. Those are the hedge funds, which I see as kind of amoral scavengers on the financial system.

At the same time, wanting a piece of the action, academics invented associated useless but impressive mathematical theories which culminated in mathematics classes throughout the country that teach “theory of finance”. These classes, which seemed scientific, and the associated economists described above, formed the “legitimacy” of this particular control fraud: it’s math, you wouldn’t understand it. But don’t you trust math? You do? Then allow us to move on with rocking our particular corner of the financial world, thanks.

Risk

I also worked in quantitative risk, which as I see it is a major conduit of mathematical control fraud.

First, we have people putting forward “risk estimates” that have larger errorbars then the underlying values. In other words, if we were honest about how much we can actually anticipate price changes in mortgage backed securities in times of panic, then we’d say something like, “search me! I got nothing.” However, as we know, it’s hard to say “I don’t know” and it’s even harder to accept that answer when there’s money on the line. And I don’t apologize for caring about “times of panic” because, after all, that’s why we care about risk in the first place. It’s easy to predict risk in quiet times, I don’t give anyone credit for that.

Never mind errorbars, though- the truth is, I saw worse than ignorance in my time in risk. What I actually saw was a rubberstamping of “third part risk assessment” reports. I saw the risk industry for what it is, namely a poor beggar at the feet of their macho big-boys-of-finance clients. It wasn’t just my firm either. I’ve recently heard of clients bullying their third party risk companies into allowing them to replace whatever their risk numbers were by their own. And that’s even assuming that they care what the risk reports say.

Conclusion

Overall, I’m thinking this time is a bit different, but only in the details, not in the process. We’ve had control fraud for a long long time, but now we have an added tool in the arsenal in the form of mathematics (and complexity). And I realize it’s not a standard example, because I’m claiming that the institution that perpetuated this particular control fraud wasn’t a specific institution like Bank of America, but rather then entire financial system. So far it’s just an idea I’m playing with, what do you think?

Categories: #OWS, finance, math, musing, rant, statistics

How much are the taxpayers subsidizing too-big-to-fail banks, if not $83 billion per year?

February 27, 2013 Cathy O'Neil, mathbabe 22 comments

There’s been lots of controversy over the Bloomberg editorial I wrote about a few days ago. The article, which is here, used an IMF study to do a back-of-the-envelope calculation on how much the yearly taxpayer subsidy is for the too-big-to-fail banks.

Since then, there have been lots of people coming out of the woodwork complaining about their interpretation of the paper, about their assumptions, and about the result. I also had someone doing that on my comments, which I appreciate.

Then, more recently, Bloomberg doubled down on their original number, which is exciting stuff in the world of wonky modeling.

Here’s where I am:

This question is important- possibly the most important question about the current financial system, as it relates to the average taxpayer. Wouldn’t you want to know how much something you’ve bought costs?
And I’m absolutely smitten by the Bloomberg editorial staff for raising the question and coming out with a model and an answer.
That doesn’t mean it’s perfect. They were relatively sloppy (but not as sloppy as some people claim).
I’m no expert either, but I’m absolutely intrigued by this question and the possible answers.
But since I’m a modeler, I know it’s a lot easier to push over a model by complaining about an assumption than it is to come up with a better model that doesn’t make such stupid assumptions.
So anyone who complains should also offer an alternative.
Because we need to know the answer to this, and since there’s not one answer, we need to have this argument, publicly.
And after all what’s the point of modeling if we can’t answer this?

One more thing. Matt Levine at Dealbreaker has come up with his own model, here, but I’m not sure it’s more convincing than Bloomberg’s. In particular his conclusion is that TBTF banks actually subsidize us (not really).

So what is it? Where’s your model?

We need this public discussion and we need thoughtful arguments about the existing models. Let’s do this!

Categories: finance

Ninja Warrior – Sasuke

February 25, 2013 Cathy O'Neil, mathbabe 2 comments

If you having trouble falling asleep some time, but you’re too tired to actually read things that require anything more than amusement, amazement, and bafflement, then let me suggest you watch a bit of Sasuke, also known as the Japanese version of American Ninja Warrior (hat tip Johan de Jong).

I dare you to watch only five minutes of the following final round (here’s the competition from the beginning if you are hardcore) in which the last surviving American gets expelled almost immediately and it’s down to the last few Japanese competitors. It’s extra fun for it to be in Japanese because then you get to add in dubbing, kind of like Iron Chef used to be back before it became Americanized:

Categories: musing

The overburdened prior

February 24, 2013 Cathy O'Neil, mathbabe 8 comments

At my new job I’ve been spending my time editing my book with Rachel Schutt (who is joining me at JRL next week! Woohoo!). It’s called Doing Data Science and it’s based on these notes I took when she taught a class on data science at Columbia last semester. Right now I’m working on the alternating least squares chapter, where we learned from Matt Gattis how to build and optimize a recommendation system. A very cool algorithm.

However, to be honest I’ve started to feel very sorry for the one parameter we call $\lambda.$ It’s also sometimes referred to as “the prior”.

Let me tell you, the world is asking too much from this little guy, and moreover most of the big-data world is too indifferent to its plight. Let me explain.

$\lambda$ as belief

First, he’s supposed to reflect an actual prior belief – namely, his size is supposed to reflect a mathematical vision of how big we think the coefficients in our solution should be.

In an ideal world, we would think deeply about this question of size before looking at our training data, and think only about the scale of our data (i.e. the input), the scale of the preferences (i.e. the recommendation system output) and the quality and amount of training data we have, and using all of that, we’d figure out our prior belief on the size or at least the scale of our hoped-for solution.

I’m not statistician, but that’s how I imagine I’d spend my days if I were: thinking through this reasoning carefully, and even writing it down carefully, before I ever start my training. It’s a discipline like any other to carefully state your beliefs beforehand so you know you’re not just saying what the data wants to hear.

$\lambda$ as convergence insurance

But then there’s the next thing we ask of our parameter $\lambda,$ namely we assign him the responsibility to make sure our algorithm converges.

Because our algorithm isn’t a closed form solution, but rather we are discovering coefficients of two separate matrices $U$ and $V$ , fixing one while we tweak the other, then switching. The algorithm stops when, after a full cycle of fixing and tweaking, none of the coefficients have moved by more than some pre-ordained $\epsilon.$

The fact that this algorithm will in fact stop is not obvious, and in fact it isn’t always true.

It is (mostly*) true, however, if our little $\lambda$ is large enough, which is due to the fact that our above-mentioned imposed belief of size translates into a penalty term, which we minimize along with the actual error term. This little miracle of translation is explained in this post.

And people say that all the time. When you say, “hey what if that algorithm doesn’t converge?” They say, “oh if $\lambda$ is big enough it always does.”

But that’s kind of like worrying about your teenage daughter getting pregnant so you lock her up in her room all the time. You’ve solved the immediate problem by sacrificing an even bigger goal.

Because let’s face it, if the prior $\lambda$ is too big, then we are sacrificing our actual solution for the sake of conveniently small coefficients and convergence. In the asymptotic limit, which I love thinking about, our coefficients all go to zero and we get nothing at all. Our teenage daughter has run away from home with her do-nothing boyfriend.

By the way, there’s a discipline here too, and I’d suggest that if the algorithm doesn’t converge you might also want to consider reducing your number of latent variables rather than increasing your $\lambda$ since you could be asking too much from your training data. It just might not be able to distinguish that many important latent characteristics.

$\lambda$ as tuning parameter

Finally, we have one more job for our little $\lambda$ , we’re not done with him yet. Actually for some people this is his only real job, because in practice this is how he’s treated. Namely, we optimize him so that our results look good under whatever metric we decide to care about (but it’s probably the mean squared error of preference prediction on a test set (hopefully on a test set!)).

In other words, in reality most of the above nonsense about $\lambda$ is completely ignored.

This is one example among many where having the ability to push a button that makes something hard seem really easy might be doing more harm than good. In this case the button says “optimize with respect to $\lambda$ “, but there are other buttons that worry me just as much, and moreover there are lots of buttons being built right now that are even more dangerous and allow the users to be even more big-data-blithe.

I’ve said it before and I’ll say it again: you do need to know about inverting a matrix, and other math too, if you want to be a good data scientist.

* There’s a change-of-basis ambiguity that’s tough to get rid of here, since you only choose the number of latent variables, not their order. This doesn’t change the overall penalty term, so you can minimize that with large enough $\lambda,$ but if you’re incredibly unlucky I can imagine you might bounce between different solutions that differ by a base change. In this case your steps should get smaller, i.e. the amount you modify your matrix each time you go through the algorithm. This is only a theoretical problem by the way but I’m a nerd.

Categories: data science, math education, modeling, statistics

Aunt Pythia’s advice

February 23, 2013 Cathy O'Neil, mathbabe Comments off

I’m psyched to be back here at my weekly inane advice column. Glad you’re here too.

This week I had the hugest compliment payed to my alter ego Aunt Pythia, namely that in a domestic dispute her name was floated as the person who could solve the dilemma at hand. There was even a threat of writing to Aunt Pythia, mid-argument!

Now that’s what I call real-world impact, which as you know is how I measure all things.

By the way, if you don’t know what you’re in for, go here for past advice columns and here for an explanation of the name Pythia. Most importantly,

Please submit your question at the bottom of this column!

——

Dear Aunt Pythia,

What do you wish you had known when you were 21 years old? If you could go back and yell, “That’s not important, don’t think about that! Look over here!” what would you explain? I’ve got only one run through the early 20s and I need your help.

Gloomily orating, an undergrad not totaling plenty years turning humbug into acronym

Dear Goauntpythia,

This will sound trite, but here goes.

There’s one thing I figured out when I was about 23 or so that has served me incredibly well, which is something I call “death bed reckoning”. Namely, when I struggle to make a decision, I think about how I’d view this decision one way or the other on my death bed. For whatever reason I have a lot of time on my hands in this imaginary bed.

So, for example, if I think, “I’ll regret it (on my death bed) if I don’t try, because it’s actually something I want to have at least attempted” then my answer is obvious, and I do it. If instead I think, “I won’t give a shit (on my death bed) if I do this or not” then I stop worrying and just do whatever I freaking feel like.

It’s actually incredibly nerdy if you think about it: an asymptotic limit of whether it matters and which direction it matters.

I recently came across a list of the “5 top regrets of dying people” with interest, since I think about death bed regrets so much. And guess what? I can happily say I’m avoiding those top 5 by living my life via death bed reckoning. They are (according to this possibly totally unscientific article):

I wish I’d had the courage to live a life true to myself, not the life others expected of me.
I wish I didn’t work so hard.
I wish I’d had the courage to express my feelings.
I wish I had stayed in touch with my friends.
I wish that I had let myself be happier.

I am going to restrain myself from giving you advice that’s more precise than this because I honestly know nothing about what it would mean for you to have the courage to live a life true to yourself. But the cool thing is you know what that means. Good luck!

Aunt Pythia

——

Dear Aunt Pythia,

I’ve been reading this blog by a total babe, and I love it. She is just so f*cking right all the time. I read her posts every day and I feel like telling her how much what she is writing comes from the bottom of my heart. It feels like having a mental hard on, but then I feel that it would be cheesy and cliche to say that. Do you know a good way to handle this situation?

Mental Hard On

Dear MHO,

I hear you, same thing happens to me. I feel like anything too overt would run the risk of giving her even more of a cult of personality – after all, I don’t want her to get all fake and/or self-conscious! What if she starts giving TED talks, for God’s sake!? That would be horrible.

Even so, I need to somehow feel close.

The best I’ve come up with to deal with my crush is to buy lots of t-shirts and coffee mugs that remind me of her so she’ll be with me in those stolen intimate moments I eke out in lavatory stalls.

Good luck!

——

Dear Aunt Pythia,

I’ve been dating this amazing woman, but there’s one issue that I feel uneasy about (I’m a man).

She’s a few years older than I am, and very successful and well-regarded in her field. Her independence and intelligence, which are doubtless big contributors to her success, are also huge turn-ons for me. I have a lot of ambition, but in my profession, being good at what you do does not lead to much money or recognition without, in addition, a big stroke of luck.

I worry that if things grow more serious, the status-income inequality will become an issue. It doesn’t bother me, but I get anxious it might start to bother her down the line. Is that the same thing as it bothering me?

I know there’s some ingrained, implied sexism here on my part that on a conscious level I disagree with–it’s not from her these worries come but from some past experience and general societal input re: gender roles. How do I get over this anxiety–which, I stress, I think is MY problem–and not let it damage what could be something wonderful?

R. Burns

Dear Mr. Burns,

I want to separate the issues here a bit.

First of all, you’re right that it’s your problem, so don’t ascribe it to her until she starts saying something like, “I feel weird that I make more money than you.” But second of all, it’s not about money. It sounds like together you guys make enough. It’s really about status and recognition – outward success, if you will.

So putting those two things together, I’ll rephrase your question: Can I make myself feel like I deserve this sexy successful amazing woman who loves me even though my chosen field is difficult to break into and even though I have not yet achieved outward success? And the answer to that is, I hope so.

I’ll be honest with you: if you can’t figure out how to feel good about yourself, you might very well fuck up your relationship through sheer insecurity about your relative outward appearance of success. That would be a shame, but I’ve seen it happen.

There’s a part of us that wants to be able to parade our lovers in front of the world and shout, “look at who I’m with! I’m with a celebrity!” but there’s an even deeper part of us that wants to be with someone who has long-term goals, who is striving towards them, and who takes them seriously. I’ll bet you’re with someone who digs you on the second level. After all, she started dating you as you are.

As for myself, I’ve always been attracted to people who are really fucking good at something, but that thing could be playing the guitar, writing awesome code, or understanding politics. It’s the passion, the swagger, and the work ethic that matter, not the awards.

Good luck!

Aunt Pythia

——

Dear Aunt Pythia,

I am an undergraduate studying pure math and I can say with firm resolution that I love math and it will hold my intellectual attention for the rest of my life (no matter what I end up doing). That said, I will be 25 by the time I graduate and so I am more enticed by the prospect of finding a good job, being financially independent, and gaining real workplace experience once I finish rather than going to grad school for another 5-6 years (but don’t get me wrong, I absolutely want to get a PhD at some point).

How marketable could I be for those data science jobs with just a bachelor’s in math (even though I’m currently taking lots of grad math classes and have experience working in computational labs)? Would it be naive of me to think that I could find a job with just a bachelor’s, narrow down the potential array of dissertation topics I could undertake based on patterns/data that I see in real life, and then return to academia? I just fear being past the age of 30 with overly specialized knowledge of just one area of math with no other real job prospect than gaining membership to some Merlin-bearded, nerd-coven of mathematicians (again, not that I wouldn’t consider that awesome, I just want to have more than one option for the road ahead).

Absolutely Dreading Dissatisfaction

Dear ADD,

I don’t think one strictly needs a Ph.D. to get a job in data science, but one should certainly have the quantitative smarts to be able to get a Ph.D. in a hard science. It sounds like you have those smarts, and moreover you have experience in a computational lab (was that on a break from college?).

I think you should look for an internship with a tech data team over the summer and see how you like it and see how you fit in, etc. My guess is that your maturity and experience, combined with intense love of all things math, will go a long way both for you and for your colleagues.

I wish I had a place to send you for internships (readers, please comment below on where to find out how to apply to such things!) but start with a google search and some questions to your fellow nerds.

Good luck!

Aunt Pythia

——

Please please please submit questions, thanks!

Categories: Aunt Pythia

Break up the megabanks already (#OWS)

February 22, 2013 Cathy O'Neil, mathbabe 26 comments

For the past few months at Occupy we’ve been focusing more and more on having a single message and goal. That has been to break up the big banks.

What’s great about this goal is that it’s a non-partisan issue; there is growing consensus (among non-bankers) from the left and the right that the current situation is outrageous and untenable. What’s not great, of course, is that the situation is so easy to spot because it’s so heinous.

Yesterday another voice joined the Break-Up-The-Big-Banks chorus in the form of an editorial at Bloomberg (hat tip Hannah Appel). They wrote a persuasive piece on breaking up the big banks based on simple arithmetic involving bank profits and taxpayer subsidy. Even the title fits that description: “Why Should Taxpayers Give Big Banks $83 Billion a Year?”. Here’s an excerpt from the editorial (emphasis mine):

…Banks have a powerful incentive to get big and unwieldy. The larger they are, the more disastrous their failure would be and the more certain they can be of a government bailout in an emergency. The result is an implicit subsidy: The banks that are potentially the most dangerous can borrow at lower rates, because creditors perceive them as too big to fail.

Lately, economists have tried to pin down exactly how much the subsidy lowers big banks’ borrowing costs. In one relatively thorough effort, two researchers — Kenichi Ueda of the International Monetary Fund and Beatrice Weder di Mauro of the University of Mainz — put the number at about 0.8 percentage point. The discount applies to all their liabilities, including bonds and customer deposits.

Big Difference

Small as it might sound, 0.8 percentage point makes a big difference. Multiplied by the total liabilities of the 10 largest U.S. banks by assets, it amounts to a taxpayer subsidy of $83 billion a year. To put the figure in perspective, it’s tantamount to the government giving the banks about 3 cents of every tax dollar collected.

The top five banks — JPMorgan, Bank of America Corp., Citigroup Inc., Wells Fargo & Co. and Goldman Sachs Group Inc. – – account for $64 billion of the total subsidy, an amount roughly equal to their typical annual profits (see tables for data on individual banks). In other words, the banks occupying the commanding heights of the U.S. financial industry — with almost $9 trillion in assets, more than half the size of the U.S. economy — would just about break even in the absence of corporate welfare. In large part, the profits they report are essentially transfers from taxpayers to their shareholders.

Next time someone tells me I want to take money out of rich people’s pockets (and that makes me a free market hater), I’m going to remind them that every time I pay taxes, 3 cents out of every dollar (that I know of) goes directly to the banks for no good reason whatsoever except the fact that they have the lobbyists to support this system. They’re bullies, and I hate bullies.

So no, I’m not suggesting we take honestly earned money out of the pockets of those who deserve it, I’m suggesting we stop stuffing insiders’ pockets with our money. Big difference.

But it’s not just money I object to – it’s future liability. There’s now an established track record of discovered criminal acts that don’t get anyone at the big banks in trouble. We are setting ourselves up for an even bigger bailout of some form soon, one that we taxpayers really may not be able to afford.

I think of the too-big-to-fail problem as like having an alcoholic brother-in-law who not only sleeps on your couch every night but also knows the PIN code on your ATM card. The money is irksome, no doubt, but what if that guy fell asleep smoking a cigarette and me and my kids die in the resulting fiery inferno? And it’s not that I think all addicts could be magically cured, but I don’t want them to have access to my personal stuff. Get them out of my house.

So can we break up the megabanks already? I’d really like to stop worrying about them because I have better things to do.

Categories: #OWS, finance, rant

NYC data hackathons, past and future: Politics, Occupy, and Climate change (#OWS)

February 21, 2013 Cathy O'Neil, mathbabe 5 comments

The past: Money in politics

First thing’s first, I went to the Bicoastal Datafest a few weekends ago and haven’t reported back. Mostly that’s because I got sick and didn’t go on the second day, but luckily other people did, like Kathy Kiely from the Sunlight Foundation, who wrote up this description of the event and the winning teams’ projects.

And hey, it turns out that my new company shares an office with Harmony Institute, whose data scientist Burton DeWilde was on the team that won “Best in Show” for their orchestral version of the federal government’s budget.

Another writeup of the event comes by way of Michael Lawson, who worked on the team that set up an accounting fraud detection system through Benford’s Law. I might be getting a guest blog post about this project through another one of its team members soon.

And we got some good progress on our DataKind/ Sunlight Foundation money-in-politics project as well, thanks to DataKind intern Pete Darche and math nerds Kevin Wilson and Johan de Jong.

The future one week from now: Occupy

Next up, on March 1st and 2nd at CUNY Graduate Center is this data hackathon called OccupyData (note this is a Friday and Saturday, which is unusual). You can register for the event here.

It’s a combination of an Occupy event and a datafest, so obviously I am going to try to go. The theme is general – data for the 99% – but there’s a discussion on this listserv as to the various topics people might want to focus on (Aaron Swartz and Occupy Sandy are coming up for example). I’m looking forward to reporting back (or reporting other people’s report-backs if my kids don’t let me go).

The future two weeks from now: Climate change

Finally, there’s this datathon, which doesn’t look open to registration, but which I’ll be participating in through my work. It’s stated goal is “to explore how social and meteorological data can be combined to enhance social science research on climate change and cities.” The datathon will run Saturday March 9th – Sunday March 10th, 2013, starting noon Saturday, with final presentations at noon Sunday. I’ll try to report back on that as well.

Categories: #OWS, hackathon, modeling, open source tools

Mathbabe t-shirts for sale!

February 20, 2013 Cathy O'Neil, mathbabe 8 comments

Hey I’ve just gotten my first shipment of mathbabe t-shirts and I love them so much I’ve made them available to anyone to order from Zazzle.

Here’s me wearing my t-shirt (my new logo is courtesy of my buddy Julie Steele):

And here’s the back:

But if that’s too strident for you, don’t despair! There’s an alternative back:

I know it’s a pretty good design because my fashion-focused 10-year-old wants one.

Here’s what you do if you want your very own mathbabe t-shirt:

You will have to go to zazzle.com and start an account if you don’t already have one. I’m sorry about this but the alternative was to buy them all for you and then send them all to you separately, which I don’t have time for.
Then go to this page on zazzle.com to buy the first version, or
to this page on zazzle.com to get the more subdued second version.
I’m also selling a mathbabe coffee mug.
I’m also open to other products, tell me what you think.

Categories: musing

Good news for professors: online courses suck

February 19, 2013 Cathy O'Neil, mathbabe 28 comments

If this New York Times editorial is correct, and it certainly passes the smell test, students are not well-served by online courses but are by so-called “hybrid” courses, where there’s a bit of online stuff and also a bit of one-on-one time. From the editorial:

The research has shown over and over again that community college students who enroll in online courses are significantly more likely to fail or withdraw than those in traditional classes, which means that they spend hard-earned tuition dollars and get nothing in return. Worse still, low-performing students who may be just barely hanging on in traditional classes tend to fall even further behind in online courses.

This is important news for math departments, at least in the medium term (i.e. until machine learners figure out how to successfully simulate one-on-one interactions), because it means they won’t be replacing calculus class with a computer. And as every mathematician should know, calculus is the bread and butter of math departments.

Categories: math education

Five false myths that make liberals feel good

February 18, 2013 Cathy O'Neil, mathbabe 25 comments

1. The U.S. has a progressive tax code

Actually, no. Not when you include all kinds of taxes. From this Economist column, which states “The fact of the matter is that the American tax code as a whole is almost perfectly flat.”

2. The U.S. is a land of opportunity

Actually, the mobility of the U.S. is worse than Canada’s or anywhere in Western Europe. From the NY Times article:

Despite frequent references to the United States as a classless society, about 62 percent of Americans (male and female) raised in the top fifth of incomes stay in the top two-fifths, according to research by the Economic Mobility Project of the Pew Charitable Trusts. Similarly, 65 percent born in the bottom fifth stay in the bottom two-fifths.

3. The bailout worked

Actually, the bailout is still happening, as we see from monthly discoveries such as this recent back-door bailout, and it hasn’t worked for the majority of the people it was intended for, namely people stuck with unreasonable mortgages (people forget this sometimes, but the first half of TARP was for the banks, the second half was for mortgage holders). From a NY Times Op-ed by Elizabeth Lynch (emphasis mine):

So a lender can forgive a second mortgage — which in the event of foreclosure would be worthless anyway — and under the settlement claim credits for “modifying” the mortgage, while at the same time it or another bank forecloses on the first loan. The upshot, of course, is that the people the settlement was designed to protect keep losing their homes.

4. Our private data is protected by our government

Although on the one hand the CIA recently admitted to full monitoring of Facebook using fake personas (h/t Chris Wiggins), the U.S. government does not in fact take great pains to protect the data they collect about its citizens. Moreover, government workers who complain about the porous data protection are punished instead of protected, as is explained in this Times piece. My favorite quote is this bit of common sense:

Susan Landau, a Guggenheim fellow in cyber security, privacy and public policy, says companies and agencies are unlikely to improve data security without the threat of penalty.

“What are the personal consequences for employees who allow data breaches to happen?” Ms. Landau asks. “Until people lose their jobs, nothing is going to change.”

5. We are recovering from the great recession

From 2009-2011, the top 1% captured 121% of all income gains (h/t Matt Stoller).

Who says you can’t perform at 121%? Turns out you can if other people are actually losing income while you’re getting increasingly rich.

Don’t get me wrong, corporate profits have done even better – a 171% gains since we’ve had Obama. But I’d go by things that matter to the 99%, so payrolls and jobs. Payrolls are flat and we still have 5 million fewer jobs, so I’d say it’s not much of a recovery.

Categories: #OWS, rant

Newer Entries Older Entries

mathbabe

Archive

Modeling fraud in the financial system

Aunt Pythia’s advice

Unintended Consequences of Journal Ranking

Poseurs should not own the backlash against data science poseurs

WTF happened to feminism?!

A blogging parliament

HSBC Action today at noon

Nasty reader comments and blogging

Aunt Pythia’s advice

Prices in the junk bond market

Is mathematics a vehicle for control fraud?

How much are the taxpayers subsidizing too-big-to-fail banks, if not $83 billion per year?

Ninja Warrior – Sasuke

The overburdened prior

Aunt Pythia’s advice

Break up the megabanks already (#OWS)

NYC data hackathons, past and future: Politics, Occupy, and Climate change (#OWS)

Mathbabe t-shirts for sale!

Good news for professors: online courses suck

Five false myths that make liberals feel good

Top Posts & Pages

Follow Blog via Email

Recent Posts

Meta