The manufactured trucker shortage

Have you been reading about the shortage of workers in the trucking industry? Have you wondered why, in this crappy economy, they haven’t been able to find more workers? Here’s an excerpt from recent Wall Street Journal’s coverage of this worker shortage crisis:

Operators across the country are short 30,000 long-distance drivers, the American Trucking Associations estimates. The group projects the shortage could top 200,000 in the next decade. Average annual pay for long-distance drivers was $49,540 in 2013, according to ATA estimates. Hiring and wages in truck transportation have inched up this year, according to the Labor Department.

I’ve got a theory. Here’s what it is: they trucking companies aren’t paying enough. Funny how demand and supply and efficient markets go out the window when there’s a political point being served, though: Congress is considering passing a law that would allow 18-year-olds to be long-haul truckers. A terrible idea considering how younger drivers are much more dangerous.

Of course, $50K isn’t nothing. But on the other hand, truckers have to be trained, competent, and regularly spend many days on the road. Moreover, the current surveillance technology has severely degraded their quality of life, which I learned by reading about Karen Levy’s work on the industry. Also, new truckers probably make substantially less than $50K when they start.

Partly the surveillance arose from the very real risk of truckers driving too much per day – it was an attempt to make sure truckers were driving safely. But since the technology has been installed in many large-company fleets, the companies have used it to essentially harass their drivers, telling them when break is over and so on. This has worked, in the sense that larger companies with more surveillance have managed to lower costs, pushing out smaller and individual truckers. And that means that truckers who used to own their own business now reluctantly work for huge companies.

For an industry that has historically prided itself for its independent nature, this change does not sit well with drivers. The turnover rates are staggering:


When you make your workers lives worse, and you don’t compensate them with cash money to make up for it, you find your workers quitting. That’s what’s happening here.

Conclusion: we either need to improve truckers’ work experiences or pay them more. There’s no worker shortage, there’s simply an unwillingness, on the employers’ side, to face up to the facts.

Categories: Uncategorized

Women in Tech: pipeline versus retention

There’s a provocative article over at about women in tech. As the article points out in about a thousand ways, it’s not just a pipeline problem, it’s an environmental problem.

Fellow math nerd Rachel Thomas, the author, points out a bunch of sad facts about working in tech. For example, how VC’s prefer men, how men’s applications are preferred in hiring processes, how women get punished for negotiating and for being pushy whereas men get rewarded.

Having worked in tech myself, I can say the maternity policies are crap, the long hours are unreasonable, and the frat-like atmosphere exhausts me. No, I do not want to play ping pong during my lunch hour.

But having said that, I don’t think I’ve experienced the worst of it; I was already a grownup, with a Ph.D., when I entered this stuff, and as such I’m allowed to have stronger opinions than the average engineer.

The most interesting issue brought up in Rachel’s piece is the retention rates for those qualified for tech jobs. Unfortunately, both Rachel’s piece and this related NPR piece which Rachel points to only discuss the statistics for women retention, namely that about 40% of women leave engineering after they get degrees in engineering (and I think Rachel’s piece actually gets that stat wrong).

Presumably, that’s higher than men, but how much higher? And do women leave jobs more often in general, or is this a tech-related retention problem? What’s the breakdown on reasons why women and men leave? Can we address them individually?

These are important questions, and if we can figure out what is happening, we should. I’ve been thinking about how to grow the pipeline for girls and women in STEM subjects at the high school and college level, but it would be ridiculous to spend an enormous amount of time on that if, once the get a job, that job proves unattractive.

Update: In the subtitle of the piece, it says 17% of men end up leaving the field compared to 42% of women, with a link to this 100 page pdf (hat tip Ewout ter Haar). I still want to know how many women leave other fields to give more context, but it’s a good start.

Categories: Uncategorized

Academic publishing versus retraction, or: how much Twitter knows about the market

Papers have mistakes all the time. If they’re smallish mistakes that don’t threaten the main work, often times the author is told to write an erratum, which the academic journal publishes in a subsequent volume. Other times the problems are more substantial, and might deserve the paper to be retracted altogether.

For example, if a paper is found to have fraudulent data, retraction is called for. Even when the claims made are outlandish, implausible, and unreproducible, but the authors hadn’t been intentionally fraudulent, there still may be just cause to seriously question their claims and retract. On the other hand, if a paper that was once deemed cutting edge and new is, in retrospect, not very innovative at all, then typically no retraction is called for; the paper is simply ignored. When exactly retraction happens, and how, probably depends on the journal, and even the editor.

Today I want to tell you a story in which that process seems to have gone badly wrong.

Elsevier, the academic publishing giant owns a journal called the Journal of Computational Science (JoCS) which published a paper called Twitter Mood Predicts the Stock Market (preprint version here) back in 2010. It got a lot of press, and even more, and according to Google Scholar has been cited 1300 times. According to media reports, the paper showed that Twitter, when it was enhanced with emotional tags, was able to predict the Dow Jones Industrial Average with an accuracy of 87% (whatever that means).

Full disclosure: I haven’t read the paper, but even so I don’t believe the results of this paper. People in hedge funds have been trolling for signal in all sorts of news and social media text-based ways for a long while, and there’s simply no way that they would have ignored such a strong signal all the way into 2008. If it was real, they wouldn’t have ignored it, and it would have faded. But I also don’t think it’s so real either.

Anyway, that’s my personal intuition about this, but I could be wrong! That’s what’s cool about academic publishing, right? That we could just be super wrong and people can say what they think and then we get to have this open conversation?

Well, sometimes. What actually happened here is that a bunch of people tried to replicate these results, which was harder because suddenly Twitter started charging lots of money for their data, and a hedge fund also tried the Twitter strategy that was similar to the one outlined in the paper, but everyone lost money*.

After a while, one of these frustrated would-be traders, who we will call LW, decides to write a letter to the editor complaining about the original paper. He even blogged about his letter here. In his letter he had two complaints. First, that the results were consistent with datamining, which is to say that there’s statistical evidence the authors cherry picked their data. Second, that if the results were true, they would violate the “Efficient Market Hypothesis,” and would surprise a bunch of traders with many decades of experience.

So far, so good. A paper is published, people are complaining that the results are wrong or extremely implausible. This is what academic publishing is for.

Here’s what happens next. The editor sends out the letter to reviewers. Two out of 3 of the reviewers respond, and I’ve got a copy their responses. The first reviewer is enthusiastic about doing something – although whether that means retracting the Twitter paper or publishing the complaint letter in the “Letter To The Editor” section is not clear – and uses the phrase “The original paper’s performance claims are convincingly shown to be severely exaggerated.” That first reviewer has minor requests for modifications.

The second reviewer is less enthusiastic but still thinks there is merit to the complaint letter. The second reviewer is dubious as to whether the original article should be withdrawn, but is clearly also skeptical of the stated claims. Finally, the second reviewer suggests that the original authors should be given a chance to respond before their article is retracted.

At this point, the editor writes to the complaint letter writer LW and says, you need to modify your letter, at which time I’ll “reconsider my decision.” The editor doesn’t say whether that decision is to retract the paper or to publish the letter.

So far, still so good. But here’s where things get very weird. After modifying the letter, LW sends it back to the editor, who soon comes back with another review, and importantly, a decision not to take further action. Here are some important facts:

  1. The new review is scathing, passionate, and very long. Look at it here.
  2. The new review has a name on it – possibly left there by accident – it’s the author of the original paper!
  3. Perhaps this was intentional? Did the editor want to give the original author a chance to defend his work?
  4. In the editor’s letter, he states “Reviewers’ comments on your work have now been received.  You will see that they are advising against publication of your work.  Therefore I must reject it.”
  5. The way that was phrased, it doesn’t sound like the editor was acknowledging that this was not an unbiased reviewer, but was in fact one of the original authors.
  6. In any case, before the final reviewer weighed in, it looked like the reviewers had been suggesting publication of the letter at the very least, possibly with the chance for another reaction letter from the author. So this author’s review seems to have been the deciding vote.
  7. You can read more about the details here, on the complaining letter writer’s blog.

What are the standards for this kind of thing? I’m not sure, but I’m pretty certain that asking the original author to be the deciding vote on whether a paper gets retracted isn’t – or should not be – standard practice.

To be clear, I think it makes sense to allow the author to respond to the complaints, but not at this point in the process. Instead, the decision of whether to publish the letter should have been made, with the help of outside reviewers, and if it was decided to publish the letter, the original author should have been given a chance to compose a rebuttal to be published side by side with the complaint.

Also to be clear, I’m not incredibly sympathetic with someone trying to make money off of a published algorithm and then getting pissed when they lose money instead. I’m willing to admit that more than one of these parties is biased. But I do think that the process over at Elsevier’s Journal of Computational Sciences needs auditing.

* Or at least the ones that are talking. Maybe other traders are raking it in but aren’t talking?

Categories: Uncategorized

Aunt Pythia’s advice

Dear readers,

Do you ever wake up not knowing what you want to do when you grow up? And then you realize you’re far too old to feel that way? Well, that’s the way Aunt Pythia feels this morning. She’s in no position to give anyone advice.

And yet. And yet, it’s fun to give people advice! So here goes. Afterwards she’s planning to whip up a batch of delicious “Identity Crisis Crepes” to cheer herself up a bit. They’re going to look like this:

Identity Crisis Crepes have extra nutella.

Identity Crisis Crepes have extra nutella.

Are you addicted to carbs like Aunt Pythia? Do you wish to demonstrate solidarity to the cause? If so, before you go,

ask Aunt Pythia any question at all at the bottom of the page!

By the way, if you don’t know what the hell Aunt Pythia is talking about, go here for past advice columns and here for an explanation of the name Pythia.


Dear Aunt Pythia,

A bit lazy to sugarcoat this. I’ve noticed that your personal history/internal biases come out very strongly on some topics and result in irrational/illogical conclusions/actions including banning challenges to your logic.

Have you noticed this yourself? Do you care? If you do care – how would you (do you try to?) address this issue (which I assume every single person suffers from)?

Curious About Rational Exchange

Dear CARE,

Why, no, I hadn’t noticed! Isn’t that why they’re called internal biases? If you’d like to point out specific examples, we can discuss further.

Come to think of it, there are certain things I’m happily opinionated and even stubborn about, but that’s what it means to have a personality, isn’t it? And isn’t that why people ask an advice columnist her opinion? Because the other person is bound to have an opinion?

Of course, one is free to ignore someone else’s opinion, even if it comes from a blogger. But I wouldn’t advise it (har har)!

Aunt Pythia


Dear Aunt Pythia:

My work history includes math, data, operations and government analyst jobs, and direct service work. I love collaborating and sharing ideas but I find myself frustrated at a lot of the shitty attitudes that are moving into this space now that its “hot”. I loved your four political camps of big data post and felt like it was one of the few things I’ve come across that addressed this thing I am trying to get my head around.

My problems are twofold:

(1) Not punching someone in the face when they tell me they want to “hack poverty” or any number of other things that speak to a critical lack of familiarity with the context of public interest or work for social good.

(2) Feeling left behind in the job race and shut out of the bigger conversation. I’ve been doing solid research and policy work on issues I care about for quite some time and I hate the idea that the even the president (given his community organizing background) is touting corporate tech as the place to find talent to help build data capacity for the govermment. How do I get my invite to the big kids table?

For now my plan is to keep on keeping on putting data to use in communities I care about and helping community based organizations build capacity around data use and service delivery but I need some help planning ahead.

Yours Truly,


p.s. Sorry there’s not a sex piece to this!


I feel you! How about you email me (address available on my “About page” and tell me what you’re working on, why it’s important, and then we can scheme on how to get more publicity for you. I agree that there is far too much absolute bullshit out there, and I’d like to help by promoting substantial work.

Also, one thing about the four political camps. There should have been five, I left out the academic camp which consists of people who genuinely want to make progress on stuff like medicine research and are constantly frustrated by HIPAA laws that protect privacy. Take a look at Daniel Barth-Jones’s work for a great example of this perspective.


Aunt Pythia

p.s. Nobody’s perfect!


Dear Aunt Pythia,

What the…

Google Chrome Listening In To Your Room Shows The Importance Of Privacy Defense In Depth

…is this for real?

Overheard, In California

Dear Overheard,

Well, I know it’s for real, because my son showed me how to use the “OK, Google” feature on his MacBook. But in order to use it, you have to activate it in your Google Chrome settings (or at least that’s what they claim!).

As for how creepy this is, it really depends on how you think about it. I mean, Siri listens too, right? Is that creepy? I think it depends on how much we trust Google and Apple. And the answer is: a fuck ton. We let Google read all our emails already, don’t forget.

As far as I know, voice transcribing still doesn’t work very well compared to actually have the text of email. So I guess if I had to list the creepy stuff in order, I’d start with gmail.

Aunt Pythia


Hi Aunt Pythia!

Here is my probably oh-so-familiar story. I’m a grad student in pure math, looking to get out and interested in data journalism. I’ve looked through your notes from the Lede program, and think that working at ProPublica would be AMAZING (though likely a pipe dream).

Beyond material at the level of AP exams, I have no experience in statistics, programming, nor journalism. However, I think reporting stories stemming from statistical analyses or making interactive news applications for readers to explore data themselves would be really cool. For someone in my position with these goals, would you make some suggestions for skills to pick up, people to talk to or emulate, workshops or informational events to attend?

(Addendum/Clarification to the question: Searching the web for “data journalism” and its variants, I find programs and resources for journalists to bulk up their data-science skills or calls for programmers to get involved with news agencies. However, what concrete suggestions would you give to someone starting from scratch who wants to break into this field? I am somewhat more interested in analyzing and interpreting data than in making graphics.)

Thank you in advance!

News Enformer Wanna Be

p.s. You should check out Amanda Cox’s work and talks if you haven’t!

Dear NEW B,

I happen to have some good news for you. Scott Klein of ProPublica came to the Lede Program and told us he hires people based on their webpage projects. If they are cool, innovative, and newsworthy, then he is interested. This is somewhat different from other editors who depend on your ability to get your work published by mainstream news outlets.

So in other words, I suggest you create an online portfolio of work that you think is super interesting and newsworthy, and then you start applying for jobs. To do this, you’ll need to learn statistics and computer programming, but I’d suggest starting with the project and then picking up skills you need to do it. Steal ideas from various online syllabi and such, and feel free to enroll in an actual program or do self-study. Go to hackathons and learn quick and dirty skills.

It’s a long-term plan (or at least not a short-term one), and you might not get a job at ProPublica, which I agree is a dreamy kind of dream, but you might well get another great job, and in any case you’ll learn a lot. Also definitely collaborate with journalists starting now – many great freelance journalists already have great stories and would love to work with mathy/ computer people. Go to a local journalism school and introduce yourself.

Aunt Pythia

p.s. Amanda Cox kicks ass!


People, people! Aunt Pythia loves you so much. And she knows that you love her. She feels the love. She really really does.

Well, here’s your chance to spread your Aunt Pythia love to the world! Please ask her a question. She will take it seriously and answer it if she can.

Click here for a form or just do it now:

Categories: Uncategorized

Gender and racial achievement gaps in math

I spent the morning watching this one hour lecture by David Kung, who has been studying the gender and racial achievement gaps in mathematics. Interesting stuff, with historical perspective – math has a sad history – and a call for the end to passive lecturing and much more:

Watch it if you have time. You can skip to 7:20 to start.

Categories: Uncategorized

Greek debt and German banks

Are you fascinated by the “debt as moral weight” arguments you see being tossed around and viciously debated over in Germany and Greece nowadays? It seems like the moral debate has superseded the economic reality of the situation. Even the IMF has declared the current Greek deal untenable, but that hasn’t seemed to interfere with the actual negotiations.

What gives? Many point to history to explain this. Besides the whole Nazi thing, or maybe exactly because of it, the Greeks keep reminding the Germans that they (and others) forgave half of existing German debt after World War II, with the1953 London Debt Agreement. The Germans have responded vehemently that such ancient history is irrelevant, and that the Greeks are a bunch of lazy olive-eating tax avoiders. It’s a dirty fight, and getting dirtier every week.

I maintain we don’t have to examine the history of 60 years ago to understand at least some of the moral anxiety. Instead we should look a mere 7 years ago, at the enormous German bailout of their own banks, which had invested quite recklessly in all sorts of the most risky financial instruments and, most relevantly, Greek bonds.

Start with the basic facts. German and French banks invested very heavily in Greek bonds, partly because they were allowed by European Basel “risk regulation” laws to set the risk of those Greek bonds at zero, and partly because they were just investing in anything and everything with a relatively high yield. Since Greek bonds were at a higher yield than other government bonds that maybe deserved the “zero risk” designation more, they naturally bought an asston of those.

[Side note: whenever there’s a market with a spectrum of products, the ones with the biggest yield for a given risk profile will be snatched up the fastest, because people want to maximize profits. We’ve seen that this almost always is a bad thing and creates bubbles very quickly. But it’s also the reason people are constantly inventing new products that hide risk. In this case they didn’t need to “invent” anything, because it was a political decision to designate Greek bonds at zero risk.]

There are two ways to look at this story from a morality standpoint. One is that, no matter who owns this debt now, the Greek government is on the hook for borrowing it and needs to figure out how to pay it back. From this point of view it was a mistake of the Greeks to issue too much debt and to spend it unwisely, while not cracking down on tax avoiders.

The other way to look at it is that, German banks should have known better to buy this debt in the first place. After all, it’s a free market, and nobody forces you to buy things, and after all if there really were no risk at all on it there would also be no yield (beyond inflation). But the very reason Greek bonds had yield was because the market was differentiating it from German bonds. From this point of view it was a mistake of the German bankers.

Either way, when the Germans bailed out their banks, they took what was a bank problem and made it into a taxpayer problem.

Have I oversimplified? I’ll also admit that, after that whole bailout went down, a series of “Greek bailouts,” all of which were clearly insufficient, made the European governments even more involved, and the Greeks owed way more on paper to the European taxpayers, which layered on the debts while destroying the Greek economy. But most of those bailouts were simply loans which were used to pay back the original loans. Put another way, the Greeks might not have needed bailing out if the original Greek bonds had been refused by risk-averse bankers in the first place.

This is not to suggest that there was perfect planning going on by the previous Greek governments. But I do think that, if we’re looking for who deserves blame in this story, we might want to circle back to the German bankers who couldn’t resist subprime mortgages and Greek bonds back in the early 2000’s.

Categories: Uncategorized

The 17-armed spiral within a spiral

Last Friday I visited my high school math camp, HCSSiM, where I became a nerd. I also taught there multiple times over the years, and in 2012 I blogged my lectures.

Why the visit? You see, we loyal alums of HCSSiM have a tradition of going back every July 17th to celebrate “Yellow Pig day,” which consists of a talk where founder and director (David) Kelly talks extensively about fun facts regarding the number 17, which happens before dinner, and then after dinner we sing “yellow pig carols” and eat an enormous amount of cake in the shape of a yellow pig. You can learn more about this ridiculous and hilarious tradition here.

Anyhoo, this year we (I went with other nerds) missed the 17 talk because of traffic in Connecticut but we made it for the dinner and carols. Luckily at dinner I had the chance to talk to Kelly, and I asked him if there were any new 17 facts this year. He told me there was one, and it was slightly mysterious. This post is an attempt to explain it a bit.

The mathematical set-up is explained here. Namely, we start with something called the Ulam Spiral, which is simply a way to label the boxes of an infinite two-dimensional grid with the natural numbers. You start at some place and then spiral outwards from there. Here’s a picture:

The center of the Ulam Spiral

The center of the Ulam Spiral

OK, so the first thing to say is that, when you label the plane like this, primes tend to cluster along lines. I think this is what Ulam thought was cool about his spiral:

Primes are black. This is the spiral with 200 layers.

Primes are black. This is the spiral with 200 layers.

Now comes the observation. You need to know what a triangular number is first, though. Namely, it’s a number that corresponds to counting up how many dots you need to form a triangle. We say the nth triangular number corresponds to a triangle with n rows. Here are the first few:

Screen Shot 2015-07-22 at 4.50.32 AM

You can also draw these triangles so that consecutive ones fit together to form squares.

When you highlight the triangular numbers in the Ulam Spiral, instead of the primes, then you get something that looks weird:

Green dots are triangular.

Green dots are triangular numbers within the Ulam Spiral.

OK so if you count those spiral arms, you’ll see there are 17 of them. But does that last forever? And if so, why?

Well, the answer is going to be yes. And here’s a rough proof. Rough because it uses asymptotic limits, so technically I will not show that the above picture extends perfectly, but rather that it eventually does look like a spiral with 17 arms.

A famous story about Gauss tells us that the formula for the nth triangular number is

T_k = k \cdot (k+1) /2.

Also, by construction of the Ulam Spiral, the bottom right corner of each “spiral layer” is an odd square, and that if we call that number n^2, there will be 4 \cdot n + 4 boxes on the very next layer, corresponding to the 4 sides of the next layer plus the 4 corners of the next layer.

Now imagine that there’s a triangular number right on that bottom right corner. That would mean that for some k,

k \cdot (k+1) /2 = n^2, or in other words that

k^2 + k = 2 \cdot n^2.

This is when things get asymptotic. Imagine that n is very very large. That would mean that k is too (everything here is a positive integer), and in particular that the k^2 term would dwarf the k term above. In other words, we could approximate:

k = \sqrt{2} \cdot n.

My next question is, how many triangular numbers would lie on the next layer of the spiral? Well, as we said above there are 4 \cdot n + 4 spots in the next layer, which we will approximate by 4 \cdot n, and the triangular number coming after T_k is T_{k+1}, which is k+1 bigger than T_k, corresponding to adding one layer to a triangle with k rows. We will approximate k+1 by \sqrt{2} \cdot n, again ignoring small terms.

For that matter, the next few triangular numbers after T_k come regularly, about k spots after the first. Therefore there are about 4 \cdot n / (\sqrt{2} \cdot n) triangular numbers in the next row of the Ulam spiral. That comes out to 2 \sqrt{2}, which is about 2.83.

So far we’ve figured out that, when n is huge, then after meeting the kth triangular number on the nth row, we will see two more, and get most of the way to a third, by going one more row.

Now let’s do that 6 more times. After traveling 6 rows past a triangular number, we will meet about 12 \sqrt{2} more triangular numbers. But

12 \sqrt{2} = 16.9705627485...,

which is very close to 17. So after traveling the Ulam Spiral for 6 rows, we will just about hit 17 triangular numbers, which will be more or less evenly spaced from each other.

What this means is that we should expect to see a spiral with 17 arms, but that when the picture is enlarged to include a very large number of rows, we will see the spiral shifting very slightly to the other direction.

By the way, I didn’t figure this out immediately. First I had a most delightful time understanding when, exactly, square numbers and triangular numbers coincide. In other words, I wanted to understand when there is a X and an X so that:

X \cdot (X+1) /2= Y^2, or

X^2 + X= 2 \cdot Y^2.

I might write this up in another post, but play around with it for a while if you get bored on the subway.

Categories: Uncategorized

Get every new post delivered to your Inbox.

Join 3,571 other followers