Archive

Archive for the ‘modeling’ Category

Reverse-engineering Chinese censorship

This recent paper written by Gary King, Jennifer Pan, and Margaret Roberts explores the way social media posts are censored in China. It’s interesting, take a look, or read this article on their work.

Here’s their abstract:

Existing research on the extensive Chinese censorship organization uses observational methods with well-known limitations. We conducted the first large-scale experimental study of censorship by creating accounts on numerous social media sites, randomly submitting different texts, and observing from a worldwide network of computers which texts were censored and which were not. We also supplemented interviews with confidential sources by creating our own social media site, contracting with Chinese firms to install the same censoring technologies as existing sites, and—with their software, documentation, and even customer support—reverse-engineering how it all works. Our results offer rigorous support for the recent hypothesis that criticisms of the state, its leaders, and their policies are published, whereas posts about real-world events with collective action potential are censored.

Interesting that they got so much help from the Chinese to censor their posts. Also keep in mind a caveat from the article:

Yu Xie, a sociologist at the University of Michigan, Ann Arbor, says that although the study is methodologically sound, it overemphasizes the importance of coherent central government policies. Political outcomes in China, he notes, often rest on local officials, who are evaluated on how well they maintain stability. Such officials have a “personal interest in suppressing content that could lead to social movements,” Xie says.

I’m a sucker for reverse-engineering powerful algorithms, even when there are major caveats.

Ello, Enron, and the future of data privacy

If you think Ello is the newest safest social media platform, you might want to think again.

Or at the very least, go ahead and read this piece by my data journalist buddy Meredith Broussard, entitled ‘Ello, social media newcomer! Goodbye, data security fears?Meredith has read the fine print in Ello’s security policy, and it’s not great news.

Notices of the AMS is killing it

I am somewhat surprised to hear myself say this, but this month’s Notices of the AMS is killing it. Generally speaking I think of it as rather narrowly focused but things seem to be expanding and picking up. Scanning the list of editors, they do seem to have quite a few people that want to address wider public issues that touch and are touched by mathematicians.

First, there’s an article about how the h-rank of an author is basically just the square root of the number of citations for that author. It’s called Critique of Hirsch’s Citation Index: A Combinatorial Fermi Problem and it’s written by Alexander Yong. Doesn’t surprised me too much, but there you go, people often fall in love with new fancy metrics that turn out to be simple transformations of old discarded metrics.

Second, and even more interesting to me, there’s an article that explains the mathematical vapidness of a widely cited social science paper. It’s called Does Diversity Trump Ability? An Example of the Misuse of Mathematics in the Social Sciences and it’s written by Abby Thompson. My favorite part of paper:

Screen Shot 2014-10-01 at 8.57.17 AM

 

Oh, and here’s another excellent take-down of a part of that paper:

Screen Shot 2014-10-01 at 9.02.00 AM

 

Let me just take this moment to say, right on, Notices of the AMS! And of course, right on Alexander Yong and Abby Thompson!

Categories: math, modeling

Chameleon models

September 29, 2014 13 comments

Here’s an interesting paper I’m reading this morning (hat tip Suresh Naidu) entitled Chameleons: The Misuse of Theoretical Models in Finance and Economics written by Paul Pfleiderer. The paper introduces the useful concept of chameleon models, defined in the following diagram:

Screen Shot 2014-09-29 at 8.46.46 AM

 

Pfleiderer provides some examples of chameleon models, and also takes on the Milton Friedman argument that we shouldn’t judge a model by its assumptions but rather by its predictions (personally I think this is largely dependent on the way a model is used; the larger the stakes, the more the assumptions matter).

I like the term, and I think I might use it. I also like the point he makes that it’s really about usage. Most models are harmless until they are used as political weapons. Even the value-added teacher model could be used to identify school systems that need support, although in the current climate of distorted data due to teaching to the test and cheating, I think the signal is probably very slight.

Categories: economics, modeling

Women not represented in clinical trials

September 26, 2014 13 comments

This recent NYTimes article entitled Health Researchers Will Get $10.1 Million to Counter Gender Bias in Studies spelled out a huge problem that kind of blows me away as a statistician (and as a woman!).

Namely, they have recently decided over at the NIH, which funds medical research in this country, that we should probably check to see how women’s health are affected by drugs, and not just men’s. They’ve decided to give “extra money” to study this special group, namely females.

Here’s the bizarre and telling explanation for why most studies have focused on men and excluded women:

Traditionally many investigators have worked only with male lab animals, concerned that the hormonal cycles of female animals would add variability and skew study results.

Let’s break down that explanation, which I’ve confirmed with a medical researcher is consistent with the culture.

If you are afraid that women’s data would “skew study results,” that means you think the “true result” is the result that works for men. Because adding women’s data would add noise to the true signal, that of the men’s data. What?! It’s an outrageous perspective. Let’s take another look at this reasoning, from the article:

Scientists often prefer single-sex studies because “it reduces variability, and makes it easier to detect the effect that you’re studying,” said Abraham A. Palmer, an associate professor of human genetics at the University of Chicago. “The downside is that if there is a difference between male and female, they’re not going to know about it.”

Ummm… yeah. So instead of testing the effect on women, we just go ahead and optimize stuff for men and let women just go ahead and suffer the side effects of the treatment we didn’t bother to study. After all, women only comprise 50.8% of the population, they won’t mind.

This is even true for migraines, where 2/3rds of migraine sufferers are women.

One reason they like to exclude women: they have periods, and they even sometimes get pregnant, which is confusing for people who like to have clean statistics (on men’s health). In fact my research contact says that traditionally, this bias towards men in clinical trials was said to protect women because they “could get pregnant” and then they’d be in a clinical trial while pregnant. OK.

I’d like to hear more about who is and who isn’t in clinical trials, and why.

Categories: modeling, news, rant, statistics

The business of public education

September 25, 2014 25 comments

I’ve been writing my book, and I’m on chapter 4 right now, which is tentatively entitled Feedback Loops In Education. I’m studying the enormous changes in primary and secondary education that have occurred since the “data-driven” educational reform movement started with No Child Left Behind in 2001.

Here’s the issue I’m having writing this chapter. Things have really changed in the last 13 years, it’s really incredible how much money and politics – and not education – are involved. In fact I’m finding it difficult to write the chapter without sounding like a wingnut conspiracy theorist. Because that’s how freaking nuts things are right now.

On the one hand you have the people who believe in the promise of educational data. They are often pro-charter schools, anti-tenure, anti-union, pro-testing, and are possibly personally benefitting from collecting data about children and then sold to commercial interests. Privacy laws are things to bypass for these people, and the way they think about it is that they are going to improve education with all this amazing data they’re collecting. Because, you know, it’s big data, so it has to be awesome. They see No Child Left Behind and Race To The Top as business opportunities.

On the other hand you have people who do not believe in the promise of educational data. They believe in public education, and are maybe even teachers themselves. They see no proven benefits of testing, or data collection and privacy issues for students, and they often worry about job security, and public shaming and finger-pointing, and the long term consequences on children and teachers of this circus of profit-seeking “educational” reformers. Not to mention that none of this recent stuff is addressing the very real problems we have.

As it currently stands, I’m pretty much part of the second group. There just aren’t enough data skeptics in the first group to warrant my respect, and there’s way too much money and secrecy around testing and “value-added models.” And the politics of the anti-tenure case are ugly and I say that even though I don’t think teacher union leaders are doing themselves many favors.

But here’s the thing, it’s not like there could never be well-considered educational experiments that use data and have strict privacy measures in place, the results of which are not saved to individual records but are lessons learned for educators, and, it goes without saying, are strictly non-commercial. There is a place for testing, but not as a punitive measure but rather as a way of finding where there are problems and devoting resources to it. The current landscape, however, is so split and so acrimonious, it’s kind of impossible to imagine something reasonable happening.

It’s too bad, this stuff is important.

When your genetic information is held against you

September 23, 2014 16 comments

My friend Jan Zilinsky recently sent me this blogpost from the NeuroCritic which investigates the repercussions of having biomarkers held against individuals.

In this case, the biomarker was in the brain and indicated a propensity for taking financial risks. Or maybe it didn’t really – the case wasn’t closed – but that was the idea, and the people behind the research mentioned three times in 8 pages that policy makers might want to use already available brain scans to figure out which populations or individuals would be at risk. Here’s an excerpt from their paper:

Our finding suggests the existence of a simple biomarker for risk attitude, at least in the midlife [sic] population we examined in the northeastern United States. …  If generalized to other groups, this finding will also imply that individual risk attitudes could, at least to some extent, be measured in many existing medical brain scans, potentially offering a tool for policy makers seeking to characterize the risk attitudes of populations.

The way the researchers did their tests was, as usual, to have them play artificial games of chance and see how different people strategized, and how their brains were different.

Here’s another article I found on biomarkers and risk for psychosis, here’s one on biomarkers and risk for PTSD.

Studies like this are common and I don’t see a reason they won’t become even more common. The question is how we’re going to use them. Here’s a nasty way I could imagine they get used: when you apply for a job, you fill in a questionnaire that puts you into a category, and then people can see what biomarkers are typical for that category, and what the related health risks look like, and then they can decide whether to hire you. Not getting hired doesn’t say anything about your behaviors, just what happens with “people like you”.

I’m largely sidestepping the issue of accuracy. It’s quite likely that, at an individual level, many such predictions will be inaccurate but could still be used by commercial interests – and even be profitable – even so.

In the best case scenario, we would use such knowledge strictly to help people stay healthy. In the worst case, we have a system whereby people are judged by their biomarkers and not their behavior. If there were ever a case for regulation, I think this is it.

Categories: data science, modeling
Follow

Get every new post delivered to your Inbox.

Join 1,819 other followers