How unsupervised is unsupervised learning?
I was recently at a Meetup and got into a discussion with Joey Markowitz about the difference between supervised, unsupervised, and partially (semi-) supervised learning.
For those who haven’t heard of this stuff, a bit of explanation. These are general categories of models. In every model there’s input data, and in some models there’s also a known quantity you are trying to predict, starting from the input data.
Not surprisingly, supervised learning is what finance quants do, because they always know what they’re going to predict: the money. Unsupervised means you don’t really know what you are looking for in advance. A good example of this is “clustering” algorithms, where you input the data and the number of clusters and the algorithm finds the “best” way of clustering the data into that many clusters (with respect to some norm in N-space where N is the number of attributes of the input data). As a toy example, you could have all your friends write down how much they like various kinds of foods (tofu, broccoli, garlic, ice cream, buttered toast) and after clustering you might find a bunch of people live in the “we love tofu, broccoli, and garlic” cluster and the others live over in the “we love ice cream and buttered toast” cluster.
I hadn’t heard of the phrase “partially supervised learning,” but it turns out it just means you train your model both on labeled and unlabeled data. Usually there’s a domain expert who doesn’t have time to classify all of the data, but the algorithm is augmented by their partial information. So, again a toy example, if the algorithm is classifying photographs, it may help for a human to go through some of them and classify them “porn” vs. “not porn” (because I know it when I see it).
Joey had some interesting thoughts about what’s really going on with supervised vs. unsupervised; he claims that “unsupervised” should really be called “indirectly supervised”. He followed up with this email:
I currently think about unsupervised learning as indirectly supervised learning. The primary reason is because once you implement an unsupervised learning algorithm it eventually becomes part of a large package, and that larger package is evaluated. Indirectly you can back out from the package evaluation the effectiveness of different implementations/seeds of the unsupervised learning algorithm.
So simply put, the unsupervised learning algorithm is only unsupervised in isolation, and indirectly supervised once part of a larger picture. If you distill this further the evaluation metric for unsupervised algorithms are project specific and developed through error analysis whereas for supervised algorithms the metric is specific to the algorithm, irrespective to the project.
supervised learning: input data -> learning algorithm -> problem non-specific cost metric -> output
unsupervised learning: input data -> learning algorithm -> problem specific cost metric -> output
The main question is… once you formulate evaluation metric for an unsupervised algorithm specific to your project… can it still be called unsupervised?
This is a good question. One stupid example of this is that, if in the tofu-broccoli-ice cream example above, we had forced three clusters instead of the more natural two clusters, then after we look at the result we may say, shit this is really a two-cluster problem. That moment when we switch the number of clusters to two is, of course, supervising the so-called unsupervised process.
I think though that Joey’s remark runs deeper than that, and is perhaps an example of how we trick ourselves into thinking we’ve successfully algorithmized a process when in fact we have made an awful lot of choices.



All learning is biased because humans make the modeling decisions. Many people have begun to write about this. And, believe it or not, it is the topic of my dissertation.
When you work predicting money against people that wrongfully believe that their methods are objective, the easiest way to beat them is to realize why the emperor has no clothes and is vulnerable.
LikeLike
I am reminded of how programmers sometimes use words like “compiled” and “interpreted” in loose ways that mean little, e.g., if you’re writing Java code, then it’s “compiled” to JVM bytecode, but wait, that is “interpreted” on some native machine, but wait, machine code is actually “interpreted” on top of an operating system and also the hardware, and wait, the hardware could actually be VMware. Basically, people take certain parameters as fixed and then use language based on that fixed view, but you really have to look at the entire picture and specify your frame of reference.
LikeLike