Why unsupervised learning… isn’t unsupervised

Why unsupervised learning... isn't unsupervised

Unsupervised learning may sound like a fancy way of saying “let the kids learn on their own not to touch the hot plate” but actually it is a pattern-finding technique for mining inspiration out of your data.

It has nothing to do with machines running around on their own without adult supervision. Instead, it’s all about similarity. Running algorithms that put like with like. It’s an algorithmic way of enforcing “Birds of a feather should flock together.”

The secret is that behind the scenes in unsupervised learning, there is actually human supervision. Humans will look at the results and ask, “Is this good? Am I finding it interesting?” If not, they’re gonna fiddle some settings and they’re gonna hit run again So, even though there are no labels put in in the beginning, the labels that come out are labels like “Group 1”, “Group 2”, “Group 3”, “Group 4” but if you – you the user, the analyst, the data scientist – don’t like those groupings and you’re not inspired by them, what you’ll do is you’ll say, “Try it again.

Try it again. Try it again.” That act – “Try it again” (try it again another way: rerandomize or try different settings) – that part is the where the human gets to supervise the system a bit. So, there’s not really such a thing as completely unsupervised and because we impose what we think is good, what we think is valuable, on there and those notions of “good” and “valuable” – those preferences that we have – are driven by our past experiences, our biases get into unsupervised learning results just like they get into supervised learning results so don’t think that unsupervised learning removes human bias.

You, the user, you are putting your own tastes on it by asking, “Do I feel inspired or do I not? Do I want to try again? Do I want to rerun it?” “What inspiration am I getting from these data?”

Share this post ...

Leave a Reply

Your email address will not be published. Required fields are marked *