Data analytics skills are becoming more and more in-demand. Everyone is talking about big data, machine learning and data mining, and what about ‘clustering’ or ‘clusters’, have you heard about this method in data analysis? In this article, one of our Data mentors, Violeta Mezeklieva, uses a very apt analogy to explain what clustering is and how it is used.

Imagine a friend of yours is getting married and asks you to help with the planning. You are going over the logistics and begin to discuss the dinner arrangements. What are the seating arrangements going to be? Your friend jumps in and suggests to group guests at one table by: family members, friends from school, the college gang, the friendly co-workers, friends from the hiking club… and so on. You let that sink in and notice an opportunity yet to be exploited.

Weddings can be challenging, especially when you invite friends and family who are going to be meeting each other for the first time. And breaking the ice is not that easy. Instead of grouping the guests based on how your friend knows them, how about mixing things up a little. A fun way to facilitate mingling would be to highlight the things the guests have in common — especially if they are meeting each other for the first time — and group them at the dinner table based on those commonalities.

In this ‘aha!’ moment you suggest to your friend that instead of classifying the guests based on a predefined condition, that machine learning should be used to gain insights into who should be seated at the same table. Your friend thinks this is a great idea and is intrigued to see what unites who.

How exciting! What you need to do now is create a questionnaire that the guests will complete before the wedding. The responses will then be used to run the perfect machine learning algorithm for this challenge: The cluster.

What the cluster will do is find commonalities between each guest — based on the responses given — until they are defined by what unites them. Therefore, each group is described by particularities that are not shared with others.

How many clusters should there be? A good starting point is to find out how many tables fit under the outdoor tent. If it only permits 15, the cluster will have to find what unites the members of each of the 15 groups. It’s possible that the groups are not all that different from one another if you split them by this number. In this case, you can reduce the amount of tables until you find that unique characteristic. You might discover after all, that the size of the table might be the problem. But that’s a problem for the carpenter.

By clustering, your friend was able to learn something that would have been dismissed if she had segmented the guests based on what she thought united them.

Awesome. Your friend loves it and you are full of ideas.

