skip to Main Content
8 Steps To Follow While Doing Cluster Analysis For Smart Decision-Making

8 Steps to Follow While Doing Cluster Analysis for Smart Decision-Making

Imagine you are the owner of a modest grocery store in your neighborhood. You know most of your customers in person and you do a pretty good job of fulfilling their demands. After a while, your hardwork pays off and now, you open three new branches in different neighborhoods. It is hard to stay on top of needs of four stores but thanks to more coffee and less sleep, you still rock. Then one day, you achieve a milestone and open your tenth store. How to analyze the requirements of so many stores? What if you have dozens of stores? Hundreds? Thousands? Could you still succeed to stay connected with your customers and figure out similarities and differences in product preferences, lifestyles, and purchasing power across all locations?

Let’s reframe the case. Is it wise to plan by overall numbers or averages when you are the head of an enormous chain store? As a known fact, mean values tend to hide variances across different parts of the business. That kind of general information can only signal a major problem and usually hides opportunities.

So how to measure the performance of stores? Ordering by a metric and comparing the best and worst might be also useless most of the time times since they probably have different dynamics: for example, one store is in center-city while other in a suburb and hence they cater to a different customer base with vastly different purchase pattern and buying power. Now the question is how to find stores that have similar dynamics? Here comes Cluster Analysis  for help!

What is Cluster Analysis?

To put it all in simple terms, cluster analysis is a method which groups objects according to their similarities. These similarities are measured by some predefined attributes such as shopping behavior of customers (market basket size), sales performance  (revenue per week), price preferences (average sales price), store traffic, etc.

If the case is customer segmentation, then customers will be your objects. Your possible attributes will be all information you can gather from customers (demographics, shopping frequency, buying history etc.). After selecting the relevant attributes and clustering method, analysis can be applied to the data.

Although applying clustering seems so easy, results probably will be unsatisfying or lead to wrong inferences, if some key points are not regarded. Some of them are listed below.

1. Choosing an Appropriate Method

There are many methods to define clusters. The methods generally require some parameters to be defined in advance. In addition, the method has to be relevant to the issue and data. Thus, details of the methods have to be known by practitioners.

2. Irrelevant Attributes:

Using an attribute which is not relevant to the problem will be a burden to calculation time of the analysis while it lowers the quality of analysis.

3. Curse of Dimensionality:

Although you weed out irrelevant attributes, you might still have many attributes which you cannot dare to leave out. Using a high number of attributes has some undesired results.

4. The difficulty of visualization:

It is hard to create a comprehensible display when your data includes more than three attributes since each attribute uses a dimension.

5. Increasing time of computation

Each extra attribute increases the time of calculation.

6. Overfitting

As the number of attribute increases, learning systems tend to memorize. In this case, two very similar clusters might be generated while it could be better if it behaved as a single cluster.

7. Outliers

Outliers issue is Achilles’ heel for not only clustering but also other learning systems. Outliers might cause either many clusters with few observations or a few clusters with dissimilar observations. Treatment of outliers has to be decided in advance, for example, will you remove outliers or use a different algorithm which is robust to outliers?

8. Evaluation of Results:

It is equally or more important than conducting the analysis. There are many statistical methods to evaluate the clustering results. Selection of evaluation method depends on the problem, data, and clustering method. Besides, evaluation has to be also made in the aspect of business concerns.

Business Applications

In business applications, clustering analysis is generally used as a preprocessing step. The analysis can be conducted before Assortment Planning, Promotion Planning, or Markdown Optimization.

Customer segmentation which we discuss above might be an initial step for a promotion planning project.  You can offer customized promotions to your customer segments by knowing their expectations.

Another example is Store Clustering for assortment planning. Supplying the right products to the right customers is crucial for retail chains. Without a store clustering, you cannot define the needs of your stores (or their customers). You can provide localized assortment plans according to a cluster of your stores.

For Store Clustering, you need to take a few key considerations into account. Deciding on which features to be used is critical. Store location, store type, demographics of customers are some of the attributes can be used. However, in many cases, using store attributes might not be sufficient. Clustering stores based on the sales performance of product categories is a well-known method for store clustering. In this method, stores are grouped for each product category according to the similarity of sales trend of that category.

Beyond a Simple Clustering Analysis

Clustering is a tough issue, which has many details to be considered. Since it is applied as an initial step, misapplication might cause false implementation of further analysis. Thus, a professional help can improve the quality of analysis and plans immediately. Here is a list of features that Solvoyo customers benefit from:

  • Selection of the best feature set and clustering method for the business case
  • Technically optimal analysis (Avoidance of curse of dimensionality, outliers, overfitting)
  • Easy integration of clustering with other tools (Assortment planning, markdown optimization)
  • Dynamic change of attributes by the user
  • 3D and Location-based visualization of clusters
  • Overriding of recommended clusters
Back To Top