Case study - Bill Cohorting Analysis of An Omnichannel Health-care Retailer

Case study - Bill Cohorting Analysis of An Omnichannel Health-care Retailer

  Apr 03, 2023 14:53:00  |    Joseph C V   #analytics #python #Azure #PowerBI

Company’s Profile and Objective

The client is a seller of pharmacy and wellness products with a chain of stores across India in several cities and towns. Apart from prescription drugs, they stock and sell products for beauty, wellness, and personal hygiene. Their stores are open round the clock. They sell FMCG and healthcare products on their website and mobile app, too.

The company has bagged an award for excellence in pharmacy in the past. The client is also awarded as the best omnichannel retailer and the most admired retailer in healthcare.

Since their business is expanding, they wanted to automate the analysis of top-selling products based on the bill value. These bill cohorts would help them make apt recommendations to the users and promote and sell the most bought-together products. The vision was to increase revenue and make genuine and data-driven recommendations instead of relying on random guessing or a salesman’s memory.

When the client onboarded the Logesys team, this billing cohort analysis with the voluminous data was a tough cookie. Also came other challenges like data duplicity in the bills.


Data Volume

With their rapidly growing business and a wide network of stores in India, we faced the challenge of sifting through their massive historical data. With data as bulky as 20 million records for 6 months, the job was complex.

Algorithm from Scratch

This was a one-of-a-kind problem. Although solutions exist in machine learning for bundling products for recos using Apriori, the idea here was to make suggestions based on bill buckets (cohorts). And to make it easy for the solution users to understand the recos.

Data Issues

Many bills have duplicate records for the same product when a customer buys it in multiple numbers. Instead of increasing the quality, the bills had repeated instances of the same product, to sum up the price. Some order ids were irrelevant as they were mapped to delivery charges.

The pharma client didn’t have any existing system as it was a new requirement.

Logesys team analyzed the existing system that included SQL Server as their database and proposed the following tools and technologies to design the solution:

Instead of using or tweaking an existing machine-learning algorithm, our team created a model in Python.

Data Pre-processing and Cleansing

Along with the picked bill number, idem id, category, sales (price), and quantity columns, we pulled the data from the SQL server at the order-item level. So, if an order has 2 packs of the same biscuits, we get 2 records.

We cleaned the data to eliminate null values and invalid item codes. This helps pull the right records and analyze the price buckets with the products that aren’t invalid, expired, nonexistent, or out of stock.

Another issue was duplicate product records if bought multiple times on the same bill. We removed the redundancy and adjusted the number of products accordingly.

If home delivered, the system levies a delivery fee and creates an identifier for it in the database, just like a product. We discarded such rows from our consideration.

All these tasks were done using PySpark in Azure Databricks.

Custom Model

We created a custom model in Python that creates various buckets of pricing. For instance, 1-250 bucks, 251 to 500 bucks, and so on in INR. Each bucket includes an amount range and the count of a product sold in that bracket. Thus, bill cohorting is done with products and their numbers within a bucket.


The reporting solution is done using Microsoft Power BI report. This powerful business intelligence tool can generate efficient and catchy reports and dashboards with voluminous data. Power BI reports come with multiple features that gel with large datasets easily.

The dashboard shows bill cohorts as mentioned above and the products falling under that bucket with their total sold numbers. On selecting any product (called parent product) from the product list, the dashboard shows 3 recommendations based on the most purchased together history.

In another section, the layout shows the total number of each recommended product in all the available cohorts for the selected parent product.


This report helps suggest various products to increase the bill value and push the revenue to a higher level.


The solution has the following benefits:

  • Promoting frequently-bought items together based on bill values,
  • Inhouse analysis of the products likely to sell together,
  • Increasing repeat purchases,
  • Selling products that may not be on top of a customer’s mind,
  • Pushing the bill values to a higher amount and increasing the revenue.


Logesys team completed the project in 3 weeks.

The project proved to be highly effective for the client. They have successfully identified high numbers of products for bill cohorts that sell together. This helped them to increase their revenue. The solution is expected to facilitate them shoot high sales in the coming years.

The dashboard is compatible with all kinds of devices and is mobile-friendly. So the users can access it anytime, anywhere.

Our client’s team is highly satisfied with the project’s outcome, and their appreciation speaks volumes about our performance.