Building A Product Recommendation Engine For An E-Commerce Store

A leading manufacturing company with revenue of $4 billon+ and a presence in categories like internal hard drives, USB Flash drive, External hard drives, and external solid-state drives. It’s one of the top 5 data storage device companies in the world.

The Challenges

Product management and product marketing had no ability to leverage data, and design cross-sell/up-sell strategies.

The data was disparate, distributed, and siloed. The years of data collected was not tapped and used to glean intelligence in a holistic manner.

Multiple teams in multiple geographical locations presented a significant scaling challenge for a unified recommendation engine.

There was no enterprise grade compute and data management platform. All the data and analytics workloads were disparately conducted in local systems.

Our Design Approach

Data Discovery Session

End-to-end data discovery sessions with stakeholders covering Marketing, Analytics, E-Commerce, and Individual Product Teams, to understand what data they were collecting and how do they measure the success of any action on cross-selling or up-selling that they take.

Detailed Data Audit

Exhaustive audit of product catalogue and transactions data across 24 product categories and 3500 product SKUs.

Top Products, Categories by Transactions

Databricks for Data Science

Azure Databricks has clusters that provide a unified platform for running production ETL pipelines, ad-hoc analytics, and machine learning that can auto scale. Interactive clusters are used to analyze data collaboratively with interactive notebooks. Job clusters are used to run fast and robust automated workloads using the UI by scheduling.

Data Transformation

The audit was followed by the development of analytical data set that could be leveraged to derive associations between two SKUs. This analytical data set was further refined to provide the ability to identify associations within, and across, the categories.

Data Science Experimentation

Three different association mining techniques were evaluated, keeping in mind future scalability for different regions and categories. The winning algorithm was validated out of sample for years 2019 and 2020 till date.


Algorithm was released as an end point in AWS, integrated with Tableau for visualization.

The Solution

View of product/category and SKU cross-sell options

Cross-sell Associations

The solution shows product category level associations and SKU level associations that are across categories. These associations are useful in identifying cross-sell. For example, how certain industrial sewing machines are associated with certain type of fax machines and printers, or how certain scanners are associated with certain type of personal printers.

Up-Sell Associations

The solution shows the base products and its associations to a premier product within category. These associations help identify those opportunities for up-sell and how they can be positioned for a higher probability of lift. For example, standard yield ink could be up-sold to high yield ink, or single cartridge pack could be up-sold to 3 cartridge pack, or regular paper could be up-sold to premium type paper.

View of product/category and SKU cross-sell options

Deployment architecture from the slides

Auto Refresh

Data Bricks is the main processing engine, with jobs scheduled for daily data refresh and AWS end point call to rescore the associations. Output is made available to Tableau post daily refresh. Tableau automatically refreshes the data in the dashboards through a daily scheduler.

Activation Enablement

On-boarding or category management team through dashboard training for designing cross-sell offers and bundles for marketing campaigns.

SKU to SKU association table uploaded by e-commerce team in content management system every month for product page cross-sell and up-sell recommendations.