In the beginning of the project, niologic clarified the existing data structure, the data volumes (billions of transactions) and the existing IT infrastructure. Use cases for data science and retail analytics were prioritized along with the customer.
In workshops lasting several days, various questions on assortment optimization and compound purchases were then worked out together with the new data scientists specifically for retail.
The general functionality of Hadoop and Spark was explained in a practical way. Algorithms were developed together using SparkR or SparkSQL and the differences between Hive and Tez and Spark were pointed out. YARN queues were also tested under Load and optimized together.
Using SparkML, a ProductRank was created, which quantified network strength within the assortment through a network analysis and thus made individual purchasing patterns or product clusters identifiable.
As part of the assortment optimization, an analysis of cannibalization effects was also implemented using R and Spark, so that similar and cannibalizing products became visible.