Introducing an analytics platform for a management consulting firm

Our client, a global management consultancy, wanted to acquire a data platform for the efficient use of data scientists in consulting teams. niologic implemented an analytics and data science platform for the consulting industry specialists.

Challenge

The customer’s consulting projects are characterized by a short onboarding phase and fast results. The customer therefore needed an analytics platform that could handle today’s data volumes and at the same time allow self-service by analysts and data scientists.

Procedure

niologic created an initial requirements list together with the customer. The goal was to enable Map&Reduce methods for data cleansing as well as high-performance in-memory analyses coupled with the execution of algorithms in R and Python.

After comparing several cloud diagnostic analytics solutions, Dataiku was chosen as a provider. niologic qualified and introduced the analytics platform Data Science Studio (DSS) for the customer. Dataiku took over the customer’s software training.

Compiling the platform, niologic relied on a combination of Apache Spark 2.x (Google Dataproc) and InMemory Analytics based on Google Bigquery for data warehousing. The Dataiku software suite was installed on a Kubernetes cluster, with niologic creating multiple docker images and custom logging the datajobs based on fluentd and Google Stackdriver. The connected Data Lake based on Google Cloud Storage ensures scalability in data processing.

Result and customer value

Within just a few days, the customer’s employees were able to run Big Data analyses that were as easy to use as a spreadsheet. At the same time, employees received further support through the adapted logging and alerting of the multipliers. This showed the strength of custom logging based on fluentd and Google Stackdriver.

By further conversion of data formats to column-based and compressing formats, the performance of Spark InMemory Processing could be increased by a factor of 100. Among other things, the multi-client capability of the system was achieved through horizontal scaling and an optimization of the Spark configuration. At the same time Dataiku could be offered important ideas for a further optimization of the software, so that for example today Kubernetes is optimally supported by Dataiku.

The customer thus found both a technically stable platform and an agile solution adapted to the consulting processes. The consultants are able to process customer data within a short period of time and to add further data through their own uploads (Big Data Enablement).

The close cooperation with the customer’s IT security numerous security measures could to be taken to regulate data security and the accessibility of the systems.