About the company

Leroy Merlin of GROUPE ADEO is an international home improvement and gardening retailer. Our retail chain in Russia includes 75 stores, and the hypermarket in Krasnogorsk is the global leader in terms of turnover and number of customers.

Company goal

A major goal for Leroy Merlin is to build a platform to manage data for today’s decentralized data flows. This includes over 100 different databases, including web analytics, product data, and consumer shopping cart data.

Two main requirements for the platform:

  • Scalability at all levels.
  • Capacity to evolve into a hybrid solution.

As soon as we build the platform, we’ll launch predictive analytics to bring together data from completely different sources, both internal and external.

At the beginning of the project, we couldn’t accurately estimate the total data volume, which is why we decided to launch a pilot in the cloud.

Solution

For the project, we used Yandex Compute to create a fleet of virtual machines and Yandex Object Storage to provide scalable storage.

To implement the project, we had to integrate multiple data sources, bring the data together in a scalable database, and start running analytics. The implemented solution is as follows:

  • NiFi — Greenplum — Kafka.
  • Write-ahead logging in Kafka.
  • Data flow to Source 1.

To deploy a massively parallel database, we went with Greenplum, an open-source MPP DBMS. For data transport, we chose Kafka and NiFi. Our choice was motivated by the fact that before the start of the project, our contractors tested Yandex Cloud and proved we could build a cluster that meets our requirements without substantial performance degradation.

In the beginning of 2019, the system’s core was a Greenplum cluster of seven nodes: 2 hosts with 12 vCPU, 72 GB RAM, and 5 hosts with 32 vCPU, 256 GB RAM, and 5 TB SSD.

Results

The main result of the first stage was the deployed cluster that could accept 12.5 TB of uncompressed data and the launch of the Hadoop and S3 workbenches and Spark processing.

In practice this meant minimized time costs:

  • Adding Greenplum nodes in a single click.
  • Creating a Greenplum/Spark/Hadoop sandbox in 10 minutes.

In the near future, we plan to expand the data volumes to 70 TB.

During project implementation, we formulated the Indispensable rules for a business team in the digital age:

  • Each business unit owns their data.
  • The data owner is in charge of keeping the data available to the business in real time.
  • The data owner is responsible for managing and describing data.
  • Data operations cost money.

End user billing might be added at the first stage, but shouldn’t restrict user access to the digital platform. In this case, it’s important to understand the logic of cost estimations and in the future, subsequent usage cost allocation across business uses.

Opinion

Dmitry Shostko,
Chief Data Officer at Leroy Merlin East
Dmitry Shostko,
Chief Data Officer at Leroy Merlin East

We chose Yandex Cloud for its maturity, potential, and opportunities for collaborative growth. Yandex Cloud experts made every effort to provide maximum operability with the performance required by a particular Leroy Merlin platform component under the pilot project. Today’s service usage is only 5% of our future consumption. The cloud is our chance to get the scalability gene and embed it in the DNA of our business.