Yandex Data Proc

Helps to deploy Apache Hadoop®* and Apache Spark™ clusters in the Yandex.Cloud infrastructure.
You can control the cluster size, node capacity, and set of Apache® services
(Spark, HDFS, YARN, Hive, HBase, Oozie, Sqoop, Flume, Tez, Zeppelin).
Apache Hadoop is used for storing and analyzing structured and unstructured big data.
Apache Spark is a tool for quick data-processing that can be integrated with Apache Hadoop as well as with other storage systems.
The Yandex.Cloud infrastructure is protected in accordance with Federal Law No. 152.
Run virtual machines
on 2nd Gen Intel® Xeon® Gold Processors.
  • Fast operations with clusters
    Creating a cluster takes just a few minutes. You don’t need to think about creating hosts, configuring, installing packages, or consolidating multiple hosts into a cluster — the service does all this automatically. Later you will be able to change the number of hosts or the computing resources of hosts in the cluster.
  • Flexible configuration of each cluster
    You will have full control over the cluster with root access to all the hosts. You can install just the Hadoop services you need, upload your applications, and change cluster configuration whenever you need to.
  • Elasticity
    Add new hosts to the cluster dynamically to increase its capacity, and pay only for the amount of time they are used. Store data in Yandex Object Storage and delete unused hosts to save money on computing resources.
  • Storage type selection
    You can choose the data storage type for each cluster individually. We offer two options: standard network storage and fast network storage. The first option costs less, but the second option is faster.
  • Isolation and encryption
    In Yandex Data Proc, the data of different Yandex.Cloud clients is completely isolated from each other. Databases do not use any shared components, so no one else can access the data you have uploaded. You can also configure encryption for your cluster, if necessary.

Use cases

  • Create an infrastructure for event analysis based on a Hadoop cluster. Use analytics tools to categorize events and identify patterns and trends.

  • Build an infrastructure based on Apache Spark and process data feeds online. Create metrics and save the necessary data slices by integrating Yandex Data Proc with Yandex Object Storage.

  • Tools like Apache Oozie™ work well for describing data streams to Yandex Data Proc clusters and processing rules. You can automatically build data marts and business metrics.

Try Yandex Data Proc:

Get startedAll services

  1. *
    Apache®, Apache Hadoop®, Apache Spark™ and Apache Oozie™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.