Yandex Data Proc
(Spark, HDFS, YARN, Hive, HBase, Oozie, Sqoop, Flume, Tez, Zeppelin).
Apache Spark is a tool for quick data-processing that can be integrated with Apache Hadoop as well as with other storage systems.
- Fast operations with clustersCreating a cluster takes just a few minutes. You don’t need to think about creating hosts, configuring, installing packages, or consolidating multiple hosts into a cluster — the service does all this automatically. Later you will be able to change the number of hosts or the computing resources of hosts in the cluster.
- Flexible configuration of each clusterYou will have full control over the cluster with root access to all the hosts. You can install just the Hadoop services you need, upload your applications, and change cluster configuration whenever you need to.
- ElasticityAdd new hosts to the cluster dynamically to increase its capacity, and pay only for the amount of time they are used. Store data in Yandex Object Storage and delete unused hosts to save money on computing resources.
- Storage type selectionYou can choose the data storage type for each cluster individually. We offer two options: standard network storage and fast network storage. The first option costs less, but the second option is faster.
- Isolation and encryptionIn Yandex Data Proc, the data of different Yandex.Cloud clients is completely isolated from each other. Databases do not use any shared components, so no one else can access the data you have uploaded. You can also configure encryption for your cluster, if necessary.
Create an infrastructure for event analysis based on a Hadoop cluster. Use analytics tools to categorize events and identify patterns and trends.
Build an infrastructure based on Apache Spark and process data feeds online. Create metrics and save the necessary data slices by integrating Yandex Data Proc with Yandex Object Storage.
Tools like Apache Oozie™ work well for describing data streams to Yandex Data Proc clusters and processing rules. You can automatically build data marts and business metrics.
Try Yandex Data Proc:
- *Apache®, Apache Hadoop®, Apache Spark™ and Apache Oozie™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.