Relationship between service resources Data Proc

    Data Proc lets you use distributed data storage and processing for data using Apache Hadoop ecosystem services.

    The main entity used in the service is a cluster. It groups together all the resources available in Hadoop, including computing and storage capabilities.

    Each cluster consists of subclusters. They integrate hosts that perform identical functions:

    • A subcluster with master hosts (for example, NameNode for HDFS or ResourceManager for YARN).


      Each cluster may have only one subcluster with master hosts.

    • Subclusters for data storage (for example, DataNode for HDFS).

    • Subclusters for data processing (for example, NodeManager for YARN).

    Subclusters for one cluster must reside in the same cloud network and availability zone. Learn more about Yandex.Cloud geography.

    Hosts in each subcluster are created with the computing power consistent with the specified host class. For a list of available host classes and their characteristics, see Host classes.

    For information about network configuration and network access to clusters, see Networks, clusters, and subclusters.