Connecting Yandex Data Proc to Metastore
Note
To use the Metastore cluster, a Yandex Data Proc cluster must have the following components:
SPARK
YARN
-
When creating or updating a Yandex Data Proc cluster, specify the following property:
spark:spark.hive.metastore.uris : thrift://<Metastore_cluster_IP_address>:9083
To find out the Metastore cluster IP address, select Data Proc in the management console
and then select the Metastore page in the left-hand panel. You will see the cluster IP address under General information. -
If the Metastore cluster and Yandex Data Proc cluster are hosted in different cloud networks, set up routing between these cloud networks so that the Metastore subnet is accessible from the Yandex Data Proc subnet.
There are multiple ways to configure routing. For example, you can create an IPsec tunnel.
-
If the cloud network uses security groups, set up the security group of the Yandex Data Proc cluster to work with Metastore. To do this, add the following rule for outgoing traffic:
- Port range:
9083
- Protocol:
Any
(Any
) - Source:
CIDR
- CIDR blocks:
0.0.0.0/0
- Port range:
For an example of using Yandex Data Proc with a Metastore cluster connected, see the Shared use of tables through Metastore tutorial.