Configuring networks for Data Proc
To grant Data Proc cluster access to resources outside their VPC virtual network, set up public IP addresses for them. If you don't want to use public IP addresses, you can set up egress NAT (Network Address Translation) for the subnet.
In this tutorial, you'll learn how to create a Data Proc cluster and set up subnets and a VM (a NAT instance).
To enable egress NAT for a Data Proc cluster:
If you no longer need these resources, delete them.
Prepare the infrastructure
You have to create:
- Network.
- Subnet for your Data Proc cluster.
- Subnet for the NAT instance.
- Security groups and rules for the cluster and NAT instance.
- Service account for the cluster.
- Cluster.
- NAT instance.
Create a network and subnet for your Data Proc cluster with egress NAT
-
Create a network named
network-data-proc
. -
In
network-data-proc
, create a subnet with the following parameters:-
Name:
subnet-cluster
. -
Zone:
ru-central1-a
. -
CIDR:
192.168.1.0/24
. -
Advanced settings: Enable Egress NAT.
Note
This setting can only be enabled in the Management console.
-
-
Save the IDs for
network-data-proc
and thesubnet-cluster
as you'll need them later.
Create the other resources
-
In
network-data-proc
, create a subnet with the following parameters:- Name:
subnet-nat
. - Zone:
ru-central1-b
. - CIDR:
192.168.100.0/24
.
You don't need to enable Egress NAT for this subnet.
- Name:
-
Create and configure security groups for the Data Proc cluster.
-
Create a security group for the NAT instance.
-
In the security group for the NAT instance, create the following rules:
For incoming traffic:
-
A rule that allows all traffic from the Data Proc cluster's security group:
- Port range:
0-65535
. - Protocol:
Any
. - Source:
Security group
. - Security group:
From list
. Select the Data Proc cluster security group.
- Port range:
-
A rule allowing an SSH connection to the NAT instance over the internet:
- Port range:
22
. - Protocol:
TCP
. - Source:
CIDR
. - CIDR blocks:
0.0.0.0/0
.
- Port range:
For outgoing traffic:
A rule allowing all egress traffic:
- Port range:
0-65535
. - Protocol:
Any
. - Source:
CIDR
. - CIDR block:
0.0.0.0/0
.
-
-
Create a service account with the following roles:
-
Create a Data Proc cluster with any suitable configuration with the following settings:
- Service account: Select the service account you created previously.
- Bucket ID format:
List
. - Bucket name: Select a previously created bucket.
- Network:
network-data-proc
. - Security groups: Select the previously created security groups.
-
In the
network-dataproc
network, create a VM from the NAT instance image with a public IP address. Specify the security groups that you configured previously. -
Go to the NAT properties and copy the VM's IP address.
-
In the
network-data-proc
network, create a routing table namedroute-table-nat
and add a static route to it:- Destination prefix:
0.0.0.0/0
. - Next hop: The internal IP address of the NAT instance.
- Destination prefix:
-
If you don't have Terraform, set up and configure it by following the instructions.
-
Download the file with the provider settings. Place it in a separate working directory and specify the parameter values.
-
Download the cluster and the NAT instance configuration file to the same working directory.
The file describes:
- Network.
- Subnets.
- Security groups.
- Data Proc cluster.
- Service account to access cloud resources.
- NAT instance.
- Bucket.
-
In the configuration file, specify all the relevant parameters.
-
Run the
terraform init
command in the working directory hosting the configuration files. This command initializes the provider specified in the configuration files and enables you to use the provider resources and data sources. -
Validate the Terraform configuration files using the following command:
terraform validate
If there are errors in the configuration files, Terraform points them out.
-
Import the network and subnet that you created previously.
Alert
Don't use IDs of networks and subnets created outside of this tutorial: running
terraform apply
orterraform destroy
will result in their change or destruction, respectively.Import the
network-data-proc
network:terraform import yandex_vpc_network.network-data-proc <network ID>
Import the
subnet-cluster
subnet:terraform import yandex_vpc_subnet.subnet-cluster <subnet ID>
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
All the required resources will be created in the specified folder. You can check that the resources are there with the correct settings, using the management console.
Set up NAT for the Data Proc cluster
-
Connect to the NAT instance over SSH.
-
To enable routing, add the following lines to the end of the
/etc/sysctl.conf
file:net.ipv4.ip_forward = 1 net.ipv4.conf.all.accept_redirects = 1 net.ipv4.conf.all.send_redirects = 1
-
To enable the execution of
/etc/rc.local
at OS startup, run the commands:sudo systemctl enable rc-local && \ sudo touch /etc/rc.local && \ sudo chmod 755 /etc/rc.local
-
To the
/etc/rc.local
file, add the code:#!/bin/sh iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
-
Reboot the NAT instance OS:
sudo reboot -f
-
Check that NAT is configured properly. To do this, reconnect to the NAT instance over SSH and run the command:
curl ifconfig.co
If the configuration is correct, the command outputs the public IP address of the NAT instance.
Delete the resources you created
- Delete the Data Proc cluster.
- Delete the VM.
- If you reserved public static IP addresses for the clusters, release and delete them.
- Delete the subnets.
- Delete the network.
To delete the infrastructure created with Terraform:
-
In the terminal window, change to the directory containing the infrastructure plan.
-
Delete the
data-proc-nat.tf
configuration file. -
Validate the Terraform configuration files using the following command:
terraform validate
If there are errors in the configuration files, Terraform points them out.
-
Confirm the update of resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All resources described in the configuration file will be deleted.