Working with Data Proc templates
Data Proc templates let you preset a cluster's configuration for your project and make it easier to deploy temporary clusters. You can find a list of templates on the project page under Project resources → Data Proc, the Shared tab.
To use Data Proc clusters, set the following project parameters:
-
Default folder to enable integration with other Yandex Cloud services. A Data Proc cluster will be deployed in this folder based on the current cloud quotas. A fee for using the cluster will be debited from your cloud billing account.
-
Service account to be used by DataSphere for creating and managing clusters. The service account needs the following roles:
dataproc.agent
to use Data Proc clusters.dataproc.admin
to create clusters from Data Proc templates.vpc.user
to use the Data Proc cluster network.iam.serviceAccounts.user
to create resources in the folder on behalf of the service account.
-
Subnet for DataSphere to communicate with the Data Proc cluster. Since the Data Proc cluster needs to access the internet, make sure to configure a NAT gateway in the subnet.
Note
If you specified a subnet in the project settings, the time to allocate computing resources may be increased.
Creating a Data Proc template
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. -
Under Project resources, click
Data Proc. -
Click Create template.
-
In the Template name field, enter a name for the template. The naming requirements are as follows:
- The name must be from 3 to 63 characters long.
- It may contain lowercase Latin letters, numbers, and hyphens.
- The first character must be a letter and the last character cannot be a hyphen.
-
Click Create. This will display the created template's info page.
Activating a Data Proc template
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Under Project resources, click Data Proc.
- Click
A cluster based on the activated Data Proc template is created when you run your project in the IDE.
Sharing a Data Proc template
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Under Project resources, click Data Proc.
- Select the appropriate template from the list.
- Go to the Access tab.
- Enable the visibility option next to the name of the community where you want to share the template.
To make a template available for use in another project, the project administrator should add it to the Shared tab.
Editing a template
You can only change the name of an existing template. To change the configuration, recreate the template.
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Under Project resources, click Data Proc.
- Select the relevant template in the list, click
- Edit the name and click Save.
Deleting a Data Proc template
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Under Project resources, click Data Proc.
- In the list, select the template to delete.
- Click
- Click Confirm.