Jobs in Data Proc
In a Data Proc cluster, you can create and execute jobs. This allows you to regularly upload a dataset from Object Storage buckets, use them in calculations, and generate analytics.
The following job types are supported:
When creating a job, specify:
- Arguments: Values used by the job's main executable file.
- Properties: The
key:valuepairs configuring image components.
For placing and starting a job:
Use the Yandex.Cloud interfaces. For more information, see basic examples of working with jobs.
Connect directly to the cluster node. For more information, see the example in Running jobs from remote hosts that are not part of a cluster Data Proc.
For successful job execution:
Grant access to the necessary Object Storage buckets for the cluster service account.
It's recommended to use at least two buckets:
- A bucket with read-only rights for storing source data and files necessary to run the job.
- A bucket with read and write rights for storing job execution results and logs. Specify it when creating a cluster.
When creating a job, pass all the files necessary for its operation.
If there are enough computing resources in the cluster, several created jobs are executed in parallel. Otherwise, a job queue is formed.
By default, job logs are saved to the bucket specified when creating a cluster at the path:
s3a://<bucket name>/<dataproc>/<cluster ID>/jobs/<job ID>/
For more information, see Managing jobs.