Jobs in Data Proc

Written by

Updated at April 14, 2024

In a Data Proc cluster, you can create and run jobs. This allows you to regularly upload datasets from Object Storage buckets, use them in calculations, and generate analytics.

The following job types are supported:

When creating a job, specify:

Arguments: Values used by the job's main executable file.
Properties: The key:value pairs that configure image components.

To create and start jobs, you can:

Use the Yandex Cloud interfaces. For more information, see basic examples for working with jobs.
Connect directly to the cluster node. For more information, see the example in the Running jobs from remote hosts that are not part of the cluster section.

To successfully run a job:

Grant access to the required Object Storage buckets for the cluster service account.

We recommend using at least two buckets:
- One with read-only permissions for storing the source data and files required to run the job.
- Another one with read and write permissions for storing job run results. Specify it when creating a cluster.
When creating a job, provide all files required for it.

If there are enough computing resources in the cluster, the jobs you created will be running concurrently; otherwise, a job queue will be formed.

Job logs

Job logs are saved in Yandex Cloud Logging. For more information, see Working with logs.

Jobs in Data Proc

Job logsJob logs

Was the article helpful?

Job logs