Data Proc API, REST: Job.list

Written by

Updated at May 26, 2023

HTTP request
Path parameters
Query parameters
Response

Retrieves a list of jobs for a cluster.

HTTP request

GET https://dataproc.api.cloud.yandex.net/dataproc/v1/clusters/{clusterId}/jobs

Path parameters

Parameter	Description
clusterId	Required. ID of the cluster to list jobs for. The maximum string length in characters is 50.

Parameter

Description

clusterId

Required. ID of the cluster to list jobs for.

The maximum string length in characters is 50.

Parameter	Description
pageSize	The maximum number of results per page to return. If the number of available results is larger than pageSize, the service returns a nextPageToken that can be used to get the next page of results in subsequent list requests. Default value: 100. The maximum value is 1000.
pageToken	Page token. To get the next page of results, set `page_token` to the nextPageToken returned by a previous list request. The maximum string length in characters is 100.
filter	A filter expression that filters jobs listed in the response. The expression must specify: The field name. Currently you can use filtering only on Job.name field. An `=` operator. The value in double quotes (`"`). Must be 3-63 characters long and match the regular expression `[a-z][-a-z0-9]{1,61}[a-z0-9]`. Example of a filter: `name=my-job`. The maximum string length in characters is 1000.

The maximum number of results per page to return. If the number of available results is larger than pageSize, the service returns a nextPageToken that can be used to get the next page of results in subsequent list requests. Default value: 100.

The maximum value is 1000.

pageToken

Page token. To get the next page of results, set page_token to the nextPageToken returned by a previous list request.

The maximum string length in characters is 100.

filter

A filter expression that filters jobs listed in the response.

The expression must specify:

The field name. Currently you can use filtering only on Job.name field.
An = operator.
The value in double quotes ("). Must be 3-63 characters long and match the regular expression [a-z][-a-z0-9]{1,61}[a-z0-9]. Example of a filter: name=my-job.

The maximum string length in characters is 1000.

Response

HTTP Code: 200 - OK

{
  "jobs": [
    {
      "id": "string",
      "clusterId": "string",
      "createdAt": "string",
      "startedAt": "string",
      "finishedAt": "string",
      "name": "string",
      "createdBy": "string",
      "status": "string",
      "applicationInfo": {
        "id": "string",
        "applicationAttempts": [
          {
            "id": "string",
            "amContainerId": "string"
          }
        ]
      },

      // `jobs[]` includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
      "mapreduceJob": {
        "args": [
          "string"
        ],
        "jarFileUris": [
          "string"
        ],
        "fileUris": [
          "string"
        ],
        "archiveUris": [
          "string"
        ],
        "properties": "object",

        // `jobs[].mapreduceJob` includes only one of the fields `mainJarFileUri`, `mainClass`
        "mainJarFileUri": "string",
        "mainClass": "string",
        // end of the list of possible fields`jobs[].mapreduceJob`

      },
      "sparkJob": {
        "args": [
          "string"
        ],
        "jarFileUris": [
          "string"
        ],
        "fileUris": [
          "string"
        ],
        "archiveUris": [
          "string"
        ],
        "properties": "object",
        "mainJarFileUri": "string",
        "mainClass": "string",
        "packages": [
          "string"
        ],
        "repositories": [
          "string"
        ],
        "excludePackages": [
          "string"
        ]
      },
      "pysparkJob": {
        "args": [
          "string"
        ],
        "jarFileUris": [
          "string"
        ],
        "fileUris": [
          "string"
        ],
        "archiveUris": [
          "string"
        ],
        "properties": "object",
        "mainPythonFileUri": "string",
        "pythonFileUris": [
          "string"
        ],
        "packages": [
          "string"
        ],
        "repositories": [
          "string"
        ],
        "excludePackages": [
          "string"
        ]
      },
      "hiveJob": {
        "properties": "object",
        "continueOnFailure": true,
        "scriptVariables": "object",
        "jarFileUris": [
          "string"
        ],

        // `jobs[].hiveJob` includes only one of the fields `queryFileUri`, `queryList`
        "queryFileUri": "string",
        "queryList": {
          "queries": [
            "string"
          ]
        },
        // end of the list of possible fields`jobs[].hiveJob`

      },
      // end of the list of possible fields`jobs[]`

    }
  ],
  "nextPageToken": "string"
}

Field	Description
jobs[]	object List of jobs for the specified cluster.
jobs[]. id	string ID of the job. Generated at creation time.
jobs[]. clusterId	string ID of the Data Proc cluster that the job belongs to.
jobs[]. createdAt	string (date-time) Creation timestamp. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
jobs[]. startedAt	string (date-time) The time when the job was started. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
jobs[]. finishedAt	string (date-time) The time when the job was finished. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
jobs[]. name	string Name of the job, specified in the create request.
jobs[]. createdBy	string The id of the user who created the job
jobs[]. status	string Job status. PROVISIONING: Job is logged in the database and is waiting for the agent to run it. PENDING: Job is acquired by the agent and is in the queue for execution. RUNNING: Job is being run in the cluster. ERROR: Job failed to finish the run properly. DONE: Job is finished. CANCELLED: Job is cancelled. CANCELLING: Job is waiting for cancellation.
jobs[]. applicationInfo	object Attributes of YARN application.
jobs[]. applicationInfo. id	string ID of YARN application
jobs[]. applicationInfo. applicationAttempts[]	object YARN application attempts
jobs[]. applicationInfo. applicationAttempts[]. id	string ID of YARN application attempt
jobs[]. applicationInfo. applicationAttempts[]. amContainerId	string ID of YARN Application Master container
jobs[]. mapreduceJob	object Specification for a MapReduce job. `jobs[]` includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
jobs[]. mapreduceJob. args[]	string Optional arguments to pass to the driver.
jobs[]. mapreduceJob. jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
jobs[]. mapreduceJob. fileUris[]	string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
jobs[]. mapreduceJob. archiveUris[]	string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
jobs[]. mapreduceJob. properties	object Property names and values, used to configure Data Proc and MapReduce.
jobs[]. mapreduceJob. mainJarFileUri	string `jobs[].mapreduceJob` includes only one of the fields `mainJarFileUri`, `mainClass` HCFS URI of the .jar file containing the driver class.
jobs[]. mapreduceJob. mainClass	string `jobs[].mapreduceJob` includes only one of the fields `mainJarFileUri`, `mainClass` The name of the driver class.
jobs[]. sparkJob	object Specification for a Spark job. `jobs[]` includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
jobs[]. sparkJob. args[]	string Optional arguments to pass to the driver.
jobs[]. sparkJob. jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
jobs[]. sparkJob. fileUris[]	string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
jobs[]. sparkJob. archiveUris[]	string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
jobs[]. sparkJob. properties	object Property names and values, used to configure Data Proc and Spark.
jobs[]. sparkJob. mainJarFileUri	string The HCFS URI of the JAR file containing the `main` class for the job.
jobs[]. sparkJob. mainClass	string The name of the driver class.
jobs[]. sparkJob. packages[]	string List of maven coordinates of jars to include on the driver and executor classpaths.
jobs[]. sparkJob. repositories[]	string List of additional remote repositories to search for the maven coordinates given with --packages.
jobs[]. sparkJob. excludePackages[]	string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.
jobs[]. pysparkJob	object Specification for a PySpark job. `jobs[]` includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
jobs[]. pysparkJob. args[]	string Optional arguments to pass to the driver.
jobs[]. pysparkJob. jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
jobs[]. pysparkJob. fileUris[]	string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
jobs[]. pysparkJob. archiveUris[]	string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
jobs[]. pysparkJob. properties	object Property names and values, used to configure Data Proc and PySpark.
jobs[]. pysparkJob. mainPythonFileUri	string URI of the file with the driver code. Must be a .py file.
jobs[]. pysparkJob. pythonFileUris[]	string URIs of Python files to pass to the PySpark framework.
jobs[]. pysparkJob. packages[]	string List of maven coordinates of jars to include on the driver and executor classpaths.
jobs[]. pysparkJob. repositories[]	string List of additional remote repositories to search for the maven coordinates given with --packages.
jobs[]. pysparkJob. excludePackages[]	string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.
jobs[]. hiveJob	object Specification for a Hive job. `jobs[]` includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
jobs[]. hiveJob. properties	object Property names and values, used to configure Data Proc and Hive.
jobs[]. hiveJob. continueOnFailure	boolean (boolean) Flag indicating whether a job should continue to run if a query fails.
jobs[]. hiveJob. scriptVariables	object Query variables and their values.
jobs[]. hiveJob. jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Hive driver and each task.
jobs[]. hiveJob. queryFileUri	string `jobs[].hiveJob` includes only one of the fields `queryFileUri`, `queryList` URI of the script with all the necessary Hive queries.
jobs[]. hiveJob. queryList	object List of Hive queries to be used in the job. `jobs[].hiveJob` includes only one of the fields `queryFileUri`, `queryList`
jobs[]. hiveJob. queryList. queries[]	string List of Hive queries.
nextPageToken	string Token for getting the next page of the list. If the number of results is greater than the specified pageSize, use `next_page_token` as the value for the pageToken parameter in the next list request. Each subsequent page will have its own `next_page_token` to continue paging through the results.

Data Proc API, REST: Job.list

HTTP requestHTTP request

Path parametersPath parameters

Query parametersQuery parameters

ResponseResponse

Was the article helpful?

HTTP request

Path parameters

Query parameters

Response