Yandex.Cloud
  • Services
  • Why Yandex.Cloud
  • Solutions
  • Pricing
  • Documentation
  • Contact us
Get started
Yandex Data Proc
  • Use cases
    • Configuring networks for Data Proc clusters
    • Using Apache Hive
    • Running Spark applications
    • Running applications from a remote host
    • Copying files from Yandex Object Storage
  • Step-by-step instructions
    • All instructions
    • Creating clusters
    • Connecting to clusters
    • Updating subclusters
    • Managing subclusters
    • Deleting clusters
  • Concepts
    • Data Proc overview
    • Host classes
    • Hadoop and component versions
    • Component interfaces and ports
    • Component web interfaces
    • Auto scaling
    • Decommissioning subclusters and hosts
    • Network in Data Proc
    • Quotas and limits
  • Access management
  • Pricing policy
  • API reference
    • Authentication in the API
    • gRPC
      • Overview
      • ClusterService
      • JobService
      • ResourcePresetService
      • SubclusterService
      • OperationService
    • REST
      • Overview
      • Cluster
        • Overview
        • create
        • delete
        • get
        • list
        • listHosts
        • listOperations
        • listUILinks
        • start
        • stop
        • update
      • Job
        • Overview
        • create
        • get
        • list
        • listLog
      • ResourcePreset
        • Overview
        • get
        • list
      • Subcluster
        • Overview
        • create
        • delete
        • get
        • list
        • update
  • Questions and answers
  1. API reference
  2. gRPC
  3. JobService

JobService

  • Calls JobService
  • List
    • ListJobsRequest
    • ListJobsResponse
    • Job
    • MapreduceJob
    • SparkJob
    • PysparkJob
    • HiveJob
    • QueryList
  • Create
    • CreateJobRequest
    • MapreduceJob
    • SparkJob
    • PysparkJob
    • HiveJob
    • QueryList
    • Operation
    • CreateJobMetadata
    • Job
    • MapreduceJob
    • SparkJob
    • PysparkJob
    • HiveJob
    • QueryList
  • Get
    • GetJobRequest
    • Job
    • MapreduceJob
    • SparkJob
    • PysparkJob
    • HiveJob
    • QueryList
  • ListLog
    • ListJobLogRequest
    • ListJobLogResponse

A set of methods for managing Data Proc jobs.

Call Description
List Retrieves a list of jobs for a cluster.
Create Creates a job for a cluster.
Get Returns the specified job.
ListLog Returns a log for specified job.

Calls JobService

List

Retrieves a list of jobs for a cluster.

rpc List (ListJobsRequest) returns (ListJobsResponse)

ListJobsRequest

Field Description
cluster_id string
Required. ID of the cluster to list jobs for. The maximum string length in characters is 50.
page_size int64
The maximum number of results per page to return. If the number of available results is larger than page_size, the service returns a ListJobsResponse.next_page_token that can be used to get the next page of results in subsequent list requests. Default value: 100. The maximum value is 1000.
page_token string
Page token. To get the next page of results, set page_token to the ListJobsResponse.next_page_token returned by a previous list request. The maximum string length in characters is 100.
filter string
  1. The field name. Currently you can use filtering only on Job.name field.
  2. An operator. Can be either = or != for single values, IN or NOT IN for lists of values.
  3. The value. Must be 3-63 characters long and match the regular expression `^[a-z][-a-z0-9]{1,61}[a-z0-9].
The maximum string length in characters is 1000.

ListJobsResponse

Field Description
jobs[] Job
List of jobs for the specified cluster.
next_page_token string
Token for getting the next page of the list. If the number of results is greater than the specified ListJobsRequest.page_size, use next_page_token as the value for the ListJobsRequest.page_token parameter in the next list request.
Each subsequent page will have its own next_page_token to continue paging through the results.

Job

Field Description
id string
ID of the job. Generated at creation time.
cluster_id string
ID of the Data Proc cluster that the job belongs to.
created_at google.protobuf.Timestamp
Creation timestamp.
started_at google.protobuf.Timestamp
The time when the job was started.
finished_at google.protobuf.Timestamp
The time when the job was finished.
name string
Name of the job, specified in the JobService.Create request.
created_by string
The id of the user who created the job
status enum Status
Job status.
  • PROVISIONING: Job is logged in the database and is waiting for the agent to run it.
  • PENDING: Job is acquired by the agent and is in the queue for execution.
  • RUNNING: Job is being run in the cluster.
  • ERROR: Job failed to finish the run properly.
  • DONE: Job is finished.
    job_spec oneof: mapreduce_job, spark_job, pyspark_job or hive_job
    Specification for the job.
      mapreduce_job MapreduceJob
    Specification for a MapReduce job.
      spark_job SparkJob
    Specification for a Spark job.
      pyspark_job PysparkJob
    Specification for a PySpark job.
      hive_job HiveJob
    Specification for a Hive job.

    MapreduceJob

    Field Description
    args[] string
    Optional arguments to pass to the driver.
    jar_file_uris[] string
    JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
    file_uris[] string
    URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
    archive_uris[] string
    URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
    properties map<string,string>
    Property names and values, used to configure Data Proc and MapReduce.
    driver oneof: main_jar_file_uri or main_class
      main_jar_file_uri string
    HCFS URI of the .jar file containing the driver class.
      main_class string
    The name of the driver class.

    SparkJob

    Field Description
    args[] string
    Optional arguments to pass to the driver.
    jar_file_uris[] string
    JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
    file_uris[] string
    URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
    archive_uris[] string
    URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
    properties map<string,string>
    Property names and values, used to configure Data Proc and Spark.
    main_jar_file_uri string
    The HCFS URI of the JAR file containing the main class for the job.
    main_class string
    The name of the driver class.

    PysparkJob

    Field Description
    args[] string
    Optional arguments to pass to the driver.
    jar_file_uris[] string
    JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
    file_uris[] string
    URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
    archive_uris[] string
    URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
    properties map<string,string>
    Property names and values, used to configure Data Proc and PySpark.
    main_python_file_uri string
    URI of the file with the driver code. Must be a .py file.
    python_file_uris[] string
    URIs of Python files to pass to the PySpark framework.

    HiveJob

    Field Description
    properties map<string,string>
    Property names and values, used to configure Data Proc and Hive.
    continue_on_failure bool
    Flag indicating whether a job should continue to run if a query fails.
    script_variables map<string,string>
    Query variables and their values.
    jar_file_uris[] string
    JAR file URIs to add to CLASSPATH of the Hive driver and each task.
    query_type oneof: query_file_uri or query_list
      query_file_uri string
    URI of the script with all the necessary Hive queries.
      query_list QueryList
    List of Hive queries to be used in the job.

    QueryList

    Field Description
    queries[] string
    List of Hive queries.

    Create

    Creates a job for a cluster.

    rpc Create (CreateJobRequest) returns (operation.Operation)

    Metadata and response of Operation:

        Operation.metadata:CreateJobMetadata

        Operation.response:Job

    CreateJobRequest

    Field Description
    cluster_id string
    Required. ID of the cluster to create a job for. The maximum string length in characters is 50.
    name string
    Name of the job. Value must match the regular expression `
    job_spec oneof: mapreduce_job, spark_job, pyspark_job or hive_job
    Specification for the job.
      mapreduce_job MapreduceJob
    Specification for a MapReduce job.
      spark_job SparkJob
    Specification for a Spark job.
      pyspark_job PysparkJob
    Specification for a PySpark job.
      hive_job HiveJob
    Specification for a Hive job.

    MapreduceJob

    Field Description
    args[] string
    Optional arguments to pass to the driver.
    jar_file_uris[] string
    JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
    file_uris[] string
    URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
    archive_uris[] string
    URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
    properties map<string,string>
    Property names and values, used to configure Data Proc and MapReduce.
    driver oneof: main_jar_file_uri or main_class
      main_jar_file_uri string
    HCFS URI of the .jar file containing the driver class.
      main_class string
    The name of the driver class.

    SparkJob

    Field Description
    args[] string
    Optional arguments to pass to the driver.
    jar_file_uris[] string
    JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
    file_uris[] string
    URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
    archive_uris[] string
    URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
    properties map<string,string>
    Property names and values, used to configure Data Proc and Spark.
    main_jar_file_uri string
    The HCFS URI of the JAR file containing the main class for the job.
    main_class string
    The name of the driver class.

    PysparkJob

    Field Description
    args[] string
    Optional arguments to pass to the driver.
    jar_file_uris[] string
    JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
    file_uris[] string
    URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
    archive_uris[] string
    URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
    properties map<string,string>
    Property names and values, used to configure Data Proc and PySpark.
    main_python_file_uri string
    URI of the file with the driver code. Must be a .py file.
    python_file_uris[] string
    URIs of Python files to pass to the PySpark framework.

    HiveJob

    Field Description
    properties map<string,string>
    Property names and values, used to configure Data Proc and Hive.
    continue_on_failure bool
    Flag indicating whether a job should continue to run if a query fails.
    script_variables map<string,string>
    Query variables and their values.
    jar_file_uris[] string
    JAR file URIs to add to CLASSPATH of the Hive driver and each task.
    query_type oneof: query_file_uri or query_list
      query_file_uri string
    URI of the script with all the necessary Hive queries.
      query_list QueryList
    List of Hive queries to be used in the job.

    QueryList

    Field Description
    queries[] string
    List of Hive queries.

    Operation

    Field Description
    id string
    ID of the operation.
    description string
    Description of the operation. 0-256 characters long.
    created_at google.protobuf.Timestamp
    Creation timestamp.
    created_by string
    ID of the user or service account who initiated the operation.
    modified_at google.protobuf.Timestamp
    The time when the Operation resource was last modified.
    done bool
    If the value is false, it means the operation is still in progress. If true, the operation is completed, and either error or response is available.
    metadata google.protobuf.Any<CreateJobMetadata>
    Service-specific metadata associated with the operation. It typically contains the ID of the target resource that the operation is performed on. Any method that returns a long-running operation should document the metadata type, if any.
    result oneof: error or response
    The operation result. If done == false and there was no failure detected, neither error nor response is set. If done == false and there was a failure detected, error is set. If done == true, exactly one of error or response is set.
      error google.rpc.Status
    The error result of the operation in case of failure or cancellation.
      response google.protobuf.Any<Job>
    if operation finished successfully.

    CreateJobMetadata

    Field Description
    cluster_id string
    Required. ID of the cluster that the job is being created for. The maximum string length in characters is 50.
    job_id string
    ID of the job being created. The maximum string length in characters is 50.

    Job

    Field Description
    id string
    ID of the job. Generated at creation time.
    cluster_id string
    ID of the Data Proc cluster that the job belongs to.
    created_at google.protobuf.Timestamp
    Creation timestamp.
    started_at google.protobuf.Timestamp
    The time when the job was started.
    finished_at google.protobuf.Timestamp
    The time when the job was finished.
    name string
    Name of the job, specified in the JobService.Create request.
    created_by string
    The id of the user who created the job
    status enum Status
    Job status.
    • PROVISIONING: Job is logged in the database and is waiting for the agent to run it.
    • PENDING: Job is acquired by the agent and is in the queue for execution.
    • RUNNING: Job is being run in the cluster.
    • ERROR: Job failed to finish the run properly.
    • DONE: Job is finished.
      job_spec oneof: mapreduce_job, spark_job, pyspark_job or hive_job
      Specification for the job.
        mapreduce_job MapreduceJob
      Specification for a MapReduce job.
        spark_job SparkJob
      Specification for a Spark job.
        pyspark_job PysparkJob
      Specification for a PySpark job.
        hive_job HiveJob
      Specification for a Hive job.

      MapreduceJob

      Field Description
      args[] string
      Optional arguments to pass to the driver.
      jar_file_uris[] string
      JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
      file_uris[] string
      URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
      archive_uris[] string
      URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
      properties map<string,string>
      Property names and values, used to configure Data Proc and MapReduce.
      driver oneof: main_jar_file_uri or main_class
        main_jar_file_uri string
      HCFS URI of the .jar file containing the driver class.
        main_class string
      The name of the driver class.

      SparkJob

      Field Description
      args[] string
      Optional arguments to pass to the driver.
      jar_file_uris[] string
      JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
      file_uris[] string
      URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
      archive_uris[] string
      URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
      properties map<string,string>
      Property names and values, used to configure Data Proc and Spark.
      main_jar_file_uri string
      The HCFS URI of the JAR file containing the main class for the job.
      main_class string
      The name of the driver class.

      PysparkJob

      Field Description
      args[] string
      Optional arguments to pass to the driver.
      jar_file_uris[] string
      JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
      file_uris[] string
      URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
      archive_uris[] string
      URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
      properties map<string,string>
      Property names and values, used to configure Data Proc and PySpark.
      main_python_file_uri string
      URI of the file with the driver code. Must be a .py file.
      python_file_uris[] string
      URIs of Python files to pass to the PySpark framework.

      HiveJob

      Field Description
      properties map<string,string>
      Property names and values, used to configure Data Proc and Hive.
      continue_on_failure bool
      Flag indicating whether a job should continue to run if a query fails.
      script_variables map<string,string>
      Query variables and their values.
      jar_file_uris[] string
      JAR file URIs to add to CLASSPATH of the Hive driver and each task.
      query_type oneof: query_file_uri or query_list
        query_file_uri string
      URI of the script with all the necessary Hive queries.
        query_list QueryList
      List of Hive queries to be used in the job.

      QueryList

      Field Description
      queries[] string
      List of Hive queries.

      Get

      Returns the specified job.

      rpc Get (GetJobRequest) returns (Job)

      GetJobRequest

      Field Description
      cluster_id string
      Required. ID of the cluster to request a job from. The maximum string length in characters is 50.
      job_id string
      Required. ID of the job to return.
      To get a job ID make a JobService.List request. The maximum string length in characters is 50.

      Job

      Field Description
      id string
      ID of the job. Generated at creation time.
      cluster_id string
      ID of the Data Proc cluster that the job belongs to.
      created_at google.protobuf.Timestamp
      Creation timestamp.
      started_at google.protobuf.Timestamp
      The time when the job was started.
      finished_at google.protobuf.Timestamp
      The time when the job was finished.
      name string
      Name of the job, specified in the JobService.Create request.
      created_by string
      The id of the user who created the job
      status enum Status
      Job status.
      • PROVISIONING: Job is logged in the database and is waiting for the agent to run it.
      • PENDING: Job is acquired by the agent and is in the queue for execution.
      • RUNNING: Job is being run in the cluster.
      • ERROR: Job failed to finish the run properly.
      • DONE: Job is finished.
        job_spec oneof: mapreduce_job, spark_job, pyspark_job or hive_job
        Specification for the job.
          mapreduce_job MapreduceJob
        Specification for a MapReduce job.
          spark_job SparkJob
        Specification for a Spark job.
          pyspark_job PysparkJob
        Specification for a PySpark job.
          hive_job HiveJob
        Specification for a Hive job.

        MapreduceJob

        Field Description
        args[] string
        Optional arguments to pass to the driver.
        jar_file_uris[] string
        JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
        file_uris[] string
        URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
        archive_uris[] string
        URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
        properties map<string,string>
        Property names and values, used to configure Data Proc and MapReduce.
        driver oneof: main_jar_file_uri or main_class
          main_jar_file_uri string
        HCFS URI of the .jar file containing the driver class.
          main_class string
        The name of the driver class.

        SparkJob

        Field Description
        args[] string
        Optional arguments to pass to the driver.
        jar_file_uris[] string
        JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
        file_uris[] string
        URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
        archive_uris[] string
        URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
        properties map<string,string>
        Property names and values, used to configure Data Proc and Spark.
        main_jar_file_uri string
        The HCFS URI of the JAR file containing the main class for the job.
        main_class string
        The name of the driver class.

        PysparkJob

        Field Description
        args[] string
        Optional arguments to pass to the driver.
        jar_file_uris[] string
        JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
        file_uris[] string
        URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
        archive_uris[] string
        URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
        properties map<string,string>
        Property names and values, used to configure Data Proc and PySpark.
        main_python_file_uri string
        URI of the file with the driver code. Must be a .py file.
        python_file_uris[] string
        URIs of Python files to pass to the PySpark framework.

        HiveJob

        Field Description
        properties map<string,string>
        Property names and values, used to configure Data Proc and Hive.
        continue_on_failure bool
        Flag indicating whether a job should continue to run if a query fails.
        script_variables map<string,string>
        Query variables and their values.
        jar_file_uris[] string
        JAR file URIs to add to CLASSPATH of the Hive driver and each task.
        query_type oneof: query_file_uri or query_list
          query_file_uri string
        URI of the script with all the necessary Hive queries.
          query_list QueryList
        List of Hive queries to be used in the job.

        QueryList

        Field Description
        queries[] string
        List of Hive queries.

        ListLog

        Returns a log for specified job.

        rpc ListLog (ListJobLogRequest) returns (ListJobLogResponse)

        ListJobLogRequest

        Field Description
        cluster_id string
        Required. ID of the cluster that the job is being created for. The maximum string length in characters is 50.
        job_id string
        ID of the job being created. The maximum string length in characters is 50.
        page_size int64
        The maximum bytes of job log per response to return. If the number of available bytes is larger than page_size, the service returns a ListJobLogResponse.next_page_token that can be used to get the next page of output in subsequent list requests. Default value: 1048576. The maximum value is 1048576.
        page_token string
        Page token. To get the next page of results, set page_token to the ListJobLogResponse.next_page_token returned by a previous list request. The maximum string length in characters is 100.

        ListJobLogResponse

        Field Description
        content string
        Requested part of Data Proc Job log.
        next_page_token string
        This token allows you to get the next page of results for ListLog requests, if the number of results is larger than page_size specified in the request. To get the next page, specify the value of next_page_token as a value for the page_token parameter in the next ListLog request. Subsequent ListLog requests will have their own next_page_token to continue paging through the results.
        In this article:
        • Calls JobService
        • List
        • ListJobsRequest
        • ListJobsResponse
        • Job
        • MapreduceJob
        • SparkJob
        • PysparkJob
        • HiveJob
        • QueryList
        • Create
        • CreateJobRequest
        • MapreduceJob
        • SparkJob
        • PysparkJob
        • HiveJob
        • QueryList
        • Operation
        • CreateJobMetadata
        • Job
        • MapreduceJob
        • SparkJob
        • PysparkJob
        • HiveJob
        • QueryList
        • Get
        • GetJobRequest
        • Job
        • MapreduceJob
        • SparkJob
        • PysparkJob
        • HiveJob
        • QueryList
        • ListLog
        • ListJobLogRequest
        • ListJobLogResponse
        Language / Region
        Careers
        Privacy policy
        Terms of use
        Brandbook
        © 2021 Yandex.Cloud LLC