Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
© 2022 Yandex.Cloud LLC
Yandex Data Proc
  • Practical guidelines
    • All practical guidelines
    • Working with jobs
      • Overview
      • Working with Hive jobs
      • Working with MapReduce jobs
      • Working with PySpark jobs
      • Working with Spark jobs
      • Using Apache Hive
      • Running Spark applications
      • Running applications from a remote host
    • Configuring networks for Data Proc clusters
    • Using Yandex Object Storage in Data Proc
    • Using initialization scripts to configure GeeseFS in Data Proc
    • Exchanging data with Managed Service for ClickHouse
    • Importing databases using Sqoop
  • Step-by-step instructions
    • All instructions
    • Information about existing clusters
    • Creating clusters
    • Connecting to clusters
    • Editing clusters
    • Updating subclusters
    • Managing subclusters
    • Sqoop usage
    • Connecting to component interfaces
    • Managing jobs
      • All jobs
      • Spark jobs
      • PySpark jobs
      • Hive jobs
      • MapReduce jobs
    • Deleting clusters
    • Working with logs
    • Monitoring the state of clusters and hosts
  • Concepts
    • Data Proc overview
    • Host classes
    • Hadoop and component versions
    • Component interfaces and ports
    • Component web interfaces
    • Jobs in Data Proc
    • Automatic scaling
    • Decommissioning subclusters and hosts
    • Network in Data Proc
    • Maintenance
    • Quotas and limits
    • Storage in Data Proc
    • Component properties
    • Logs in Data Proc
    • Initialization scripts
  • Access management
  • Pricing policy
  • API reference
    • Authentication in the API
    • gRPC
      • Overview
      • ClusterService
      • JobService
      • ResourcePresetService
      • SubclusterService
      • OperationService
    • REST
      • Overview
      • Cluster
        • Overview
        • create
        • delete
        • get
        • list
        • listHosts
        • listOperations
        • listUILinks
        • start
        • stop
        • update
      • Job
        • Overview
        • cancel
        • create
        • get
        • list
        • listLog
      • ResourcePreset
        • Overview
        • get
        • list
      • Subcluster
        • Overview
        • create
        • delete
        • get
        • list
        • update
  • Revision history
    • Service updates
    • Images
  • Questions and answers
  1. Step-by-step instructions
  2. Working with logs

Working with logs

Written by
Yandex Cloud
  • Viewing log entries
  • Disabling sending logs
  • Storing logs

Data Proc cluster logs are collected and displayed by Yandex Cloud Logging.

To monitor the events on the cluster and its individual hosts, specify, in its settings, a relevant log group. You can do this when creating or updating the cluster. If no log group has been selected for the cluster, a default log group in the cluster directory will be used to send and store logs.

For more information, see Logs in Data Proc.

Viewing log entries

Management console
CLI
  1. Go to the folder page and select Data Proc.
  2. Click the name of the desired cluster and select the Logs tab.
  3. (optional) Specify the output settings:
    • Message filter:

      • Getting the job start output Data Proc:

        job_id="<job ID>"
        
      • Getting the stdout output for all YARN application containers:

        application_id="<YARN application ID>" AND yarn_log_type="stdout"
        
      • Getting YARN container's stderr output:

        container_id="<container YARN ID>" AND yarn_log_type="stderr"
        
      • Getting the YARN Resource Manager service logs from the cluster's managing host:

        hostname="<FQDN of the managing host>" AND log_type="hadoop-yarn-resourcemanager"
        
    • Message logging levels: From TRACE to FATAL.

    • Number of messages per page.

    • Message interval (a standard or arbitrary one).

If you don't have the Yandex Cloud command line interface yet, install and initialize it.

View a description of the CLI command to get logs:

yc logging read --help

Examples:

  • To get logs of the Data Proccluster's HDFS NameNode service, run the command:

    yc logging read \
      --group-id "<log group ID>" \
      --resource-ids "<cluster ID>" \
      --filter "log_type=hadoop-hdfs-namenode"
    
  • To get logs for the last two hours from all Data Proc clusters assigned to a specific log group, run the command:

    yc logging read \
      --group-id "<log group ID>" \
      --resource-types "dataproc.cluster" \
      --since 2h
    

Disabling sending logs

Management console
CLI

When creating or updating the cluster, add the dataproc:disable_cloud_logging property set to true.

If you don't have the Yandex Cloud command line interface yet, install and initialize it.

The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name or --folder-id parameter.

When creating or updating the cluster pass the dataproc:disable_cloud_logging=true value in the --property parameter or pass an empty string ("") instead of the log group ID in the --log-group-id parameter:

yc dataproc cluster create <cluster name> \
   ... \
  --log-group-id=""
yc dataproc cluster update <cluster ID or name> \
  --property dataproc:disable_cloud_logging=true

Storing logs

Receiving and storing logs is paid based on the Yandex Cloud Logging pricing rules. To edit the retention period and log access rules, edit the log group settings.

Learn more about working with logs in the Yandex Cloud Logging documentation.

Was the article helpful?

Language / Region
© 2022 Yandex.Cloud LLC
In this article:
  • Viewing log entries
  • Disabling sending logs
  • Storing logs