Working with logs
Data Proc cluster logs are collected and displayed by Yandex Cloud Logging.
To monitor the events on the cluster and its individual hosts, specify, in its settings, a relevant log group. You can do this when creating or updating the cluster. If no log group has been selected for the cluster, a default log group in the cluster directory will be used to send and store logs.
For more information, see Logs in Data Proc.
Viewing log entries
- Go to the folder page and select Data Proc.
- Click the name of the desired cluster and select the Logs tab.
- (optional) Specify the output settings:
-
-
Getting the job start output Data Proc:
job_id="<job ID>"
-
Getting the stdout output for all YARN application containers:
application_id="<YARN application ID>" AND yarn_log_type="stdout"
-
Getting YARN container's stderr output:
container_id="<container YARN ID>" AND yarn_log_type="stderr"
-
Getting the YARN Resource Manager service logs from the cluster's managing host:
hostname="<FQDN of the managing host>" AND log_type="hadoop-yarn-resourcemanager"
-
-
Message logging levels: From
TRACE
toFATAL
. -
Number of messages per page.
-
Message interval (a standard or arbitrary one).
-
If you don't have the Yandex Cloud command line interface yet, install and initialize it.
View a description of the CLI command to get logs:
yc logging read --help
Examples:
-
To get logs of the Data Proccluster's HDFS NameNode service, run the command:
yc logging read \ --group-id "<log group ID>" \ --resource-ids "<cluster ID>" \ --filter "log_type=hadoop-hdfs-namenode"
-
To get logs for the last two hours from all Data Proc clusters assigned to a specific log group, run the command:
yc logging read \ --group-id "<log group ID>" \ --resource-types "dataproc.cluster" \ --since 2h
Disabling sending logs
When creating or updating the cluster, add the dataproc:disable_cloud_logging
property set to true
.
If you don't have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
When creating or updating the cluster pass the dataproc:disable_cloud_logging=true
value in the --property
parameter or pass an empty string (""
) instead of the log group ID in the --log-group-id
parameter:
yc dataproc cluster create <cluster name> \
... \
--log-group-id=""
yc dataproc cluster update <cluster ID or name> \
--property dataproc:disable_cloud_logging=true
Storing logs
Receiving and storing logs is paid based on the Yandex Cloud Logging pricing rules. To edit the retention period and log access rules, edit the log group settings.
Learn more about working with logs in the Yandex Cloud Logging documentation.