Vision API, gRPC: VisionService

Written by

Updated at September 8, 2023

Calls VisionService
BatchAnalyze

A set of methods for the Vision service.

Call	Description
BatchAnalyze	Analyzes a batch of images and returns results with annotations.

Calls VisionService

BatchAnalyze

Analyzes a batch of images and returns results with annotations.

rpc BatchAnalyze (BatchAnalyzeRequest) returns (BatchAnalyzeResponse)

BatchAnalyzeRequest

Field	Description
analyze_specs[]	AnalyzeSpec A list of specifications. Each specification contains the file to analyze and features to use for analysis. Restrictions: Supported file formats: `JPEG`, `PNG`. Maximum file size: 1 MB. Image size should not exceed 20M pixels (length x width). The number of elements must be in the range 1-8.
folder_id	string ID of the folder to which you have access. Required for authorization with a user account (see `yandex.cloud.iam.v1.UserAccount` resource). Don't specify this field if you make the request on behalf of a service account. The maximum string length in characters is 50.

AnalyzeSpec

Field	Description
source	oneof: `content` or `signature`
content	bytes Image content, represented as a stream of bytes. Note: As with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64. The maximum string length in characters is 10485760.
signature	string The maximum string length in characters is 16384.
features[]	Feature Requested features to use for analysis. Max count of requested features for one file is 8. The number of elements must be in the range 1-8.
mime_type	string MIME type of content (for example, `application/pdf`). The maximum string length in characters is 255.

Feature

Field	Description
type	enum Type Type of requested feature. `TEXT_DETECTION`: Text detection (OCR) feature. `CLASSIFICATION`: Classification feature. `FACE_DETECTION`: Face detection feature. `IMAGE_COPY_SEARCH`: Image copy search.
config	oneof: `classification_config` or `text_detection_config`
classification_config	FeatureClassificationConfig Required for the `CLASSIFICATION` type. Specifies configuration for the classification feature.
text_detection_config	FeatureTextDetectionConfig Required for the `TEXT_DETECTION` type. Specifies configuration for the text detection (OCR) feature.

FeatureClassificationConfig

Field	Description
model	string Model to use for image classification. The maximum string length in characters is 256.

FeatureTextDetectionConfig

Field	Description
language_codes[]	string List of the languages to recognize text. Specified in ISO 639-1 format (for example, `ru`). The number of elements must be in the range 1-8. The maximum string length in characters for each value is 3.
model	string Model to use for text detection. Possible values: `page` (default): this model is suitable for detecting multiple text entries in an image. `line`: this model is suitable for cropped images with one line of text. The maximum string length in characters is 50.

BatchAnalyzeResponse

Field	Description
results[]	AnalyzeResult Request results. Results have the same order as specifications in the request.

AnalyzeResult

Field	Description
results[]	FeatureResult Results for each requested feature. Feature results have the same order as in the request.
error	google.rpc.Status Return error in case of error with file processing.

FeatureResult

Field	Description
feature	oneof: `text_detection`, `classification`, `face_detection` or `image_copy_search`
text_detection	TextAnnotation Text detection (OCR) result.
classification	ClassAnnotation Classification result.
face_detection	FaceAnnotation Face detection result.
image_copy_search	ImageCopySearchAnnotation Image Copy Search result.
error	google.rpc.Status Return error in case of error during the specified feature processing.

TextAnnotation

Field	Description
pages[]	Page Pages of the recognized file. For JPEG and PNG files contains only 1 page.

Page

Field	Description
width	int64 Page width in pixels.
height	int64 Page height in pixels.
blocks[]	Block Recognized text blocks in this page.
entities[]	Entity Recognized entities

Block

Field	Description
bounding_box	Polygon Area on the page where the text block is located.
lines[]	Line Recognized lines in this block.

Polygon

Field	Description
vertices[]	Vertex The bounding polygon vertices.

Vertex

Field	Description
x	int64 X coordinate in pixels.
y	int64 Y coordinate in pixels.

Line

Field	Description
bounding_box	Polygon Area on the page where the line is located.
words[]	Word Recognized words in this line.
confidence	double Confidence of the OCR results for the line. Range [0, 1].

Word

Field	Description
bounding_box	Polygon Area on the page where the word is located.
text	string Recognized word value.
confidence	double Confidence of the OCR results for the word. Range [0, 1].
languages[]	DetectedLanguage A list of detected languages together with confidence.
entity_index	int64 Id of recognized word in entities array

DetectedLanguage

Field	Description
language_code	string Detected language code.
confidence	double Confidence of detected language. Range [0, 1].

Entity

Field	Description
name	string Entity name
text	string Recognized entity text

ClassAnnotation

Field	Description
properties[]	Property Properties extracted by a specified model. For example, if you ask to evaluate the image quality, the service could return such properties as `good` and `bad`.

Property

Field	Description
name	string Property name.
probability	double Probability of the property, from 0 to 1.

FaceAnnotation

Field	Description
faces[]	Face An array of detected faces for the specified image.

Face

Field	Description
bounding_box	Polygon Area on the image where the face is located.

ImageCopySearchAnnotation

Field	Description
copy_count	int64 Number of image copies
top_results[]	CopyMatch Top relevance result of image copy search

CopyMatch

Field	Description
image_url	string url of image
page_url	string url of page that contains image
title	string page title that contains image
description	string image description

Vision API, gRPC: VisionService

Calls VisionServiceCalls VisionService

BatchAnalyzeBatchAnalyze

BatchAnalyzeRequestBatchAnalyzeRequest

AnalyzeSpecAnalyzeSpec

FeatureFeature

FeatureClassificationConfigFeatureClassificationConfig

FeatureTextDetectionConfigFeatureTextDetectionConfig

BatchAnalyzeResponseBatchAnalyzeResponse

AnalyzeResultAnalyzeResult

FeatureResultFeatureResult

TextAnnotationTextAnnotation

PagePage

BlockBlock

PolygonPolygon

VertexVertex

LineLine

WordWord

DetectedLanguageDetectedLanguage

EntityEntity

ClassAnnotationClassAnnotation

PropertyProperty

FaceAnnotationFaceAnnotation

FaceFace

ImageCopySearchAnnotationImageCopySearchAnnotation

CopyMatchCopyMatch

Was the article helpful?

Calls VisionService

BatchAnalyze

BatchAnalyzeRequest

AnalyzeSpec

Feature

FeatureClassificationConfig

FeatureTextDetectionConfig

BatchAnalyzeResponse

AnalyzeResult

FeatureResult

TextAnnotation

Page

Block

Polygon

Vertex

Line

Word

DetectedLanguage

Entity

ClassAnnotation

Property

FaceAnnotation

Face

ImageCopySearchAnnotation

CopyMatch