Getting started with Vision
This section describes how to get started with the Vision API for image analysis.
Before getting started
To use the examples, install cURL and get the authorization data for your account:
- On the billing page, make sure that your billing account status is
ACTIVE
orTRIAL_ACTIVE
. If you don't have a billing account, create one. - Get an IAM token required for authentication.
- Get the ID of any folder that your account is granted the
editor
role or higher for.
-
Select the authentication method:
-
Get an IAM token used in the examples.
-
Create an API key. Pass the API key in the
Authorization
header in the following format:Authorization: Api-Key <API key>
-
-
Assign the
editor
role or a higher role to the service account for the folder where it was created.Don't specify the folder ID in your requests: the service uses the folder where the service account was created.
-
Use the CLI to get an IAM token required for authentication:
$ yc iam create-token
-
Get the ID of any folder that your account is granted the
editor
role or higher for.
Analyze the image
You can use any of the analysis features from the list. For example, use the Text recognition feature:
-
Prepare an image file that meets the requirements:
-
Supported file formats: JPEG, PNG, PDF.
You specify the MIME type of the file in the
mime_type
property. The default isimage
. -
Maximum file size: 1 MB.
-
Image size should not exceed 20 MP (length x width).
Note
Need a sample image? Download an image of the penguin crossing road sign.
-
-
Encode the file as Base64:
UNIXWindowsPowerShellPythonNode.jsJavaGo$ base64 -i input.jpg > output.txt
C:> Base64.exe -e input.jpg > output.txt
[Convert]::ToBase64String([IO.File]::ReadAllBytes("./input.jpg")) > output.txt
# Импортируйте библиотеку для кодирования в Base64 import base64 # Создайте функцию, которая кодирует файл и возвращает результат. def encode_file(file): file_content = file.read() return base64.b64encode(file_content)
// Считайте содержимое файла в память. var fs = require('fs'); var file = fs.readFileSync('/path/to/file'); // Получите содержимое файла в формате Base64. var encoded = Buffer.from(file).toString('base64');
// Импортируйте библиотеку для кодирования в Base64. import org.apache.commons.codec.binary.Base64; // Получите содержимое файла в формате Base64. byte[] fileData = Base64.encodeBase64(yourFile.getBytes());
import ( "bufio" "encoding/base64" "io/ioutil" "os" ) // Откройте файл f, _ := os.Open("/path/to/file") // Прочитайте содержимое файла. reader := bufio.NewReader(f) content, _ := ioutil.ReadAll(reader) // Получите содержимое файла в формате Base64. base64.StdEncoding.EncodeToString(content)
-
Create a file with the request body (for example,
body.json
). In thecontent
property, specify the Base64-encoded image.For the service to automatically detect the text language, specify the
"language_codes": ["*"]
property in the configuration.body.json:
{ "folderId": "b1gvmob95yysaplct532", "analyze_specs": [{ "content": "iVBORw0KGgo...", "features": [{ "type": "TEXT_DETECTION", "text_detection_config": { "language_codes": ["*"] } }] }] }
-
Send a request using the batchAnalyze method and save the response in a file, such as
output.json
:$ export IAM_TOKEN=CggaATEVAgA... $ curl -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${IAM_TOKEN}" \ -d '@body.json' \ https://vision.api.cloud.yandex.net/vision/v1/batchAnalyze > output.json
The response will consist of recognized blocks of text, lines, and words with their position on the image indicated:
{ "results": [{ "results": [{ "textDetection": { "pages": [{ "blocks": [{ "boundingBox": { ... }, "lines": [{ "boundingBox": { ... }, "words": [{ "boundingBox": { "vertices": [{ "x": "504", "y": "760" }, { "x": "504", "y": "836" }, { "x": "826", "y": "836" }, { "x": "826", "y": "760" } ] }, "languages": [{ "languageCode": "en", "confidence": 0.9520227313 }], "text": "PENGUINS", "confidence": 0.9520227313 }], "confidence": 0.9520227313 }, ... ] }], "width": "1920", "height": "1280" }] } }] }] }
-
To get all the recognized words from the image, find all the lines with the
text
property (one way to do this is using the grep utility):$ grep -o "\"text\":\s\".*\"" output.json "text": "PENGUINS" "text": "CROSSING" "text": "SLOW"