Recognizing text in an image

To recognize text in an image, use the Text recognition feature.

In the batchAnalyze method, set the type property to TEXT_DETECTION, and specify the recognition settings in the textDetectionConfig property.

Note

The service recognizes text written horizontally. If an image or text is rotated, recognition may fail. If you scanned a document and don't know how the image was rotated, send several versions of the image with different rotation angles.

Examples

To try the examples in this section:

  1. On the billing page, make sure that the payment account has the ACTIVE or TRIAL_ACTIVE status. If you don't have a payment account, create one.
  2. Make sure you have installed the cURL utility that is used in the examples.
  3. Get the ID of any folder that your account is granted the editor role or higher for.
  4. Get an IAM token for your Yandex account.

To perform these operations on behalf of the service account:

  1. Assign the editor role or a higher role to the service account for the folder where it was created.
  2. Do not specify the folder ID in the request: the service uses the folder where the service account was created.
  3. Choose the authentication method: get an IAM token or API key.

Recognize text

  1. Prepare an image file that meets the requirements:

    • Supported file formats: JPEG, PNG, PDF.

      The MIME-type of the file is specified in the mime_type property. The default is image.

    • Maximum file size: 1 MB.

    • Image size should not exceed 20 MP (length x width).

    Note

    Need a sample image? Download an image of the penguin crossing road sign.

  2. Encode the file as Base64:

    $ base64 -i input.jpg > output.txt
    
    C:> Base64.exe -e input.jpg > output.txt
    
    [Convert]::ToBase64String([IO.File]::ReadAllBytes("./input.jpg")) > output.txt
    
    # Импортируйте библиотеку для кодирования в Base64
    import base64
    
    # Создайте функцию, которая кодирует файл и возвращает результат.
    def encode_file(file):
      file_content = file.read()
      return base64.b64encode(file_content)
    
    // Считайте содержимое файла в память.
    var fs = require('fs');
    var file = fs.readFileSync('/path/to/file');
    
    // Получите содержимое файла в формате Base64.
    var encoded = Buffer.from(file).toString('base64');
    
    // Импортируйте библиотеку для кодирования в Base64.
    import org.apache.commons.codec.binary.Base64;
    
    // Получите содержимое файла в формате Base64.
    byte[] fileData = Base64.encodeBase64(yourFile.getBytes());
    
    import (
        "bufio"
        "encoding/base64"
        "io/ioutil"
        "os"
    )
    
    // Откройте файл
    f, _ := os.Open("/path/to/file")
    
    // Прочитайте содержимое файла.
    reader := bufio.NewReader(f)
    content, _ := ioutil.ReadAll(reader)
    
    // Получите содержимое файла в формате Base64.
    base64.StdEncoding.EncodeToString(content)
    
  3. Create a file with the request body (for example, body.json). In the content property, specify the Base64-encoded image.

    For the service to automatically detect the text language, specify the "language_codes": ["*"] property in the configuration.

    body.json:

    {
        "folderId": "b1gvmob95yysaplct532",
        "analyze_specs": [{
            "content": "iVBORw0KGgo...",
            "features": [{
                "type": "TEXT_DETECTION",
                "text_detection_config": {
                    "language_codes": ["*"]
                }
            }]
        }]
    }
    
  4. Send a request using the batchAnalyze method and save the response in a file, such as output.json:

    $ export IAM_TOKEN=CggaATEVAgA...
    $ curl -X POST \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer ${IAM_TOKEN}" \
        -d '@body.json' \
        https://vision.api.cloud.yandex.net/vision/v1/batchAnalyze > output.json
    

    The response will consist of recognized blocks of text, lines, and words with their position on the image indicated:

    {
      "results": [{
        "results": [{
          "textDetection": {
            "pages": [{
              "blocks": [{
                "boundingBox": { ... },
                "lines": [{
                    "boundingBox": { ... },
                    "words": [{
                      "boundingBox": {
                        "vertices": [{
                            "x": "504",
                            "y": "760"
                          },
                          {
                            "x": "504",
                            "y": "836"
                          },
                          {
                            "x": "826",
                            "y": "836"
                          },
                          {
                            "x": "826",
                            "y": "760"
                          }
                        ]
                      },
                      "languages": [{
                        "languageCode": "en",
                        "confidence": 0.9520227313
                      }],
                      "text": "PENGUINS",
                      "confidence": 0.9520227313
                    }],
                    "confidence": 0.9520227313
                  },
                  ...
                ]
              }],
              "width": "1920",
              "height": "1280"
            }]
          }
        }]
      }]
    }
    
  5. To get all the recognized words from the image, find all the lines with the text property (one way to do this is using the grep utility):

    $ grep -o "\"text\":\s\".*\"" output.json
    "text": "PENGUINS"
    "text": "CROSSING"
    "text": "SLOW"
    

Recognize text in multiple languages

  1. Create a file with the request body (for example, body.json). In the content property, specify a Base64-encoded image.

    body.json:

    {
        "folderId": "b1gvmob95yysaplct532",
        "analyze_specs": [{
            "content": "iVBORw0KGgo...",
            ...
    
  2. Select the languages for text recognition and the appropriate recognition models:

    • If all the languages are in the same model, specify multiple languages in the request body (for example, French and German):

      body.json:

      {
          "folderId": "b1gvmob95yysaplct532",
          "analyze_specs": [{
              "content": "iVBORw0KGgo...",
              "features": [{
                  "type": "TEXT_DETECTION",
                  "text_detection_config": {
                      "language_codes": ["fr", "de"]
                  }
              }]
          }]
      }
      
    • If the languages are from different models, set multiple configurations in the features property. For example, you can recognize images in Hebrew, Arabic, and English:

      body.json:

      {
          "folderId": "b1gvmob95yysaplct532",
          "analyze_specs": [{
              "content": "iVBORw0KGgo...",
              "features": [{
                  "type": "TEXT_DETECTION",
                  "text_detection_config": {
                      "language_codes": ["he","en"]
                  }
              },{
                  "type": "TEXT_DETECTION",
                  "text_detection_config": {
                      "language_codes": ["ar","en"]
                  }
              }]
          }]
      }
      
  3. Send a recognition request using the batchAnalyze method and save the response to a file, such as output.json:

    $ export IAM_TOKEN=CggaATEVAgA...
    $ curl -X POST \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer ${IAM_TOKEN}" \
        -d '@body.json' \
        https://vision.api.cloud.yandex.net/vision/v1/batchAnalyze > output.json