The feature is at the Preview stage.
You can improve the quality of machine translations in a specific domain by using your own data to train the model. This will not degrade the quality of translations of everyday language.
What data is required for retraining
For retraining, you need a TMX file with source and target segments. For any significant effect, you need dozens of thousands of such segments.
The texts you use for training should match the target knowledge domain as closely as possible (such as legal documents, medicine, or oil and gas). If you provide texts on multiple domains, this may decrease the result quality.
How to retrain a model
Submit a request to technical support. Specify your cloud details and attach the TMX file. The model will be retrained within around two weeks.
To use a model, enter its ID in the
model parameter when sending a request.
Who will have access to the received model
Yandex Cloud does not use transmitted data to train its own models. The obtained model will only be available for the folder specified in the application.