SpeechKit Synthesis Service gRPC API: Synthesizer

A set of methods for voice synthesis.

CallDescription
UtteranceSynthesisSynthesizing text into speech.

Calls Synthesizer

UtteranceSynthesis

Synthesizing text into speech.

rpc UtteranceSynthesis (UtteranceSynthesisRequest) returns (stream UtteranceSynthesisResponse)

UtteranceSynthesisRequest

FieldDescription
modelstring
The name of the model. Specifies basic synthesis functionality. Currently should be empty. Do not use it.
Utteranceoneof: text or text_template
Text to synthesis, one of text synthesis markups.
  textstring
Raw text (e.g. "Hello, Alice").
  text_templateTextTemplate
Text template instance, e.g. {"Hello, {username}" with username="Alice"}.
hints[]Hints
Optional hints for synthesis.
output_audio_specAudioFormatOptions
Optional. Default: 22050 Hz, linear 16-bit signed little-endian PCM, with WAV header
loudness_normalization_typeenum LoudnessNormalizationType
Specifies type of loudness normalization. Optional. Default: LUFS.
  • MAX_PEAK: The type of normalization, wherein the gain is changed to bring the highest PCM sample value or analog signal peak to a given level.
  • LUFS: The type of normalization based on EBU R 128 recommendation.
unsafe_modebool
Optional. Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible.

TextTemplate

FieldDescription
text_templatestring
Template text.
Sample:The {animal} goes to the {place}.
variables[]TextVariable
Defining variables in template text.
Sample: {animal: cat, place: forest}

TextVariable

FieldDescription
variable_namestring
The name of the variable.
variable_valuestring
The text of the variable.

Hints

FieldDescription
Hintoneof: voice, audio_template, speed, volume or role
The hint for TTS engine to specify synthesised audio characteristics.
  voicestring
Name of speaker to use.
  audio_templateAudioTemplate
Template for synthesizing.
  speeddouble
Hint to change speed.
  volumedouble
Hint to regulate normalization level.
  • For MAX_PEAK loudness_normalization_type: volume changes in a range (0;1], default value is 0.7.
  • For LUFS loudness_normalization_type: volume changes in a range [-145;0), default value is -19.
  rolestring
Hint to specify pronunciation character for the speaker.

AudioTemplate

FieldDescription
audioAudioContent
Audio file.
text_templateTextTemplate
Template and description of its variables.
variables[]AudioVariable
Describing variables in audio.

AudioContent

FieldDescription
AudioSourceoneof: content
The audio source to read the data from.
  contentbytes
Bytes with audio data.
audio_specAudioFormatOptions
Description of the audio format.

AudioVariable

FieldDescription
variable_namestring
The name of the variable.
variable_start_msint64
Start time of the variable in milliseconds.
variable_length_msint64
Length of the variable in milliseconds.

AudioFormatOptions

FieldDescription
AudioFormatoneof: raw_audio or container_audio
  raw_audioRawAudio
The audio format specified in request parameters.
  container_audioContainerAudio
The audio format specified inside the container metadata.

RawAudio

FieldDescription
audio_encodingenum AudioEncoding
Encoding type.
  • LINEAR16_PCM: Audio bit depth 16-bit signed little-endian (Linear PCM).
sample_rate_hertzint64
Sampling frequency of the signal.

ContainerAudio

FieldDescription
container_audio_typeenum ContainerAudioType
  • WAV: Audio bit depth 16-bit signed little-endian (Linear PCM).
  • OGG_OPUS: Data is encoded using the OPUS audio codec and compressed using the OGG container format.
  • MP3: Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format.

UtteranceSynthesisResponse

FieldDescription
audio_chunkAudioChunk
Part of synthesized audio.

AudioChunk

FieldDescription
databytes
Sequence of bytes of the synthesized audio in format specified in output_audio_spec.
Previous