Disclaimer
We highly encourage you to use the API through our SDK. If you still want to make requests on your own, there's a caveat to remember. Most of the JavasScript tools to make HTTP requests, interpret the server's response as
plaintext
orjson
by default, but our TTS/VC endpoints return the binary data. You have take that into account to be able to use it.In case of Axios, you have to configure the
responseType
asarraybuffer
.In Fetch, you have to use the
response.arrayBuffer()
method.For other tools, please refer to the corresponding documentation.
If you use our SDK, you don't have to worry about it.
Security of using the SDK in browser
The ClientKey and ApiKey required to authenticate the SDK are meant to be private.
Packaging them with the frontend application effectively leaks them publicly and allows anyone to use your gemelo.ai Account directly.
We strongly advise to proxy those requests via your own backend or use the browser SDK only for internal usage.
Requirements
The JS/TS SDK works both in all of the modern browsers and Node.js >=18.0.0. We support CommonJS and ESModules.
Installation
yarn add @charactr/api-sdk
SDK initialization
Before you use the SDK, you have to initialize it first with your account's credentials. To get them, please login to the Client Panel.
const sdk = new CharactrAPISDK({ ClientKey: "", APIKey: "" });
await sdk.init();
Text to speech
Getting available voices
const voices = await sdk.tts.getVoices();
Refer here for the response type.
You can also browse the voices more conveniently in the Client Panel.
Making TTS requests
const result = await sdk.tts.convert(voiceId, "Hello world");
Refer here for the response type.
TTS Simplex Streaming
Simplex streaming allows you to send the text to the server and receive the converted audio response as a stream through WebSockets. It greatly reduces the latency compared to the traditional REST endpoint and it's not limited by the text's length. To use it, call sdk.tts.convertStreamSimplex
method:
try {
await sdk.tts.convertStreamSimplex(voiceId, text, {
onData: (data: ArrayBuffer) => {
// do anything with the incoming binary data
},
});
} catch (error) {
console.error(error);
}
TTS Duplex Streaming
This more advanced technique allows you to start a two-way, WebSockets-based stream, where you can asynchronously send text and receive the converted audio. You control when the stream ends.
To use, it, call the sdk.tts.convertStreamDuplex
method:
async convertStreamDuplex(
voice: number | Voice,
cb: TTSStreamDuplexCallbacks
): Promise<TTSStreamDuplex>
You may pass callbacks to know when the stream ends/errors and to receive data:
export interface TTSStreamDuplexCallbacks {
onData?: (data: ArrayBuffer) => void;
onClose?: (event: CloseEvent) => void;
}
The method also returns a set of functions to control the stream:
export interface TTSStreamDuplex {
/**
* sends to the server the text to be converted into audio
*/
convert: (text: string) => void;
/**
* resolves when there were 5 seconds of stream inactivity
*/
wait: () => Promise<void>;
/**
* terminates the websocket connection immediately
* in most use cases we advise to use the `close()` method instead
*/
terminate: () => void;
/**
* requests the server to close the connection gracefully
*/
close: () => void;
}
Example usage:
const stream = await sdk.tts.convertStreamDuplex(voiceId, {
onData: (data: ArrayBuffer) => {
// do anything with the incoming binary data
},
onClose: (event: CloseEvent) => {
if (event.code === 1000) {
// the stream ended successfully
} else {
// the stream was closed because of an error
}
},
});
stream.convert(text);
stream.close();
Voice Conversion
Getting available voices
const voices = await sdk.vc.getVoices();
Refer here for the response type.
You can also browse the voices more conveniently in the Client Panel.
Making VC requests
const inputAudioBlob = new Blob();
const result = await sdk.vc.convert(inputAudioBlob, "Hello world");
Refer here for the response type.
Voice Cloning
Voice cloning API allows you to create voice clones from audio, list all created clones, change their name and delete them, as well as using them with tts & vc endpoints.
The only notable difference is, that voiceType
option needs to be set to cloned
when doing the convert request:
{ voiceType: "cloned" }
For more, see the relevant code example: Full Voice Cloning Example
Examples
Our SDK comes with a bunch of examples to help you start working with it. Please, refer to the SDK's README page if you wish to run them.