How to create a dialogue system using OpenAI and Charactr API?
The main idea is to send three requests:
- sending an audio sample to Whisper (ASR) to convert speech to text,
- then sending this text to ChatGPT to get a text response,
- finally sending this response to gemelo.ai TTS to convert text to speech and get a voice response.
This is one iteration of the dialogue. To have a longer conversation, we can repeat these three steps many times and collect text responses to keep context. We can implement such a dialogue system in a few small steps:
- First install the
charactr-api-sdk
library to be able to use gemelo.ai API for gemelo.ai TTS andopenai
library to be able to use OpenAI API for Whisper and ChatGPT.
pip install charactr-api-sdk openai
Install also other required libraries.
pip install requests ipywebrtc ipython
- Load all required libraries.
import json
from typing import Dict, List
import IPython.display
import openai
import requests
from charactr_api import CharactrAPISDK, Credentials
from ipywebrtc import AudioRecorder, CameraStream
- Set up your keys for OpenAI and gemelo.ai API so that you can connect to them with your accounts.
openai_api_key = 'xxxx'
charactr_client_key = 'yyyy'
charactr_api_key = 'zzzz'
openai.api_key = openai_api_key
- Create
CharactrAPISDK
instance using your gemelo.ai keys. It allows you to use TTS and choose a voice.
credentials = Credentials(client_key=charactr_client_key, api_key=charactr_api_key)
charactr_api = CharactrAPISDK(credentials)
- Check the list of available voices to choose one of them.
charactr_api.tts.get_voices()
You get the list of available voices as output. Choose a voice and set up voice_id
variable - this voice will be used to generate voice responses.
voice_id = 136
- Set up a model and other parameters for ChatGPT. The list of all parameters and their explanations you can find here: https://platform.openai.com/docs/api-reference/chat/create
model = 'gpt-3.5-turbo'
parameters = {
'temperature': 0.8,
'max_tokens': 150,
'top_p': 1,
'presence_penalty': 0,
'frequency_penalty': 0,
'stop': None
}
Using the above parameters, define a function that generates a text response with ChatGPT based on history of conversation (context) and a current request. It updates the conversation.
def update_conversation(request: str, conversation: List[Dict]) -> None:
"""
Run a request to ChatGPT to get a response
and update the conversation.
"""
try:
user_request = {'role': 'user', 'content': request}
conversation.append(user_request)
result = openai.ChatCompletion.create(model=model,
messages=conversation,
**parameters)
response = result['choices'][0]['message']['content'].strip()
bot_response = {'role': 'assistant', 'content': response}
conversation.append(bot_response)
except Exception as e:
raise Exception(e)
Next describe a bot and set up conversation
variable.
conversation = [{'role': 'system', 'content': 'You are AI Assistant.'}]
- Define a function that runs a request to Whisper to convert speech to text. The list of all parameters and their explanations you can find here: https://platform.openai.com/docs/api-reference/audio
def speech2text(audio_path: str) -> str:
"""Run a request to Whisper to convert speech to text."""
try:
with open(audio_path, 'rb') as audio_f:
result = openai.Audio.transcribe('whisper-1', audio_f)
text = result['text']
except Exception as e:
raise Exception(e)
return text
- Create an audio recorder in a notebook.
camera = CameraStream(constraints={'audio': True, 'video': False})
recorder = AudioRecorder(stream=camera)
recorder
Use record
button within a notebook to record your request. Then save your recording.
temp_path = 'recording.webm'
with open(temp_path, 'wb') as f:
f.write(recorder.audio.value)
- Generate a voice response with Whisper, ChatGPT and gemelo.ai TTS that answers your recording.
# convert the recorded audio to text
input_text = speech2text(temp_path)
# generate a text response and update conversation
update_conversation(input_text, conversation)
# convert a text response to speech
tts_result = charactr_api.tts.convert(voice_id, conversation[-1]['content'])
It returns Audio
object that contains fields: data
, type
, duration_ms
, size_bytes
. To listen to the output voice response in a notebook, run:
IPython.display.Audio(tts_result['data'])
You can also save the output voice response as a file.
with open('output.wav', 'wb') as f:
f.write(tts_result['data'])
To continue the conversation, repeat steps 8 and 9.