Building a ChatGPT-based AI Assistant with Python using OpenAI APIs
ChatGPT has unveiled a world of possibilities lying ahead for us in the age of AI. The rate of adoption of ChatGPT rattled giants like Google. In the rising generation of AI, we will witness paradigm shifts and the far-reaching consequences of AI in almost all spheres of human lives. But this age will also allow us to leverage AI to improve human lives.
As scary as it can get for some people, ChatGPT can be leveraged in fun new ways. This article is a practical demonstration of one of the ways we can leverage the power of ChatGPT, NLP, STT and TTS :).
This article demonstrates a workflow for integrating multiple AI services to perform speech-to-text (STT), natural language processing (NLP), and text-to-speech (TTS) using OpenAI’s ChatGPT and Whisper APIs in Python.
Table of Contents
- The OpenAI APIs
- Setting Up
- Recognize the Speech
- Listen to the Whispers
- The Completions
- Speak Up
- The Assembly Line
The OpenAI APIs
OpenAI exposes a set of APIs to interact with its GPT models. For example, earlier this month, they launched an API endpoint for chat completions with the ChatGPT model. This opens up a wide variety of applications. For instance, we can directly call and get responses from the ChatGPT model and embed its answers in our applications OpenAI has recently launched the Whisper APIs used to convert Speech to Text.
In this article, we will leverage the chat completions API using the ChatGPT model and Whisper API to convert speech to text.
Set up shop properly. To get the job done.
Our script starts by declaring a few variables like the
openaiurl and the
openai_token for setting up a connection to the OpenAI API using an API token stored in the environment variable
OPENAI_API_TOKEN. After that, the script exits with an error code if the token is not set.
The API token is passed into the request in the form of an
Authorization header, as shown in the code below:
Recognize the Speech
The spoken word and its absence are essential. In many ways.
The intent is to leverage the microphones present on our pristine computational machines. We capture the spoken word into a
wav format audio file.
The script then prompts the user to speak into their microphone and records the audio using the
SpeechRecognition library. Finally, the audio is saved to a WAV file in the
Listen to the Whispers
Whisper is powerful. There is a reason it can’t be said out loud.
We intend to leverage the OpenAI Whisper API to transcribe our audio file to text. The model used by OpenAI to perform transcriptions is labelled
Our script sends a
POST request to the Whisper API with the audio file as data. The API performs Speech-To-Text(STT) on the audio and returns the transcribed text.
We get a response containing a
text key with the transcription value.
We all need completions. I meant closures.
Here we will call the OpenAI chat completions endpoint with the transcribed text. To use the ChatGPT model, we pass the model’s name
The script sends another
POST request to the OpenAI API with the transcribed text as data. The API uses the ChatGPT model to perform NLP on the text and returns a response.
Finally, we have the response from ChatGPT.
We have to Speak up. At some point.
Now we need to play the response back to the user. So, this is how we do it.
Our script uses the
pyttsx3 library to convert the NLP response to speech and plays the audio output through the user’s speakers.
We have all experienced text-based interactions with ChatGPT. Our approach here is more of a conversational nature, like Alexa. We can easily extend our script to involve going deeper by asking follow-up questions.
The Assembly Line
All complex things are assembled. Sometimes in unlikely places.
Libraries feed curious minds. The ones with books.
I performed this setup on a MacBook Pro. To get the audio recording to work, I had to install the following:
The following libraries are used in the code. Install them using the following commands:
Knowing how to use things is a talent. Knowing when timing.
To use the script, we need to run it. So be sure to install the dependencies first.
After running the script, we get a message on the terminal stating, “Say Something!”. When the prompt hit the terminal, we must come up with a question and speak up. A pause is detected to stop the recording.
After that, the recording gets stored as a
wav file. The audio file is then passed on to the OpenAI Whisper transcription API. Finally, the API returns with a transcription response.
The transcribed response is used to query the OpenAI chat completion API leveraging the ChatGPT model. Finally, the endpoint returns a JSON response with the ChatGPT response.
Finally, this text response is converted to text-to-speech (TTS) audio, which you can hear from the speakers.
Conclusions are an invitation to continue. That’s how I see them.
This code demonstrates how multiple AI services can be integrated to create a more complex application. In this example, the user’s spoken input is transcribed to text using STT, analyzed using NLP, and the response is converted to speech using TTS. This workflow can be adapted and extended to create more sophisticated applications with many use cases. You can find the code for this article in the GitHub repo The Speaking ChatGPT.
This article is a dedication to my boy Mursaleen (Mursi) on his 2nd birthday. Happy Birthday! 🎉