How ChatGPT streams responses back to the user

And learn how you can build it too using server-sent events (SSE).

Feb 23, 2023

Note October 2024: This is an old post published shortly after ChatGPT launched and many ideas found here are no longer applicable. If you’re curious about implementing a similar mechanism, you can check out better guides out there, such as Vercel’s AI hooks system.

I was curious to learn how ChatGPT achieves its “streaming” effect when it sends responses from the model back to the user. I’m referring to the cursor-driven effect that neatly “types” the model’s response in the chat interface:

Example of ChatGPT “streaming” a response back to the user.

Like any other practical hacker, I started by investigating the network requests tab in Chrome in order to better understand what’s happening when a user sends a message to the OpenAI server and ChatGPT responds. I expected to see some sort of simple HTTP requests or even a web-socket, but I was surprised to see neither.

Example of typical ChatGPT network requests.

As you can see in the image above, ChatGPT’s frontend communicates with the following APIs:

/auth/session: Manages basic authentication.
conversation/<unique conversation ID>: An endpoint that accepts HTTP GET requests and returns an existing conversation history.
/conversations?offset=0&limit=20: A paginated HTTP GET endpoint to retrieve a set of past conversations and their metadata.
/models: Returns the name and id of the models used with the current.
/conversation: An endpoint that accepts HTTP Post requests from the user with the latest message, a conversation_id, the model being used, and a parent_message_id.

These APIs look pretty standard, but where is the response from the model? Given these network requests, I expected the request to /conversation to return an answer, but that was not the case. Notice how there is no “response” section in the selected request in the “screenshot above”.

This surprised me, so I assumed that ChatGPT is using web-sockets instead. However, when I looked for the web-socket-related network requests in Chrome, I found nothing. This started getting really confusing.

I started examining the response from each endpoint hoping to find something useful. I even considered that the server was returning responses in a base64 encoding, but I could find nothing that actually contained the response!

Confused… No web-sockets and no responses from the server…

Server-sent events (SSE)

After about 15 more minutes of Googling and poking around, I realized that the ChatGPT server uses something called “server-sent events” (SSE) to send data back to the client. What tipped me off was the “EventStream” section in the /conversation endpoint. While the EventStream section was empty, it made me consider that something else was at play, and after a couple of searches, I learned that there was another way to send data to the client that did not include regular HTTP responses or web-sockets.

SSE is a very simple concept. It’s a one-way real-time connection from the server to the client that allows the server to push data to the client. This is similar to web sockets, but without being bidirectional (i.e. clients can’t respond). I had never heard of SSE before, but it looked like a very handy piece of web technology, so I decided to learn how to use it.

Setting up server-sent events is fairly easy. Since SSE is part of the HTTP standard, you don’t need a lot of tooling. You just need to add a listener on the front-end and a backend endpoint to send events from.

Here’s what the front-end code looks like for a basic implementation:

// JavaScript

// Create an EventSource linked to your page / endpoint.
var source = new EventSource("/sse");

// Add event listeners for "publish" and "error"
source.onmessage = (event) => {
    console.log(`event: ${event.data}`);
};

Setting up the backend code is equally simple. Here’s an example using Python and Flask:

# Python and Flask Pseudocode

@route("/sse")
def stream():
    def eventStream():
        while True:
            # Poll data from the database
            # and see if there's a new message
            if len(messages) > len(previous_messages):
                yield "data: 
                    {}\n\n".format(messages[len(messages)-1)])"
    
    return Response(eventStream(), mimetype="text/event-stream")

That’s it! Now you can use SSE too, like ChatGPT.

Shreyam Dutta Gupta

May 17·edited May 17

Hey great post. I was curious about this too, and started exploring the network tab. They definitely use websockets. Make sure you click on preserve logs. Open a new tab without opening chatgpt yet, then go to inspect and make sure your preserve logs checkbox is checked. Now open chatgpt. You'll see 3 WS connections.

But you are right on the SSE & one direction communication I guess because that's what confused me lot and started googling and found your article. In the /ws logs, I don't see the stream of token. Not not even on XHR/Fetch. I see a POST request with my questions, I see incoming streaming of tokens on /ws, which didn't have the tokens explicitly. Not even encoded ones! I'm so confused

Expand full comment

2 replies by Theodor Marcu and others

Harshit Gangwar

Jun 8

Hi Theodor.

Loved your blog and explaination.

Can you please do a deep dive into websocket implementation too?

Side note: I am using enteprise chatgpt which is using HTTP/1.1. Here I am still seeing Server side streaming being used, while on personal chatgpt I think they are uisng Websockets with HTTP3. I tried a lot but not able to see the message chunks being received in network tab.

7 more comments...

Theodor’s Blog

Discussion about this post