The journey of documenting a Socket.IO API (Pt 2)

Dimitrios Dedoussis

·14 min read

This post originally appeared on https://dedouss.is

In the opening part of this series we outlined the basics of Socket.IO and discussed the importance of documenting Socket.IO APIs. Now it’s time to bring AsyncAPI into play.

In this post we’re going to cover:

Modelling the Socket.IO protocol using AsyncAPI

Don’t let the title of this section intimidate you. This modelling exercise ended up being relatively straightforward and I think it makes a great example of how AsyncAPI was designed to fit any event-driven protocol. If you are not interested in the thought process behind this exercise, you may jump straight to the Summary paragraph of this section, which presents the solution.

I will approach this problem by traversing the AsyncAPI object structure, attempting to map each of the objects to a semantic of the Socket.IO client API.

The root object of the specification is the AsyncAPI Object. The fields of this object that require special attention are channels and servers.

Channels

The Channels Object is a map structure that relates a channel path (relative URI) to a Channel Item Object.

1channels:
2  /: {} # Channel Item Object
3  /admin: {} # Channel Item Object

Channels are addressable components where messages/events flow through. The specification suggests that a server may support multiple channel instances enabling an application to separate its concerns. This sounds very much like the definition of the Socket.IO namespace. Namespaces are indeed addressable components that follow the relative URI convention. Since Socket.IO supports multiplexing, a client may emit messages to multiple namespaces over a single shared connection. However, it could also force a separate connection per namespace (using the forceNew option). Thus, a Socket.IO namespace could either be a virtual or physical channel.

Given that connections are established on the namespace level, the Channel Item Object is the only object of the specification that MAY include bindings. For a Socket.IO API, the Channel Bindings Object should only contain the ws field, in which one can specify the handshake context (HTTP headers and query params) that a client should provide when connecting to that particular channel/namespace.

1channels:
2  /:
3    publish: {} # Operation object - Ignore this for now
4    subscribe: {} # Operation object - Ignore this for now
5    bindings:
6      ws:
7        query:
8          type: object
9          properties:
10            token:
11              type: string
12          required: [token]

Since a single connection (and thus binding) is going to be used across multiple channels, there is no need to repeat the same bindings object under each channel/namespace. We can introduce the convention of always including bindings under the main (/) namespace but omitting them under the custom ones. At this point I would also like to propose the following bonus semantic: If a custom namespace includes bindings, then the client should always force a new connection when connecting to it.

You have probably noticed that I chose to stick to the WebSockets Channel Binding as the only possible binding that a Socket.IO API may define. One could ask why not use an HTTP Channel Binding object alongside the WebSockets one, since the protocol could also be implemented via HTTP long-polling. There are 2 answers to this question:

  1. The current latest version of the AsyncAPI bindings specifications does not allow HTTP bindings to be defined at the channel level.
  2. The HTTP long-polling implementation of Socket.IO is essentially a pseudo WebSocket. It is implemented in such a way to resemble the WebSocket implementation. The same HTTP headers and query params are sent to the server no matter the transport mechanism.

Hence, it is safe to use the ws bindings even for the HTTP long-polling fallback. However, in an ideal world, we would have AsyncAPI supporting SocketIO bindings through an explicit socketio field. In fact, I have created a github issue to pitch this proposal.

Along with bindings, the Channel Item Object includes the publish and subscribe fields, in which one defines the operations that a namespace supports. The publish Operation Object lists all the possible events that the client may emit (socket.emit), while the subscribe operation defines the events that the client may listen to (socket.on).

A Socket.IO event can be expressed using the Message Object, where the name field describes the eventName and the payload field describes the schema of the args that the client passes as part of the socket.emit invocation: socket.emit(eventName[, …args][, ack]). For subscribe events, payload defines the structure of the arguments that the event handler callback expects: socket.on(eventName, (...args) => {}).

The structure of the payload value depends on the number of arguments expected:

ScenarioSender-side codePayload value structureAsyncAPI Message Object
No args expectedsocket.emit("hello")n/a — Payload field should be omitted
name: hello
Single arg expectedsocket.emit("hello", {foo: “bar”})Any type other than tuple
name: hello
payload:
  type: object
  properties:
    foo:
      type: string
Multiple args expectedsocket.emit("hello", {foo: “bar”}, 1)Tuple type
name: hello
payload:
  type: array
  prefixItems:
  - type: object
      properties:
        foo:
          type: string
  - type: number

To account for multiple events (Message Objects) per namespace, the message field of each Operation Object allows the oneOf array structure. For example, in the message of the publish operation of the /admin namespace, the oneOf array lists all the available eventName and args payload pairs that a client can pass to the adminNamespace.emit call:

1channels:
2  /admin:
3    publish:
4      message:
5        oneOf:
6          - $ref: "#/components/messages/MessageOne"
7          - $ref: "#/components/messages/MessageTwo"

Now, let’s move on to the acknowledgement semantics of the protocol: The basic unit of information in the Socket.IO protocol is the packet. There are 7 distinct packet types. The payloads of the publish and subscribe Message Objects described above correspond to the EVENT and BINARY_EVENT packet types. These are essentially the packets that are transmitted when the Socket.IO sender invokes the emit API function of the Socket.IO library (regardless of implementation). In turn, the Socket.IO event receiver handles the received event using the on API function of the Socket.IO library. As part of the on handler, the receiver may choose to return an acknowledgement of the received message. This acknowledgement is conveyed back to the sender via the ACK and BINARY_ACK packet types. The ack data is passed as input to the callback that the message sender has provided through the emit invocation.

Socket.IO ack sequence diagram

Socket.IO ack sequence diagram

In order to express the above semantics, the Message Object (eventName and args payload pair) should be linked to an optional acknowledgement object. Since the specification in its current form does not support such a structure, I am proposing the following Specification Extension:

  • Message Objects MAY include the x-ack field. The value of this field SHOULD be a Message Ack Object.
  • Components Object MAY include the x-messageAcks field. The value of this field should be of type: Map[string, Message Ack Object | Reference Object].

Message Ack Object

Field NameTypeDescription
argsSchema ObjectSchema of the arguments that are passed as input to the acknowledgement callback function. In the case of multiple arguments, use the array type to express the tuple.

In the case of a publish message, the x-ack field informs the client that it should expect an acknowledgement from the server, and that this acknowledgement should adhere to the agreed schema. Likewise, for subscribe messages the x-ack field encourages the client to send a structured acknowledgement, for each message it receives.

Servers

The Servers Object is – surprise surprise – a map of Server Objects. Each Server Object contains a url field from which the client may infer the custom path to the Socket.IO server. This custom path should then be provided via the path option upon the initialisation of the Socket.IO connection manager, alongside the url arg. The protocol field of the Server Object is also required, and specifies the scheme part of that url arg. Its value should equal any of the ws, wss, http or https protocols. For a Socket.IO client, it does not really matter whether the scheme is http or ws, due to the upgrade mechanism. Thus, for Socket.IO APIs, the only purpose of the protocol field is to indicate the use (or absence) of SSL.

Summary

We made it to the end of the modelling exercise the outcome of which is the following table, relating Socket.IO semantics to AsyncAPI structures.

Socket.IOAsyncAPI
NamespaceChannel (described through the Channel Item Object)
IO optionsWebSockets Channel Binding
namespaceSocket.emit(eventName[, …args][, ack])Operation Object defined under the publish field of a Channel Item Object. The available eventName & args pairs for this emit invocation are listed under the message field, through the oneOf array structure.
namespaceSocket.on(eventName, callback)Operation Object defined under the subscribe field of a Channel Item Object. The available eventName & callback argument pairs for this on invocation are listed under the message field, through the oneOf array structure.
EventMessage (described through the Message Object)
eventNameThe name field of the Message Object)
Event argsThe payload field of the Message Object
ackThe x-ack field of the Message Object. Requires an extension of the specification. The field may be populated for both publish and subscribe messages.
Custom path (path option)The url field of the Server Object
Use of TLS (regardless of transport mechanism)The protocol field of the Server Object

In practice

With the modelling exercise out of the way, I’m now going to guide you through the process of creating an AsyncAPI spec from scratch given an existing Socket.IO API. For the purposes of this simple tutorial, let’s use this minimal chat application, which is one of the get-started demos featured in the Socket.IO website.

Below is the source of our Socket.IO server:

1// Setup basic express server
2const express = require("express");
3const app = express();
4const path = require("path");
5const server = require("http").createServer(app);
6const io = require("socket.io")(server);
7const port = process.env.PORT || 3000;
8
9server.listen(port, () => {
10  console.log("Server listening at port %d", port);
11});
12
13// Chatroom
14let numUsers = 0;
15
16io.on("connection", (socket) => {
17  let addedUser = false;
18
19  // when the client emits 'new message', this listens and executes
20  socket.on("new message", (data) => {
21    // we tell the client to execute 'new message'
22    socket.broadcast.emit("new message", {
23      username: socket.username,
24      message: data,
25    });
26  });
27
28  // when the client emits 'add user', this listens and executes
29  socket.on("add user", (username, cb) => {
30    if (addedUser) {
31      cb({ error: "User is already added" });
32      return;
33    }
34
35    // we store the username in the socket session for this client
36    socket.username = username;
37    ++numUsers;
38    addedUser = true;
39    socket.emit("login", {
40      numUsers: numUsers,
41    });
42    // echo globally (all clients) that a person has connected
43    socket.broadcast.emit("user joined", {
44      username: socket.username,
45      numUsers: numUsers,
46    });
47    cb({ error: null });
48  });
49
50  // when the client emits 'typing', we broadcast it to others
51  socket.on("typing", () => {
52    socket.broadcast.emit("typing", {
53      username: socket.username,
54    });
55  });
56
57  // when the client emits 'stop typing', we broadcast it to others
58  socket.on("stop typing", () => {
59    socket.broadcast.emit("stop typing", {
60      username: socket.username,
61    });
62  });
63
64  // when the user disconnects.. perform this
65  socket.on("disconnect", () => {
66    if (addedUser) {
67      --numUsers;
68
69      // echo globally that this client has left
70      socket.broadcast.emit("user left", {
71        username: socket.username,
72        numUsers: numUsers,
73      });
74    }
75  });
76});
77
78// Admin
79
80io.of("/admin").on("connection", (socket) => {
81  let token = socket.handshake.query.token;
82  if (token !== "admin") socket.disconnect();
83
84  socket.emit("server metric", {
85    name: "CPU_COUNT",
86    value: require("os").cpus().length,
87  });
88});

I’ve slightly tweaked the original source located at https://github.com/socketio/socket.io/tree/master/examples/chat to include acknowledgments and bindings, so that I can showcase the full spectrum of the AsyncAPI specification.

Let’s start by defining the version of the specification as well as the info object which provides metadata about the service:

1asyncapi: 2.2.0
2
3info:
4  title: Socket.IO chat service
5  version: 1.0.0
6  description: |
7    This is one of the get-started demos listed in the socket.io website: https://socket.io/demos/chat/

Moving on to the servers section, where one should provide connectivity information for all the instances of their service. In the case of our simple chat application, there is only one demo server accessible at socketio-chat-h9jt.herokuapp.com:

1servers:
2  demo:
3    url: socketio-chat-h9jt.herokuapp.com/socket.io
4    protocol: wss

Things get a bit more interesting when it comes to channels. Skimming through the server code we find 2 namespace instances (default and /admin), which means that the channel mapping should consist of 2 entries:

1channels:
2  /: {}
3  /admin: {}

Within each namespace connection block, there are multiple socket.on, and socket.emit references. For each unique reference, we need to append a Message Object under the publish and subscribe operations respectively:

1channels:
2  /:
3    publish:
4      message:
5        oneOf:
6          - $ref: "#/components/messages/NewMessage"
7          - $ref: "#/components/messages/Typing"
8          - $ref: "#/components/messages/StopTyping"
9          - $ref: "#/components/messages/AddUser"
10    subscribe:
11      message:
12        oneOf:
13          - $ref: "#/components/messages/NewMessageReceived"
14          - $ref: "#/components/messages/UserTyping"
15          - $ref: "#/components/messages/UserStopTyping"
16          - $ref: "#/components/messages/UserJoined"
17          - $ref: "#/components/messages/UserLeft"
18          - $ref: "#/components/messages/LogIn"
19  /admin:
20    subscribe:
21      message: # No need to use `oneOf` since there is only a single event
22        $ref: "#/components/messages/ServerMetric"

From the server code, we can also see that the connection handler of the admin namespace applies some very sophisticated authorization based on the token query parameter. The spec should hence document that the API requires the presence of a valid token query param upon the handshake:

1channels:
2  /:
3    publish:
4      # ...
5    subscribe:
6      # ...
7  /admin:
8    subscribe:
9      # ...
10    bindings:
11      $ref: "#/components/channelBindings/AuthenticatedWsBindings"

Putting everything together into a single document:

1asyncapi: 2.2.0
2
3info:
4  title: Socket.IO chat demo service
5  version: 1.0.0
6  description: |
7    This is one of the get-started demos presented in the socket.io website: https://socket.io/demos/chat/
8
9servers:
10  demo:
11    url: socketio-chat-h9jt.herokuapp.com/socket.io
12    protocol: wss
13
14channels:
15  /:
16    publish:
17      message:
18        oneOf:
19          - $ref: "#/components/messages/NewMessage"
20          - $ref: "#/components/messages/Typing"
21          - $ref: "#/components/messages/StopTyping"
22          - $ref: "#/components/messages/AddUser"
23    subscribe:
24      message:
25        oneOf:
26          - $ref: "#/components/messages/NewMessageReceived"
27          - $ref: "#/components/messages/UserTyping"
28          - $ref: "#/components/messages/UserStopTyping"
29          - $ref: "#/components/messages/UserJoined"
30          - $ref: "#/components/messages/UserLeft"
31          - $ref: "#/components/messages/LogIn"
32  /admin:
33    subscribe:
34      message: # No need to use `oneOf` since there is only a single event
35        $ref: "#/components/messages/ServerMetric"
36    bindings:
37      $ref: "#/components/channelBindings/AuthenticatedWsBindings"
38
39components:
40  messages:
41    NewMessage:
42      name: new message
43      payload:
44        type: string
45    Typing:
46      name: typing
47    StopTyping:
48      name: stop typing
49    AddUser:
50      name: add user
51      payload:
52        type: string
53      x-ack: # Documents that this event is always acknowledged by the receiver
54        args:
55          type: object
56          properties:
57            error:
58              type: [string, "null"]
59    NewMessageReceived:
60      name: new message
61      payload:
62        type: object
63        properties:
64          username:
65            type: string
66          message:
67            type: string
68    UserTyping:
69      name: typing
70      payload:
71        type: object
72        properties:
73          username:
74            type: string
75    UserStopTyping:
76      name: stop typing
77      payload:
78        type: object
79        properties:
80          username:
81            type: string
82    UserJoined:
83      name: user joined
84      payload:
85        type: object
86        properties:
87          username:
88            type: string
89          numUsers:
90            type: integer
91    UserLeft:
92      name: user left
93      payload:
94        type: object
95        properties:
96          username:
97            type: string
98          numUsers:
99            type: integer
100    LogIn:
101      name: login
102      payload:
103        type: object
104        properties:
105          numUsers:
106            type: integer
107    ServerMetric:
108      name: server metric
109      payload:
110        type: object
111        properties:
112          name:
113            type: string
114          value:
115            type: number
116
117  channelBindings:
118    AuthenticatedWsBindings:
119      ws:
120        query:
121          type: object
122          properties:
123            token:
124              type: string
125          required: [token]

The modified server source code is pushed at https://github.com/dedoussis/asyncapi-socket.io-example, along with the above AsyncAPI spec, which can be viewed using the AsyncAPI playground.

Note that there is no point in documenting the reserved events since all Socket.IO APIs support these by default.

Asynction

In parallel to this exercise I have been developing Asynction, a Socket.IO python framework that is driven by the AsyncAPI specification. Asynction is built on top of Flask-Socket.IO and inspired by Connexion. It guarantees that your API will work in accordance with its documentation. In essence, Asynction is to AsyncAPI and Flask-SocketIO, what Connexion is to OpenAPI and Flask.

In this example, I forked the minimal chat application that we documented above and re-implemented the server in python, using Asynction. Be mindful of the x-handler and x-handlers extensions that have been introduced to relate AsyncAPI entities (such as message or channel objects) to python callables (event handlers).

You may find extensive documentation of Asynction at: https://asynction.dedouss.is

The framework is still at a beta stage, so please get in touch before using it in a production setup.

Any piece of feedback would be much appreciated.

The end

For any questions, comments, or corrections, feel free to reach out to me at dimitrios@dedouss.is.

A special shout out to derberq, alequetzalli, and the wider AsyncAPI community for being particularly helpful and responsive. 🙇

Photo by Matt Howard on Unsplash