Chapter 15. The Guacamole protocol

This chapter is an overview of the Guacamole protocol, describing its design and general use. While a few instructions and their syntax will be described here, this is not an exhaustive list of all available instructions. The intent is only to list the general types and usage. If you are looking for the syntax or purpose of a specific instruction, consult the protocol reference included with the appendices.

Design

The Guacamole protocol consists of instructions. Each instruction is a comma-delimited list followed by a terminating semicolon, where the first element of the list is the instruction opcode, and all following elements are the arguments for that instruction:

OPCODE,ARG1,ARG2,ARG3,...;

Each element of the list has a positive decimal integer length prefix separated by the value of the element by a period. This length denotes the number of Unicode characters in the value of the element, which is encoded in UTF-8:

LENGTH.VALUE

Any number of complete instructions make up a message which is sent from client to server or from server to client. Client to server instructions are generally control instructions (for connecting or disconnecting) and events (mouse and keyboard). Server to client instructions are generally drawing instructions (caching, clipping, drawing images), using the client as a remote display.

For example, a complete and valid instruction for setting the display size to 1024x768 would be:

4.size,1.0,4.1024,3.768;

Here, the instruction would be decoded into four elements: "size", the opcode of the size instruction, "0", the index of the default layer, "1024", the desired width in pixels, and "768", the desired height in pixels.

The structure of the Guacamole protocol is important as it allows the protocol to be streamed while also being easily parsable by JavaScript. JavaScript does have native support for conceptually-similar structures like XML or JSON, but neither of those formats is natively supported in a way that can be streamed; JavaScript requires the entirety of the XML or JSON message to be available at the time of decoding. The Guacamole protocol, on the other hand, can be parsed as it is received, and the presence of length prefixes within each instruction element means that the parser can quickly skip around from instruction to instruction without having to iterate over every character.

Handshake phase

The handshake phase is the phase of the protocol entered immediately upon connection. It begins with a "select" instruction sent by the client which tells the server which protocol will be loaded:

6.select,3.vnc;

After receiving the "select" instruction, the server will load the associated client support and respond with a list of accepted parameter names using an "args" instruction:

4.args,8.hostname,4.port,8.password,13.swap-red-blue,9.read-only;

After receiving the list of arguments, the client is required to respond with the list of supported audio, video, and image mimetypes, the optimal display size and resolution, and the values for all arguments available, even if blank. If any of these requirements are left out, the connection will close:

4.size,4.1024,3.768,2.96;
5.audio,9.audio/ogg;
5.video;
5.image,9.image/png,10.image/jpeg;
7.connect,9.localhost,4.5900,0.,0.,0.;

For clarity, we've put each instruction on its own line, but in the real protocol, no newlines exist between instructions. In fact, if there is anything after an instruction other than the start of a new instruction, the connection is closed.

Here, the client is specifying that the optimal display size is 1024x768 at 96 DPI and it supports Ogg Vorbis audio, but no video, and can accept both PNG and JPEG images. It wants to connect to localhost at port 5900, and is leaving the three other parameters blank.

Once these instructions have been sent by the client, the server will attempt to initialize the connection with the parameters received and, if successful, respond with a "ready" instruction. This instruction contains the ID of the new client connection and marks the beginning of the interactive phase. The ID is an arbitrary string, but is guaranteed to be unique from all other active connections, as well as from the names of all supported protocols:

5.ready,37.$260d01da-779b-4ee9-afc1-c16bae885cc7;

The actual interactive phase begins immediately after the "ready" instruction is sent. Drawing and event instructions pass back and forth until the connection is closed.

Joining an existing connection

Once the handshake phase has completed, that connection is considered active and can be joined by other connections if the ID is provided instead of a protocol name via the "select" instruction:

6.select,37.$260d01da-779b-4ee9-afc1-c16bae885cc7;

The rest of the handshake phase for a joining connection is identical. Just as with a new connection, the restrictions or features which apply to the joining connection are dictated by the parameter values supplied during the handshake.

Drawing

Compositing

The Guacamole protocol provides compositing operations through the use of "channel masks". The term "channel mask" is simply a description of the mechanism used while designing the protocol to conceptualize and fully enumerate all possible compositing operations based on four different sources of image data: source image data where the destination is opaque, source image data where the destination is transparent, destination image data where the source is opaque, and destination image data where the source is transparent. Assigning a binary value to each of these "channels" creates a unique integer ID for every possible compositing operation, where these operations parallel the operations described by Porter and Duff in their paper. As the HTML5 canvas tag also uses Porter/Duff to describe their compositing operations (as do other graphical APIs), the Guacamole protocol is conveniently similar to the compositing support already present in web browsers, with some operations not yet supported. The following operations are all implemented and known to work correctly in all browsers:

B out A (0x02)

Clears the destination where the source is opaque, but otherwise draws nothing. This is useful for masking.

A atop B (0x06)

Fills with the source where the destination is opaque only.

A xor B (0x0A)

As with logical XOR. Note that this is a compositing operation, not a bitwise operation. It draws the source where the destination is transparent, and draws the destination where the source is transparent.

B over A (0x0B)

What you would typically expect when drawing, but reversed. The source appears only where the destination is transparent, as if you were attempting to draw the destination over the source, rather than the source over the destination.

A over B (0x0E)

The most common and sensible compositing operation, this draws the source everywhere, but includes the destination where the source is transparent.

A + B (0x0F)

Simply adds the components of the source image to the destination image, capping the result at pure white.

The following operations are all implemented, but may work incorrectly in WebKit browsers which always include the destination image where the source is transparent:

B in A (0x01)

Draws the destination only where the source is opaque, clearing anywhere the source or destination are transparent.

A in B (0x04)

Draws the source only where the destination is opaque, clearing anywhere the source or destination are transparent.

A out B (0x08)

Draws the source only where the destination is transparent, clearing anywhere the source or destination are opaque.

B atop A (0x09)

Fills with the destination where the source is opaque only.

A (0x0C)

Fills with the source, ignoring the destination entirely.

The following operations are defined, but not implemented, and do not exist as operations within the HTML5 canvas:

Clear (0x00)

Clears all existing image data in the destination.

B (0x03)

Does nothing.

A xnor B (0x05)

Adds the source to the destination where the destination or source are opaque, clearing anywhere the source or destination are transparent. This is similar to A + B except the aspect of transparency is also additive.

(A + B) atop B (0x07)

Adds the source to the destination where the destination is opaque, preserving the destination otherwise.

(A + B) atop A (0x0D)

Adds the destination to the source where the source is opaque, copying the source otherwise.

Image data

The Guacamole protocol, like many remote desktop protocols, provides a method of sending an arbitrary rectangle of image data and placing it either within a buffer or in a visible rectangle of the screen. Raw image data in the Guacamole protocol is streamed as PNG, JPEG, or WebP data over a stream allocated with the "img" instruction. Depending on the format used, image updates sent in this manner can be RGB or RGBA (alpha transparency) and are automatically palettized if sent using libguac. The streaming system used for image data is generalized and used by Guacamole for other types of streams, including audio and file transfer. For more information about streams in the Guacamole protocol, see the section called “Streams and objects”.

Image data can be sent to any specified rectangle within a layer or buffer. Sending the data to a layer means that the image becomes immediately visible, while sending the data to a buffer allows that data to be reused later.

Copying image data between layers

Image data can be copied from one layer or buffer into another layer or buffer. This is often used for scrolling (where most of the result of the graphical update is identical to the previous state) or for caching parts of an image.

Both VNC and RDP provide a means of copying a region of screen data and placing it somewhere else within the same screen. RDP provides an additional means of copying data to a cache, or recalling data from that cache and placing it on the screen. Guacamole takes this concept and reduces it further, as both on-screen and off-screen image storage is the same. The Guacamole "copy" instruction allows you to copy a rectangle of image data, and place it within another layer, whether that layer is the same as the source layer, a different visible layer, or an off-screen buffer.

Graphical primitives

The Guacamole protocol provides basic graphics operations similar to those of Cairo or the HTML5 canvas. In many cases, these primitives are useful for remote drawing, and desirable in that they take up less bandwidth than sending corresponding PNG images. Beware that excessive use of primitives leads to an increase in client-side processing, which may reduce the performance of a connected client, especially if that client is on a lower-performance machine like a mobile phone or tablet.

Buffers and layers

All drawing operations in the Guacamole protocol affect a layer, and each layer has an integer index which identifies it. When this integer is negative, the layer is not visible, and can be used for storage or caching of image data. In this case, the layer is referred to within the code and within documentation as a "buffer". Layers are created automatically when they are first referenced in an instruction.

There is one main layer which is always present called the "default layer". This layer has an index of 0. Resizing this layer resizes the entire remote display. Other layers default to the size of the default layer upon creation, while buffers are always created with a size of 0x0, automatically resizing themselves to fit their contents.

Non-buffer layers can be moved and nested within each other. In this way, layers provide a simple means of hardware-accelerated compositing. If you need a window to appear above others, or you have some object which will be moving or you need the data beneath it automatically preserved, a layer is a good way of accomplishing this. If a layer is nested within another layer, its position is relative to that of its parent. When the parent is moved or reordered, the child moves with it. If the child extends beyond the parents bounds, it will be clipped.

Streams and objects

Guacamole supports transfer of clipboard contents, audio, video, and image data, as well as files and arbitrary named pipes.

Streams are allocated directly with instructions that associate the new stream with particular semantics and metadata, such as the "audio" or "video" instructions used for playing media, the "file" instruction used for file transfer, and the "pipe" instruction for transfer of completely arbitrary data between client and server. In some cases, the availability and semantics of streams may be explicitly advertised using structured sets of named streams known as "objects".

Once a stream is allocated, data is sent along the stream in chunks using "blob" instructions, which may be acknowledged by the receiving end by "ack" instructions. The end of the stream is finally signalled with an "end" instruction.

Events

When something changes on either side, client or server, such as a key being pressed, the mouse moving, or clipboard data changing, an instruction describing the event is sent.

Disconnecting

The server and client can end the connection at any time. There is no requirement for the server or the client to communicate that the connection needs to terminate. When the client or server wish to end the connection, and the reason is known, they can use the "disconnect" or "error" instructions.

The disconnect instruction is sent by the client when it is disconnecting. This is largely out of politeness, and the server must be written knowing that the disconnect instruction may not always be sent in time (guacd is written this way).

If the client does something wrong, or the server detects a problem with the client plugin, the server sends an error instruction, including a description of the problem in the parameters. This informs the client that the connection is being closed.