Coordinates and Transforms in Wayland

People have asked on IRC and other places multiple times about all the different transforms involved in a Wayland compositor. Unfortunately, there are a lot of coordinate spaces and transforms to wrap your head around. In order to try and help people better understand them all, I sent the following as a long e-mail to wayland-devel and have also put it here.

DRM/KMS:

The DRM backend gets a list of modes from the kernel which is the list of modes the kernel thinks the physical monitor and display hardware can handle. There may be other modes that your hardware can do. For instance, most hardware combinations can at least fake 640x480, 800x600, and 1024x768 even if they don’t advertise it. We also have a way in the weston config file to add modelines if there’s something you know your hardware supports but the kernel doesn’t. The important thing here is that these are all physical modes in terms of physical pixels. If you have an HD monitor, it will have a mode of 1920x1080 but it will not have the mode 1080x1920. While weston can render transformed at 1080x1920, that’s not a mode the monitor supports natively. You have to render sideways and then physically turn the monitor sideways in order to get that. For the sake of this discussion, we will call the modes the hardware can actually do “physical modes” and call the ones that weston can do by transformed rendering “virtual modes”.

There are a variety of ways that weston can get itself into a “virtual mode”. The first and most obvious is by means of a 90 or 270-degree tranform. In this case, weston can take on any of the virtual modes you get by taking a physical mode and swapping width and height. The way these modes work is by weston rendering sideways and assuming you’ve physically turned your monitor sideways. In this case, weston is still rendering into a 1920x1080 buffer (that matches the physical mode) but, since it’s rendering sideways, your desktop is 1080x1920. The point here is that the physical mode hasn’t changed and weston’s buffer is still the size of the physical mode. The only thing that’s changed is the way we render.

Another subtle way that we can get a different physical mode, is to apply a scaling factor. My laptop screen has a resolution of 3200x1800. Since that’s way too dense for any normal human to read, a 2x scaling factor is applied. This gives weston the virtual mode of 1600x900 which is much more readable. If I also set a transform of 90 degrees (which would be silly on a laptop), I get the virtual mode of 900x1600. Again, the physical mode is still 3200x1800, but the virtual mode changes.

Coordinate spaces:

At this point, we need to talk about the different coordinate spaces that are in play at any given moment. The most obvious is “physical device coordinates”. These are the physical pixels on the physical monitor starting at the uppler-left corner of the monitor when it’s not rotated. The monitor’s physical mode describes the size of or physical device coordinate space.

Second, is what weston calls “global coordinates”. From the perspective of the desktop user, all of his windows live on some global surface that may, depending on his monitor configuration, have a funny shape. Your windows sit on the global global surface and you can drag them around on it. A window can be on one monitor and then dragged onto another or it can sit half-way in-between. Each physical output device maps to some rectangle in your global surface. Many times, desktop monitor configuration GUIs (such as the one in Windows XP and later) will show you how the monitors relate to each other by drawing their rectangles.

The way that physical devices map to global coordinates is governed by their position, transform, and scaling factor. A 1920x1080 monitor that has a transform of 90 degrees applied will take up a 1080x1920 rectangle in global coordiates A 3200x1800 monitor with a 2x scaling factor will take up a 1600x900 rectangle in global coordinates. All of this happens (mostly) transparently to the app.

Next, we have surface coordinates. This is the coordinate system that the client sees. Surface coordinates are usually the same as global coordinates only with the origin moved to the upper-left corner of the client’s surface. The client doesn’t know where it’s window is on the global surface, so we have to adjust the origin. The reason why I say “usually” is that we are, after all, in a composited world and the compositor is free to rotate the window, wrap it around a sphere, or make it wobbly. In these cases, the clients coordinates won’t directly map to global coordinates but will, instead, have to be transformed somehow. However, for the usual case of a window sitting normally on your desktop, surface coordinates are just global coordinates with a different origin.

Finally, we have buffer coordinates. This is the coordinate system of the rectangular array of pixels that the client is rendering into. By default this is the same as surface coordinates, but there are several things that can make them diverge. For instance, we have the crop+scale extensions that allow you to render into a 720x480 buffer and then scale it up to a 1920x1080 surface. This is useful for DVD players, for instance, because it lets them render at the native resolution of the DVD and let the compositor figure out how to scale it. The other way that surface and buffer coordinates can diverge is through wl_surface.set_buffer_transform and wl_surface.set_buffer_scale. More on this in a moment.

Weston:

Ok, now that we have this pile of coordinate systems, what does weston, or any other compositor for that matter, do with them. Let’s take things one at a time. First, we’ll look at how we render. Obviously, when rendering to an output we need to be in physical output coordinates. So we use the output transform and scaling factor to make ourselves a global-to-output transformation matrix. As we render apps, the desktop background, or anything else, we apply this transform. That way, the rest of weston only has to think in terms of global coordinates and the renderer takes care of putting it into physical device coordinates. Apps automatically get rotated for rotated displays and scaled for scaled displays.

For each surface, we have two more matrices: A surface-to-global matrix and buffer-to-surface. Usually the surface-to-global transform, as I mentioned above, is just a translation. However, it could be something more exotic so we use a matrix for it. The buffer-to-surface matrix transforms from buffer coordinates (the pixels the client is rendering) to surface coordinates. If we multiply all three matrices together we get the complete transform from buffer coordinates to physical output coordinates. Fullscreen shell:

Let’s take a brief diversion to talk about the fullscreen shell and modes. When weston starts up, it gets the physical modes from DRM and passes those out to the clients through the wl_output.mode event. It also gives the clients its transform and scaling factor through the wl_output.geometry and wl_output.scale events. This tells the clients the physcal modes of the output as well as all the information they need to figure out how the physcal modes map to virtual modes.

The fullscreen shell assumes that the client is either very dumb or very smart. Either it just wants to throw buffers at the compositor and doesn’t really care what happens to them, or it wants complete control over modesetting. Nested compositors fall under this second category of wanting control over modesetting. In this case, it expects the client to be smart enough to take the physcal modes, apply the transform and scale to them to get virtual modes, and present surfaces that match one of the virtual modes. This means that if you have a 1920x1080 monitor rotated 90 degrees, the client needs to figure this out and present a 1080x1920 surface.

Clients:

As far as the clients are concerned, all of these transformations (with the exception buffer-to-surface) are automatic. It the client presents a window, the compositor automatically rotates or scales it to be the right size and orientation on the physical output. If the client renders a 600x400 window that is currently sitting on a 4K monitor with a rotation of 90 degrees and a scaling factor of 3, the compositor will rotate and scale the surface so that it consumes a 1200x1800 rectangle of physical pixels. All this happens behind the client’s back. The data available through wl_output gives the client a hint as to how it is transformed, but it ultimately doesn’t know.

This yields a couple of problems. First, scaling an app up by a factor of 3 makes it look terrible. Second, if the total transform from the buffer the client provides to physical output coordinates is just a translation (no rotation or scaling) then we can frequently flip that buffer directly to the screen without having to do a full GL composite. This is especially important for fullscreen applications as we would really like to take their buffers and hand them directly to the scanout hardware.

To solve these problems, wl_surface provides the set_buffer_transform and set_buffer_scale requests. These inform the compositor that the client has rendered with the given transform and scaling factor. If the client’s surface is on an untransformed output, then this transform and scale will have to be reversed in order to get the buffer back into physical output coordinates. However, if the transform and scale provided by the client match the transform and scale that the output is using, then the buffers can be used directly.

This is most easily seen with the scaling factor so we’ll start there. My laptop, as I’ve mentioned, has a 3200x1800 screen which I usually have set to a scaling factor of 2. This means that the virtual mode is 1600x900 which is very nice for a laptop of its size. If an app renders normally at 600x400, the scaling factor will cause the compositor to scale it up to 1200x800. Usually this is done with linear filtering and the app looks a little blurry. It’s readable, but not really what you want. You bought that 3200x1800 display for a reason after all. If the client is HiDPI aware, however, it will notice that the output has a scaling factor of 2, render at 1200x800 with everything double-sized, and call wl_surface.set_buffer_scale(2). Then the compositor knows that it’s 600x400 in global coordinates but it’s buffer is twice as big so it will look good on that display. The end result is that fonts and widgets are nice and crisp and the only things that actually get scaled are images that the client doesn’t have higher DPI versions of.

What’s a little less obvious is the transform. The principle, however, is exactly the same. If the display is rotated by 90 degrees then a client that renders at 600x400 will automatically get rotated so it consumes 400x600 physical pixels. While this will look nice and sharp (it’s still pixel-for-pixel on the output) the performance, as I mentioned above, may not be as good. If the client wants to do a bit to help the compositor out, it can render itself rotated at 400x600 pixels and call wl_surface.set_buffer_transform(WL_OUTPUT_TRANSFORM_90). Then the compositor will know that it’s still 600x400 in global coordinates but, when the compositor goes to render, it’s already transformed the way it needs to be on the output and it just needs to be displayed.