Deferred Shading


The way we did lighting so far was called forward rendering or forward shading, a straightforward approach where we render an object, light it according to all light sources in a scene and then render the next object, and so on for each object in the scene. While quite easy to understand and implement it is also quite heavy on performance as each rendered object has to iterate over each light source for every rendered fragment which is a lot! Forward rendering also tends to waste a lot of fragment shader runs in scenes with a high depth complexity (multiple objects cover the same screen pixel) as most fragment shader outputs are overwritten.

Deferred shading or deferred rendering tries to overcome these issues which drastically changes the way we render objects. This gives us several new options to significantly optimize scenes with large numbers of lights, allowing us to render hundreds or even thousands of lights with an acceptable framerate. Below is an image of a scene with 1847 point lights rendered with deferred shading (image courtesy of Hannes Nevalainen); something that wouldn't be possible with forward rendering.

Example of the power of deferred shading in OpenGL as we can easily render 1000s lights with an acceptable framerate

Deferred shading is based on the idea that we defer or postpone most of the heavy rendering (like lighting) to a later stage. Deferred shading consists of two passes: in the first pass called the geometry pass we render the scene once and retrieve all kinds of geometrical information from the objects that we store in a collection of textures called the G-buffer; think of position vectors, color vectors, normal vectors and/or specular values. The geometric information of a scene stored in the G-buffer is then later used for (more complex) lighting calculations. Below is the content of a G-buffer of a single frame:

An example of a G-Buffer filled with geometrical data of a scene in OpenGL

We use the textures from the G-buffer in a second pass called the lighting pass where we render a screen-filled quad and calculate the scene's lighting for each fragment using the geometrical information stored in the G-buffer; pixel by pixel we iterate over the G-buffer. Instead of taking each object all the way from the vertex shader to the fragment shader we decouple its advanced fragment processes to a later stage. The lighting calculations remain exactly the same to what we're used to, but this time we take all required input variables from the corresponding G-buffer textures instead of the vertex shader (plus some uniform variables).

The image below nicely illustrates the total process of deferred shading.

Overview of the deferred shading technique in OpenGL

A major advantage of this approach is that whatever fragment ends up in the G-buffer is the actual fragment information that ends up as a screen pixel, as the depth test already concluded this fragment information as the top-most fragment. This ensures that for each pixel we process in the lighting pass we only do this once; saving us from a lot of unused render calls. Furthermore, deferred rendering opens up the possibility for further optimizations that allow us to render a much larger amount of light sources than we'd be able to use with forward rendering.

It also comes with a few disadvantages as the G-buffer requires us to store a relatively large amount of scene data in its texture colorbuffers which eats memory, especially since scene data like position vectors require a high precision. Another disadvantage is that it doesn't support blending (as we only have information of the topmost fragment) and MSAA no longer works. There are several workarounds for these disadvantages that we'll get to at the end of the tutorial.

Filling the G-buffer in the geometry pass is quite efficient as we directly store object information like position, color or normals into a framebuffer with a small or zero amount of processing. By also using multiple render targets (MRT) we can even do all this in a single render pass.

The G-buffer

The G-buffer is the collective term of all textures used to store lighting-relevant data for the final lighting pass. Let's take this moment to briefly review all the data we need to light a fragment with forward rendering:

With these (per-fragment) variables at our disposal we are able to calculate the (Blinn-)Phong lighting we're accustomed to. The light source positions and colors, and the player's view position can be configured using uniform variables, but the other variables are all specific to each of an object's fragments. If we can somehow pass the exact same data to the final deferred lighting pass we can calculate the same lighting effects, even though we're rendering fragments of a 2D quad.

There is no limit in OpenGL to what we can store in a texture so it makes sense to store all per-fragment data in one or multiple screen-filled textures called the G-buffer and use these later in the lighting pass. As the G-buffer textures will have the same size as the lighting pass's 2D quad we get the exact same fragment data we'd had in a forward rendering setting, but this time in the lighting pass; there is a one on one mapping.

In pseudocode the entire process will look a bit like this:

while(...) // render loop
    // 1. geometry pass: render all geometric/color data to g-buffer 
    glBindFramebuffer(GL_FRAMEBUFFER, gBuffer);
    for(Object obj : Objects)
    // 2. lighting pass: use g-buffer to calculate the scene's lighting
    glBindFramebuffer(GL_FRAMEBUFFER, 0);

The data we'll need to store of each fragment is a position vector, a normal vector, a color vector and a specular intensity value. In the geometry pass we thus need to render all objects of the scene and store these data components in the G-buffer. We can again use multiple render targets to render to multiple colorbuffers in a single render pass; this was briefly discussed in the bloom tutorial.

For the geometry pass we'll need to initialize a framebuffer object that we'll intuitively call gBuffer that has multiple colorbuffers attached and a single depth renderbuffer object. For the position and normal texture we'd preferably use a high-precision texture (16 or 32-bit float per component) and albedo and specular values we'll be fine with a default texture (8-bit precision per component).

unsigned int gBuffer;
glGenFramebuffers(1, &gBuffer);
glBindFramebuffer(GL_FRAMEBUFFER, gBuffer);
unsigned int gPosition, gNormal, gColorSpec;
// - position color buffer
glGenTextures(1, &gPosition);
glBindTexture(GL_TEXTURE_2D, gPosition);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, gPosition, 0);
// - normal color buffer
glGenTextures(1, &gNormal);
glBindTexture(GL_TEXTURE_2D, gNormal);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, gNormal, 0);
// - color + specular color buffer
glGenTextures(1, &gAlbedoSpec);
glBindTexture(GL_TEXTURE_2D, gAlbedoSpec);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, gAlbedoSpec, 0);
// - tell OpenGL which color attachments we'll use (of this framebuffer) for rendering 
glDrawBuffers(3, attachments);
// then also add render buffer object as depth buffer and check for completeness.

Since we use multiple render targets we have to explicitly tell OpenGL which of the colorbuffers associated with GBuffer we'd like to render to with glDrawBuffers. Also interesting to note here is that we store position and normal data in an RGB texture as we have 3 components each, but we store color and specular intensity data combined in a single RGBA texture; this saves us from having to declare an additional colorbuffer texture. As your deferred shading pipeline gets more complex and needs more data you'll quickly find new ways to combine data in individual textures.

Next we need to render into the G-buffer. Assuming each object has a diffuse, a normal and a specular intensity texture we'd use something like the following fragment shader to render into the G-buffer:

#version 330 core
layout (location = 0) out vec3 gPosition;
layout (location = 1) out vec3 gNormal;
layout (location = 2) out vec4 gAlbedoSpec;

in vec2 TexCoords;
in vec3 FragPos;
in vec3 Normal;

uniform sampler2D texture_diffuse1;
uniform sampler2D texture_specular1;

void main()
    // store the fragment position vector in the first gbuffer texture
    gPosition = FragPos;
    // also store the per-fragment normals into the gbuffer
    gNormal = normalize(Normal);
    // and the diffuse per-fragment color
    gAlbedoSpec.rgb = texture(texture_diffuse1, TexCoords).rgb;
    // store specular intensity in gAlbedoSpec's alpha component
    gAlbedoSpec.a = texture(texture_specular1, TexCoords).r;

As we use multiple render targets, the layout specifier tells OpenGL to which colorbuffer of the currently active framebuffer we render to. Note that we do not store the specular intensity into a single colorbuffer texture as we can store its single float value in the alpha component of one of the other colorbuffer textures.

Keep in mind that with lighting calculations it is extremely important to keep all variables in the same coordinate space; in this case we store (and calculate) all variables in world-space.

If we'd now were to render a large collection of nanosuit objects into the gBuffer framebuffer and visualize its content by projecting its colorbuffers one by one onto a screen-filled quad we'd see something like this:

Image of a G-Buffer in OpenGL with several nanosuits

Try to visualize that the world-space position and normal vectors are indeed correct. For instance, the normal vectors pointing to the right would be more aligned to a red color, similar for position vectors that point from the scene's origin to the right. As soon as you're satisfied with the content of the G-buffer it's time to move to the next step: the lighting pass.

The deferred lighting pass

With a large collection of fragment data in the G-Buffer at our disposal we have the option to completely calculate the scene's final lighted colors by iterating over each of the G-Buffer textures pixel by pixel and use their content as input to the lighting algorithms. Because the G-buffer texture values all represent the final transformed fragment values we only have to do the expensive lighting operations once per pixel. This makes deferred shading quite efficient, especially in complex scenes where we'd easily invoke multiple expensive fragment shader calls per pixel in a forward rendering setting.

For the lighting pass we're going to render a 2D screen-filled quad (a bit like a post-processing effect) and execute an expensive lighting fragment shader on each pixel:

glBindTexture(GL_TEXTURE_2D, gPosition);
glBindTexture(GL_TEXTURE_2D, gNormal);
glBindTexture(GL_TEXTURE_2D, gAlbedoSpec);
// also send light relevant uniforms
shaderLightingPass.setVec3("viewPos", camera.Position);

We bind all relevant textures of the G-buffer before rendering and also send the lighting-relevant uniform variables to the shader.

The fragment shader of the lighting pass is largely similar to the lighting tutorial shaders we've used so far. What is new is the method in which we obtain the lighting's input variables which we now directly sample from the G-buffer:

#version 330 core
out vec4 FragColor;
in vec2 TexCoords;

uniform sampler2D gPosition;
uniform sampler2D gNormal;
uniform sampler2D gAlbedoSpec;

struct Light {
    vec3 Position;
    vec3 Color;
const int NR_LIGHTS = 32;
uniform Light lights[NR_LIGHTS];
uniform vec3 viewPos;

void main()
    // retrieve data from G-buffer
    vec3 FragPos = texture(gPosition, TexCoords).rgb;
    vec3 Normal = texture(gNormal, TexCoords).rgb;
    vec3 Albedo = texture(gAlbedoSpec, TexCoords).rgb;
    float Specular = texture(gAlbedoSpec, TexCoords).a;
    // then calculate lighting as usual
    vec3 lighting = Albedo * 0.1; // hard-coded ambient component
    vec3 viewDir = normalize(viewPos - FragPos);
    for(int i = 0; i < NR_LIGHTS; ++i)
        // diffuse
        vec3 lightDir = normalize(lights[i].Position - FragPos);
        vec3 diffuse = max(dot(Normal, lightDir), 0.0) * Albedo * lights[i].Color;
        lighting += diffuse;
    FragColor = vec4(lighting, 1.0);

The lighting pass shader accepts 3 uniform textures that represent the G-buffer and hold all the data we've stored in the geometry pass. If we were to sample these with the current fragment's texture coordinates we'd get the exact same fragment values as if we were rendering the geometry directly. At the start of the fragment shader we retrieve the lighting-relevant variables from the G-buffer textures by a simple texture lookup. Note that we retrieve both the Albedo color and the Specular intensity from the single gAlbedoSpec texture.

As we now have the per-fragment variables necessary (and the relevant uniform variables) to calculate Blinn-Phong lighting we don't have to make any changes to the lighting code. The only thing we change in deferred shading is the method of obtaining lighting input variables.

Running a simple demo with a total of 32 small lights looks a bit like this:

Example of Deferred Shading in OpenGL

One of the disadvantages of deferred shading is that it is not possible to do blending as all values in the G-buffer are from single fragments, and blending operates on the combination of multiple fragments. Another disadvantage is that deferred shading forces you to use the same lighting algorithm for most of your scene's lighting; you can somehow alleviate this a bit by including more material-specific data in the G-buffer.

To overcome these disadvantages (especially blending) we often split the renderer into two parts: one deferred rendering part, and the other a forward rendering part specifically meant for blending or special shader effects not suited for a deferred rendering pipeline. To illustrate how this works we'll render the light sources as small cubes using a forward renderer as the light cubes require a special shader (simply output a single light color).

Combining deferred rendering with forward rendering

Say we want to render each of the light sources as a 3D cube positioned at the light source's position emitting the color of the light alongside the deferred renderer. A first idea that comes to mind is to simply forward render all the light sources on top of the deferred lighting quad at the end of the deferred shading pipeline. So basically render the cubes as we'd normally do, but only after we've finished the deferred rendering operations. In code this will look a bit like this:

// deferred lighting pass
// now render all light cubes with forward rendering as we'd normally do
shaderLightBox.setMat4("projection", projection);
shaderLightBox.setMat4("view", view);
for (unsigned int i = 0; i < lightPositions.size(); i++)
    model = glm::mat4();
    model = glm::translate(model, lightPositions[i]);
    model = glm::scale(model, glm::vec3(0.25f));
    shaderLightBox.setMat4("model", model);
    shaderLightBox.setVec3("lightColor", lightColors[i]);

However, these rendered cubes do not take any of the stored geometry depth of the deferred renderer into account and are, as a result, always rendered on top of the previously rendered objects; this isn't the result we were looking for.

Image of deferred rendering with forward rendering where we didn't copy depth buffer data and lights are rendered on top of all geometry in OpenGL

What we need to do is first copy the depth information stored in the geometry pass into the default framebuffer's depth buffer and only then render the light cubes. This way the light cubes' fragments are only rendered when on top of the previously rendered geometry.

We can copy the content of a framebuffer to the content of another framebuffer with the help of glBlitFramebuffer, a function we also used at the anti-aliasing tutorial to resolve multisampled framebuffers. The glBlitFramebuffer function allows us to copy a user-defined region of a framebuffer to a user-defined region of another framebuffer.

We stored the depth of all the objects rendered in the deferred shading pass in the gBuffer FBO. If we were to simply copy the content of its depth buffer to the depth buffer of the default framebuffer, the light cubes would then render as if all of the scene's geometry was rendered with forward rendering. As briefly explained in the anti-aliasing tutorial we have to specify a framebuffer as the read framebuffer and similarly specify a framebuffer as the write framebuffer:

glBindFramebuffer(GL_READ_FRAMEBUFFER, gBuffer);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0); // write to default framebuffer
glBindFramebuffer(GL_FRAMEBUFFER, 0);
// now render light cubes as before

Here we copy the entire read framebuffer's depth buffer content to the default framebuffer's depth buffer; this can similarly be done for colorbuffers and stencil buffers. Now if we then render the light cubes, the cubes indeed act as if the scene's geometry is real and not simply pasted on top of a 2D quad:

Image of deferred rendering with forward rendering where we copied the depth buffer data and lights are rendered properly with all  geometry in OpenGL

You can find the full source code of the demo here.

With this approach we can easily combine deferred shading with forward shading. This is great as we can now still apply blending and render objects that require special shader effects, something that isn't possible in a deferred rendering context.

A larger number of lights

What deferred rendering is often praised for is its ability to render an enormous amount of light sources without a heavy cost on performance. Deferred rendering by itself doesn't allow for a very large amount of light sources as we'd still have to calculate each fragment's lighting component for each of the scene's light sources. What makes a large amount of light sources possible is a very neat optimization we can apply to the deferred rendering pipeline: that of light volumes.

Normally when we'd render a fragment in a large lighted scene we'd calculate the contribution of each light source in a scene, regardless of their distance to the fragment. A large portion of these light sources will never reach the fragment so why waste all the lighting computations?

The idea behind light volumes is to calculate the radius or volume of a light source i.e. the area where its light is able to reach fragments. As most light sources use some form of attenuation we can use that to calculate the maximum distance or radius their light is able to reach. We then only do the expensive lighting calculations if a fragment is inside one or more of these light volumes. This can save us a considerable amount of computation as we now only calculate lighting where it's necessary.

The trick to this approach is mostly figuring out the size or radius of the light volume of a light source.

Calculating a light's volume or radius

To obtain a light's volume radius we'd basically have to solve the attenuation equation for a brightness we deem as dark, this can be 0.0 or something slightly more lit but still considered dark like 0.03. To demonstrate how we can calculate a light's volume radius we'll use one of the more difficult, but extensive attenuation functions that we've introduced in the light casters tutorial:

\[F_{light} = \frac{I}{K_c + K_l * d + K_q * d^2}\]

What we want to do is solve this equation for when \(F_{light}\) is 0.0, when the light is completely dark at that distance. However, this equation will never exactly reach the value 0.0 so there won't be a solution. What we can do however is not solve the equation for 0.0, but solve it for a brightness value that is close to 0.0 but still perceived as dark. The brightness value we choose as acceptable for this tutorial's demo scene is \(5/256\); divided by 256 as the default 8-bit framebuffer can display that many intensities per component.

The attenuation function used is mostly dark in its visible range so if we were to limit it to an even darker brightness than \(5/256\) the light volume would become too large and thus less effective. As long as a user cannot see a sudden cut-off of a light source at its volume borders we'll be fine. Of course this always depends on the type of scene; a higher brightness threshold results in smaller light volumes and thus a better efficiency, but can produce noticeable artifacts where lighting seems to break at a volume's borders.

The attenuation equation we have to solve becomes:

\[\frac{5}{256} = \frac{I_{max}}{Attenuation}\]

Here \(I_{max}\) is the light source's brightest color component. We use a light source's brightest color component, as solving the equation for a light's brightest intensity value best reflects the ideal light volume radius.

From here on we continue solving the equation:

\[\frac{5}{256} * Attenuation = I_{max} \] \[5 * Attenuation = I_{max} * 256 \] \[Attenuation = I_{max} * \frac{256}{5} \] \[K_c + K_l * d + K_q * d^2 = I_{max} * \frac{256}{5} \] \[K_q * d^2 + K_l * d + K_c - I_{max} * \frac{256}{5} = 0 \]

The last equation is an equation of the form \(ax^2 + bx + c = 0\) which we can solve using the quadratic equation:

\[x = \frac{-K_l + \sqrt{K_l^2 - 4 * K_q * (K_c - I_{max} * \frac{256}{5})}}{2 * K_q} \]

This gives us a general equation that allows us to calculate \(x\) i.e. the light volume's radius for the light source given a constant, linear and quadratic parameter:

float constant  = 1.0; 
float linear    = 0.7;
float quadratic = 1.8;
float lightMax  = std::fmaxf(std::fmaxf(lightColor.r, lightColor.g), lightColor.b);
float radius    = 
  (-linear +  std::sqrtf(linear * linear - 4 * quadratic * (constant - (256.0 / 5.0) * lightMax))) 
  / (2 * quadratic);  

This returns a radius between roughly 1.0 and 5.0 based on the light's maximum intensity.

We calculate this radius for each light source of the scene and use it to only calculate lighting for that light source if a fragment is inside the light source's volume. Below is the updated lighting pass fragment shader that takes into account the calculated light volumes. Note that this approach is merely done for teaching purposes and not viable in a practical setting as we'll soon discuss:

struct Light {
    float Radius;
void main()
    for(int i = 0; i < NR_LIGHTS; ++i)
        // calculate distance between light source and current fragment
        float distance = length(lights[i].Position - FragPos);
        if(distance < lights[i].Radius)
            // do expensive lighting

The results are exactly the same as before, but this time each light only calculates lighting for the light sources in which volume it resides.

You can find the final source code of the demo here.

How we really use light volumes

The fragment shader shown above doesn't really work in practice and only illustrates how we can sort of use a light's volume to reduce the lighting calculations. The reality is that your GPU and GLSL are really bad at optimizing loops and branches. The reason for this is that shader execution on the GPU is highly parallel and most architectures have a requirement that for large collection of threads they need to run the exact same shader code for it to be efficient. This often means that a shader is run that always executes all branches of an if statement to ensure the shader runs are the same, making our previous radius check optimization completely useless; we'd still calculate lighting for all light sources!

The appropriate approach to using light volumes is to render actual spheres, scaled by the light volume radius. The center of these spheres are positioned at the light source's position and as it is scaled by the light volume radius the sphere exactly encompasses the light's visible volume. This is where the trick comes in: we use largely the same deferred fragment shader for rendering the sphere. As the rendered sphere produces fragment shader invocations that exactly match the pixels the light source affects, we only render the relevant pixels and skip all other pixels. The image below illustrates this:

Image of a light volume rendered with a deferred fragment shader in OpenGL

This is done for each light source in the scene and the resulting fragments are additively blended together. The result is then the exact same scene as before, but this time rendering only the relevant fragments per light source. This effectively reduces the computations from nr_objects * nr_lights to nr_objects + nr_lights which makes it incredibly efficient in scenes with a large number of lights. This approach is what makes deferred rendering so suitable for rendering a large number of lights.

There is still an issue with this approach: face culling should be enabled (otherwise we'd render a light's effect twice) and when it is enabled the user might enter a light source's volume after which the volume isn't rendered anymore (due to back-face culling), removing the light source's influence; this can be solved with a neat stencil buffer trick.

Rendering light volumes does take a heavy toll on performance and while it is generally faster than normal deferred shading it's not the best optimization. Two other popular (and more efficient) extensions on top of deferred shading exist called deferred lighting and tile-based deferred shading. These are incredibly efficient at rendering large amounts of light and also allow for relatively efficient MSAA. However, for the sake of this tutorial's length I'll leave these optimizations for a later tutorial.

Deferred rendering vs forward rendering

By itself (without light volumes) deferred shading is already a large optimization as each pixel only runs a single fragment shader, compared to forward rendering where we'd often run the fragment shader multiple times per pixel. Deferred rendering does come with a few disadvantages though: a large memory overhead, no MSAA and blending still has to be done with forward rendering.

When you have a small scene and not too many lights, deferred rendering is not necessarily faster and sometimes even slower as the overhead then outweighs the benefits of deferred rendering. In more complex scenes deferred rendering quickly becomes a significant optimization; especially with the more advanced optimization extensions.

As a final note I'd also like to mention that basically all effects that can be accomplished with forward rendering can also be implemented in a deferred rendering context; this often only requires a small translation step. For instance, if we want to use normal mapping in a deferred renderer, we'd change the geometry pass shaders to output a world-space normal extracted from a normal map (using a TBN matrix) instead of the surface normal; the lighting calculations in the lighting pass don't need to change at all. And if you want parallax mapping to work, you'd want to first displace the texture coordinates in the geometry pass before sampling an object's diffuse, specular or normal textures. Once you understand the idea behind deferred rendering, it's not too difficult to get creative.

Additional resources