Chapter 4. Comparing Tests
By now, you should have a good idea of what to look for when optimizing a page. Supposing you have compressed a few images or delay-loaded a JavaScript file or two, you’re probably eager to find out how much of a difference these changes have made. Besides manually loading each individual test’s results page side by side, WebPageTest allows you to compare tests with tools especially designed to highlight the differences. In this chapter, we’ll look at the comparison tools at your disposal and how WebPageTest makes it all possible.
Perceived Performance
So far we’ve looked at web performance from a very mechanical point of view. That is to say that we’ve gathered a lot of great data about how the page is constructed “under the hood,” like millisecond-precision event timings. But there’s one area of web performance that is arguably more important than simple load-time metrics. That is the measure of how fast a user perceives the page to load: does it feel fast? So what’s the difference? For example, the user doesn’t care if the page-load event fired after one second if the video she was intending to watch doesn’t load until five seconds. The literal load time of the page was only one second, but the perceived load time was much longer. Overhyping metrics like the load event can lead us astray when optimizing web pages. We also need to look at performance from the users’ perspective and optimize for their experience, which may not necessarily align with WebPageTest’s default metrics.
The visual comparison tools described in this chapter address the shortcomings of context-agnostic metrics. These tools are more like a camera than a stopwatch, giving us the ability to actually see the page load just as an end user would. This kind of empathetic analysis allows us to better understand how quick a page feels.
Capture Video
The Capture Video configuration option makes test comparison possible. With this option enabled, screenshots of the page will be taken at regular intervals during loading. These screenshots comprise the filmstrip view and, when shown in succession, make up the video replay. Figure 4-1 shows you where to enable the option.
To enable this option, select the checkbox on the Test Settings tab of the Advanced Settings section. This tells WebPageTest to save screenshots at regular intervals during the page load process. These still images comprise the frames that will make up a video recording of the loading process. This visual data will form the foundation for the core functionality of the test comparison tool. Only when this configuration is turned on are you able to use the tool to compare against other tests.
Tip
There are several querystring parameters that are helpful when you know you’re going to be using the test comparison page. On the WebPageTest home page, append these parameters to the URL before starting your test:
-
video=1
ensures that the Capture Video option is always enabled. -
continuousVideo=1
maintains a consistent frame rate of 10 fps. Without this, the test is not guaranteed to always record screenshots every 100 milliseconds, as it slows down to 1 fps by default. -
iq=100
alters the image quality of JPEG screenshots to any value between 0 and 100, with 100 being the best quality. -
pngss=1
formats the final screenshot of the fully loaded page as a PNG image.
Be aware that there are some limitations to the screenshots and the resultant video. First and foremost, screenshots can only be taken of the visible area of the web page, known as the area above the fold. Just like a user’s browser window, the test agent’s browser has a fixed size outside of which a screenshot is unable to capture. Second, the rate of capture is slow enough to avoid major interference with the CPU. Taking a virtual picture comes at a cost in terms of computation, which must be balanced with the processing required to build the page. For example, if the pictures were taken at a smooth 60 frames per second, the processor would be burdened every 16.7 milliseconds, which could adversely affect test results. For this reason, screenshots are limited to 10 frames per second; fast enough to capture granular changes to the page and slow enough to stay out of the CPU’s way. And the last word of caution is related to the quality of the pictures. To save on storage space, screenshots are recorded at lower resolution. The default quality is passable for recognizing prominent elements on the page but unsatisfactory for reading most text.
Filmstrip and Video
In the late nineteenth century, a heated debate had developed over whether a horse in motion was ever completely unsupported midair or if it always had at least one foot on the ground. United States Senator Leland Stanford commissioned Eadweard Muybridge to photograph his galloping horse, Sallie Gardner, to settle the debate scientifically.
Muybridge carefully positioned 24 cameras along a track to capture Stanford’s horse in action. As the horse rode by, each camera took a photograph of the horse mid-stride. What resulted was a set of images all centered on the horse, capturing its movements for further scrutiny. What Muybridge had on film was enough to bury the debate definitively; the horse could be seen momentarily suspended in midair without a foot on the ground. This series of photographs is considered to be one of the first motion pictures ever recorded, known as Sallie Gardner at a Gallop or The Horse in Motion (Figure 4-2).
Like the galloping horse, web pages can appear to load too quickly to discern any particular pattern. The naked eye isn’t able to break down the process into discrete observable steps, so we use tools to assist us. Muybridge’s series of cameras was able to show what the eye couldn’t see. WebPageTest’s filmstrip is an incredibly powerful tool that similarly captures moments in time, allowing us to better quantify how a page loads visually, as shown in Figure 4-3.
The Visual Comparison page is only available to tests that have enabled video capture. For these tests, the page is accessible by way of either the test log or the test summary page. The test log lets you refer back to completed tests (Figure 4-4), both those that you started yourself and those that have been set to public by everyone else using the tool. From this page, you are easily able to select multiple tests for comparison. Alternatively, you can see a test’s visual progress view from its summary page, as shown in Figure 4-5.
WebPageTest constructs the filmstrip by taking periodic screenshots of the page above the fold. These images are laid out chronologically so that each passing frame shows the page a little closer to being fully loaded (see Figure 4-6). Like Muybridge’s photographs, these still images instill a sense of movement as the page appears to load. Long runs of blank or unchanging images tell you that the page is slow to load. This is exactly the kind of empathetic analysis that makes perceived performance so powerful. Being able to look at test results that evoke a feeling of slowness should make you just as impatient and anxious as a user would feel. Load time is just a number, but seeing a page load is a feeling.
To truly give you a feeling of the page performance, the filmstrip is second only to the video feature. This is a real-time playback of each frame in the filmstrip. Things get really interesting when you add tests to the comparison; the video will synchronize each test and visualize their progress side by side, as shown in Figure 4-7. This is especially useful for succinctly capturing the difference in performance between tests. To that end, the performance video is great for nontechnical people to see and understand page-load time across tests. A video is easy to watch and doesn’t require numbers to complicate the message that one page is faster than another.
Storytelling by itself is mentally imaginative. As kids, we would tell stories in an activity called “show and tell,” during which everyone would present something they wanted to talk about. Coupled with actually showing something from the story, the activity also becomes visually engaging. In the web performance testing version of “show and tell,” the filmstrip and video serve as the tangible parts of the performance story. They can’t do it alone, though, so we use these tools in addition to the cold hard metrics and waterfall charts to tell the complete story.
Speed Index
Recall from “Measure What Matters” that generic metrics like load time and time-to-first-paint are blind to the context of a page. Unlike the filmstrip and video, which show exactly what is visible to the user at a given time, these cold hard metrics tell you more about how the page is doing than how the user is doing. WebPageTest invented a new metric to address this specific problem, but it needed to be context-aware like the visual comparison tools. The speed index of a page is derived from each screenshot’s visual progress toward being fully loaded (see Figure 4-8). A page that displays more to the user sooner has a lower, or better, speed index than a page that is slower to display content. This property of the speed index is what makes it superior to other metrics; it is a measure of general user experience as it relates to page loading.
To demonstrate the speed index’s usefulness, consider two versions of the same page. Both versions paint to the screen and complete loading at exactly the same times. The only difference between them is the rate at which content is painted. Ask any user which version they would prefer and the consensus would always be for a page that shows more content sooner. Even when cold metrics like paint and load time are equal, the user experience begs to differ. The speed index is a measure of this experience.
When visually comparing these two hypothetical tests, it’s clear that one of them appears to load faster. We can graph the visual progress data to illustrate how two versions of the same page could load so differently, as shown in Figure 4-9.
To show how the speed index can be derived from the illustration in Figure 4-9, consider the percentage of content that is not rendered at any given time. For the faster test, we can say that there is less to be rendered. Another way of thinking of it is to look at the area above the lines, as shown in Figure 4-10. By shading in these areas, the speed index emerges and the stark difference in area corresponds to small and large indexes.
Remember that these two tests started and finished displaying content at exactly the same times! But the speed index isn’t fooled by a page that takes its time in the middle of the page load. We’ve seen how it works visually, so how is it computed? In calculus, an integral is used to calculate the area under a curve (see the formula in Figure 4-11). If you’re getting anxious flashbacks to math class, don’t worry. The integral of the visual progress curve tells us the amount of content that has been displayed, but remember that we’re interested in the content yet to be displayed. We can get the area above the curve by subtracting the completed percentage from 1 and integrating the result for each 100-millisecond interval.
Warning
The speed index is much better at representing the quality of a page load’s user experience than simpler metrics like first paint and load times. However, it is not a one-size-fits-all number, because it still fails in one important area: application-specific contexts. 100% visual completeness is not necessarily equivalent to total usability of a page. Not only is the page content below the fold excluded from visual measurements, but also WebPageTest has no idea which part of the page users are actually waiting for. For example, a page that is only 25% visually complete may still have enough visible content for a user to start interacting with it. Think of a news page that shows the headline and article immediately but delay-loads other components like the article’s corresponding photograph. The story is absolutely readable without the picture; therefore visual completeness is not a perfect indicator of user experience.
Let’s see how this is implemented in code:
function
getSpeedIndex
(
&
$filmstrip
)
{
$speed_index
=
0
;
$last_time
=
0
;
$last_progress
=
0
;
foreach
(
$filmstrip
[
'frames'
]
as
$time
=>
&
$frame
)
{
// The interval between frames.
$elapsed
=
$time
-
$last_time
;
// The area of the rectangle above the current point is length * width,
// where the length is the remaining progress %
// and the width is the time interval.
$speed_index
+=
$elapsed
*
(
100
-
$last_progress
);
// Save the current state for the next frame.
$last_time
=
$time
;
$last_progress
=
$frame
[
'progress'
];
}
return
$speed_index
;
}
Using this relatively straightforward algorithm, we can get a single number that adequately represents the loading performance of a page. The speed index metric is a novel way to look at web performance. You should look at it in your test analysis to give you a more complete idea of performance beyond the default metrics. By doing so, you’ll be able to make more-informed decisions about how to better optimize your page.
Summary of Part I
In the preceding chapters, we looked at the basic use cases of WebPageTest. We started by dispelling a couple of common misconceptions about how the tool should be used. WebPageTest is a synthetic tool, which has different applications from RUM. Synthetic tools are excellent for identifying how to make a page faster but not necessarily useful for understanding the actual speeds users are experiencing. The performance metrics that WebPageTest exposes are especially useful for comparison purposes but shouldn’t be mistaken for the ground truth of real users’ performance.
To understand what could be slowing down a web page, we looked at a few of the tools available in the test analysis report. The flagship tool of WebPageTest is the waterfall diagram. Having discussed each constituent part of a waterfall, we came to a better understanding of what to expect from it. For example, the shape of a waterfall says a lot about the underlying performance of the page. A waterfall’s shape can be broken down further into discrete slopes of horizontal and vertical imaginary lines that indicate which requests are contributing to poor performance. We also looked at the connection view, which is just a different way of visualizing the network activity, the difference being that requests are grouped by the channel over which they are served.
Using the waterfall and connection views, we were able to come up with a list of anti-patterns in which bad performance manifests itself. Of the anti-patterns to look out for, the most common and severe is the long first-byte time. When the initial request takes a long time to process, the entire page is delayed and literally nothing can proceed. Ironically, WebPageTest is not well-equipped for us to figure out exactly why the response was delayed. Additional tools are required to trace the source of the problem. WebPageTest is, by design, a client-side analysis tool—one of many tools you should keep at your disposal when testing web performance.
In addition to the network visualizations, we also studied the way that WebPageTest grades tests. These grades are meant to be the most essential performance optimizations that should be universally applicable to all pages. For example, images are the most ubiquitous resource type on the web and yet they are too often transferred with unnecessary bloat. The Compress Images grade analyzes a test’s images to see just how much could have been saved with basic image compression techniques. The greater the savings, the worse of a grade the test will receive. Unlike the raw data, these grades try to do some of the analysis for you by calling out the more egregious categories of inefficiencies.
And finally, we saw how to compare tests so that the differences between them become more apparent. The clearest way to show how two tests differ is to look at them visually by making use of the filmstrip and video tools. The filmstrip view shows screenshots at periodic intervals, allowing us to see exactly what is visible on screen. These screenshots are postprocessed so that we’re able to determine their visual completeness, or the percentage of the page that has completed loading. This metric gives way to the speed index metric, which is a radically different way of measuring page performance. We talked about how perceived performance is in tune with how users perceive a page load. The perception of performance can be totally unlike the “cold” performance metrics that serve (for the most part) to report on distinct browser events like when the DOM is ready or when the page has loaded. The speed index quantifies the user experience based on the rate at which a page loads. Loading more content to the screen sooner just feels faster to the user. Of course, this is only as useful as the relevance of the content being displayed. That’s why the filmstrip and video tools are so important, because we’re able to see exactly which parts of the page are rendering and when.
These basic use cases have formed a solid foundation upon which we will build up a more thorough understanding of how to use WebPageTest like the experts. For most people, this may be enough to get started and make impactful web performance optimizations. If you’re someone who needs to get more out of WebPageTest, continue reading to learn about how to use its more advanced features.
Get Using WebPageTest now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.