Performance Tools: Appendix - Even Faster Websitesby Steve Souders
This excerpt is from Even Faster Web Sites .
Like all good engineers, web developers need to build up a set of tools to do a high-quality job. This appendix describes the tools I recommend for analyzing and improving web site performance. The tools are grouped into the following sections:
- the section called “Packet Sniffers”
When I sit down to analyze a web site, I start by looking at the HTTP requests for the web page in question. This makes it possible to identify the slow parts of the page. A packet sniffer that is handy and easy to use is the first tool to add to your kit. The tools included in this category are HttpWatch, Firebug Net Panel, AOL Pagetest, VRTA, IBM Page Detailer, Web Inspector Resources Panel, Fiddler, Charles, and Wireshark.
- the section called “Web Development Tools”
- the section called “Performance Analyzers”
Performance analyzers evaluate a given web page against a set of performance best practices. As I will explain, they vary a great deal in what they measure. This section describes YSlow, AOL Pagetest, VRTA, and neXpert.
- the section called “Miscellaneous”
This section includes a grab bag of tools I use regularly: Hammerhead, Smush.it, Cuzillion, and UA Profiler.
Every web developer working on performance needs to look at how his web pages load, including all the resources within the page. This is done using a packet sniffer. The packet sniffers listed in this section range from tools that give a higher-level view of network traffic, such as HttpWatch, to lower-level tools that expose each packet sent over the network, such as Wireshark. In most of my web performance analysis, I use the higher-level network monitors; they are typically easier to configure and have a user interface that is more visual. In some situations, such as debugging chunked encoding, I drop down into one of the lower-level packet sniffers in order to see the contents and timing of each packet sent over the wire.
HttpWatch is my preferred packet sniffer. HttpWatch depicts network traffic in a graphical way, as shown in Figure A.1, “HttpWatch”. Most of the HTTP waterfall charts in this book were captured using HttpWatch. This graphical display makes it easy to spot performance delays.
HttpWatch is built by Simtec. You can try out the free download, but it’s limited to work on only a few major sites, such as Google and Yahoo!. You have to pay for the full-featured version, but it’s money well spent. HttpWatch runs on Microsoft Windows with Internet Explorer and Firefox.
Firebug has many features critical to any web developer and is described
more thoroughly in the section called “Web Development Tools”. The Firebug
Net Panel, however, deserves mention here. Net Panel displays HTTP
waterfall charts, making it an easy alternative for developers who
already have Firebug installed. I especially like Net Panel’s use of
vertical lines to mark the
onload events in the page load timeline, as
shown in Figure A.2, “Firebug Net Panel”. This is a feature
that other packet sniffers should adopt.
An additional constraint is that Firebug is a Firefox add-on, so it isn’t available in other browsers.
AOL Pagetest is an Internet Explorer plug-in that produces HTTP waterfall charts. It also identifies areas for improving performance, as discussed in the section called “Performance Analyzers”.
VRTA from Microsoft focuses on improving network performance. Its HTTP waterfall chart is more detailed than other network monitors, putting an emphasis on reusing existing TCP connections. See the section called “Performance Analyzers” for more information about VRTA.
IBM Page Detailer used to be my preferred packet sniffer, but IBM stopped selling the professional version. The basic version is still available, but it lacks many features that I consider mandatory, such as support for analyzing HTTPS requests and the ability to export data. IBM Page Detailer runs on Microsoft Windows.
I use IBM Page Detailer when analyzing browsers other than
Internet Explorer and Firefox, such as Opera and Safari (since these
browsers aren’t supported by HttpWatch). IBM Page Detailer can monitor
network traffic for any process that uses HTTP. This is enabled by
wd_WS2s.ini file and
adding the process’s name to the
Executable line, like so:
There’s an interesting twist that prevents IBM Page Detailer from analyzing Chrome: Chrome has a separate process for the browser UI plus one for each tab. IBM Page Detailer attaches to the browser UI process, and so it doesn’t see any of the HTTP traffic for the actual web pages being loaded. Nevertheless, if support for HTTPS and exporting data isn’t required, IBM Page Detailer is a useful alternative.
The main distinguishing feature of Fiddler, built by Eric Lawrence from the Microsoft Internet Explorer team, is that it supports a scripting capability that allows for setting breakpoints and manipulating HTTP traffic. One downside is that it acts as a proxy, and so it may alter the behavior of the browser (e.g., the number of open connections per server). If you need a scripting capability and are mindful of any side effects of using a proxy, I highly recommend Fiddler. It runs on Microsoft Windows.
Charles is an HTTP proxy, similar to Fiddler. It has many of the same features as Fiddler, including the ability to analyze both HTTP and HTTPS traffic, and bandwidth throttling. Charles supports Microsoft Windows, Mac OS X, and Linux.
Wireshark evolved from Ethereal. It analyzes HTTP requests at the packet level. Its UI is not as graphical as other network monitors. It also doesn’t have the concept of a “web page,” so it’s up to you to discern where the web page’s packets start and end. If you have to look at traffic at the packet level, such as to analyze chunked encoding, Wireshark is the best choice. It’s available on many platforms, including Microsoft Windows, Mac OS X, and Linux.
Developers love Firebug because of their ability to extend it. This open extension model makes it possible to add on to Firebug’s features in a way that also allows for that new functionality to be shared with other developers. You can find useful Firebug extensions at http://getfirebug.com/extensions/index.html.
YSlow was the first widely used performance “lint” tool. AOL Pagetest, VRTA, and neXpert were released subsequently. Each of these tools has its own set of performance best practices. I’ve aggregated all of these best practices in Table A.1, “Performance best practices”, with an indication of which rules are evaluated by each particular tool. I’ve grouped the best practices into three categories:
The rules included in High Performance Web Sites
The best practices described in this book
Other rules that I haven’t addressed but that are incorporated in at least one of these tools
Looking at Table A.1, “Performance best practices”, it’s clear that there is little overlap in the best practices espoused by each tool. In one sense, this is good—bringing in different perspectives on the performance problem leads to the discovery of new best practices. But this diversity has a more important and unfavorable impact: confusion and fragmentation in the web development community. It’s unclear which set of best practices is best. The choice of tool might be dictated by development environment rather than by the content of the performance analysis.
Across the developers of these tools, there is more agreement on performance best practices than is reflected in Table A.1, “Performance best practices”. The inconsistencies arise for several reasons. There’s a desire to introduce new best practices and to focus less on covering what has already been covered somewhere else. Development time is always an issue; developers may decide to skip the implementation of well-known best practices. Don’t underestimate the impact of personal interests; for instance, it’s clear that the developers of VRTA have more interest and familiarity with networking issues than I do.
Table A.1. Performance best practices
High Performance Web Sites
Use CSS sprites
Use a CDN
Gzip text responses
Put CSS at the top
Avoid CSS expressions
Reduce DNS lookups
Remove dupe scripts
Even Faster Web Sites
Don’t block the UI thread
Load scripts asynchronously
Inline scripts before stylesheet
Minimize uncompressed size
Flush the document early
Simplify CSS selectors
Use persistent connections
Avoid network congestion
Increase MTU, TCP window
Avoid server congestion
Moving forward, web developers would be well served if it became possible for these and other tools to share a common set of performance best practices. I fully expect this will happen. These tools were created in the spirit of evangelizing a faster web experience for all users and to help developers easily identify where they can make the greatest improvement to their site’s speed. In that spirit, it makes sense to give developers tools that are more consistent regardless of their platform and tool of choice.
That’s the future. For now, the following sections provide descriptions of YSlow, AOL Pagetest, VRTA, and neXpert, as they exist today.
I created YSlow while working at Yahoo!. It existed first as a bookmarklet, and then as a Greasemonkey script. Joe Hewitt was kind enough to explain how to port YSlow to be a Firebug extension. Swapnil Shinde did a lot of the coding to get it to work with Firebug. The motivation I gave Swapnil was that I was certain YSlow would be used by as many as 10,000 people. YSlow was released in July 2007 and crossed the 1 million download mark a year and a half later. The name is a play on the question “whY is this page Slow?”
YSlow contains the following rules which are echoed as chapters in High Performance Web Sites. When YSlow was released, I also posted summaries of each rule at http://developer.yahoo.com/performance/rules.html. That page has subsequently been updated by the folks at Yahoo! to include 34 rules! Here are the original 13 rules that are still the basis for YSlow’s performance analysis:
Rule 1: Make Fewer HTTP Requests
Rule 2: Use a Content Delivery Network
Rule 3: Add an
Rule 4: Gzip Components
Rule 5: Put Stylesheets at the Top
Rule 6: Put Scripts at the Bottom
Rule 7: Avoid CSS Expressions
Rule 9: Reduce DNS Lookups
Rule 11: Avoid Redirects
Rule 12: Remove Duplicate Scripts
Rule 13: Configure ETags
Enable browser caching of static assets
Use one CDN for all static assets
Gzip-encode all appropriate text assets
Use persistent connections
Proper cookie usage
No ETag headers
AOL Pagetest is a plug-in for Internet Explorer. WebPagetest is accessible through any browser; it runs Internet Explorer on the backend server. In addition to performance analysis, both provide an HTTP waterfall chart, screenshots, page load times, and summary statistics.
The deployment of this functionality via the WebPagetest web site is intriguing. WebPagetest is fairly popular, but it hasn’t gotten the wide adoption it deserves. It lets you analyze any web site from any browser, without the hassle of downloading, installing, and configuring an application or plug-in. It does this by running AOL Pagetest in Internet Explorer on the WebPagetest site’s backend servers. WebPagetest users, from any browser, simply enter the URL of the site they want to analyze into the web-based form, and the results are presented a minute or so later. Figure A.4, “WebPagetest” shows the results for http://www.aol.com/.
Making WebPagetest available through a web page form makes it easy to use for everyone, including nondevelopers, but it does have some limitations. It’s important to remember that the results are always generated using Internet Explorer running in WebPagetest’s remote location. This can be confusing. Notice in Figure A.4, “WebPagetest” that I’m using Firefox; remembering that these results were produced using Internet Explorer is a challenge. Similarly, the results do not necessarily reflect your local conditions. If you’re trying to debug a problem with your current Internet connection, or you’re loading a page that depends on your current cookies, that can’t be captured by WebPagetest. AOL Pagetest (the downloaded, locally installed Internet Explorer plug-in) or the other packet sniffers mentioned in the previous section are the choice for analyzing your current browsing experience.
VRTA from Microsoft is short for Visual Round Trip Analyzer. It displays HTTP waterfall charts, but these are more detailed than those found in other tools. VRTA focuses on network optimization. One key aspect of this is reusing existing TCP connections. In most HTTP waterfall charts, each HTTP request is a separate horizontal bar. Instead, VRTA represents each TCP connection as a horizontal bar. This makes it easy to see how well TCP connections are being utilized. VRTA also shows a bit rate histogram, to show how well the available bandwidth is utilized.
In addition to its sophisticated network charts, VRTA evaluates the page download information against the following set of performance best practices:
Open enough ports
Limit the number of small files to be downloaded
Turn on keepalives
Identify network congestion
Increase network maximum transmission unit (MTU) or TCP window size
Identify server congestion
Check for unnecessary round trips
Set expiration dates
Think before you redirect
Edit your CSS
neXpert is also from Microsoft. It’s an add-on to Fiddler (see the section called “Packet Sniffers” for more information about Fiddler). It uses Fiddler to gather information about the resources downloaded for a web page. neXpert analyzes this information against a set of performance best practices and produces a report of suggested improvements. neXpert goes further than other performance analyzers in that it predicts the impact these improvements might have on the web page’s load time. The list of performance best practices analyzed by neXpert includes the following:
HTTP response codes
The tools in this section address specific web performance areas not covered in the previous sections. I use all of these tools on a regular, if not daily, basis.
Improving web performance requires measuring page load times. Although this sounds simple, in reality it’s extremely difficult to gather load time measurements in an accurate and statistically sound way that is representative of real-world users. There’s no single solution. Instead, multiple techniques are required, including measuring real-world traffic, bucket testing, and scripted or synthetic testing. The problem is that all of these techniques are costly, in terms of both dollars and calendar time.
I created Hammerhead to make it easier for developers to measure load times early in the development process. Hammerhead is an extension to Firebug. To test, or “hammer,” a set of web pages, enter the URLs into Hammerhead, along with the number of measurements desired. Figure A.5, “Hammerhead” shows an example.
Hammerhead loads each URL the specified number of times and records each measurement, as well as the average and median load times. The pages are loaded with both an empty and a primed cache (Hammerhead manages the cache for you). Although Hammerhead measurements are gathered under just one set of test conditions (your development environment), they provide a quick and easy way to compare two or more web page alternatives.
Smush.it is a service for analyzing and optimizing images in your web page. It was created by Stoyan Stefanov and Nicole Sullivan, the authors of Chapter 10, Optimizing Images. Smush.it tells you how many bytes you can save by optimizing your images, as shown in Figure A.6, “Smush.it”. It even produces the optimized images for you as a single ZIP file for easy download. There is also a Smush.it bookmarklet and Firefox extension, so you can get similar functionality inside the browser.
Almost every day I wonder about or am asked about a performance edge case. Do external scripts load in parallel if there’s an inline script in between them? What if there’s an inline script and a stylesheet in between them? Is the behavior the same on Firefox 3.1 and Chrome 2.0?
Instead of writing a new HTML page for each edge case that comes up, I use Cuzillion, shown in Figure A.7, “Cuzillion”. It has a graphical web page “avatar” onto which you can drag-and-drop different types of resources (external scripts, inline scripts, stylesheets, inline style blocks, images, and iframes). Clicking on a resource exposes a variety of configuration settings such as the domain used for loading the resource and how long it takes to respond.
I created Cuzillion while I was working on Chapter 4, Loading Scripts Without Blocking. I needed to test hundreds of test cases. Creating a test framework made this possible in a fraction of the time. The name comes from the tag line: “‘cuz there are a zillion pages to check.”
When Google released Chrome, Dion Almaer (coauthor of Chapter 2, Creating Responsive Web Applications) asked whether I was going to review it from a performance perspective. Rather than put Chrome through the paces manually, I created a set of HTML pages, each of which contains a specific test: are scripts loaded in parallel, do prefetch links work, and so forth. I then chained those pages together so that the tests would all run automatically.
For web developers, UA Profiler is useful for confirming how a given browser will perform during a specific optimization. For example, if you’re adding future caching headers to a redirect but it still doesn’t seem to be cached, you can check UA Profiler to make sure you’re using a browser that supports redirect caching.
If you enjoyed this excerpt, buy a copy of Even Faster Web Sites .