Chapter 1. A Primer on Selenium
Selenium is an open source suite composed of a set of libraries and tools that enable the automation of web browsers. We can see Selenium as an umbrella project with three core components: WebDriver, Grid, and IDE (Integrated Development Environment). Selenium WebDriver is a library that allows the driving of browsers programmatically. Thus, we can use Selenium WebDriver to navigate websites and interact with web pages (e.g., clicking on links, filling in forms, etc.) as a real user would do, in an automated fashion. The primary use of Selenium WebDriver is the automated testing of web applications. Other Selenium uses include the automation of web-based administration tasks or web scraping (automated web data extraction).
This chapter provides a comprehensive overview of the Selenium core components: WebDriver, Grid, and IDE. Then, it reviews the Selenium ecosystem, i.e., other tools and technologies around it. Finally, it analyzes the foundations of software testing related to Selenium.
Selenium Core Components
Jason Huggins and Paul Hammant created Selenium in 2004 while working in Thoughtworks. They chose the name âSeleniumâ as a counterpart to Mercury, an existing testing framework developed by Hewlett-Packard. The name is significant because the chemical selenium is known for reducing the toxicity of mercury.
That initial version of Selenium (known today as Selenium Core) is a JavaScript library that impersonates user actions in web applications. Selenium Core interprets the so-called Selenese commands to achieve this task. These commands are encoded as an HTML table composed of three parts: command (action executed in a web browser, such as opening a URL or clicking a link), target (locator that identifies a web element, such as the attribute of a given component), and value (optional data, such as the text typed into a web-form field).
Huggins and Hammant added a scripting layer to Selenium Core in a new project called Selenium Remote Control (RC). Selenium RC follows a client-server architecture. Clients use a binding language (such as Java or JavaScript) to send Selenese commands over HTTP to an intermediate proxy called the Selenium RC Server. This server launches web browsers on demand, injecting the Selenium Core library into a website and proxying requests from clients to Selenium Core. In addition, the Selenium RC Server masks the target website to the same local URL of the injected Selenium Core library to avoid same-origin policy concerns. This approach was a game-changer for browser automation at that time, but it had significant limitations. First, because JavaScript is the underlying technology to support automation, some actions are not permitted since JavaScript does not allow themâfor instance, uploading and downloading files or handling pop-ups and dialogs, to name a few. Besides, Selenium RC introduces a relevant overhead that impacts its performance.
In parallel, Simon Stewart created the project WebDriver in 2007. WebDriver and Selenium RC were equivalent from a functional perspective, i.e., both projects allow programmers to impersonate web users using a programming language. Nevertheless, WebDriver uses the native support of each browser to carry out the automation, and therefore, its capabilities and performance are far superior to RC. In 2009, after a meeting between Jason Huggins and Simon Stewart at the Google Test Automation Conference, they decided to merge Selenium and WebDriver in a single project. The new project was called Selenium WebDriver or Selenium 2. This new project uses a communication protocol based on HTTP combined with the native automation support on the browser. That approach is still the basis of Selenium 3 (released in 2016) and Selenium 4 (released in 2021). Now we refer to Selenium RC and Core as âSelenium 1,â and its use is discouraged in favor of Selenium WebDriver. This book focuses on the latest version of Selenium WebDriver to date, i.e., version 4.
Tip
Appendix A summarizes the novelties shipped with Selenium 4. This appendix also contains a migration guide for bumping from Selenium 3 to 4.
Today, Selenium is a well-known automation suite composed of three subprojects: WebDriver, Grid, and IDE. The following subsections present the main characteristics of each one.
Selenium WebDriver
Selenium WebDriver is a library that allows the controlling of web browsers automatically. To that aim, it provides a cross-platform API in different language bindings. The official programming languages supported by Selenium WebDriver are Java, JavaScript, Python, Ruby, and C#. Internally, Selenium WebDriver uses the native support implemented by each browser to carry out the automation process. For this reason, we need to place a component called driver between the script using the Selenium WebDriver API and the browser. Table 1-1 summarizes the browsers and drivers officially supported by Selenium WebDriver.
Note
The name Selenium is widely used to refer to the library for browser automation. Since this term is also the name of the umbrella project, I use Selenium in this book to identify the browser automation suite, which is composed of three components: Selenium WebDriver (library), Selenium Grid (infrastructure), and Selenium IDE (tool).
Browser | Driver | Operating system | Maintainer | Download |
---|---|---|---|---|
Chrome/Chromium |
chromedriver |
Windows/macOS/Linux |
||
Edge |
msedgedriver |
Windows/macOS/Linux |
Microsoft |
https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver |
Firefox |
geckodriver |
Windows/macOS/Linux |
Mozilla |
|
Opera |
operadriver |
Windows/macOS/Linux |
Opera Software AS |
|
Internet Explorer |
IEDriverServer |
Windows |
Selenium project |
|
Safari |
safaridriver |
macOS |
Apple |
Built-in |
Drivers (e.g., chromedriver, geckodriver, etc.) are platform-dependent binary files that receive commands from a WebDriver script and translate them into some browser-specific language. In the first releases of Selenium WebDriver (i.e., in Selenium 2), these commands (also known as the Selenium protocol) were JSON messages over HTTP (the so-called JSON Wire Protocol). Nowadays, this communication (still JSON over HTTP) follows a standard specification named W3C WebDriver. This specification is the preferred Selenium protocol as of Selenium 4.
Figure 1-1 summarizes the basic architecture of Selenium WebDriver we have seen so far. As you can see, this architecture has three tiers. First, we find a script using the Selenium WebDriver API (Java, JavaScript, Python, Ruby, or C#). This script sends W3C WebDriver commands to the second layer, in which we find the drivers. This figure shows the specific case of using chromedriver (to control Chrome) and geckodriver (to control Firefox). Finally, the third layer contains the web browsers. In the case of Chrome, the native browser follows the DevTools Protocol. DevTools is a set of developer tools for browsers based on the Blink rendering engine, such as Chrome, Chromium, Edge, or Opera. The DevTools Protocol is based on JSON-RPC messages and allows inspecting, debugging, and profiling these browsers. In Firefox, the native automation support uses the Marionette protocol. Marionette is a remote protocol based on JSON, allowing instrumenting and controlling web browsers based on the Gecko engine (such as Firefox).
Overall, Selenium WebDriver allows controlling web browsers as a user would, but programmatically. To that aim, the Selenium WebDriver API provides a wide variety of features to navigate web pages, interact with web elements, or impersonate user actions, among many other capabilities. The target application is web-based, such as static websites, dynamic web applications, Single Page Applications (SPA), complex enterprise systems with a web interface, etc.
Selenium Grid
The second project of the Selenium family is Selenium Grid. Philippe Hanrigou started the development of this project in 2008. Selenium Grid is a group of networked hosts that provides browser infrastructure for Selenium WebDriver. This infrastructure enables the (parallel) execution of Selenium WebDriver scripts with remote browsers of a different nature (types and versions) in multiple operating systems.
Figure 1-2 shows the basic architecture of Selenium Grid. As you can see, a group of nodes provides browsers used by Selenium scripts. These nodes can use different operating systems (as we saw in Table 1-1) with various installed browsers. The central entry point to this Grid is the Hub (also known as Selenium Server). This server-side component keeps track of the nodes and proxies requests from the Selenium scripts. Like in Selenium WebDriver, the W3C WebDriver specification is the standard protocol for the communication between these scripts and the Hub.
The hub-nodes architecture in Grid has been available since Selenium 2. This architecture is also present in Selenium 3 and 4. Nevertheless, this centralized architecture can lead to performance bottlenecks if the number of requests to the Hub is high. Selenium 4 provides a fully distributed flavor of Selenium Grid to avoid this problem. This architecture implements advanced load balancing mechanisms to avoid overloading any component.
Tip
Chapter 6 describes how to set up Selenium Grid following the classical approach (based on a hub and set of nodes). This chapter also covers the standalone mode (i.e., hub and node(s) hosted in the same machine) and the fully distributed architecture.
Selenium IDE
Selenium IDE is the last core component of the Selenium suite. Shinya Kasatani created this project in 2006. Selenium IDE is a tool that implements the so-called Record and Playback (R&P) automation technique. As the name suggests, this technique has two steps. First, in Selenium IDE, the record part captures user interactions with a browser, encoding these actions as Selenium commands. Second, we use the generated Selenium script to execute a browser session automatically (playback).
This early version of Selenium IDE was a Firefox plug-in that embedded Selenium Core to record, edit, and play back Selenium scripts. These early versions were XPI modules (i.e., a technology used to create Mozilla extensions). As of version 55 (released in 2017), Firefox migrated support for add-ons to the W3C Browser Extension specification. As a result, Selenium IDE was discontinued, and for some time, it has not been possible to use it. The Selenium team rewrote Selenium IDE following the Browser Extensions recommendation to solve this problem. Thanks to this, we can now use Selenium IDE in multiple browsers, such as Chrome, Edge, and Firefox.
Figure 1-3 shows the new Selenium IDE GUI (Graphical User Interface).
Using this GUI, users can record interactions with a browser and edit and execute the generated script. Selenium IDE encodes each interaction in different parts: a command (i.e., the action executed in the browser), a target (i.e., the locator of the web element), and a value (i.e., the data handled). Optionally, we can include a description of the command. Figure 1-3 shows a recorded example of these steps:
-
Open website (https://bonigarcia.dev/selenium-webdriver-java). We will use this website as the practice site in the rest of the book.
-
Click on the link with the text âGitHub.â As a result, the navigation moves to the examples repository source code.
-
Assert that the book title (Hands-On Selenium WebDriver with Java) is present on the web page.
-
Close the browser.
Once we have created a script in Selenium IDE, we can export this script as a Selenium WebDriver test. For instance, Figure 1-4 shows how to convert the presented example as a JUnit test case. Finally, we can save the project on our local machine. The resulting project for this sample is available in the examples GitHub repository.
Note
The Selenium project is porting Selenium IDE to Electron at the time of this writing. Electron is an open source framework based on Chromium and Node.js that allows desktop application development.
Selenium Ecosystem
Software ecosystems are collections of elements interacting with a shared market underpinned by a common technological background. In the case of Selenium, its ecosystem involves the official core projects and other related projects, libraries, and actors. This section reviews the Selenium ecosystem, divided into the following categories: language bindings, driver managers, frameworks, browser infrastructure, and community.
Language Bindings
As we already know, the Selenium project maintains various language bindings for Selenium WebDriver: Java, JavaScript, Python, Ruby, and C#. Nevertheless, other languages are also available. Table 1-2 summarizes these language bindings for Selenium WebDriver maintained by the community.
Name | Language | License | Maintainer | Website |
---|---|---|---|---|
hs-webdriver |
Haskell |
BSD-3-Clause |
Adam Curtis |
|
php-webdriver |
PHP |
MIT |
Facebook, community |
|
RSelenium |
R |
AGPLv3 |
rOpenSci |
|
Selenium |
Go |
MIT |
Miki Tebeka |
|
Selenium-Remote-Driver |
Perl |
Apache 2.0 |
George S. Baugh |
|
webdriver.dart |
Dart |
Apache 2.0 |
||
wd |
JavaScript |
Apache 2.0 |
Adam Christian |
Driver Managers
Drivers are mandatory components to control web browsers natively with Selenium WebDriver (see Figure 1-1). For this reason, before using the Selenium WebDriver API, we need to manage these drivers. Driver management is the process of downloading, setting up, and maintaining the proper driver for a given browser. The usual steps in the driver management procedure are:
- 1. Download
-
Each browser has its own driver. For example, we use chromedriver for controlling Chrome or geckodriver for Firefox (see Table 1-1). The driver is a platform-specific binary file. Therefore, we need to download the proper driver for a given operating system (typically, Windows, macOS, or Linux). In addition, we need to consider the driver version since a driver release is compatible with a given browser version (or range). For example, to use Chrome 91.x, we need to download chromedriver 91.0.4472.19. We usually find the browser-driver compliance in the driver documentation or release notes.
- 2. Setup
-
Once we have the proper driver, we need to make it available in our Selenium WebDriver script.
- 3. Maintenance
-
Modern web browsers (e.g., Chrome, Firefox, or Edge) upgrade themselves automatically and silently, without prompting the user. For this reason, and concerning Selenium WebDriver, we need to maintain the browser-driver version compatibility in time for these so-called evergreen browsers.
As you can see, the driver maintenance process can be time-consuming. Furthermore, it can cause problems for Selenium WebDriver users (e.g., failed tests due to browser-driver incompatibility after an automatic browser upgrade). For this reason, the so-called driver managers aim to carry out the driver management process in an automated fashion to some extent. Table 1-3 summarizes the available driver managers for different language bindings.
Name | Language | License | Maintainer | Website |
---|---|---|---|---|
WebDriverManager |
Java |
Apache 2.0 |
Boni GarcÃa |
|
webdriver-manager |
JavaScript |
MIT |
||
webdriver-manager |
Python |
Apache 2.0 |
Serhii Pirohov |
|
WebDriverManager.Net |
C# |
MIT |
Aliaksandr Rasolka |
|
webdrivers |
Ruby |
MIT |
Titus Fortner |
Tip
In this book, I recommend using WebDriverManager because it automates the entire driver maintenance process (i.e., download, setup, and maintenance). See Appendix B for further information about automated and manual driver management.
Locator Tools
The Selenium WebDriver API provides different ways to locate web elements (see Chapter 3): by attribute (id, name, or class), by link text (complete or partial), by tag name, by CSS (Cascading Style Sheets) selector, or by XML Path Language (XPath). Specific tools can help to identify and generate these locators. Table 1-4 shows some of these tools.
Name | Type | License | Maintainer | Website |
---|---|---|---|---|
Chrome DevTools |
Built-in browser tool |
Proprietary freeware, based on open source |
||
Firefox Developer Tools |
Built-in browser tool |
MPL 2.0 |
Mozilla |
|
Cropath |
Browser extension |
Freeware |
AutonomIQ |
|
SelectorsHub |
Browser extension |
Freeware |
Sanjay Kumar |
|
POM Builder |
Browser extension |
Freeware |
LogiGear Corporation |
Frameworks
In software engineering, a framework is a set of libraries and tools used as a conceptual and technological base and support for software development. Selenium is the foundation for frameworks that wrap, enhance, or complement its default features. Table 1-5 contains some of these frameworks and libraries based on Selenium.
Name | Language | Description | License | Maintainer | Website |
---|---|---|---|---|---|
CodeceptJS |
JavaScript |
Multi-backend testing framework that models browser interactions as simple steps from a user perspective |
MIT |
Michael Bodnarchuk |
|
FluentSelenium |
Java |
Fluent API for Selenium WebDriver |
Apache 2.0 |
Paul Hammant |
|
FluentLenium |
Java |
Website and mobile automation framework to create readable and reusable WebDriver tests |
Apache 2.0 |
FluentLenium team |
|
Healenium |
Java |
Library for improving the stability of Selenium tests by using machine learning algorithms to analyze web and mobile web elements |
Apache 2.0 |
Anna Chernyshova and Dmitriy Gumeniuk |
|
Helium |
Python |
High-level API based on Selenium WebDriver |
MIT |
Michael Herrmann |
|
QAF (QMetry Automation Framework) |
Java |
Test automation platform for web and mobile applications |
MIT |
Chirag Jayswal |
|
Lightning |
Java |
Lightweight Selenium WebDriver client for Java |
Apache 2.0 |
FluentLenium |
|
Nerodia |
Python |
Python port of the Watir Ruby gem |
MIT |
Lucas Tierney |
|
Robot Framework |
Python, Java, .NET, and others |
Generic automation framework based on human-readable test cases |
Apache 2.0 |
Robot Framework Foundation |
|
Selenide team |
Java |
Fluent, concise API for Selenium WebDriver |
MIT |
Selenide team |
|
SeleniumBase |
Python |
Browser automation framework based on WebDriver and pytest |
MIT |
Michael Mintz |
|
Watir (Web Application Testing in Ruby) |
Ruby |
Gem library based on WebDriver for automating web browsers |
MIT |
Titus Fortner |
|
WebDriverIO |
JavaScript |
Test automation framework based WebDriver and Appium |
MIT |
Christian Bromann |
|
Nightwatch.js |
JavaScript |
Integrated end-to-end testing framework based on the W3C WebDriver |
MIT |
Andrei Rusu |
|
Applitools |
Java, JavaScript, C#, Ruby, PHP, Python |
Test automation framework for visual user interface regression and A/B testing. It provides SDKs for Selenium, Appium, and others |
Commercial |
Applitools team |
|
Katalon Studio |
Java, Groovy |
Test automation platform leveraging Selenium WebDriver, Appium, and cloud providers |
Commercial |
Katalon team |
|
TestProject |
Java, C#, Python |
Test automation platform for web and mobile apps built on top of Selenium and Appium |
Commercial |
TestProject team |
Browser Infrastructure
We can use Selenium WebDriver to control local browsers installed in the machine running the WebDriver script. Also, Selenium WebDriver can drive remote web browsers (i.e., those executed in other hosts). In this case, we can use Selenium Grid to support the remote browser infrastructure. Nevertheless, this infrastructure can be challenging to create and maintain.
Alternatively, we can use a cloud provider to outsource the responsibility for supporting the browser infrastructure. In the Selenium ecosystem, a cloud provider is a company or product that supplies managed services for automated testing. These companies typically offer commercial solutions for web and mobile testing. The users of a cloud provider request on-demand browsers of different types, versions, and operating systems. Also, these providers typically offer additional services for easing the testing and monitoring activities, such as access to session recordings or analysis capabilities, to name a few. Some of the most relevant cloud providers for Selenium nowadays are Sauce Labs, BrowserStack, LambdaTest, CrossBrowserTesting, Moon Cloud, TestingBot, Perfecto, or Testinium.
Another solution we can use to support the browser infrastructure for Selenium is Docker. Docker is an open source software technology that allows users to pack and run applications as lightweight, portable containers. The Docker platform has two main components: the Docker Engine, a tool for creating and running containers, and the Docker Hub, a cloud service for distributing Docker images. In the Selenium domain, we can use Docker to pack and execute containerized browsers. Table 1-6 presents a summary of relevant projects using Docker in the Selenium ecosystem.
Name | Description | License | Maintainer | Website |
---|---|---|---|---|
docker-selenium |
Official Docker images for Selenium Grid |
Apache 2.0 |
Selenium project |
|
Selenoid |
Lightweight Golang implementation of Selenium Hub running browsers in Docker (images available on Docker Hub) |
Apache 2.0 |
Aerokube |
|
Moon |
Enterprise Selenium cluster that use Docker and Kubernetes |
Commercial |
Aerokube |
|
Callisto |
Open source Kubernetes-native implementation of Selenium Grid |
MIT |
Aerokube |
Community
Due to its collaborative nature, software development needs the organization and interaction of many participants. In the open source domain, we can measure the success of a project by the relevance of its community. Selenium is supported by a large community of many different participants worldwide. Table 1-7 presents a summary of several Selenium resources grouped into the following categories: official documentation, development, support, and events.
Category | Description | Website |
---|---|---|
Official documentation |
User guide |
|
Blog |
||
Wiki |
||
Ecosystem |
||
Development |
Source code |
|
Issues |
||
Governance |
||
Support |
User group |
|
Slack |
||
IRC |
||
StackOverflow |
||
Events |
Conference |
|
Meetups |
Software Testing Fundamentals
Software testing (or simply testing) consists of the dynamic evaluation of a piece of software, called System Under Test (SUT), through a finite set of test cases (or simply tests), giving a verdict about it. Testing implies the execution of SUT using specific input values to assess the outcome or expected behavior.
At first glance, we distinguish two separate categories of software testing: manual and automated. On the one hand, in manual testing, a person (typically a software engineer or the final user) evaluates the SUT. On the other hand, in automated testing, we use specific software tools to develop tests and control their execution against the SUT. Automated tests allow the early detection of defects (usually called bugs) in the SUT while providing a large number of additional benefits (e.g., cost savings, fast feedback, test coverage, or reusability, to name a few). Manual testing can also be a valuable approach in some cases, for example, in exploratory testing (i.e., human testers freely investigate and evaluate the SUT).
Note
There is no universal classification for the numerous forms of testing presented in this section. These concepts are subject to continuous evolution and debate, just like software engineering. Consider it a proposal that can fit into a large number of projects.
Levels of Testing
Depending on the size of the SUT, we can define different levels of testing. These levels define several categories in which software teams divide their testing efforts. In this book, I propose a stacked layout to represent the different levels (see Figure 1-5). The lower levels of this structure represent the tests aimed at verifying small pieces of software (called units). As we ascend in the stack, we find other tiers (e.g., integration, system, etc.) in which the SUT integrates more and more components.
The lowest level of this stack is unit testing. At this level, we assess individual units of software. A unit is a particular observable element of behavior. For instance, units are typically methods or classes in object-oriented programming and functions in functional programming. Unit testing aims to verify that each unit behaves as expected. Automated unit tests usually run very fast since each test executes a small amount of code in isolation. To achieve this isolation, we can use test doubles, pieces of software that replace the dependent components of a given unit. For example, a popular type of test double in object-oriented programming is the mock object. A mock object mimics an actual object using some programmed behavior.
The next level in Figure 1-5 is integration testing. At this level, different units are composed to create composite components. Integration testing aims to assess the interaction between the involved units and expose defects in their interfaces.
Then, at the system testing and end-to-end (E2E) levels, we test the software system as a whole. We need to deploy the SUT and verify its high-level features to carry out these levels. The difference between system/end-to-end and integration testing is that the former involves all the system components and the final user (typically impersonated). In other words, system and end-to-end testing assess the SUT through the User Interface (UI). This UI can be graphical (GUI) or nongraphical (e.g., text-based or other types).
Figure 1-6 illustrates the difference between system and end-to-end testing. As you can see, on the one hand, end-to-end testing involves the software system and its dependent subsystems (e.g., database or external services). On the other hand, system testing comprises only the software system, and these external dependencies are typically mocked.
Acceptance testing is the top tier of the presented stack. At this level, the final user participates in the testing process. The objective of acceptance testing is to decide whether the software system meets end-user expectations. As you can see in Figure 1-6, like end-to-end testing, acceptance testing validates the whole system and its dependencies. Therefore, acceptance tests also use the UI to carry out the SUT validation.
Tip
The primary purpose of Selenium WebDriver is to implement end-to-end tests. Nevertheless, we can use WebDriver to carry out system testing when mocking the backend calls made by the website under test. Moreover, we can use Selenium WebDriver in conjunction with a Behavior-Driven Development (BDD) tool to implement acceptance tests (see Chapter 9).
Types of Testing
Depending on the strategy for designing test cases, we can implement different types of tests. The two principal types of testing are:
- Functional testing (also known as behavioral or closed-box testing)
-
Evaluates the compliance of a piece of software with the expected behavior (i.e., its functional requirements).
- Structural testing (also known as clear-box testing)
-
Determines if the program-code structure is faulty. To that aim, testers should know the internal logic of a piece of software.
The difference between these testing types is that functional tests are responsibility-based, while structural tests are implementation-based. Both types can be performed at any test level (unit, integration, system, end-to-end, or acceptance). Nevertheless, structural tests are commonly done at the unit or integration level since these levels enable more direct control of the code execution flow.
Warning
Black-box and white-box testing are other names for functional and structural testing, respectively. Nevertheless, these designations are not recommended since the tech industry is trying to adopt more inclusive terms and use neutral terminology instead of potentially harmful language.
There are different flavors of functional testing. For example:
- UI testing (known as GUI testing when the UI is graphical)
-
Evaluates if the visual elements of an application meet the expected functionality. Note that UI testing is different from the system and end-to-end testing levels since the former tests the interface itself, and the latter evaluates the whole system through the UI.
- Negative testing
-
Evaluates the SUT under unexpected conditions (e.g., expected exceptions). This term is the counterpart of the regular functional testing (sometimes called positive testing), in which we assess if the SUT behaves as expected (i.e., its happy path).
- Cross-browser testing
-
This is specific for web applications. It aims to verify the compatibility of websites and applications in different web browsers (types, versions, or operating systems).
A third miscellaneous testing type, nonfunctional testing, includes testing strategies that assess the quality attributes of a software system (i.e., its nonfunctional requirements). Common methods of nonfunctional testing include, but are not limited to:
- Performance testing
-
Assesses different metrics of software systems, such as response time, stability, reliability, or scalability. The objective of performance testing is not finding bugs but finding system bottlenecks. There are two common subtypes of performance testing:
- Security testing
-
Tries to evaluate security concerns, such as confidentiality (disclosure of information protection), authentication (ensuring the user identity), or authorization (determining user rights and privileges), among others.
- Usability testing
-
Evaluates how user-friendly a software application is. This assessment is also called User eXperience (UX) testing. A subtype of usability testing is:
- Accessibility testing
-
Evaluates if a system is usable by people with disabilities.
Tip
We use Selenium WebDriver primarily to implement functional tests (i.e., interacting with a web application UI to assess the application behavior). It is unlikely to use WebDriver to implement structural tests. In addition, although it is not its principal usage, we can use WebDriver to implement nonfunctional tests, e.g., for load, security, accessibility, or localization (assessment of specific locale settings) testing (see Chapter 9).
Testing Methodologies
The software development lifecycle is the set of activities, actions, and tasks required to create software systems in software engineering. The moment at which software engineers design and implement test cases in the overall development lifecycle depends on the specific development process (such as iterative, waterfall, or agile, to name a few). Two of the most relevant testing methodologies are:
- Test Driven Development (TDD)
-
TDD is a methodology in which we design and implement tests before the actual software design and implementation. At the beginning of the 21st century, TDD became popular with the rise of agile software development methodologies, such as Extreme Programming (XP). In TDD, a developer first writes an (initially failing) automated test for a given feature. Then, the developer creates a piece of code to pass that test. Finally, the developer refactors the code to achieve or improve readability and maintainability.
- Test Last Development (TLD)
-
TLD is a methodology in which we design and implement tests after implementing the SUT. This practice is typical in traditional software development processes, such as waterfall (sequential), incremental (multi-waterfall), spiral (risk-oriented multi-waterfall), or Rational Unified Process (RUP).
Another relevant testing methodology is Behavior-Driven Development (BDD). BDD is a testing practice derived from TDD, and consequently, we design tests at the early stages of the software development lifecycle in BDD. To that aim, conversations occur between the final user and the development team (typically with the project leader, manager, or analysts). These conversations formalize a common understanding of the desired behavior and the software system. As a result, we create acceptance tests in terms of one or more scenarios following a Given-When-Then structure:
- Given
-
Initial context at the beginning of the scenario
- When
-
Event that triggers the scenario
- Then
-
Expected outcome
Tip
TLD is a common practice used to implement Selenium WebDriver. In other words, developers/testers do not implement a WebDriver test until the SUT is available. Nevertheless, different methodologies are also possible. For instance, BDD is a common approach when using WebDriver with Cucumber (see Chapter 9).
Closely related to the domain of testing methodologies, we find the concept of Continuous Integration (CI). CI is a software development practice where members of a software project build, test, and integrate their work continuously. Grady Booch first coined the term CI in 1991. Now it is a popular strategy to create software.
As Figure 1-7 shows, CI has three separate stages. First, we use a source code repository, a hosting facility to store and share the source code of a software project. We typically use a version control system (VCS) to manage this repository. A VCS is a tool that keeps track of the source code, who made each change, and when (sometimes called patch).
Git, initially developed by Linus Torvalds, is the preferred VCS today. Other alternatives are a concurrent versions system (CVS) or Subversion (SVN). On top of Git, several code hosting platforms (such as GitHub, GitLab, or Bitbucket) provide collaborative cloud repository hosting services for developing, sharing, and maintaining software.
Developers synchronize a local repository (or simply, repo) copy in their local environments. Then, they do the coding work using that local copy, committing new changes to the remote repository (typically daily). The basic idea of CI is that every commit triggers the build and test of the software with the new changes. The test suite executed to assess that a patch does not break the build is called a regression test. A regression suite can contain tests of different types, including unit, integration, end-to-end, etc.
When the number of tests is too large for regression testing, we typically choose only a part of the relevant tests from the whole suite. There are different strategies to select these tests, for instance, smoke testing (i.e., tests that ensure the critical functionality) or sanity testing (i.e., tests that evaluate the basic functionality). Lastly, we can execute the complete suite as a scheduled task (typically nightly).
We need to use a server-side infrastructure called a build server to implement a CI pipeline. The build server usually reports a problem to the original developer when the regression tests fail. Table 1-8 provides a summary of several build servers.
Name | Description | License | Maintainer | Website |
---|---|---|---|---|
Bamboo |
Easy use with Jira (issue tracker) and Bitbucket |
Commercial |
Atlassian |
|
GitHub Actions |
Integrated build server in GitHub |
Free for public repositories |
Microsoft |
|
GitLab CI/CD |
Integrated build server in GitLab |
Free for public repositories |
GitLab |
|
Jenkins |
Open source automation server |
MIT |
Jenkins team |
Tip
I use a GitHub repository (https://github.com/bonigarcia/selenium-webdriver-java) to publish and maintain the test examples presented in this book. GitHub Actions is the build server for this repo (see Chapter 2).
We can extend a typical CI pipeline in two ways (see Figure 1-8):
- Continuous Delivery (CD)
-
After CI, the build server deploys the release to a staging environment (i.e., a replica of a production environment for testing purposes) and executes the automated acceptance tests (if any).
- Continuous Deployment
-
The build server deploys the software release to the production environment as the final step.
Close to CI, the term DevOps (development and operations) has gained momentum. DevOps is a software methodology that promotes communication and collaboration between different teams in a software project to develop and deliver software efficiently. These teams include developers, testers, QA (quality assurance), operations (infrastructure), etc.
Test Automation Tools
We need to use some tooling to implement, execute, and control automated tests effectively. One of the most relevant categories for testing tools is the unit testing framework. The original framework in the unit testing family (also known as xUnit) is SmalltalkUnit (or SUnit). SUnit is a unit test framework for the Smalltalk language created by Kent Beck in 1999. Erich Gamma ported SUnit to Java, creating JUnit. Since then, JUnit has been very popular, inspiring other unit testing frameworks. Table 1-9 summarizes the most relevant unit testing frameworks in different languages.
Name | Language | Description | License | Maintainer | Website |
---|---|---|---|---|---|
JUnit |
Java |
Reference implementation of xUnit family |
EPL |
JUnit team |
|
TestNG |
Java |
Inspired by JUnit and NUnit, including extra features |
Apache 2.0 |
Cedric Beust |
|
Mocha |
JavaScript |
Test framework for Node.js and the browser |
MIT |
OpenJS Foundation |
|
Jest |
JavaScript |
Focused on simplicity with a focus on web applications |
MIT |
Facebiij |
|
Karma |
JavaScript |
Allows you to execute JavaScript tests in web browsers |
MIT |
Karma team |
|
NUnit |
.Net |
Unit testing framework for all .Net languages (C#, Visual Basic, and F#) |
MIT |
.NET Foundation |
|
unittest |
Python |
Unit testing framework included as a standard library as of Python 2.1 |
PSF License |
Python Software Foundation |
|
minitest |
Ruby |
Complete suite of testing utilities for Ruby |
MIT |
Seattle Ruby Brigade |
An important common characteristic of the xUnit family is the test structure, composed of four phases (see Figure 1-9):
- Setup
-
The test case initializes the SUT to exhibit the expected behavior.
- Exercise
-
The test case interacts with the SUT. As a result, the test gets an outcome from the SUT.
- Verify
-
The test case decides if the obtained outcome from the SUT is as expected. To that aim, the test contains one or more assertions. An assertion (or predicate) is a boolean-value function that checks if an expected condition is true. The execution of the assertions generates a test verdict (typically, pass or fail).
- Teardown
Tip
We can use unit testing frameworks in conjunction with other libraries or utilities to implement any test type. For example, as explained in Chapter 2, we use JUnit and TestNG to embed the call to the Selenium WebDriver API, implementing end-to-end tests for web applications.
The stages of setup and teardown are optional in a unit test case. Although it is not strictly mandatory, verifying is highly recommended. Even if unit testing frameworks include capabilities to implement assertions, it is common to incorporate third-party assertions libraries. These libraries aim to improve the test codeâs readability by providing a rich set of fluent assertions. In addition, these libraries offer enhanced error messages to help testers understand the cause of a failure. Table 1-10 contains a summary of some of the most relevant assertion libraries for Java.
Name | Description | License | Maintainer | Website |
---|---|---|---|---|
AssertJ |
Fluent assertions Java library |
Apache 2.0 |
AssertJ team |
|
Hamcrest |
Java library of matchers aimed to create flexible assertions |
BSD |
Hamcrest team |
|
Truth |
Fluent assertions for Java and Android |
Apache 2.0 |
As you can see in Figure 1-9, the SUT usually can query another component, named the Depended-On Component (DOC). In some cases (e.g., at the unit or system testing level), we might want to isolate the SUT from the DOC(s). We can find a wide variety of mock libraries to achieve this isolation.
Table 1-11 shows a comprehensive summary of some of these mock libraries for Java.
Name | Level | Description | License | Maintainer | Website |
---|---|---|---|---|---|
EasyMock |
Unit |
It allows mocking objects for unit testing using Java annotations |
Apache |
EasyMock team |
|
Mockito |
Unit |
Mocking Java library for mock creation and verification |
MIT |
Mockito team |
|
JMockit |
Integration |
It allows out-of-container integration testing for Java EE and Spring-based apps |
Open |
JMockit team |
|
MockServer |
System |
Mocking library for any system integrated via HTTP or HTTPS with Java clients |
Apache 2.0 |
James Bloom |
|
WireMock |
System |
Tool for simulating HTTP-based services |
Apache 2.0 |
Tom Akehurst |
The last category of testing tools we analyze in this section is BDD, a development process that creates acceptance tests. There are plenty of alternatives to implement this approach. For instance, Table 1-12 shows a condensed summary of relevant BDD frameworks.
Name | Language | Description | License | Maintainer | Website |
---|---|---|---|---|---|
Cucumber |
Ruby, Java, JavaScript, Python |
Testing framework to created automated acceptance tests following a BDD approach |
MIT |
SmartBear Software |
|
FitNesse |
Java |
Standalone collaborative wiki and acceptance testing framework |
CPL |
FitNesse team |
|
JBehave |
Java, Groovy, Kotlin, Ruby, Scala |
BDD framework for all JVM languages |
BSD-3-Clause |
JBehave team |
|
Jasmine |
JavaScript |
BDD framework for JavaScript |
MIT |
Jasmine team |
|
Capybara |
Ruby |
Web-based acceptance test framework that simulates scenarios for user stories |
MIT |
Thomas Walpole |
|
Serenity BDD |
Java, Javascript |
Automated acceptance testing library |
Apache 2.0 |
Serenity BDD team |
Summary and Outlook
Selenium has come a long way since its inception in 2004. Many practitioners consider it the de facto standard solution to develop end-to-end tests for web applications, and it is used by thousands of projects worldwide. In this chapter, you have seen the foundations of the Selenium project (made up of WebDriver, Grid, and IDE). In addition, Selenium has a rich ecosystem and active community. WebDriver is the heart of the Selenium project, and it is a library that provides an API to control different web browsers (e.g., Chrome, Firefox, Edge, etc.) programmatically. Table 1-13 contains a comprehensive overview of the primary and secondary uses of Selenium WebDriver.
Primary | Secondary (other usages) | |
---|---|---|
Purpose |
Automated testing |
Web scraping, web-based administration tasks |
Test level |
End-to-end testing |
System testing (mocking backend calls) |
Test type |
Functional testing (ensuring expected behavior) |
Nonfunctional testing (e.g., load, security, accessibility, or localization) |
Test methodology |
TLD (implementing tests when SUT is available) |
BDD (defining user scenarios at early development stages) |
In the next chapter, you discover how to set up a Java project using Maven or Gradle as build tools. This project will contain end-to-end tests for web applications using JUnit and TestNG as the unit testing frameworks and calls to the Selenium WebDriver API. In addition, you will learn how to control different web browsers (e.g., Chrome, Firefox, or Edge) with a basic test case (the Selenium WebDriverâs version of the classic hello world).
Get Hands-On Selenium WebDriver with Java now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.