Chapter 1. A Primer on Selenium

Selenium is an open source suite composed of a set of libraries and tools that enable the automation of web browsers. We can see Selenium as an umbrella project with three core components: WebDriver, Grid, and IDE (Integrated Development Environment). Selenium WebDriver is a library that allows the driving of browsers programmatically. Thus, we can use Selenium WebDriver to navigate websites and interact with web pages (e.g., clicking on links, filling in forms, etc.) as a real user would do, in an automated fashion. The primary use of Selenium WebDriver is the automated testing of web applications. Other Selenium uses include the automation of web-based administration tasks or web scraping (automated web data extraction).

This chapter provides a comprehensive overview of the Selenium core components: WebDriver, Grid, and IDE. Then, it reviews the Selenium ecosystem, i.e., other tools and technologies around it. Finally, it analyzes the foundations of software testing related to Selenium.

Selenium Core Components

Jason Huggins and Paul Hammant created Selenium in 2004 while working in Thoughtworks. They chose the name âSeleniumâ as a counterpart to Mercury, an existing testing framework developed by Hewlett-Packard. The name is significant because the chemical selenium is known for reducing the toxicity of mercury.

That initial version of Selenium (known today as Selenium Core) is a JavaScript library that impersonates user actions in web applications. Selenium Core interprets the so-called Selenese commands to achieve this task. These commands are encoded as an HTML table composed of three parts: command (action executed in a web browser, such as opening a URL or clicking a link), target (locator that identifies a web element, such as the attribute of a given component), and value (optional data, such as the text typed into a web-form field).

Huggins and Hammant added a scripting layer to Selenium Core in a new project called Selenium Remote Control (RC). Selenium RC follows a client-server architecture. Clients use a binding language (such as Java or JavaScript) to send Selenese commands over HTTP to an intermediate proxy called the Selenium RC Server. This server launches web browsers on demand, injecting the Selenium Core library into a website and proxying requests from clients to Selenium Core. In addition, the Selenium RC Server masks the target website to the same local URL of the injected Selenium Core library to avoid same-origin policy concerns. This approach was a game-changer for browser automation at that time, but it had significant limitations. First, because JavaScript is the underlying technology to support automation, some actions are not permitted since JavaScript does not allow themâfor instance, uploading and downloading files or handling pop-ups and dialogs, to name a few. Besides, Selenium RC introduces a relevant overhead that impacts its performance.

In parallel, Simon Stewart created the project WebDriver in 2007. WebDriver and Selenium RC were equivalent from a functional perspective, i.e., both projects allow programmers to impersonate web users using a programming language. Nevertheless, WebDriver uses the native support of each browser to carry out the automation, and therefore, its capabilities and performance are far superior to RC. In 2009, after a meeting between Jason Huggins and Simon Stewart at the Google Test Automation Conference, they decided to merge Selenium and WebDriver in a single project. The new project was called Selenium WebDriver or Selenium 2. This new project uses a communication protocol based on HTTP combined with the native automation support on the browser. That approach is still the basis of Selenium 3 (released in 2016) and Selenium 4 (released in 2021). Now we refer to Selenium RC and Core as âSelenium 1,â and its use is discouraged in favor of Selenium WebDriver. This book focuses on the latest version of Selenium WebDriver to date, i.e., version 4.

Tip

AppendixÂ A summarizes the novelties shipped with Selenium 4. This appendix also contains a migration guide for bumping from Selenium 3 to 4.

Today, Selenium is a well-known automation suite composed of three subprojects: WebDriver, Grid, and IDE. The following subsections present the main characteristics of each one.

Selenium WebDriver

Selenium WebDriver is a library that allows the controlling of web browsers automatically. To that aim, it provides a cross-platform API in different language bindings. The official programming languages supported by Selenium WebDriver are Java, JavaScript, Python, Ruby, and C#. Internally, Selenium WebDriver uses the native support implemented by each browser to carry out the automation process. For this reason, we need to place a component called driver between the script using the Selenium WebDriver API and the browser. TableÂ 1-1 summarizes the browsers and drivers officially supported by Selenium WebDriver.

Note

The name Selenium is widely used to refer to the library for browser automation. Since this term is also the name of the umbrella project, I use Selenium in this book to identify the browser automation suite, which is composed of three components: Selenium WebDriver (library), Selenium Grid (infrastructure), and Selenium IDE (tool).

Table 1-1. Browsers and drivers supported by Selenium WebDriver
Browser	Driver	Operating system	Maintainer	Download
Chrome/Chromium	chromedriver	Windows/macOS/Linux	Google	https://chromedriver.chromium.org
Edge	msedgedriver	Windows/macOS/Linux	Microsoft	https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver
Firefox	geckodriver	Windows/macOS/Linux	Mozilla	https://github.com/mozilla/geckodriver
Opera	operadriver	Windows/macOS/Linux	Opera Software AS	https://github.com/operasoftware/operachromiumdriver
Internet Explorer	IEDriverServer	Windows	Selenium project	https://www.selenium.dev/downloads
Safari	safaridriver	macOS	Apple	Built-in

Drivers (e.g., chromedriver, geckodriver, etc.) are platform-dependent binary files that receive commands from a WebDriver script and translate them into some browser-specific language. In the first releases of Selenium WebDriver (i.e., in Selenium 2), these commands (also known as the Selenium protocol) were JSON messages over HTTP (the so-called JSON Wire Protocol). Nowadays, this communication (still JSON over HTTP) follows a standard specification named W3C WebDriver. This specification is the preferred Selenium protocol as of Selenium 4.

FigureÂ 1-1 summarizes the basic architecture of Selenium WebDriver we have seen so far. As you can see, this architecture has three tiers. First, we find a script using the Selenium WebDriver API (Java, JavaScript, Python, Ruby, or C#). This script sends W3C WebDriver commands to the second layer, in which we find the drivers. This figure shows the specific case of using chromedriver (to control Chrome) and geckodriver (to control Firefox). Finally, the third layer contains the web browsers. In the case of Chrome, the native browser follows the DevTools Protocol. DevTools is a set of developer tools for browsers based on the Blink rendering engine, such as Chrome, Chromium, Edge, or Opera. The DevTools Protocol is based on JSON-RPC messages and allows inspecting, debugging, and profiling these browsers. In Firefox, the native automation support uses the Marionette protocol. Marionette is a remote protocol based on JSON, allowing instrumenting and controlling web browsers based on the Gecko engine (such as Firefox).

Overall, Selenium WebDriver allows controlling web browsers as a user would, but programmatically. To that aim, the Selenium WebDriver API provides a wide variety of features to navigate web pages, interact with web elements, or impersonate user actions, among many other capabilities. The target application is web-based, such as static websites, dynamic web applications, Single Page Applications (SPA), complex enterprise systems with a web interface, etc.

Selenium Grid

The second project of the Selenium family is Selenium Grid. Philippe Hanrigou started the development of this project in 2008. Selenium Grid is a group of networked hosts that provides browser infrastructure for Selenium WebDriver. This infrastructure enables the (parallel) execution of Selenium WebDriver scripts with remote browsers of a different nature (types and versions) in multiple operating systems.

FigureÂ 1-2 shows the basic architecture of Selenium Grid. As you can see, a group of nodes provides browsers used by Selenium scripts. These nodes can use different operating systems (as we saw in TableÂ 1-1) with various installed browsers. The central entry point to this Grid is the Hub (also known as Selenium Server). This server-side component keeps track of the nodes and proxies requests from the Selenium scripts. Like in Selenium WebDriver, the W3C WebDriver specification is the standard protocol for the communication between these scripts and the Hub.

The hub-nodes architecture in Grid has been available since Selenium 2. This architecture is also present in Selenium 3 and 4. Nevertheless, this centralized architecture can lead to performance bottlenecks if the number of requests to the Hub is high. Selenium 4 provides a fully distributed flavor of Selenium Grid to avoid this problem. This architecture implements advanced load balancing mechanisms to avoid overloading any component.

Tip

ChapterÂ 6 describes how to set up Selenium Grid following the classical approach (based on a hub and set of nodes). This chapter also covers the standalone mode (i.e., hub and node(s) hosted in the same machine) and the fully distributed architecture.

Selenium IDE

Selenium IDE is the last core component of the Selenium suite. Shinya Kasatani created this project in 2006. Selenium IDE is a tool that implements the so-called Record and Playback (R&P) automation technique. As the name suggests, this technique has two steps. First, in Selenium IDE, the record part captures user interactions with a browser, encoding these actions as Selenium commands. Second, we use the generated Selenium script to execute a browser session automatically (playback).

This early version of Selenium IDE was a Firefox plug-in that embedded Selenium Core to record, edit, and play back Selenium scripts. These early versions were XPI modules (i.e., a technology used to create Mozilla extensions). As of version 55 (released in 2017), Firefox migrated support for add-ons to the W3C Browser Extension specification. As a result, Selenium IDE was discontinued, and for some time, it has not been possible to use it. The Selenium team rewrote Selenium IDE following the Browser Extensions recommendation to solve this problem. Thanks to this, we can now use Selenium IDE in multiple browsers, such as Chrome, Edge, and Firefox.

FigureÂ 1-3 shows the new Selenium IDE GUI (Graphical User Interface).

Using this GUI, users can record interactions with a browser and edit and execute the generated script. Selenium IDE encodes each interaction in different parts: a command (i.e., the action executed in the browser), a target (i.e., the locator of the web element), and a value (i.e., the data handled). Optionally, we can include a description of the command. FigureÂ 1-3 shows a recorded example of these steps:

Open website (https://bonigarcia.dev/selenium-webdriver-java). We will use this website as the practice site in the rest of the book.
Click on the link with the text âGitHub.â As a result, the navigation moves to the examples repository source code.
Assert that the book title (Hands-On Selenium WebDriver with Java) is present on the web page.
Close the browser.

Once we have created a script in Selenium IDE, we can export this script as a Selenium WebDriver test. For instance, FigureÂ 1-4 shows how to convert the presented example as a JUnit test case. Finally, we can save the project on our local machine. The resulting project for this sample is available in the examples GitHub repository.

Note

The Selenium project is porting Selenium IDE to Electron at the time of this writing. Electron is an open source framework based on Chromium and Node.js that allows desktop application development.

Selenium Ecosystem

Software ecosystems are collections of elements interacting with a shared market underpinned by a common technological background. In the case of Selenium, its ecosystem involves the official core projects and other related projects, libraries, and actors. This section reviews the Selenium ecosystem, divided into the following categories: language bindings, driver managers, frameworks, browser infrastructure, and community.

Language Bindings

As we already know, the Selenium project maintains various language bindings for Selenium WebDriver: Java, JavaScript, Python, Ruby, and C#. Nevertheless, other languages are also available. TableÂ 1-2 summarizes these language bindings for Selenium WebDriver maintained by the community.

Table 1-2. Unofficial language bindings for Selenium WebDriver
Name	Language	License	Maintainer	Website
hs-webdriver	Haskell	BSD-3-Clause	Adam Curtis	https://github.com/kallisti-dev/hs-webdriver
php-webdriver	PHP	MIT	Facebook, community	https://github.com/php-webdriver/php-webdriver
RSelenium	R	AGPLv3	rOpenSci	https://github.com/ropensci/RSelenium
Selenium	Go	MIT	Miki Tebeka	https://github.com/tebeka/selenium
Selenium-Remote-Driver	Perl	Apache 2.0	George S. Baugh	https://github.com/teodesian/Selenium-Remote-Driver
webdriver.dart	Dart	Apache 2.0	Google	https://github.com/google/webdriver.dart
wd	JavaScript	Apache 2.0	Adam Christian	https://github.com/admc/wd

Driver Managers

Drivers are mandatory components to control web browsers natively with Selenium WebDriver (see FigureÂ 1-1). For this reason, before using the Selenium WebDriver API, we need to manage these drivers. Driver management is the process of downloading, setting up, and maintaining the proper driver for a given browser. The usual steps in the driver management procedure are:

1. Download: Each browser has its own driver. For example, we use chromedriver for controlling Chrome or geckodriver for Firefox (see TableÂ 1-1). The driver is a platform-specific binary file. Therefore, we need to download the proper driver for a given operating system (typically, Windows, macOS, or Linux). In addition, we need to consider the driver version since a driver release is compatible with a given browser version (or range). For example, to use Chrome 91.x, we need to download chromedriver 91.0.4472.19. We usually find the browser-driver compliance in the driver documentation or release notes.
2. Setup: Once we have the proper driver, we need to make it available in our Selenium WebDriver script.
3. Maintenance: Modern web browsers (e.g., Chrome, Firefox, or Edge) upgrade themselves automatically and silently, without prompting the user. For this reason, and concerning Selenium WebDriver, we need to maintain the browser-driver version compatibility in time for these so-called evergreen browsers.

As you can see, the driver maintenance process can be time-consuming. Furthermore, it can cause problems for Selenium WebDriver users (e.g., failed tests due to browser-driver incompatibility after an automatic browser upgrade). For this reason, the so-called driver managers aim to carry out the driver management process in an automated fashion to some extent. TableÂ 1-3 summarizes the available driver managers for different language bindings.

Table 1-3. Driver managers for Selenium WebDriver
Name	Language	License	Maintainer	Website
WebDriverManager	Java	Apache 2.0	Boni GarcÃa	https://github.com/bonigarcia/webdrivermanager
webdriver-manager	JavaScript	MIT	Google	https://www.npmjs.com/package/webdriver-manager
webdriver-manager	Python	Apache 2.0	Serhii Pirohov	https://pypi.org/project/webdriver-manager
WebDriverManager.Net	C#	MIT	Aliaksandr Rasolka	https://github.com/rosolko/WebDriverManager.Net
webdrivers	Ruby	MIT	Titus Fortner	https://github.com/titusfortner/webdrivers

Tip

In this book, I recommend using WebDriverManager because it automates the entire driver maintenance process (i.e., download, setup, and maintenance). See AppendixÂ B for further information about automated and manual driver management.

Locator Tools

The Selenium WebDriver API provides different ways to locate web elements (see ChapterÂ 3): by attribute (id, name, or class), by link text (complete or partial), by tag name, by CSS (Cascading Style Sheets) selector, or by XML Path Language (XPath). Specific tools can help to identify and generate these locators. TableÂ 1-4 shows some of these tools.

Table 1-4. Locators tools summary
Name	Type	License	Maintainer	Website
Chrome DevTools	Built-in browser tool	Proprietary freeware, based on open source	Google	https://developer.chrome.com/docs/devtools
Firefox Developer Tools	Built-in browser tool	MPL 2.0	Mozilla	https://developer.mozilla.org/en-US/docs/Tools
Cropath	Browser extension	Freeware	AutonomIQ	https://autonomiq.io/deviq-chropath.html
SelectorsHub	Browser extension	Freeware	Sanjay Kumar	https://selectorshub.com
POM Builder	Browser extension	Freeware	LogiGear Corporation	https://pombuilder.com

Frameworks

In software engineering, a framework is a set of libraries and tools used as a conceptual and technological base and support for software development. Selenium is the foundation for frameworks that wrap, enhance, or complement its default features. TableÂ 1-5 contains some of these frameworks and libraries based on Selenium.

Table 1-5. Testing frameworks and libraries based on Selenium
Name	Language	Description	License	Maintainer	Website
CodeceptJS	JavaScript	Multi-backend testing framework that models browser interactions as simple steps from a user perspective	MIT	Michael Bodnarchuk	https://codecept.io
FluentSelenium	Java	Fluent API for Selenium WebDriver	Apache 2.0	Paul Hammant	https://github.com/SeleniumHQ/fluent-selenium
FluentLenium	Java	Website and mobile automation framework to create readable and reusable WebDriver tests	Apache 2.0	FluentLenium team	https://fluentlenium.com
Healenium	Java	Library for improving the stability of Selenium tests by using machine learning algorithms to analyze web and mobile web elements	Apache 2.0	Anna Chernyshova and Dmitriy Gumeniuk	https://healenium.io
Helium	Python	High-level API based on Selenium WebDriver	MIT	Michael Herrmann	https://github.com/mherrmann/selenium-python-helium
QAF (QMetry Automation Framework)	Java	Test automation platform for web and mobile applications	MIT	Chirag Jayswal	https://qmetry.github.io/qaf
Lightning	Java	Lightweight Selenium WebDriver client for Java	Apache 2.0	FluentLenium	https://github.com/aerokube/lightning-java
Nerodia	Python	Python port of the Watir Ruby gem	MIT	Lucas Tierney	https://nerodia.readthedocs.io
Robot Framework	Python, Java, .NET, and others	Generic automation framework based on human-readable test cases	Apache 2.0	Robot Framework Foundation	https://robotframework.org
Selenide team	Java	Fluent, concise API for Selenium WebDriver	MIT	Selenide team	https://selenide.org
SeleniumBase	Python	Browser automation framework based on WebDriver and pytest	MIT	Michael Mintz	https://seleniumbase.io
Watir (Web Application Testing in Ruby)	Ruby	Gem library based on WebDriver for automating web browsers	MIT	Titus Fortner	http://watir.com
WebDriverIO	JavaScript	Test automation framework based WebDriver and Appium	MIT	Christian Bromann	https://webdriver.io
Nightwatch.js	JavaScript	Integrated end-to-end testing framework based on the W3C WebDriver	MIT	Andrei Rusu	https://nightwatchjs.org
Applitools	Java, JavaScript, C#, Ruby, PHP, Python	Test automation framework for visual user interface regression and A/B testing. It provides SDKs for Selenium, Appium, and others	Commercial	Applitools team	https://applitools.com
Katalon Studio	Java, Groovy	Test automation platform leveraging Selenium WebDriver, Appium, and cloud providers	Commercial	Katalon team	https://www.katalon.com
TestProject	Java, C#, Python	Test automation platform for web and mobile apps built on top of Selenium and Appium	Commercial	TestProject team	https://testproject.io

Browser Infrastructure

We can use Selenium WebDriver to control local browsers installed in the machine running the WebDriver script. Also, Selenium WebDriver can drive remote web browsers (i.e., those executed in other hosts). In this case, we can use Selenium Grid to support the remote browser infrastructure. Nevertheless, this infrastructure can be challenging to create and maintain.

Alternatively, we can use a cloud provider to outsource the responsibility for supporting the browser infrastructure. In the Selenium ecosystem, a cloud provider is a company or product that supplies managed services for automated testing. These companies typically offer commercial solutions for web and mobile testing. The users of a cloud provider request on-demand browsers of different types, versions, and operating systems. Also, these providers typically offer additional services for easing the testing and monitoring activities, such as access to session recordings or analysis capabilities, to name a few. Some of the most relevant cloud providers for Selenium nowadays are Sauce Labs, BrowserStack, LambdaTest, CrossBrowserTesting, Moon Cloud, TestingBot, Perfecto, or Testinium.

Another solution we can use to support the browser infrastructure for Selenium is Docker. Docker is an open source software technology that allows users to pack and run applications as lightweight, portable containers. The Docker platform has two main components: the Docker Engine, a tool for creating and running containers, and the Docker Hub, a cloud service for distributing Docker images. In the Selenium domain, we can use Docker to pack and execute containerized browsers. TableÂ 1-6 presents a summary of relevant projects using Docker in the Selenium ecosystem.

Table 1-6. Docker resources for Selenium
Name	Description	License	Maintainer	Website
docker-selenium	Official Docker images for Selenium Grid	Apache 2.0	Selenium project	https://github.com/seleniumhq/docker-selenium
Selenoid	Lightweight Golang implementation of Selenium Hub running browsers in Docker (images available on Docker Hub)	Apache 2.0	Aerokube	https://aerokube.com/selenoid
Moon	Enterprise Selenium cluster that use Docker and Kubernetes	Commercial	Aerokube	https://aerokube.com/moon
Callisto	Open source Kubernetes-native implementation of Selenium Grid	MIT	Aerokube	https://github.com/wrike/callisto

Community

Due to its collaborative nature, software development needs the organization and interaction of many participants. In the open source domain, we can measure the success of a project by the relevance of its community. Selenium is supported by a large community of many different participants worldwide. TableÂ 1-7 presents a summary of several Selenium resources grouped into the following categories: official documentation, development, support, and events.

Table 1-7. Selenium community resources
Category	Description	Website
Official documentation	User guide	https://www.selenium.dev/documentation
	Blog	https://www.selenium.dev/blog
	Wiki	https://github.com/seleniumhq/selenium/wiki
	Ecosystem	https://www.selenium.dev/ecosystem
Development	Source code	https://github.com/seleniumhq/selenium
	Issues	https://github.com/seleniumhq/selenium/issues
	Governance	https://www.selenium.dev/project
Support	User group	https://groups.google.com/group/selenium-users
	Slack	https://seleniumhq.slack.com
	IRC	https://webchat.freenode.net/#selenium
	StackOverflow	https://stackoverflow.com/questions/tagged/selenium
	Reddit	https://www.reddit.com/r/selenium
Events	Conference	https://www.selenium.dev/categories/conference
Events	Meetups	https://www.meetup.com/topics/selenium

Software Testing Fundamentals

Software testing (or simply testing) consists of the dynamic evaluation of a piece of software, called System Under Test (SUT), through a finite set of test cases (or simply tests), giving a verdict about it. Testing implies the execution of SUT using specific input values to assess the outcome or expected behavior.

At first glance, we distinguish two separate categories of software testing: manual and automated. On the one hand, in manual testing, a person (typically a software engineer or the final user) evaluates the SUT. On the other hand, in automated testing, we use specific software tools to develop tests and control their execution against the SUT. Automated tests allow the early detection of defects (usually called bugs) in the SUT while providing a large number of additional benefits (e.g., cost savings, fast feedback, test coverage, or reusability, to name a few). Manual testing can also be a valuable approach in some cases, for example, in exploratory testing (i.e., human testers freely investigate and evaluate the SUT).

Note

There is no universal classification for the numerous forms of testing presented in this section. These concepts are subject to continuous evolution and debate, just like software engineering. Consider it a proposal that can fit into a large number of projects.

Levels of Testing

Depending on the size of the SUT, we can define different levels of testing. These levels define several categories in which software teams divide their testing efforts. In this book, I propose a stacked layout to represent the different levels (see FigureÂ 1-5). The lower levels of this structure represent the tests aimed at verifying small pieces of software (called units). As we ascend in the stack, we find other tiers (e.g., integration, system, etc.) in which the SUT integrates more and more components.

The lowest level of this stack is unit testing. At this level, we assess individual units of software. A unit is a particular observable element of behavior. For instance, units are typically methods or classes in object-oriented programming and functions in functional programming. Unit testing aims to verify that each unit behaves as expected. Automated unit tests usually run very fast since each test executes a small amount of code in isolation. To achieve this isolation, we can use test doubles, pieces of software that replace the dependent components of a given unit. For example, a popular type of test double in object-oriented programming is the mock object. A mock object mimics an actual object using some programmed behavior.

The next level in FigureÂ 1-5 is integration testing. At this level, different units are composed to create composite components. Integration testing aims to assess the interaction between the involved units and expose defects in their interfaces.

Then, at the system testing and end-to-end (E2E) levels, we test the software system as a whole. We need to deploy the SUT and verify its high-level features to carry out these levels. The difference between system/end-to-end and integration testing is that the former involves all the system components and the final user (typically impersonated). In other words, system and end-to-end testing assess the SUT through the User Interface (UI). This UI can be graphical (GUI) or nongraphical (e.g., text-based or other types).

The Test Pyramid

The test pyramid is a classical representation of the levels of testing. Mike Cohn first coined this concept in 2009. In his original conception, Cohn recommended a large number of unit tests as the basis of testing efforts. The following levels (e.g., integration tests) are less numerous in each stage but typically more expensive (in terms of development and maintenance effort) and slow (in terms of execution time). This proposal might be impractical for many projects because unit tests are not always from a comprehensive testing suite. For this reason, other authors define different shapes for the levels of testing, such as the testing trophy (in which the intermediate layer, i.e., the integration test, is the largest). Since the relevance of the different test categories can vary from one project to another, I use a basic stack structure to represent the different levels of testing.

FigureÂ 1-6 illustrates the difference between system and end-to-end testing. As you can see, on the one hand, end-to-end testing involves the software system and its dependent subsystems (e.g., database or external services). On the other hand, system testing comprises only the software system, and these external dependencies are typically mocked.

Acceptance testing is the top tier of the presented stack. At this level, the final user participates in the testing process. The objective of acceptance testing is to decide whether the software system meets end-user expectations. As you can see in FigureÂ 1-6, like end-to-end testing, acceptance testing validates the whole system and its dependencies. Therefore, acceptance tests also use the UI to carry out the SUT validation.

Tip

The primary purpose of Selenium WebDriver is to implement end-to-end tests. Nevertheless, we can use WebDriver to carry out system testing when mocking the backend calls made by the website under test. Moreover, we can use Selenium WebDriver in conjunction with a Behavior-Driven Development (BDD) tool to implement acceptance tests (see ChapterÂ 9).

Verification and Validation

The down levels of the test stack we have seen (unit, integration, system, and end-to-end testing) belong to development testing. Development testing is a process carried out by the team that produces the software system (i.e., developers, testers, etc.) during the construction phase of the software development lifecycle. Development testing is a type of verification since we assess that the software meets its stated functional and nonfunctional requirements (i.e., its specification). Using the classical definition stated by Barry Boehm in 1984, verification allows answering the following question: âAre we building the product right?â

The top level of the test stack represented in FigureÂ 1-5 (i.e., acceptance testing) belongs to user testing since it involves the final user in the testing process. Acceptance testing is a type of validation because its objective is to prove the software system meets end-user expectations. Validation is a more general process than verification since the system specification does not always reflect the userâs real wishes or needs. Thus, according to Boehm, validation allows answering the question âAre we building the right product?â

Types of Testing

Depending on the strategy for designing test cases, we can implement different types of tests. The two principal types of testing are:

Functional testing (also known as behavioral or closed-box testing): Evaluates the compliance of a piece of software with the expected behavior (i.e., its functional requirements).
Structural testing (also known as clear-box testing): Determines if the program-code structure is faulty. To that aim, testers should know the internal logic of a piece of software.

The difference between these testing types is that functional tests are responsibility-based, while structural tests are implementation-based. Both types can be performed at any test level (unit, integration, system, end-to-end, or acceptance). Nevertheless, structural tests are commonly done at the unit or integration level since these levels enable more direct control of the code execution flow.

Warning

Black-box and white-box testing are other names for functional and structural testing, respectively. Nevertheless, these designations are not recommended since the tech industry is trying to adopt more inclusive terms and use neutral terminology instead of potentially harmful language.

There are different flavors of functional testing. For example:

UI testing (known as GUI testing when the UI is graphical): Evaluates if the visual elements of an application meet the expected functionality. Note that UI testing is different from the system and end-to-end testing levels since the former tests the interface itself, and the latter evaluates the whole system through the UI.
Negative testing: Evaluates the SUT under unexpected conditions (e.g., expected exceptions). This term is the counterpart of the regular functional testing (sometimes called positive testing), in which we assess if the SUT behaves as expected (i.e., its happy path).
Cross-browser testing: This is specific for web applications. It aims to verify the compatibility of websites and applications in different web browsers (types, versions, or operating systems).

A third miscellaneous testing type, nonfunctional testing, includes testing strategies that assess the quality attributes of a software system (i.e., its nonfunctional requirements). Common methods of nonfunctional testing include, but are not limited to:

Performance testing

Assesses different metrics of software systems, such as response time, stability, reliability, or scalability. The objective of performance testing is not finding bugs but finding system bottlenecks. There are two common subtypes of performance testing:

Load testing: Increases the usage on the system by simulating multiple concurrent users to verify if it can operate in the defined boundaries.
Stress testing: Exercises a system beyond its operational capacity to identify the actual limits at which the system breaks.

Security testing

Tries to evaluate security concerns, such as confidentiality (disclosure of information protection), authentication (ensuring the user identity), or authorization (determining user rights and privileges), among others.

Usability testing

Evaluates how user-friendly a software application is. This assessment is also called User eXperience (UX) testing. A subtype of usability testing is:

A/B testing: Compares different variations of the same application to determine which one is more effective for its end users.

Accessibility testing

Evaluates if a system is usable by people with disabilities.

Tip

We use Selenium WebDriver primarily to implement functional tests (i.e., interacting with a web application UI to assess the application behavior). It is unlikely to use WebDriver to implement structural tests. In addition, although it is not its principal usage, we can use WebDriver to implement nonfunctional tests, e.g., for load, security, accessibility, or localization (assessment of specific locale settings) testing (see ChapterÂ 9).

Testing Methodologies

The software development lifecycle is the set of activities, actions, and tasks required to create software systems in software engineering. The moment at which software engineers design and implement test cases in the overall development lifecycle depends on the specific development process (such as iterative, waterfall, or agile, to name a few). Two of the most relevant testing methodologies are:

Test Driven Development (TDD): TDD is a methodology in which we design and implement tests before the actual software design and implementation. At the beginning of the 21st century, TDD became popular with the rise of agile software development methodologies, such as Extreme Programming (XP). In TDD, a developer first writes an (initially failing) automated test for a given feature. Then, the developer creates a piece of code to pass that test. Finally, the developer refactors the code to achieve or improve readability and maintainability.
Test Last Development (TLD): TLD is a methodology in which we design and implement tests after implementing the SUT. This practice is typical in traditional software development processes, such as waterfall (sequential), incremental (multi-waterfall), spiral (risk-oriented multi-waterfall), or Rational Unified Process (RUP).

Another relevant testing methodology is Behavior-Driven Development (BDD). BDD is a testing practice derived from TDD, and consequently, we design tests at the early stages of the software development lifecycle in BDD. To that aim, conversations occur between the final user and the development team (typically with the project leader, manager, or analysts). These conversations formalize a common understanding of the desired behavior and the software system. As a result, we create acceptance tests in terms of one or more scenarios following a Given-When-Then structure:

Given: Initial context at the beginning of the scenario
When: Event that triggers the scenario
Then: Expected outcome

Tip

TLD is a common practice used to implement Selenium WebDriver. In other words, developers/testers do not implement a WebDriver test until the SUT is available. Nevertheless, different methodologies are also possible. For instance, BDD is a common approach when using WebDriver with Cucumber (see ChapterÂ 9).

Closely related to the domain of testing methodologies, we find the concept of Continuous Integration (CI). CI is a software development practice where members of a software project build, test, and integrate their work continuously. Grady Booch first coined the term CI in 1991. Now it is a popular strategy to create software.

As FigureÂ 1-7 shows, CI has three separate stages. First, we use a source code repository, a hosting facility to store and share the source code of a software project. We typically use a version control system (VCS) to manage this repository. A VCS is a tool that keeps track of the source code, who made each change, and when (sometimes called patch).

Git, initially developed by Linus Torvalds, is the preferred VCS today. Other alternatives are a concurrent versions system (CVS) or Subversion (SVN). On top of Git, several code hosting platforms (such as GitHub, GitLab, or Bitbucket) provide collaborative cloud repository hosting services for developing, sharing, and maintaining software.

Developers synchronize a local repository (or simply, repo) copy in their local environments. Then, they do the coding work using that local copy, committing new changes to the remote repository (typically daily). The basic idea of CI is that every commit triggers the build and test of the software with the new changes. The test suite executed to assess that a patch does not break the build is called a regression test. A regression suite can contain tests of different types, including unit, integration, end-to-end, etc.

When the number of tests is too large for regression testing, we typically choose only a part of the relevant tests from the whole suite. There are different strategies to select these tests, for instance, smoke testing (i.e., tests that ensure the critical functionality) or sanity testing (i.e., tests that evaluate the basic functionality). Lastly, we can execute the complete suite as a scheduled task (typically nightly).

We need to use a server-side infrastructure called a build server to implement a CI pipeline. The build server usually reports a problem to the original developer when the regression tests fail. TableÂ 1-8 provides a summary of several build servers.

Table 1-8. Build servers
Name	Description	License	Maintainer	Website
Bamboo	Easy use with Jira (issue tracker) and Bitbucket	Commercial	Atlassian	https://www.atlassian.com/software/bamboo
GitHub Actions	Integrated build server in GitHub	Free for public repositories	Microsoft	https://github.com/features/actions
GitLab CI/CD	Integrated build server in GitLab	Free for public repositories	GitLab	https://docs.gitlab.com/ee/ci
Jenkins	Open source automation server	MIT	Jenkins team	https://www.jenkins.io

Tip

I use a GitHub repository (https://github.com/bonigarcia/selenium-webdriver-java) to publish and maintain the test examples presented in this book. GitHub Actions is the build server for this repo (see ChapterÂ 2).

We can extend a typical CI pipeline in two ways (see FigureÂ 1-8):

Continuous Delivery (CD): After CI, the build server deploys the release to a staging environment (i.e., a replica of a production environment for testing purposes) and executes the automated acceptance tests (if any).
Continuous Deployment: The build server deploys the software release to the production environment as the final step.

Close to CI, the term DevOps (development and operations) has gained momentum. DevOps is a software methodology that promotes communication and collaboration between different teams in a software project to develop and deliver software efficiently. These teams include developers, testers, QA (quality assurance), operations (infrastructure), etc.

Test Automation Tools

We need to use some tooling to implement, execute, and control automated tests effectively. One of the most relevant categories for testing tools is the unit testing framework. The original framework in the unit testing family (also known as xUnit) is SmalltalkUnit (or SUnit). SUnit is a unit test framework for the Smalltalk language created by Kent Beck in 1999. Erich Gamma ported SUnit to Java, creating JUnit. Since then, JUnit has been very popular, inspiring other unit testing frameworks. TableÂ 1-9 summarizes the most relevant unit testing frameworks in different languages.

Table 1-9. Unit testing frameworks
Name	Language	Description	License	Maintainer	Website
JUnit	Java	Reference implementation of xUnit family	EPL	JUnit team	https://junit.org
TestNG	Java	Inspired by JUnit and NUnit, including extra features	Apache 2.0	Cedric Beust	https://testng.org
Mocha	JavaScript	Test framework for Node.js and the browser	MIT	OpenJS Foundation	https://mochajs.org
Jest	JavaScript	Focused on simplicity with a focus on web applications	MIT	Facebiij	https://jestjs.io
Karma	JavaScript	Allows you to execute JavaScript tests in web browsers	MIT	Karma team	https://karma-runner.github.io
NUnit	.Net	Unit testing framework for all .Net languages (C#, Visual Basic, and F#)	MIT	.NET Foundation	https://nunit.org
unittest	Python	Unit testing framework included as a standard library as of Python 2.1	PSF License	Python Software Foundation	https://docs.python.org/library/unittest.html
minitest	Ruby	Complete suite of testing utilities for Ruby	MIT	Seattle Ruby Brigade	https://github.com/settlers/minitest

An important common characteristic of the xUnit family is the test structure, composed of four phases (see FigureÂ 1-9):

Setup: The test case initializes the SUT to exhibit the expected behavior.
Exercise: The test case interacts with the SUT. As a result, the test gets an outcome from the SUT.
Verify: The test case decides if the obtained outcome from the SUT is as expected. To that aim, the test contains one or more assertions. An assertion (or predicate) is a boolean-value function that checks if an expected condition is true. The execution of the assertions generates a test verdict (typically, pass or fail).
Teardown: The test case puts the SUT back into the initial state.

Tip

We can use unit testing frameworks in conjunction with other libraries or utilities to implement any test type. For example, as explained in ChapterÂ 2, we use JUnit and TestNG to embed the call to the Selenium WebDriver API, implementing end-to-end tests for web applications.

The stages of setup and teardown are optional in a unit test case. Although it is not strictly mandatory, verifying is highly recommended. Even if unit testing frameworks include capabilities to implement assertions, it is common to incorporate third-party assertions libraries. These libraries aim to improve the test codeâs readability by providing a rich set of fluent assertions. In addition, these libraries offer enhanced error messages to help testers understand the cause of a failure. TableÂ 1-10 contains a summary of some of the most relevant assertion libraries for Java.

Table 1-10. Assertion libraries for Java
Name	Description	License	Maintainer	Website
AssertJ	Fluent assertions Java library	Apache 2.0	AssertJ team	https://assertj.github.io/doc
Hamcrest	Java library of matchers aimed to create flexible assertions	BSD	Hamcrest team	http://hamcrest.org
Truth	Fluent assertions for Java and Android	Apache 2.0	Google	https://truth.dev

As you can see in FigureÂ 1-9, the SUT usually can query another component, named the Depended-On Component (DOC). In some cases (e.g., at the unit or system testing level), we might want to isolate the SUT from the DOC(s). We can find a wide variety of mock libraries to achieve this isolation.

TableÂ 1-11 shows a comprehensive summary of some of these mock libraries for Java.

Table 1-11. Mock libraries for Java
Name	Level	Description	License	Maintainer	Website
EasyMock	Unit	It allows mocking objects for unit testing using Java annotations	Apache	EasyMock team	https://easymock.org
Mockito	Unit	Mocking Java library for mock creation and verification	MIT	Mockito team	https://site.mockito.org
JMockit	Integration	It allows out-of-container integration testing for Java EE and Spring-based apps	Open	JMockit team	https://jmockit.github.io
MockServer	System	Mocking library for any system integrated via HTTP or HTTPS with Java clients	Apache 2.0	James Bloom	https://www.mock-server.com
WireMock	System	Tool for simulating HTTP-based services	Apache 2.0	Tom Akehurst	https://wiremock.org

The last category of testing tools we analyze in this section is BDD, a development process that creates acceptance tests. There are plenty of alternatives to implement this approach. For instance, TableÂ 1-12 shows a condensed summary of relevant BDD frameworks.

Table 1-12. BDD frameworks
Name	Language	Description	License	Maintainer	Website
Cucumber	Ruby, Java, JavaScript, Python	Testing framework to created automated acceptance tests following a BDD approach	MIT	SmartBear Software	https://cucumber.io
FitNesse	Java	Standalone collaborative wiki and acceptance testing framework	CPL	FitNesse team	http://fitnesse.org
JBehave	Java, Groovy, Kotlin, Ruby, Scala	BDD framework for all JVM languages	BSD-3-Clause	JBehave team	https://jbehave.org
Jasmine	JavaScript	BDD framework for JavaScript	MIT	Jasmine team	https://jasmine.github.io
Capybara	Ruby	Web-based acceptance test framework that simulates scenarios for user stories	MIT	Thomas Walpole	https://teamcapybara.github.io/capybara
Serenity BDD	Java, Javascript	Automated acceptance testing library	Apache 2.0	Serenity BDD team	https://serenity-bdd.info

Summary and Outlook

Selenium has come a long way since its inception in 2004. Many practitioners consider it the de facto standard solution to develop end-to-end tests for web applications, and it is used by thousands of projects worldwide. In this chapter, you have seen the foundations of the Selenium project (made up of WebDriver, Grid, and IDE). In addition, Selenium has a rich ecosystem and active community. WebDriver is the heart of the Selenium project, and it is a library that provides an API to control different web browsers (e.g., Chrome, Firefox, Edge, etc.) programmatically. TableÂ 1-13 contains a comprehensive overview of the primary and secondary uses of Selenium WebDriver.

Table 1-13. Selenium WebDriver primary and secondary usages
	Primary	Secondary (other usages)
Purpose	Automated testing	Web scraping, web-based administration tasks
Test level	End-to-end testing	System testing (mocking backend calls) Acceptance testing (e.g., using with Cucumber)
Test type	Functional testing (ensuring expected behavior) Cross-browser testing (compatibility in different web browsers) Regression testing (ensuring build after each commit in CI)	Nonfunctional testing (e.g., load, security, accessibility, or localization)
Test methodology	TLD (implementing tests when SUT is available)	BDD (defining user scenarios at early development stages)

In the next chapter, you discover how to set up a Java project using Maven or Gradle as build tools. This project will contain end-to-end tests for web applications using JUnit and TestNG as the unit testing frameworks and calls to the Selenium WebDriver API. In addition, you will learn how to control different web browsers (e.g., Chrome, Firefox, or Edge) with a basic test case (the Selenium WebDriverâs version of the classic hello world).

Get Hands-On Selenium WebDriver with Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Chapter 1. A Primer on Selenium

Selenium Core Components

Tip

Selenium WebDriver

Note

Figure 1-1. Selenium WebDriver architecture

Selenium Grid

Figure 1-2. Selenium Grid hub-nodes architecture

Tip

Selenium IDE

Figure 1-3. Selenium IDE showing an example of a recorded script

Note

Figure 1-4. Exporting a Selenium IDE script to a JUnit test case

Selenium Ecosystem

Language Bindings

Driver Managers

Tip

Locator Tools

Frameworks

Browser Infrastructure

Community

Software Testing Fundamentals

Note

Levels of Testing

Figure 1-5. Stack representation of the different levels of testing

Figure 1-6. Component-based representation of the different levels of testing

Tip

Types of Testing

Warning

Tip

Testing Methodologies

Tip

Figure 1-7. CI generic process

Tip

Figure 1-8. Continuous Integration, Delivery, and Deployment pipeline

Test Automation Tools

Figure 1-9. Unit test generic structure

Tip

Summary and Outlook

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly