When we think about the World Wide Web, we normally think of applications—web browsers, web servers—and the many kinds of content that those applications move around the network. But it’s important to note that standards and protocols, not the applications themselves, have enabled the Web’s growth. Ever since the first days of the Internet, there have been ways to move files from here to there, and document formats that were just as good as HTML, but there was not a unifying model for how to identify, retrieve, and display information; nor was there a universal way for applications to interact with that data over the network. As we all know, HTML came to provide a common data basis for documents. In this chapter, we’re going to talk about how to use HTTP, the protocol that governs communications between web clients and servers, and URLs, which provide a standard for naming and addressing objects on the Web.
In this chapter, we’re also going to talk about web
programming: making the Web intelligent, making it do what you want.
This involves writing code for both clients and servers. Java
provides a powerful API for dealing with URLs, which will be the
first focus of our discussion. Then we’ll discuss how to write
web clients that can interact with the standard CGI interface, using
the GET
and POST
methods.
Finally, we’ll take a look at servlets, simple Java programs
that run on web servers and provide an effective way to build
intelligence into your web pages. Servlets have been one of the most
important and popular developments in Java over the past couple of
years.
A URL points to an object on the Internet. It’s (usually) a text string that identifies an item, tells you where to find it, and specifies a method for communicating with it or retrieving it from its source. A URL can refer to any kind of information source. It might point to static data, such as a file on a local filesystem, a web server, or an FTP archive; or it can point to a more dynamic object such as a news article on a news spool or a record in a database. URLs can even refer to less tangible resources such as telnet sessions and mailing addresses.
The Java URL classes provide an API for accessing well-defined
networked resources, like documents and applications on servers. The
classes use an extensible set of prefabricated protocol and content
handlers to perform the necessary communication and data conversion
for accessing URL resources. With URLs, an application can fetch a
complete file or database record from a server on the network with
just a few lines of code. Applications like web browsers, which deal
with networked content, use the URL
class to
simplify the task of network programming. They also take advantage of
the dynamic nature of Java, which allows handlers for new types of
URLs to be added on the fly. As new types of servers and new formats
for content evolve, additional URL handlers can be supplied to
retrieve and interpret the data without modifying the original
application.
A URL is usually presented as a string of text, like an address.[38] Since there are many different ways to locate an item on the Net, and different mediums and transports require different kinds of information, there are different formats for different kinds of URLs. The most common form has three components: a network host or server, the name of the item and its location on that host, and a protocol by which the host should communicate:
protocol://hostname/location/item-name
protocol
(also called the “scheme”) is an identifier such as
http
, ftp
, or
gopher
;
hostname
is an Internet hostname; and
the location
and
item
components form a
path
that identifies the object on that host. Variants of this form allow
extra information to be packed into the URL, specifying things like
port numbers for the communications protocol and fragment identifiers
that reference parts inside the object.
We sometimes speak of a URL that is relative to another URL, called a base URL. In that case we are using the base URL as a starting point and supplying additional information. For example, the base URL might point to a directory on a web server; a relative URL might name a particular file in that directory.
[38] The term URL was coined by the Uniform Resource Identifier (URI) working group of the IETF to distinguish URLs from the more general notion of Uniform Resource Names or URNs. URLs are really just static addresses, whereas URNs would be more persistent and abstract identifiers used to resolve the location of an object anywhere on the Net. URLs are defined in RFC 1738 and RFC 1808.
Get Learning Java now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.