Chapter 4. Instant Messaging
I Think, Therefore IM
The initial goal of the early Jabber project, well before the protocol was named XMPP, was to create an open Instant Messaging (IM) platform. Although IM is often thought of as person-to-person chat, at its core it really provides the ability to quickly route messages from one place to another over the network (no matter who or what the intended recipient is). For this reason, XMPP servers are optimized for handling large numbers of relatively small messages with very little latency. When you are exchanging instant messages, you don’t want to experience any delivery delays (which can be almost as annoying in IM as they are on the phone).
In XMPP, messages are delivered as fast as possible over the
network. Let’s say that Alice sends a message from her new account on
the wonderland.lit
server to her sister on the
realworld.lit
server. Her client effectively
“uploads” the message to wonderland.lit
by pushing a message
stanza over a client-to-server XML stream. The wonderland.lit
server then stamps a
from
address on the stanza and checks the
to
address in order to see how the stanza needs to
be handled (without performing any deep packet inspection or XML
parsing, since that would eat into the delivery time). Seeing that the
message stanza is bound for the realworld.lit
server, the wonderland.lit
server then immediately
routes the message to realworld.lit
over a
server-to-server XML stream (with no intermediate hops). Upon
receiving the message stanza, the realworld.lit
server checks to see
whether Alice’s sister is online; if so, the server immediately
delivers the message to one or more of her online devices over a
server-to-client XML stream (without storing it or otherwise
performing much processing on it). As a result, the message is
delivered very quickly from Alice to her sister.
These design decisions have important implications. First and foremost, the clients and servers need to be event-driven and ready to take appropriate action whenever they receive an incoming stanza. XMPP servers don’t have the luxury of storing a message and waiting for a client to poll for it; instead, they deliver the message as soon as they receive it. Second, all entities (but especially the servers) need to be presence-aware, since it is the concept of being online that makes rapid delivery possible in the crucial “last mile” between the recipient’s server and the recipient’s device(s). Third, fast and accurate handling of DNS lookups, domain name resolution, long-lived TCP connections, connectivity outages, and network congestion is critical to the success of the overall system.
Several types of XMPP messages exist, fundamentally
differentiated by the value of the type
attribute:
normal
This message type is delivered immediately or stored offline by the server, and handled by the client as a “standalone” message outside of any chat or groupchat session. This is the default message type.
chat
Messages of type
chat
are sent within a burst of messages called a “chat session,” usually over a relatively short period of time. Instant messaging clients show such messages in a one-to-one conversation interface for the two parties.groupchat
XMPP servers usually route messages of type
groupchat
to a specialized component or module that hosts multi-user chat rooms, and this component then generates one outbound message for each of the room occupants. (We discuss groupchat messages in Chapter 7.)headline
Headline messages usually are not stored offline, because they are temporal in nature. In addition, XMPP servers often send a message of type
headline
to all of the online devices associated with an account (at least those with non-negative<priority/>
values).error
A message of type
error
is sent in response to a previously sent message, to indicate that a problem occurred in relation to the earlier message (the recipient does not exist, message delivery is not possible at the moment, etc.).
Both chat and normal messages are usually handled by the
recipient’s server in a particular way: if the message is addressed to
the bare JID (user@domain.tld
) of the account, the
server immediately delivers the message to the highest-priority
resource currently associated with the account. If there is only one
online resource, this decision is easy, but if there are multiple
online resources, the recipient’s server delivers the message to the
resource with the largest value for its presence priority. For
example, a resource with a presence priority of 7 will receive
messages addressed to the bare JID, but another resource with a
presence priority of 3 will not. (Resources with negative priority
will never receive a message sent to the bare JID, but all resources
will receive a message addressed to the full JID of that
resource.)
Finally, although XMPP technologies put a premium on near real-time data delivery, almost all XMPP servers include support for “offline messages” if the intended recipient is not online when the server receives a normal or chat message addressed to that JabberID. These messages are automatically pushed to the recipient’s client when the user next logs in. When the recipient’s server pushes out the offline message, it also adds a small extension noting when the message was originally received, using the protocol extension defined in Delayed Delivery XEP-0203. This enables the recipient’s client to properly order the messages it receives in a user interface.
Chat Sessions
When two people “IM” with each other, the conversation usually happens in a burst of messages over a short period of time. This pattern mimics real life, where you might chat with someone for 5 or 10 minutes when you meet them on the street or talk on the phone, but not chat with them again for a week or two. In XMPP, we call this kind of burst a chat session, and you can see an example of such a session in Figure 4-1.
XMPP chat sessions are not formally negotiated but proceed naturally. The entity that initiates the conversation sends a message to the bare JID of the responder, and this message is stamped by the initiator’s server with the full JID of the initiator. When the responder sends a reply, it too is stamped by the recipient’s server with the full JID of the responder. At this point, the initiator knows the responder’s full JID and the responder knows the initiator’s full JID, so the parties have “locked in” to each other’s XMPP resource identifiers. Each party now addresses stanzas to the full JID of the other party when sending subsequent messages, until and unless receiving a presence change from the other party (which might trigger resending a message to the bare JID).
The features we discuss in the following sections all relate in one way or another to instant messaging, and to chat sessions in particular: chat state notifications tell you whether your conversation partner is actively engaged; XHTML lets you add a bit of dash and style to your messages; vCards enable you to learn something about the people you chat with; and blocking and filtering help you avoid unpleasant conversations with some of the unsavory characters you might meet online.
Are You There? Chat State Notifications
Consider the following IM conversation between you and your nine-year-old daughter:
You: Hi honey!
She: Hi
You: How was school today?
She: Great
This is the moment where she starts typing about all the great and exciting things she learned about. Unfortunately, her typing skills aren’t at the 80 words per minute you’re hitting. While she’s composing her answer, you assume that she’s not in the mood to talk about school right now, so you continue the conversation:
You: Did you visit grandma this afternoon? What did she tell you?
Now you’re waiting for an answer. In the meantime, your daughter has been typing away about her day at school. After a while, she decides to pause composing her answer to your first question, and looks up from the keyboard she’d been concentrating on for the past few minutes. She now sees that you have already moved on from the previous question, so she’s left with the choice to delete everything she wrote so far, send half of the answer she wanted to send and move on, or just continue typing (thus slowing the conversation down even further). She bites the bullet, deletes everything she wrote so far, and moves on:
She: Yes, I did
Now, you’re waiting for the second part of the answer as to what grandma told her. After waiting for two minutes, you wonder whether she’s just typing slowly, or she just missed the fact that you asked a second question. So, just to be sure, you repeat the question:
You: And?
It turns out that she started writing the answer, but suddenly had to go downstairs to answer the phone. So, she comes back, and finishes the answer:
She: Everything was fine. I have to go do my homework now.
Since the answer wasn’t coming immediately, you decided to do something else while waiting for it. When switching back, you notice that your daughter wants to finish the conversation, so you’ll have to say goodbye. Or, wait, maybe she finished it already, and started doing her homework, in which case you don’t really want to distract her.
The problem with this (fairly common) scenario is that neither of you know anything about the other person’s activity level with regard to the conversation. The exact same conversation over the phone would have been a lot less awkward: it would have been easy to tell whether the other person was answering your question or not, and the sound of a dial tone would leave no doubt that the conversation was actually finished. In order to avoid the inconvenient situations like the preceding conversation, you need the notion of chat states in your IM system, as defined in Chat State Notifications XEP-0085.
Chat states describe your involvement with a conversation, which can be one of the following:
- Starting
Someone started a conversation, but you haven’t joined in yet.
- Active
You are actively involved in the conversation. You’re currently not composing any message, but you are paying close attention.
- Composing
You are actively composing a message.
- Paused
You started composing a message, but stopped composing for some reason.
- Inactive
You haven’t contributed to the conversation for some period of time.
- Gone
Your involvement with the conversation has effectively ended (e.g., you have closed the chat window).
During the conversation, your chat state will most likely
change: after composing a message while in the
composing
state, you will become
active
while waiting for a reply to your message.
However, it does not always make sense to go from one specific state
to another one. For example, from composing a message, you can’t
really become inactive for a long period without pausing for at least
a short time. Figure 4-2 shows the possible
transitions between chat states.
Changing state in a conversation is done by embedding the corresponding chat state element into a message stanza. For example, the mother-daughter conversation would start off like this:
<message from="you@yourdomain.tld/work" to="daughter@yourdomain.tld" type="chat"> <body>Hi honey!</body> <active xmlns="http://jabber.org/protocol/chatstates"/> </message>
By adding the <active/>
element to your message, you indicate that you are actively
engaged with the conversation. Your daughter starts typing her
response, so her client sends you a chat state update by
adding a <composing/>
element to an
empty message:
<message from="daughter@yourdomain.tld/home" to="you@yourdomain.tld/work" type="chat"> <composing xmlns="http://jabber.org/protocol/chatstates"/> </message>
Shortly after the notification, the actual message comes in, making her an active participant of the conversation again:
<message from="daughter@yourdomain.tld/home" to="you@yourdomain.tld/work" type="chat"> <body>Hi</body> <active xmlns="http://jabber.org/protocol/chatstates"/> </message>
The conversation goes on for a while, up to the point where you ask her about grandma:
<message from="you@yourdomain.tld/work" to="daughter@yourdomain.tld/home" type="chat"> <body>Did you visit grandma this afternoon? What did she tell you?</body> <active xmlns="http://jabber.org/protocol/chatstates"/> </message> <message from="daughter@yourdomain.tld/home" to="you@yourdomain.tld/work" type="chat"> <composing xmlns="http://jabber.org/protocol/chatstates"/> </message>
This is where she suddenly stops typing to go answer the phone,
and so after a few seconds, her client notifies you of that fact by
sending you a <paused/>
notification:
<message from="daughter@yourdomain.tld/home" to="you@yourdomain.tld/work" type="chat"> <paused xmlns="http://jabber.org/protocol/chatstates"/> </message>
After a while, she resumes her answer:
<message from="daughter@yourdomain.tld/home" to="you@yourdomain.tld/work" type="chat"> <composing xmlns="http://jabber.org/protocol/chatstates"/> </message>
Finally, skipping to the end of the conversation, she sends her goodbye and closes her chat window:
<message from="daughter@yourdomain.tld/home" to="you@yourdomain.tld/work" type="chat"> <body>Everything was fine. I have to go do my homework now.</body> <active xmlns="http://jabber.org/protocol/chatstates"/> </message> <message from="daughter@yourdomain.tld/home" to="you@yourdomain.tld/work" type="chat"> <gone xmlns="http://jabber.org/protocol/chatstates"/> </message> s
The person you are communicating with may not always be
interested in receiving notifications about your chat state. For
example, when she is using her mobile phone for IM, she would rather
save on the usage of the limited network capacity, at the price of not
being able to see when you are typing. In order to discover whether
the other party is interested in your chat state, you start the
conversation as usual, by adding an <active/>
element to your message. If the reply comes back without any chat
state information, you have to assume that the other person either
does not know how to handle chat state updates, or does not want to
receive them. From then on, you both continue the conversation,
without adding any chat state information to your subsequent messages.
(Naturally, if you know that the other party does not support the chat
states protocol, you would leave off the notifications entirely. We
talk about ways to discover support for various protocol extensions in
Chapter 5.)
Another reason why you may not want to send chat state notifications is privacy. You may not want other people to know when you are physically using your IM client (information that chat state notifications would reveal). However, it does not always have to be as drastic as disabling all types of notifications. You could configure your client to send only basic chat state information (i.e., whether you are active or composing), and not send any information about more fine-grained states, such as paused, inactive, or gone. This basic information would only reveal whether you started composing an answer or not, and leave out any hints to whether you physically went away from your IM client, or reconsidered talking and closed the conversation.
So far, we have talked about chat state notifications only in the context of one-to-one conversations. To a certain degree, chat state notifications can be useful inside multi-party chats as well (we talk about groupchat in Chapter 7). However, note that if the number of participants starts growing, the total number of notifications sent will increase drastically as well.
Looks Matter: Formatted Messages
Some folks think plain-text messages are boring. For example, let’s say you are really excited about a new movie that you just watched, so you send a message to your friend:
You: I love this movie I saw last night, it’s awesome!
If you said that over the phone or in person, you’d probably emphasize some of the words:
You: I love this movie I saw last night, it’s awesome!
One way to represent that kind of emphasis is by using some special characters in the plain text:
You: I /love/ this movie I saw last night, it’s *awesome*!
That’s a bit of a kludge, though. Thankfully, XMPP enables you to customize the look or presentation of messages, using a subset of HTML as defined in XHTML-IM XEP-0071:
<message from="you@yourdomain.tld/home" to="friend@theirdomain.tld" type="chat"> <body>I love this movie I saw last night, it's awesome!</body> <html xmlns="http://jabber.org/protocol/xhtml-im"> <body xmlns="http://www.w3.org/1999/xhtml"> <p> I <em>love</em>, this new movie I saw last night, it's <strong>awesome</strong>! </p> </body> </html> </message>
As you can see, your client sends the plain-text message body plus the marked-up version. That way, if your friend is using a client that doesn’t understand XHTML markup, the key content of the message still gets through.
Although we formatted the italics and bold text using the XHTML
<strong/>
and <em/>
elements, you can also format text using Cascading Style Sheets
(CSS). This enables you to include a number of popular stylistic
formats, including colors, font families, text sizes, font weights
(e.g., bold) and styles (e.g., italic), margins, text alignment (e.g.,
center), and text decoration (e.g., underline).
The XHTML-IM subset also provides support for some of the core HTML presentation features, including numbered and unordered lists, hypertext links, and images.
Missing from that list are more advanced HTML features such as
tables and media objects, as well as anything that normally goes in
the <HEAD>
tag of an HTML document, such as
scripts. This is intentional, because some of these features could be
used to include malicious code (yes, the designers of XMPP are always
thinking hard about security!). Instead, XHTML-IM is focused on a
simple subset of HTML features that can be used for lightweight
presentation in the context of rapid-fire chat conversations. Even so,
XMPP clients should exercise caution about receiving XHTML-formatted
messages from unknown entities, since even the inclusion of image
references could introduce security vulnerabilities. One such
preventive measure is to accept XHTML-IM formatting only from people
in your roster.
Who Are You? vCards
Sometimes you want to find out more information about the people you chat with. Perhaps someone has sent you a message out of the blue or asked to subscribe to your presence information. Before you continue the conversation or approve the subscription request, you wonder to yourself: just who is this person?
Don’t worry, XMPP has you covered. The extension we’re interested in here is called vCard-temp XEP-0054, and enables you to publish a kind of electronic business card called a vCard, and to retrieve vCards that other people have published.
The vCard standard (originally published in vCard MIME Directory Profile RFC 2426) defines many of the basic data fields you might want to advertise, including your name, nickname, address, phone and fax number, company affiliation, email address, birthday, a pointer to your website, a photo of you, and even your PGP key. You don’t have to publish any of that information if you don’t want to, but doing so enables people to find out more about you, which can grease the wheels of communication.
So let’s say that Alice in Wonderland sends an unsolicited message to a poor, hapless mouse:
<message from="alice@wonderland.lit/pda" to="mouse@wonderland.lit"> <body>O Mouse, do you know the way out of this pool?</body> </message>
Before replying, the mouse might check Alice’s vCard by sending an IQ-get to her JabberID:
<iq from="mouse@wonderland.lit/pool" id="pw91nf84" to="alice@wonderland.lit" type="get"> <vCard xmlns="vcard-temp"/> </iq>
Because the request was sent to Alice’s bare JID, Alice’s server replies on her behalf:
<iq from="alice@wonderland.lit" id="pw91nf84" to="mouse@wonderland.lit/pool" type="result"> <vCard xmlns="vcard-temp"> <N> <GIVEN>Alice</GIVEN> </N> <URL>http://wonderland.lit/~alice/</URL> <PHOTO> <EXTVAL>http://www.cs.cmu.edu/~rgs/alice03a.gif</EXTVAL> </PHOTO> </vCard> </iq>
As a result, the mouse can at least visit Alice’s website and view a picture of her before continuing the chat. Naturally, all of the data in a vCard can be faked, so it pays to take any given vCard result with a grain of salt. But in many situations, it’s better than nothing!
To update your vCard, send an IQ-set to your server. Here Alice adds an email address and uploads the entire vCard to her server (no, it’s not possible to upload only a “diff,” as the vCard-temp specification does not provide for that feature):
<iq from="alice@wonderland.lit/pda" id="w0s1nd97" to="alice@wonderland.lit" type="set"> <vCard xmlns="vcard-temp"> <N> <GIVEN>Alice</GIVEN> </N> <URL>http://wonderland.lit/~alice/</URL> <PHOTO> <EXTVAL>http://www.cs.cmu.edu/~rgs/alice03a.gif</EXTVAL> </PHOTO> <EMAIL><USERID>alice@wonderland.lit</USERID></EMAIL> </vCard> </iq>
Is vCard Really “temp”?
The vCard format used by the early Jabber developers was derived from an experimental XML representation of the official vCard format. Recently, the IETF has begun work on a more modern and stable approach to XML vCards, and it is possible that the XMPP community will adopt that standard instead of using vCard-temp (which has been “temp” since 1999!).
Talk to the Hand: Blocking and Filtering Communication
Lots of people use XMPP-based IM services (probably over 50 million of them, although we have no way of knowing, because XMPP is a distributed, decentralized technology). But you might not want to chat with them all. In fact, you might want to actively block a certain person from chatting with you—say, your old boss, a childhood enemy, or that weird guy you met in a chat room last week.
Because the XMPP developers care about privacy, they have defined an extension for communications blocking (defined in Privacy Lists XEP-0016), as well as a stripped-down interface to privacy lists (defined in Simple Communications Blocking XEP-0191).
First we’ll look at simple communications blocking because it’s, well, simple.
Blocking: The Simple Approach
Let’s say you want to block communications from your old boss at BigCompany.com. It’s easy enough to do if your server supports simple communications blocking—just send an appropriate IQ-set:
<iq from="you@yourdomain.tld/newjob" id="yu4er81v" to="you@yourdomain.tld" type="set"> <block xmlns="urn:xmpp:blocking"> <item jid="boss@bigcompany.com"/> </block> </iq>
Now, what does blocking boss@bigcompany.com
mean exactly?
First of all, you want to appear offline to your old boss.
When you add the block rule for that JabberID, your server sends out
an unavailable presence packet, so that your old boss sees you go
offline. From then on, whenever you update your presence (e.g., by
coming online), the associated presence stanzas will not be sent to
boss@bigcompany.com
(as far as he is concerned,
it’s as if you never log in anymore).
Second, your server needs to make sure that your old boss
cannot find out that you are online in any other way. This means
that your server will respond to every incoming IQ-get or IQ-set
with a <service-unavailable/>
error, ignore
any incoming <message/>
message (or, again,
return a <service-unavailable/>
error), and drop any incoming
<presence/>
stanza.
Finally, your server needs to prevent you from doing something
daft, like sending a message or IQ request to your old boss, so it
will reply to any outbound stanza intended for
boss@bigcompany.com
with a <not-acceptable/>
error.
You can also block entire domains. Let’s say that you have
started to receive unsolicited messages from a rogue server on the
XMPP network (perhaps spammers.lit
). You can
block messages from any JabberID at that domain by setting another
block rule:
<iq from="you@yourdomain.tld/newjob" id="i3s91xc3" to="you@yourdomain.tld" type="set"> <block xmlns="urn:xmpp:blocking"> <item jid="spammers.lit"/> </block> </iq>
Now when you retrieve your “block list,” you will see two items:
<iq from="you@yourdomain.tld/newjob" id="92h1nv8f" to="you@yourdomain.tld" type="get"> <blocklist xmlns="urn:xmpp:blocking"/> </iq> <iq from="you@yourdomain.tld" id="92h1nv8f" to="you@yourdomain.tld/newjob" type="result"> <blocklist xmlns="urn:xmpp:blocking"> <item jid="boss@bigcompany.com"/> <item jid="spammers.lit"/> </blocklist> </iq>
In simple communications blocking, it is also straightforward
to unblock someone. Simply send an IQ-set with the JabberID
contained in an <unblock/>
element instead of a <block/>
element:
<iq from="you@yourdomain.tld/newjob" id="ng23h57w" to="you@yourdomain.tld" type="set"> <unblock xmlns="urn:xmpp:blocking"> <item jid="boss@bigcompany.com"/> </unblock> </iq>
Advanced Blocking and Filtering
Sometimes you want to have more control over blocking and filtering rules than simple communications blocking will give you. For example, when you are using your mobile phone to log into your IM server, you don’t want to receive status updates from your 200 coworkers, as this would clog up your very limited bandwidth. On the other hand, you do want to receive the occasional messages they send you. Moreover, you also don’t want to block all incoming presence packets, as you want to know which members of your family are online, so you can chat with them before leaving on an overseas trip. Thus you need a finer-grained protocol for controlling your traffic filtering rules.
Here, again, XMPP comes to the rescue. Whereas simple communications blocking used a basic block list, the full-featured privacy protocol uses a more advanced privacy list. A privacy list is a list of rules that are matched against all traffic, both incoming and outgoing. If one of the rules matches an outgoing packet, the associated action of the rule is applied on the packet. For example, consider the following privacy list:
<list name="mylist"> <item type="jid" value="boss@bigcompany.com" action="deny" order="1"> <iq/> <message/> <presence-out/> </item> <item type="group" value="Work" action="deny" order="2"> <presence-in/> </item> <item action="allow" order="3"/> </list>
Let’s see how to parse this into plain English:
An incoming message from
boss@bigcompany.com
would match the first rule. Therefore, if your server receives an IQ or message stanza from your old boss, it will discard the stanza or return an error.However, if your server receives a presence stanza from your old boss, that stanza is not matched by the first privacy rule, so your server proceeds to the next rule. Since you don’t work with your old boss anymore, he is not in the “Work” group of your roster. Therefore, your server proceeds to the next (and, in this case, final rule). Lo and behold, the inbound presence stanza matches the final rule, so your server allows the stanza through. Now you can see when your old boss is online, but he can’t communicate with you!
The possible combinations of particular privacy rules provide
a powerful tool for allowing
and blocking communication, because your privacy list can include an
unlimited number of privacy rules in any specified order (each
identified by an <item/>
element, as shown earlier). The action
for any given rule is either allow
or
deny
, and the rule type
processes stanzas based on a specific or wildcard JabberID, on a
roster group name, or on a presence subscription state. Finally,
stanzas are matched based on whether they are messages, inbound
presence notifications (i.e., not including subscription-related presence
stanzas), outbound presence notifications, IQs, or all stanzas
(including subscription-related stanzas). In practice, these more
advanced block and allow methods provide basic filtering instead of
just simple blocking (although at the price of greater
complexity).
More Messaging Extensions
This chapter provided an overview of various messaging-related extensions in XMPP. But not all of them! Here is a quick look at a few more. Refer to the specifications for all the details, and make sure you check for support in your favorite client, server, or library, because some of these are not yet widely implemented:
Extended Stanza Addressing XEP-0033 lets you send a single message to multiple recipients at the same time, without using a dedicated chat room.
Advanced Message Processing XEP-0079 provides a way to control the delivery of a message; examples include message expiration and preventing messages from being stored offline for later delivery.
Message Receipts XEP-0184 do just what you would expect it to do based on the title: they provide an end-to-end mechanism for determining whether the intended recipient has indeed received a message (by contrast, Advanced Message Processing notifications are generated by servers, not clients).
Message Archiving XEP-0136 defines a technology for storing messages on your server instead of archiving them to your local machine. There are many scenarios in which this is helpful: perhaps you are using a web client that does not have local storage, the device you are using (e.g., a PDA or mobile phone) has limited storage capacity, or you move between different devices quite a bit and you want all of your message history in one place.
Summary
Instant messaging is not only the most visible application of the ability to quickly route data from one point to another, but it is also the most popular (with over 50 million XMPP users worldwide). IM interactions usually take the form of chat sessions: short bursts of messages exchanged between two parties. The XMPP extension for chat state notifications provides support for chat sessions by communicating up-to-date information about the involvement of one’s conversation partner in the discussion. In XMPP, XHTML is used to provide user-friendly formatting, such as bold, italics, and colored text. Furthermore, vCards enable you to find out more about people you might want to chat with, and privacy lists can prevent unwanted communication from other entities. The XMPP developer community continues to work XMPP extensions that will optimize the IM experience.
Get XMPP: The Definitive Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.