Software transactional memory (STM) is a technique for simplifying concurrent programming by allowing multiple state-changing operations to be grouped together and performed as a single atomic operation. Strictly speaking, “software transactional memory” is an implementation technique, whereas the language construct we are interested in is “atomic blocks.” Unfortunately, the former term has stuck, and so the language-level facility is called STM.
STM solves a number of problems that arise with conventional concurrency abstractions, which we describe here through a series of examples. For reference throughout the following sections, the types and operations of the STM interface are:
data
STM
a
-- abstract
instance
Monad
STM
-- among other things
atomically
::
STM
a
->
IO
a
data
TVar
a
-- abstract
newTVar
::
a
->
STM
(
TVar
a
)
readTVar
::
TVar
a
->
STM
a
writeTVar
::
TVar
a
->
a
->
STM
()
retry
::
STM
a
orElse
::
STM
a
->
STM
a
->
STM
a
throwSTM
::
Exception
e
=>
e
->
STM
a
catchSTM
::
Exception
e
=>
STM
a
->
(
e
->
STM
a
)
->
STM
a
Imagine a window manager that manages multiple desktops. The user can move windows from one desktop to another, while at the same time, a program can request that its own window move from its current desktop to another desktop. The window manager uses multiple threads: one to listen for input from the user, a set of threads to listen for requests from the programs running in each existing window, and one thread that renders the display to the user.
How should the program represent the state of the display? Let’s assume some abstract types representing desktops and windows respectively:
data
Desktop
-- abstract
data
Window
-- abstract
A display consists of a number of Desktop
s, each of which is
displaying a set of Window
s. To put it another way, a display
is a mapping from Desktop
to a set of Window
objects. The mapping changes
over time, so we want to make it mutable, and the state needs to be
shared among multiple threads. Hence, following the pattern from
MVar as a Container for Shared State, we could use a Map
stored in an MVar
:
type
Display
=
MVar
(
Map
Desktop
(
Set
Window
))
This would work, but the MVar
is a single point of contention. For
example, the rendering thread, which needs to look only at the
currently displayed desktop, could be blocked by a window on another
desktop that is moving itself. This structure doesn’t allow as much
concurrency as we would like.
To allow operations on separate desktops to proceed without impeding
each other, perhaps we can have a separate MVar
for each desktop:
type
Display
=
Map
Desktop
(
MVar
(
Set
Window
))
Unfortunately, this approach also quickly runs into problems. Consider an operation to move a window from one desktop to another:
moveWindow
::
Display
->
Window
->
Desktop
->
Desktop
->
IO
()
moveWindow
disp
win
a
b
=
do
wa
<-
takeMVar
ma
wb
<-
takeMVar
mb
putMVar
ma
(
Set
.
delete
win
wa
)
putMVar
mb
(
Set
.
insert
win
wb
)
where
ma
=
disp
!
a
mb
=
disp
!
b
Note that we must take both MVar
s before we can put the results;
otherwise, another thread could potentially observe the display in a
state in which the window we are moving does not exist. But this
raises a problem: what if there is a concurrent call to moveWindow
trying to move a window in the opposite direction? Let’s think
through what would happen:
thread 1: moveWindow d w1 a b thread 2: moveWindow d w2 b a
Here’s one possible interleaving:
-
Thread 1 takes the
MVar
for desktopa
. -
Thread 2 takes the
MVar
for desktopb
. -
Thread 1 tries to take the
MVar
for desktopb
and blocks. -
Thread 2 tries to take the
MVar
for desktopa
and blocks.
Now we have deadlock: both threads are blocked on each other, and neither can make progress. This is an instance of the classic “Dining Philosophers” problem.
One solution is to impose an ordering on the MVar
s and require
that all agents take MVar
s in the correct order and release them
in the opposite order. That is inconvenient and error-prone, though,
and furthermore we have to extend our ordering to any other state that
we might need to access concurrently. Large systems written in
languages with locks (e.g., operating systems) are often plagued by
this problem, and managing the complexity requires building an elaborate
infrastructure to detect ordering violations.
Sofware transactional memory provides a way to avoid this deadlock problem
without imposing a requirement for ordering on the programmer. To
solve the problem using STM, we replace MVar
with TVar
:
type
Display
=
Map
Desktop
(
TVar
(
Set
Window
))
TVar
stands for “transactional variable”; it is a mutable variable
that can be read or written only within the special monad STM
, using
the operations readTVar
and writeTVar
:
readTVar
::
TVar
a
->
STM
a
writeTVar
::
TVar
a
->
a
->
STM
()
A computation in the STM
monad can be performed in the IO
monad,
using the atomically
function:
atomically
::
STM
a
->
IO
a
When an STM
computation is performed like this, it is called a
transaction because the whole operation takes place atomically with
respect to the rest of the program. No other thread can observe an
intermediate state in which only some of the operations of the transaction
have taken place. The STM
computation passed to
atomically
can be arbitrarily large and can contain any number of
TVar
operations, but as we shall see later there are performance
implications for large transactions.
To implement moveWindow
using STM, we first convert all the
operations to their STM
equivalents, and rename the function to
moveWindowSTM
to indicate that it is in the STM
monad:
moveWindowSTM
::
Display
->
Window
->
Desktop
->
Desktop
->
STM
()
moveWindowSTM
disp
win
a
b
=
do
wa
<-
readTVar
ma
wb
<-
readTVar
mb
writeTVar
ma
(
Set
.
delete
win
wa
)
writeTVar
mb
(
Set
.
insert
win
wb
)
where
ma
=
disp
!
a
mb
=
disp
!
b
Then, we wrap this in atomically
to make the IO
-monad version
moveWindow
:
moveWindow
::
Display
->
Window
->
Desktop
->
Desktop
->
IO
()
moveWindow
disp
win
a
b
=
atomically
$
moveWindowSTM
disp
win
a
b
The code for moveWindowSTM
is almost identical to the MVar
version, but the behavior is quite different: the sequence of
operations inside atomically
happens indivisibly as far as the rest
of the program is concerned, so the problem we encountered earlier that
required taking MVar
s in the correct order does not occur. What’s
more, there is no requirement that we read both TVar
s before we
write them; this would be fine, too:
moveWindowSTM
::
Display
->
Window
->
Desktop
->
Desktop
->
STM
()
moveWindowSTM
disp
win
a
b
=
do
wa
<-
readTVar
ma
writeTVar
ma
(
Set
.
delete
win
wa
)
wb
<-
readTVar
mb
writeTVar
mb
(
Set
.
insert
win
wb
)
where
ma
=
disp
!
a
mb
=
disp
!
b
So STM is far less error-prone here. The approach also
scales to any number of TVar
s, so we could easily write an operation
that moves the windows from all other desktops to the current desktop,
for example.
Now suppose that we want to swap two windows, moving window W from
desktop A to B, and simultaneously V from B to A. With the MVar
representation, we would have to write a special purpose operation to do this, because it has to take the MVar
s for A and B (in the
right order) and then put both MVar
s back with the new contents.
With STM, however, we can express this much more neatly by simply
making two calls to moveWindowSTM
:
swapWindows
::
Display
->
Window
->
Desktop
->
Window
->
Desktop
->
IO
()
swapWindows
disp
w
a
v
b
=
atomically
$
do
moveWindowSTM
disp
w
a
b
moveWindowSTM
disp
v
b
a
This demonstrates the composability of STM
operations: any operation of type STM a
can be composed with others
to form a larger atomic transaction. For this reason, STM
operations are usually provided without the atomically
wrapper so
that clients can compose them as necessary before finally wrapping
the entire operation in atomically
.
Note
Why is STM
a different monad from IO
? The STM implementation
relies on being able to roll back the effects of a transaction in
the event of a conflict with another transaction (and for other
reasons, as we shall see shortly). A transaction can be rolled
back only if we can track exactly what effects it has, and this would not
be possible if arbitrary I/O were allowed inside a transaction—we
might have performed some I/O that cannot be undone, like making a
noise or launching some missiles. For this reason, the STM
monad
permits only side effects on TVar
s, and the STM implementation
tracks these effects to ensure the correct transaction
semantics. We will discuss the implementation of STM and its
performance implications in more detail in Performance.
This is an example of using the Haskell type system to enforce a
safety invariant. We are guaranteed that every transaction is
actually a transaction, because the type system prevents arbitrary
side-effects from being performed in the STM
monad.
So far, we covered the basic facilities of STM and showed that STM can be used to scale atomicity in a composable way. STM improves the expressibility and robustness of concurrent programs. The benefits of STM in Haskell go further, however. In the following sections, we show how STM can be used to make blocking abstractions compose, and how STM can be used to manage complexity in the presence of failure and interruption.
An important part of concurrent programming is dealing with blocking when we need to wait for some condition to be true, or to acquire a particular resource. STM provides an ingenious way to do this with a single operation:
retry
::
STM
a
The meaning of retry
is simply “abandon the current transaction and
run it again.” An example should help to clarify how retry
works. Let’s consider how to implement MVar
using STM because takeMVar
and putMVar
need to be able to block when the MVar
is empty or
full, respectively.
First the data type: an MVar
is always in one of two states; either it is
full and contains a value, or it is empty. We model this with a
TVar
containing Maybe a
:[39]
newtype
TMVar
a
=
TMVar
(
TVar
(
Maybe
a
))
To make an empty TMVar
, we simply need a TVar
containing Nothing
:
newEmptyTMVar
::
STM
(
TMVar
a
)
newEmptyTMVar
=
do
t
<-
newTVar
Nothing
return
(
TMVar
t
)
Now to code takeTMVar
, which blocks if the desired variable is empty
and returns the content once the variable is set:
takeTMVar
::
TMVar
a
->
STM
a
takeTMVar
(
TMVar
t
)
=
do
m
<-
readTVar
t
--
case
m
of
Nothing
->
retry
--
Just
a
->
do
writeTVar
t
Nothing
--
return
a
Read the current contents of the
TVar
, which we inspect with acase
.If the
TVar
containsNothing
, then theTMVar
is empty, so we need to block. Theretry
operation says, “Run the current transaction again,” which will have the desired effect: we keep rerunning the transaction until theTVar
no longer containsNothing
and the othercase
branch is taken. Of course, we don’t really want to blindly rerun the transaction over and over again, making our CPU hot for no good reason. The STM implementation knows that there is no point rerunning the transaction unless something different is likely to happen, and that can be true only if one or more of theTVar
s that were read by the current transaction have changed. In fact, what happens is that the current thread is blocked until one of theTVar
s that it is reading is written to, at which point the thread is unblocked again and the transaction is rerun.If the
TVar
containsJust a
, we empty theTMVar
by writingNothing
into it and then return thea
.
The implementation of putMVar
is straightforward:
putTMVar
::
TMVar
a
->
a
->
STM
()
putTMVar
(
TMVar
t
)
a
=
do
m
<-
readTVar
t
case
m
of
Nothing
->
do
writeTVar
t
(
Just
a
)
return
()
Just
_
->
retry
So now that we have a replacement for MVar
built using STM, what can we do with it? Well, STM
operations are composable, so we can
perform operations on multiple TMVar
s at the same time:
atomically
$
do
a
<-
takeTMVar
ta
b
<-
takeTMVar
tb
return
(
a
,
b
)
This STM transaction succeeds when and only when both TMVar
s are full;
otherwise it is blocked. This explains why retry
must abandon the
whole transaction: if the first takeTMVar
succeeds but the second
one retries, we do not want the effect of the first takeTMVar
to
take place.
This example is difficult to program with MVar
because taking a
single MVar
is a side effect that is visible to the rest of the
program, and hence cannot be easily undone if the other MVar
is
empty. One way to implement it is with a third MVar
acting as a
lock to control access to the other two, but then of course all other
clients have to be aware of the locking protocol.
The retry
operation allows us to block on arbitrary conditions. As
a concrete example, we can use retry
to implement the rendering
thread in our window manager example. The behavior we want is this:
- One desktop is designated as having the focus. The focused desktop is the one displayed by the rendering thread.
- The user may request that the focus be changed at any time.
- Windows may move around and appear or disappear of their own accord, and the rendering thread must update its display accordingly.
We are supplied with a named function render
which handles the business of
rendering windows on the display. It should be called whenever the
window layout changes:[40]
render
::
Set
Window
->
IO
()
The currently focused desktop is a piece of state that is shared by
the rendering thread and some other thread that handles user input.
Therefore, we represent that by a TVar
:
type
UserFocus
=
TVar
Desktop
Next, we define an auxiliary function getWindows
that takes the
Display
and the UserFocus
and returns the set of windows to render
in the STM
monad. The implementation is straightforward: read the
current focus and look up the contents of the appropriate desktop in
the Display
:
getWindows
::
Display
->
UserFocus
->
STM
(
Set
Window
)
getWindows
disp
focus
=
do
desktop
<-
readTVar
focus
readTVar
(
disp
!
desktop
)
Finally, we can implement the rendering thread. The general plan is
to repeatedly read the current state with getWindows
and call
render
to render it, but use retry
to avoid calling render
when
nothing has changed. Here is the code:
renderThread
::
Display
->
UserFocus
->
IO
()
renderThread
disp
focus
=
do
wins
<-
atomically
$
getWindows
disp
focus
--
loop
wins
--
where
loop
wins
=
do
--
render
wins
--
next
<-
atomically
$
do
wins'
<-
getWindows
disp
focus
--
if
(
wins
==
wins'
)
--
then
retry
--
else
return
wins'
--
loop
next
First, we read the current set of windows to display.
We use this as the initial value for the
loop
.The loop takes the current set of windows as an argument, renders the windows, and then blocks until something changes that requires re-rendering.
Each iteration calls
render
to display the current state and then enters a transaction to read the next state.Inside the transaction, we read the current state.
We compare it to the state we just rendered.
If the states are the same, then there is no need to do anything, so we call
retry
.If the states are different, then we return the new state, and the loop iterates with the new state.
The effect of the retry
is precisely what we need: it waits until
the value read by getWindows
could possibly be different, because
another thread has successfully completed a transaction that writes to
one of the TVar
s that is read by getWindows
. That encompasses
both changes to the focus
(because the user switched to a different
desktop), and changes to the contents of the current desktop (because
a window moved, appeared, or disappeared). Furthermore, changes to
other desktops can take place without the rendering thread being woken
up.
If it weren’t for STM’s retry
operation, we’d have to implement
this complex logic ourselves, including implementing the signals
between threads that modify the state and the rendering thread. This
is anti-modular, because operations that modify the state have to know
about the observers that need to act on changes. Furthermore, it
gives rise to a common source of concurrency bugs: lost wakeups. If we forgot to signal the rendering thread, the
display wouldn’t be updated. In this case, the effects are somewhat
benign. In a more complex scenario, lost wakeups often lead to
deadlocks: the woken thread was supposed to complete an
operation on which other threads are waiting.
Recall that in Merging we considered the problem of waiting for any
event from a set of possible events. Typically this requires the
events to be merged into a single MVar
or Chan
so that we can
wait for the next event using takeMVar
or readChan
. In turn, this
means that the source of each event needs to know which MVar
(s) or
Chan
(s) to send it to, rather than each event being a completely
independent entity.
The more general problem of taking either of two MVar
s requires
creating two new threads to take each MVar
and put the result into a
third MVar
. However, even this doesn’t really solve the problem: if
we wanted to take at most one of two MVar
s, then (as far as I am aware) there is no way to do it; you just have to construct
your program in a different way so that it doesn’t need to do this.
STM provides a neat solution to both of these problems in the form of an operation that we have not yet introduced:
orElse
::
STM
a
->
STM
a
->
STM
a
The operation orElse a b
has the following behavior:
-
First,
a
is executed. Ifa
returns a result, then theorElse
call returns it and ends. -
If
a
callsretry
instead,a
’s effects are discarded_ andb
is executed instead.
The orElse
operator lets us combine two blocking transactions such
that one is performed but not both. This is exactly what we
need for composing several event sources, or for taking at most one of
two MVar
s (actually TMVar
s, of course). The latter is coded
as follows:
takeEitherTMVar
::
TMVar
a
->
TMVar
b
->
STM
(
Either
a
b
)
takeEitherTMVar
ma
mb
=
fmap
Left
(
takeTMVar
ma
)
`
orElse
`
fmap
Right
(
takeTMVar
mb
)
There are two calls to takeTMVar
, with their results wrapped in
Left
and Right
, respectively, composed together with orElse
.
One thing to note is that orElse
is left-biased: if both TMVar
s
are non-empty, takeEitherTMVar
will always return the contents
of the first one. Whether this is problematic depends on the
application. Be aware that the left-biased nature of orElse
can have
implications for fairness in some situations.
STM provides two complementary ways to compose blocking
operations together: the ordinary monadic bind gives us “and”, and
orElse
gives us “or”.
Recall in Merging that we defined waitEither
for the Async
abstraction by forking two extra threads. STM’s orElse
now allows
us to define waitEither
much more efficiently. Furthermore, the
extra flexibility of STM lets us compose Async
s together in
more interesting ways. But first, we need to rewrite the Async
implementation in terms of STM, rather than MVar
. The translation
is straightforward: we just replace MVar
with TMVar
.
data
Async
a
=
Async
ThreadId
(
TMVar
(
Either
SomeException
a
))
The async
function looks familiar, with only an additional
atomically
to wrap the call to putTMVar
in the child thread:
async
::
IO
a
->
IO
(
Async
a
)
async
action
=
do
var
<-
newEmptyTMVarIO
t
<-
forkFinally
action
(
atomically
.
putTMVar
var
)
return
(
Async
t
var
)
Here we used newEmptyTMVarIO
, which is a convenient version of
newEmptyTMVar
in the IO
monad.
The waitCatchSTM
function is like waitCatch
, but in the STM
monad:
waitCatchSTM
::
Async
a
->
STM
(
Either
SomeException
a
)
waitCatchSTM
(
Async
_
var
)
=
readTMVar
var
And we can define waitSTM
, the version of waitCatchSTM
that
re-throws an exception result, in terms of waitCatchSTM
:
waitSTM
::
Async
a
->
STM
a
waitSTM
a
=
do
r
<-
waitCatchSTM
a
case
r
of
Left
e
->
throwSTM
e
Right
a
->
return
a
Now we can define waitEither
by composing two calls to waitSTM
using orElse
:
waitEither
::
Async
a
->
Async
b
->
IO
(
Either
a
b
)
waitEither
a
b
=
atomically
$
fmap
Left
(
waitSTM
a
)
`
orElse
`
fmap
Right
(
waitSTM
b
)
More generally, we can wait for any number of Async
s
simultaneously. The function waitAny
does this by first mapping
waitSTM
over a list of Async
s and then composing the calls
together by folding them with orElse
:
waitAny
::
[
Async
a
]
->
IO
a
waitAny
asyncs
=
atomically
$
foldr
orElse
retry
$
map
waitSTM
asyncs
In Merging (geturls6.hs), we downloaded several URLs simultaneously
and reported the first one to finish by using a version of waitAny
that forked a new thread for each Async
to wait for. Using the
above definition of waitAny
with the STM version of Async
, we can
now solve the same problem without forking a new thread per
Async
:
main
::
IO
()
main
=
do
let
download
url
=
do
r
<-
getURL
url
return
(
url
,
r
)
as
<-
mapM
(
async
.
download
)
sites
(
url
,
r
)
<-
waitAny
as
printf
"%s was first (%d bytes)
\n
"
url
(
B
.
length
r
)
mapM_
wait
as
The program works as before, creating an Async
to download each URL
in the list. Then it calls waitAny
to get the first result, reports
it, and finally waits for the rest to complete.
In this section, we’ll implement the Chan
type from
MVar as a Building Block: Unbounded Channels using STM. As we’ll see, using STM to implement
Chan
is rather less tricky than using MVar
s, and furthermore we
are able to add complex operations that were difficult or
impossible using MVar
s.
The STM version of Chan
is called TChan
, and the interface we wish to implement is as follows:[41]
data
TChan
a
newTChan
::
STM
(
TChan
a
)
writeTChan
::
TChan
a
->
a
->
STM
()
readTChan
::
TChan
a
->
STM
a
This is exactly the same as Chan
, except that we renamed Chan
to
TChan
, and all the operations are in the STM
monad rather than
IO
. The full code for the implementation is given next.
data
TChan
a
=
TChan
(
TVar
(
TVarList
a
))
(
TVar
(
TVarList
a
))
type
TVarList
a
=
TVar
(
TList
a
)
data
TList
a
=
TNil
|
TCons
a
(
TVarList
a
)
newTChan
::
STM
(
TChan
a
)
newTChan
=
do
hole
<-
newTVar
TNil
read
<-
newTVar
hole
write
<-
newTVar
hole
return
(
TChan
read
write
)
readTChan
::
TChan
a
->
STM
a
readTChan
(
TChan
readVar
_
)
=
do
listHead
<-
readTVar
readVar
head
<-
readTVar
listHead
case
head
of
TNil
->
retry
TCons
val
tail
->
do
writeTVar
readVar
tail
return
val
writeTChan
::
TChan
a
->
a
->
STM
()
writeTChan
(
TChan
_
writeVar
)
a
=
do
newListEnd
<-
newTVar
TNil
listEnd
<-
readTVar
writeVar
writeTVar
writeVar
newListEnd
writeTVar
listEnd
(
TCons
a
newListEnd
)
The implementation is similar in structure to the MVar
version in
MVar as a Building Block: Unbounded Channels, so we do not describe it line by line; however, we
will point out a few important details:
-
All the operations are in the
STM
monad, so to use them they need to be wrapped inatomically
(but they can also be composed; more about that later). -
The
TList
type needs aTNil
constructor to indicate an empty list; in theMVar
implementation, the empty list was represented implicitly by an emptyMVar
. -
Blocking in
readTChan
is implemented by a call toretry
. - Nowhere did we have to worry about what happens when a read executes concurrently with a write, because all the operations are atomic.
We now describe three distinct benefits of the STM implementation
compared with using MVar
s.
In MVar as a Building Block: Unbounded Channels, we mentioned the
operation unGetChan
, which could not be implemented with the desired
semantics using MVar
s. Here is its implementation with STM:
unGetTChan
::
TChan
a
->
a
->
STM
()
unGetTChan
(
TChan
readVar
_
)
a
=
do
listHead
<-
readTVar
readVar
newHead
<-
newTVar
(
TCons
a
listHead
)
writeTVar
readVar
newHead
The obvious implementation does the right thing here. Other
operations that were not possible with MVar
s are straightforward
with STM; an example is isEmptyTChan
, the MVar
version that
suffers from the same problem as unGetChan
:
isEmptyTChan
::
TChan
a
->
STM
Bool
isEmptyTChan
(
TChan
read
_write
)
=
do
listhead
<-
readTVar
read
head
<-
readTVar
listhead
case
head
of
TNil
->
return
True
TCons
_
_
->
return
False
Because blocking STM
computations can be composed together, we can build composite
operations like readEitherTChan
:
readEitherTChan
::
TChan
a
->
TChan
b
->
STM
(
Either
a
b
)
This function reads a value from either of the two TChan
s passed
as arguments, or blocks if they are both empty. Its implementation
should look familiar, being similar to takeEitherTMVar
:
readEitherTChan
::
TChan
a
->
TChan
b
->
STM
(
Either
a
b
)
readEitherTChan
a
b
=
fmap
Left
(
readTChan
a
)
`
orElse
`
fmap
Right
(
readTChan
b
)
Up until now, we have said nothing
about how exceptions in STM behave. The STM
monad supports
exceptions much like the IO
monad, with two operations:
throwSTM
::
Exception
e
=>
e
->
STM
a
catchSTM
::
Exception
e
=>
STM
a
->
(
e
->
STM
a
)
->
STM
a
The throwSTM
operation throws an exception, and catchSTM
catches
exceptions and invokes a handler, just like catch
in the IO
monad.
However, exceptions in STM are different in one vital way: in
catchSTM m h
, if m
raises an exception, then all of its effects
are discarded, and then the handler h
is invoked. As a degenerate
case, if there is no enclosing catchSTM
at all, then all of the effects
of the transaction are discarded and the exception is propagated out
of atomically
.
An example should help to demonstrate the motivation for this
behavior. Imagine an STM
operation readCheck
defined as follows:
readCheck
::
TChan
a
->
STM
a
readCheck
chan
=
do
a
<-
readTChan
chan
checkValue
a
Where checkValue
is an operation that imposes some extra constraints
on the value read from the channel. Now suppose checkValue
raises an exception (perhaps accidentally, e.g., divide-by-zero). We
would prefer it if the readTChan
had not happened because an element
of the channel would be lost. Furthermore, we would like readCheck
to have this behavior regardless of whether there is an enclosing
exception handler or not. Hence catchSTM
discards the effects of
its first argument in the event of an exception.
The discarding-effects behavior is even more useful in the case of
asynchronous exceptions. If an asynchronous exception occurs during
an STM transaction, the effects of the transaction are discarded, just
as for a synchronous exception. So in most cases, asynchronous
exception safety in STM consists of doing absolutely nothing at all.
There are no locks to replace, so there is no need for exception handlers or
bracket
and no need to worry about which critical sections to
protect with mask
.
The implementation of TChan
given earlier is entirely safe with
respect to asynchronous exceptions as it stands, and moreover any
compositions of these operations are also safe.
STM provides a nice way to write code that is automatically safe with respect to asynchronous exceptions, so it can be useful even for state that is not shared between threads. The only catch is that we have to use STM consistently for all our state, but having made that leap, asynchronous exception safety comes for free.
In the previous section, we implemented a channel type that was
analogous to the MVar
-based Chan
, in that it has a similar
implementation structure and the same basic operations. However, the
flexibility of STM gives us more choices in how to construct channels,
and in fact if we don’t need dupChan
, we can implement a much
more efficient channel abstraction.
The key observation is that in STM, an operation can block on any condition whatsoever. This means we can represent the channel contents by any data structure we choose. For example, even a simple list works:
newtype
TList
a
=
TList
(
TVar
[
a
])
newTList
::
STM
(
TList
a
)
newTList
=
do
v
<-
newTVar
[]
return
(
TList
v
)
writeTList
::
TList
a
->
a
->
STM
()
writeTList
(
TList
v
)
a
=
do
list
<-
readTVar
v
writeTVar
v
(
list
++
[
a
])
readTList
::
TList
a
->
STM
a
readTList
(
TList
v
)
=
do
xs
<-
readTVar
v
case
xs
of
[]
->
retry
(
x
:
xs'
)
->
do
writeTVar
v
xs'
return
x
This is a channel abstraction with the same behavior as TChan
;
readTList
blocks when the channel is empty, because it can detect
the empty list and call retry
.
There is a performance problem with this representation, though. Note
that writeTList
must add an element to the end of the list, which,
using the standard Haskell list datatype, requires an O(n) append
operation.
The solution is to use a different queue data structure that
supports O(1) enqueue and dequeue operations. There is a folklore
technique for representing a queue that has the desired property: the
idea is to represent a queue as two lists, xs
and ys
, where
the whole contents of the list is given by xs ++ reverse ys
. That
is, to take an element from the front we take it from xs
, and to add
an element to the back we add it to the front of ys
; both of these
operations are O(1). But what if xs
is empty and we need to take
an element? In that case, we must reverse ys
and let that become the
new xs
. So while most of the time, taking an element from the front
is O(1), occasionally it is O(n). However, we know that each list
element is reversed only once, so on average the complexity of both
enqueue and dequeue is O(1).[42]
We can use this technique to represent the channel contents. This is the code:
data
TQueue
a
=
TQueue
(
TVar
[
a
])
(
TVar
[
a
])
newTQueue
::
STM
(
TQueue
a
)
newTQueue
=
do
read
<-
newTVar
[]
write
<-
newTVar
[]
return
(
TQueue
read
write
)
writeTQueue
::
TQueue
a
->
a
->
STM
()
writeTQueue
(
TQueue
_read
write
)
a
=
do
listend
<-
readTVar
write
writeTVar
write
(
a
:
listend
)
readTQueue
::
TQueue
a
->
STM
a
readTQueue
(
TQueue
read
write
)
=
do
xs
<-
readTVar
read
case
xs
of
(
x
:
xs'
)
->
do
writeTVar
read
xs'
return
x
[]
->
do
ys
<-
readTVar
write
case
ys
of
[]
->
retry
--
_
->
do
let
(
z
:
zs
)
=
reverse
ys
--
writeTVar
write
[]
writeTVar
read
zs
return
z
+
There is one subtlety here: we must be careful that the reverse
is
done lazily, which is why we use a let
rather than case
here. If
we were to pattern-match on the result of the reverse
strictly,
the STM transaction could not complete until the reverse finished
(see Performance).
Another happy consequence of this representation choice is that we are
able to use a separate TVar
for each list. This means that in the
common case, readers and writers can proceed independently without
conflict, which is important if we use this data structure in a
parallel setting.
This implementation of channels in STM outperforms both the
MVar
-based Chan
and the TVar
-based TChan
. A simple
benchmark program can be found in chanbench.hs with
three different scenarios:
- Two threads, one reading from and one writing to the channel
- One thread, writing a large number of values and then reading them
- One thread, repeatedly writing and then reading a number of values
On my computer, TQueue
is about the same as Chan
on the first
test and wins by about 20% on the second and third test.
Why is TQueue
so much faster? The main reason is that the data
structure representing the channel contents is much more compact and
thus faster to operate on: ordinary linked lists are very cheap in Haskell,
whereas operations on TVar
and MVar
are much more expensive.
So far, we have seen one-place channels (MVar
and TMVar
) and
unbounded channels (Chan
and TChan
), but in practice we often want
something between the two. The one-place channel does not allow
sufficient concurrency: consider multiple writers with a single
reader. If there is a burst of writing activity, most of the
writers will block waiting for the reading thread to catch up, and
there will be a lot of context switching as the reader services each
writer in turn. The unbounded channel has a different pathology: if
the reading thread cannot keep up with the writers, the size of
the channel will keep growing without bound, and in the worst case we
could run out of memory.
Ideally, there should be some limit on the size of the channel so that the channel can absorb bursts of writing activity without the danger that heavy writing will use too much memory.
Fortunately, STM makes it quite straightforward to build a bounded
channel. All we need to do is keep track of the current capacity in
the channel and arrange that writing to the channel blocks if the
channel is currently full. This implementation is based on TQueue
:
data
TBQueue
a
=
TBQueue
(
TVar
Int
)
(
TVar
[
a
])
(
TVar
[
a
])
--
newTBQueue
::
Int
->
STM
(
TBQueue
a
)
newTBQueue
size
=
do
read
<-
newTVar
[]
write
<-
newTVar
[]
cap
<-
newTVar
size
return
(
TBQueue
cap
read
write
)
writeTBQueue
::
TBQueue
a
->
a
->
STM
()
writeTBQueue
(
TBQueue
cap
_read
write
)
a
=
do
avail
<-
readTVar
cap
--
if
avail
==
0
--
then
retry
--
else
writeTVar
cap
(
avail
-
1
)
--
listend
<-
readTVar
write
writeTVar
write
(
a
:
listend
)
readTBQueue
::
TBQueue
a
->
STM
a
readTBQueue
(
TBQueue
cap
read
write
)
=
do
avail
<-
readTVar
cap
--
writeTVar
cap
(
avail
+
1
)
xs
<-
readTVar
read
case
xs
of
(
x
:
xs'
)
->
do
writeTVar
read
xs'
return
x
[]
->
do
ys
<-
readTVar
write
case
ys
of
[]
->
retry
_
->
do
let
(
z
:
zs
)
=
reverse
ys
writeTVar
write
[]
writeTVar
read
zs
return
z
The
TBQueue
data type is like theTQueue
we saw previously but has an extraTVar Int
to store the channel’s current capacity.In
writeTBQueue
, we first read the current capacity.If the capacity is zero, meaning the channel is full,
we call
retry
to block.Otherwise, decrease the capacity by 1, because we are about to add another element.
When reading, we always increment the capacity.
In the chanbench.hs channel benchmark, the bounded
channel performs almost as well as TQueue
in the first test,
although it doesn’t do so well in the third test, performing about the
same as TChan
. The second test, which writes a large number of
items to the channel, inevitably fails with TBQueue
.
Tip
The danger with bounded channels is that it is possible to write a
program with a lurking deadlock that is only discovered much later
when the program is running in production. This is because the vast
majority of the time writeTBQueue
does not block, but once in a
while, probably under heavy load, the channel fills up and
writeTBQueue
blocks. If the program depends on writeTBQueue
not
blocking, it may deadlock. How might we get into this situation?
It is the dining philosophers problem again:
thread 1: x <- atomically $ readTBQueue q1 y <- atomically $ readTBQueue q2 thread 2: atomically $ writeTBQueue q2 y atomically $ writeTBQueue q1 x
This sequence will work perfectly well until q2
becomes
full, at which point we get a deadlock. If the communication pattern
is obscured by other code, we might not realize there’s a
problem.
There’s no silver bullet. The best advice is to test your code thoroughly with a buffer size of 1, because that will tend to expose any deadlocks of this kind during testing. Note that deadlocks will often be detected by the runtime system and result in an exception rather than a hang; see Detecting Deadlock.
STM offers a qualitative improvement over MVar
in various ways:
composable atomicity, composable blocking, and simpler error handling.
Therefore, it is reasonable to ask whether we need MVar
at all, and
whether there is anything that is harder to accomplish with STM than
with MVar
.
One unsurprising advantage of MVar
is that it
is faster than STM. But even though a straightforward comparison of, say,
takeMVar
against atomically . takeTMVar
will show that takeMVar
is faster, we should not assume that using MVar
will
always result in faster code. As we saw in the previous section, we
can build a channel using STM that outperforms the MVar
-based
version, and furthermore is composable.
In fact, MVar
does have one other important advantage over STM, which
we mentioned earlier: fairness. When multiple threads block on
an MVar
, they are guaranteed to be woken up in FIFO order, and no
single thread can be blocked in takeMVar
indefinitely so long as
there is a constant supply of putMVar
s. In contrast, when multiple
threads are blocked in STM transactions that depend on a particular
TVar
, and the TVar
is modified by another thread, it is not enough
to just wake up one of the blocked transactions—the runtime must
wake them all. To see why, consider the following:
do
x
<-
takeTMVar
m
when
(
x
/=
42
)
retry
A transaction can block on an arbitrary condition, so the runtime
doesn’t know whether any individual transaction will be able to make
progress after the TVar
is changed; it must run the
transaction to find out. Hence, when there are multiple transactions
that might be unblocked, we have to run them all; after all, they
might all be able to continue now. Because the runtime has to run all the blocked transactions, there is no guarantee that threads will
be unblocked in FIFO order and no guarantee of fairness.
You might wonder whether we could implement fairness using STM. For
example, suppose we want to add fairness to our TMVar
implementation. We will need to represent explicitly the queue of
blocked takeTMVar
s, perhaps as a list of TVar
s, each waiting
to receive a value. Conversely, the blocked putTMVar
s could also
be a list of TVar
s, each with a value to put. In fact, we could
represent all the blocked threads by a list of TVar (Maybe a)
.
So this could be the TMVar
data type:
data
TMVar
a
=
TMVar
(
TVar
(
Maybe
a
))
(
TVar
[
TVar
(
Maybe
a
)])
Now consider how putMVar
would work. There are three cases to consider:
-
The
TMVar
is empty, and there are no blockedtakeTMVar
s -
Store the value in the
TMVar
and return. -
The
TMVar
is empty, and there are some blockedtakeTMVar
s -
Removes the first blocked
takeTMVar
from the queue and put the value in itsTVar
. -
The
TMVar
is full -
We must create a new
TVar
containingJust a
(the value to be put), add this to the end of the list of blockedputTMVar
s, and then wait until theTVar
contents becomesNothing
.
The last case is the tricky one: we cannot write a transaction that
both has a visible effect (adds something to the list) and calls
retry
, because calling retry
abandons any changes to TVar
s
made by the current transaction.
The only way to implement fairness is to abandon composability. We
can implement a TMVar
with the structure I suggested, but the
operations must be in the IO
monad, not the STM
monad. The trick
is to have the STM transaction return an IO
action that is executed
after the STM transaction completes. I’ll leave the implementation as
an exercise for the reader.
In general, the class of operations that STM cannot express are those
that involve multi-way communication between threads. The simplest
example is a synchronous channel, in which both the reader and the
writer must be present simultaneously for the operation to go ahead.
We cannot implement this in STM, at least compositionally, for the
same reason that we cannot implement TMVar
with fairness: the
operations need to block and have a visible effect—advertise that
there is a blocked thread—simultaneously.
As with most abstractions, STM has a runtime cost. If we understand the cost model, we can avoid writing code that hits the bad cases. So in this section I’ll give an informal description of the implementation of STM, with enough detail that the reader can understand the cost model.
An STM transaction works by accumulating a log of readTVar
and writeTVar
operations that have happened so far during the
transaction. The log is used in three ways:
-
By storing
writeTVar
operations in the log rather than applying them to main memory immediately, discarding the effects of a transaction is easy; we just throw away the log. Hence, aborting a transaction has a fixed small cost. -
Each
readTVar
must traverse the log to check whether theTVar
was written by an earlierwriteTVar
. Hence,readTVar
is an O(n) operation in the length of the log. -
Because the log contains a record of all the
readTVar
operations, it can be used to discover the full set ofTVar
s read during the transaction, which we need to know in order to implementretry
.
When a transaction reaches the end, the STM implementation compares
the log against the contents of memory. If the current contents of
memory match the values read by readTVar
, the effects of the
transaction are committed to memory, and if not, the log is
discarded and the transaction runs again from the beginning. This
process takes place atomically by locking all the TVar
s involved
in the transaction for the duration. The STM implementation in GHC
does not use global locks; only the TVar
s involved in the
transaction are locked during commit, so transactions operating on
disjoint sets of TVar
s can proceed without interference.
There are two important rules of thumb:
-
Never read an unbounded number of
TVar
s in a single transaction because the O(n) performance ofreadTVar
then gives O(n2) for the whole transaction. -
Try to avoid expensive evaluation inside a transaction because this
will cause the transaction to take a long time, increasing the
chance that another transaction will modify one or more of the same
TVar
s, causing the current transaction to be re-executed. In the worst case, a long-running transaction re-executes indefinitely because it is repeatedly aborted by shorter transactions.
It is possible that a future STM implementation may use a different
data structure to store the log, reducing the readTVar
overhead to
O(log n) or better (on average), but the likelihood that a
long transaction will fail to commit would still be an issue. To
avoid that problem, intelligent contention-management is required,
which is an area of active research.
The retry
operation uses the transaction log to find out which
TVar
s were accessed by the transaction, because changes to any of
these TVar
s must trigger a rerun of the current transaction.
Hence, each TVar
has a watch list of threads that should be
woken up if the TVar
is modified, and retry
adds the current
thread to the watch list of all the TVar
s read during the current
transaction. Hence, retry
is O(n) in the number of TVar
s
read during the transaction. When a transaction is committed, if any
of the modified TVar
s has a watch list, then the threads on the list
are all woken up.
One other thing to watch out for is composing too many blocking
operations together. If we wanted to wait for a list of TMVar
s to
become full, we might be tempted to do this:
atomically
$
mapM
takeTMVar
ts
Imagine that the TMVar
s all started empty and became full one at a
time in the same order as the list ts
. Each time a new TMVar
becomes full, the transaction wakes up and runs again, going to sleep
at the next empty TMVar
. We’ll run the transaction from the start,
once for every element of ts
, so the whole operation is O(n2).
If instead, we had written this code:
mapM
(
atomically
.
takeTMVar
)
ts
then it is O(n), although now the semantics are different—it is not a single transaction anymore—but if these semantics are acceptable, then the second form will be much faster.
To summarize, STM provides several benefits for concurrent programming:
- Composable atomicity
- You can construct arbitrarily large atomic operations on shared state, which can simplify the implementation of concurrent data structures with fine-grained locking.
- Composable blocking
-
You can build operations that
choose between multiple blocking operations, which is
very difficult with
MVar
s and other low-level concurrency abstractions. - Robustness in the presence of failure and cancellation
- A transaction in progress is aborted if an exception occurs, so STM makes it easy to maintain invariants on state in the presence of exceptions.
[39] The TMVar
implementation is available from the Control.Concurrent.STM.TMVar
module in the stm
package.
[40] We are assuming that the actual window contents are rendered via some separate means, e.g., compositing.
[41] The implementation
is available in the module Control.Concurrent.STM.TChan
from the
stm
package.
[42] Technically, the complexity is amortized O(1). For more details on these kinds of data structures, I recommend reading Okasaki’s Purely Functional Data Structures (Cambridge University Press, 1999).
Get Parallel and Concurrent Programming in Haskell now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.