Caring doesn’t scale
Information stored in silos can be connected with ease by the adoption of a few new standards.
We are social creatures. We align ourselves in groups made of family, friends, co-workers, neighbors, etc. When we have similar interests in activities, beliefs, political leanings and shared experiences, these bonds are strong. When we occasionally bump into the same people on our way to work, we might smile and say hello, but these are much shallower interactions.
The sad reality is, we cannot care equally deeply about everyone. This doesn’t mean we are necessarily antagonistic to others, it is just that deep connections are time and energy consuming. There are even specific theories about why this is so. Caring simply doesn’t scale.
The same is true in distributed systems. Our software can establish highly intimate, shared relationships with other schemas, services and software, but they cannot do so with many other systems. It takes time to establish close relationships whether it is between friends or distributed sources of information. Anything that makes us care prevents us from scaling in terms of diversity of participants. We have to decide what systems are worth the coupling. What information is worth the cost of integration.
When we speak about Resource-Oriented systems, we are referring to a specific design orientation. We are thinking in terms of resources, an abstraction that covers documents, data, services, or concepts. We name them with resource identifiers and we interact with them (potentially) with specific protocols such as the Hypertext Transfer Protocol (HTTP). This is an idea anyone familiar with the Web has been exposed to, but it is easy to lose the thread when we get caught up in implementation details.
One of the central notions of the Web is that we separate the identity of a resource from its representation and implementation. The representation can take different forms for different clients but the identity of the document doesn’t change. The 3rd quarter sales report is still the third quarter sales report. A resume is still a resume.
Separating identity from representation allows clients to evolve independently from each other. Servers can support new uses without breaking existing consumers. It does not require separate endpoints to do so.
A client is coupled to the representation, not the server. This frees the server to evolve on its own terms as well. Browsers aren’t affected by shifts from one web server or container to another. They are impacted, however, by new features in the representation which is why HTML5 support took so long.
When you point a browser to a website it has never visited before, it doesn’t stop and ask for directions or documentation. It treats the resource you gave it like any other resource in the world and asks for it with an HTTP GET request. When a browser finds a link in a representation provided by the server, it doesn’t have to know how to build it up. It is told what to do, it doesn’t have to care about what is different about this URL compared to any other.
These ideas have produced our definition of scale. We freely exchange information with the world all day, every day because nobody has to care about the details. Everything just works so we share resource identifiers via SMS messages, Tweets, email messages, on web pages, blogs, advertisements, movie trailers, television shows, and even napkins.
As we try to apply these ideas more generally via the REpresentational State Transfer (REST), we often make poor choices that eliminate these properties. This isn’t going to be a RESTafarian Orthodoxy rant. You are free to build your systems the way you want. No one else really minds what you do other than you have to deal with the consequences. It is just perplexing that developers fail to understand that they are making their clients care more than they need to.
It is easy to mess this situation up unintentionally. We frequently see resource identifiers that involve the means of production (.php, FooServlet, .do), a specific serialization (.html, .xml) or a specific point in time (/v1). The problem here is you are either leaking details or conflating representation and identity. This increases the coupling between systems and the burden to interact. You are requiring the clients to know more beforehand and that is a form of caring. There is nothing intrinsically wrong with any of these choices, but they do have consequences and those accumulate.
Full-blown REST as an architectural style induces some fantastic properties that solve all kinds of problems in distributed systems. It certainly isn’t the only solution and is not universally applicable. However, it works in enough ways for enough systems that you probably need to prove why you can’t use it rather than require someone to convince you why you should.
It doesn’t solve all of the problems though. Even if you do fully support cleanly-defined URLs, hypermedia formats, and the semantics and intention of the HTTP methods, we are still left exchanging media types that describe specific domains. As the consumer of your API I need to know what you mean by the use of the term “title.” (name for a published work or a rank in English royalty?) You are projecting a collection of terms and domains via resources, but I still have to care.
And so, even if we exchange information in loosely-coupled ways, we end up creating integration tables in our relational systems, export, transform and load (ETL) processes, data warehouses, custom code libraries, and a bunch of other time and energy-consuming technologies. This is all a level of intimacy that also cannot scale, so we are limited in number to whom we can actually maintain these systemic relationships with.
Supporting a new data source becomes kind of like asking, “Can I really handle a new best friend?” Certainly we always like to meet compatible people and share our lives with them, but we often don’t let new folks in because we already have so many people making demands on our time.
Here’s the thing though, it needn’t be time consuming to make friends to play tennis, grab a beer, watch a play, or discuss a shared health issue. Not every acquaintance has to be a best friend and we could miss a lot in life if we don’t open ourselves to casual friendships.
A data integration doesn’t have to be a huge initiative either. The information in a spreadsheet might add insight to what is stored in a database. Content gleaned from a web site might enforce or discourage some other action we are motivated to pursue via our own world views and information systems.
Building upon the loose coupling of REST, we can envision larger webs of data that transcend where information is stored. If we adopt a consistent, extensible data model that uses standard identifiers and standard serialization formats, we can produce almost purely portable data. In the W3C standards surrounding the Semantic Web, we see exactly these capabilities in RDF, Turtle, Linked Data, etc.
If the cost of integration falls to almost nothing it becomes easier to support casual interactions. They may turn into more formal, long-lived integrations down the road, but for now we can exchange information with anyone at any time about any subject for way less effort than you probably can imagine given the pains you have seen elsewhere.
Our ability to do anything with this information isn’t immediately guaranteed, but freeing it from silos is the first step. Beyond that, having standard query languages and the ability to connect and reason over arbitrary domains builds upon our resource abstractions and Web interactions.
Even where there are impedance mismatches between the data models of Linked Data and the tools of the average developers, we now have solutions in bridge technologies like JSON-LD. Those who want to think about the Big Picture can. Those who don’t want to do, don’t have to. You aren’t required to care.
If this all seems like fantasy, perhaps you are not paying close enough attention. We are on a trend as an industry and it scales remarkably well, and we have 26 years of proof under our belt. We see this trend playing out in the following diagram by Aldo Bucci.
We used to have disconnected computers that were time-consuming to connect. They had to care an awful lot about each other to communicate. As an industry, we adopted some standards (TCP/IP and DNS) that gave us a remarkable platform of interconnected computers. Nobody would have believed it before experiencing it, but we experience it every day on planes, buses, boats, trains and home, work, and school.
Once the computers could talk, exchanging documents was a painful process. Proprietary and brittle formats were time and energy-consuming to share. As an industry, we adopted a few more standards (HTTP and HTML) and we received a platform of interconnected documents. Again, it would have been inconceivable to anyone before they experienced it but here we are.
So, now, when I point out that the information stored in our silos can be connected with ease by the adoption of a few new standards, it should be a more tolerable statement to accept, but people still fight the idea. Even so, we are seeing these technologies being indexed by search engines and adopted by organizations in retail, publishing, life sciences, entertainment, government, military, and intelligence communities.
There is a lot to process here, but it starts with thinking in terms of resources. Resource-oriented architectures give us the Web, REST, and a means to exchange information freely. On that, we can build to support easily-integrated systems. The languages and technologies you use to implement these systems are independent choices based upon your needs, client requirements, staff, legacies, and projected future.
Don’t make me care about any of that and we’ll get along just fine.