Friday, February 28, 2014


Resources-Oriented Web Services (ROWS) is a set of technologies that enable the programmatic discovery, description and invocation of actions on resources.

It connects the human web with the programmable web. It is an alternative to Service-Oriented Architectures that is more aligned with how the web works (URLs and A Uniform Interface: REST. The "R" in URL is the key part.).

This is part of a series of posts. You want to read this and this before carrying on.

This is a human-readable walk through of a slightly more technical specification (build).

What problems are we set to solve again?

Our goal is to connect the "human web" to the "programmable web". Our challenge is to automate what we currently do manually as humans.

We, as a community, haven't yet converged on the following:
  1. A general framework, a way for each individual service to tell clients about its resource design, its representation formats and the links it provides between resources.
  2. A language with a vocabulary that can describe the variety of RESTful and hybrid services. A document written in this language could script a generic web service client, making it act like a custom-written wrapper. More specifically, we'll need to tell clients:
    • What semantic operations are available to be performed
    • Which HTTP method to use
    • What the expected entity-body looks like
    • What to expect to get back after you invoke

Lets look at what our starting point looks like

The human web starts by using a browser to send a GET request to a resource via a URL. Lets say you were looking for booking a cab, here is what you'd do under the hoods:

GET /mountain-view HTTP/1.1

And the server responds:

HTTP/1.1 200 OK Content-Type: text/html


<span>Welcome to Yellow Cab Mountain View!</span>

<a href="/moutain-view/reservations">
Click here to book a cab!


Now, that's great for humans to consume, but a computer can't tell the difference between this and a web page about frogs.

Enter JSON-LD, microdata and

The first step to help computers make sense of this page is to tell it explicitly what this is about.

There are a few good methods for transporting linked-data in HTML, but my favorite are JSON-LD and microdata*.

* I think that JSON-LD is a more scalable approach overall for large/complex instances, but microdata is easier to grasp on simpler examples. So I'm going to use microdata here in my examples, but bear in mind that I actually prefer JSON-LD a lot better in practice.

That alone isn't sufficient. You need a machine readable description of a taxi stand. Something that a computer could understand. This is where comes in: it provides a vocabulary that describes things in the universe in a manner that computers can digest.

This is what this web page would look like:

HTTP/1.1 200 OK
Content-Type: text/html

<body itemscope
    itemtype="" >

<span itemprop="description">
Welcome to Yellow Cab Mountain View!

<a itemprop="reservations"
  itemscope itemtype="">
Click here to book a cab!


Now, this basically addresses problem #1 I raised above. It gives you a general framework (json-ld/microdata + to describe things in a manner that computers can understand.

A computer now knows:
  • This resource is a TaxiStand
  • This TaxiStand has a description
  • This TaxiStand has an ItemList of reservations
But it doesn't yet know what it can do with it.

The link to the programmable web

The programmable web exists in APIs, but it can't be easily found by computers. So, lets add a link between this specific taxi stand and where it can be found in the yellow cab APIs:

<body itemscope
    itemtype="" >
  <meta itemprop="alternate" itemscope 

Now we know that this Taxi Service is linked to a specific API.

If you GET-ed that URL you'd get something like the following:

GET /mountain-view HTTP/1.1

And the server responds:

HTTP/1.1 200 OK
Content-Type: application/json+ld
  "@context": "",
  "@type": "TaxiStand",
  "@id": "/mountain-view",
  "description": "Welcome to Yellow Cab!",
  "reservations": {
    "@type": "ItemList",
    "@id": "/mountain-view/reservations",

That's much more like what computers can understand. There is a Content-Type header that tells computers how to parse it and inside the hypermedia there is machine-readable information about the resource.

But, can a computer tell what to *do* with these resources?

If you wanted to create a reservation, how far would HTTP take you?

The closest to an API discovery mechanism in HTTP is the OPTIONS request. 

OPTIONS /mountain-view/reservations HTTP/1.1

And it could respond:

HTTP/1.1 200 OK

POST requests can take you a long way, but as LSM pointed out earlier, it is not sufficient. How would you know:
  1. What entity-body the POST request takes?
  2. Whether POST is not an overloaded POST (e.g. RPC-Style POST)?

Introducing Actions

Actions gives you the vocabulary to describe what can be done with resources. It doesn't require you to make an extra request, but it is rather attached inline with the resources.

It consists of three core mechanisms:
  1. A mechanism to link Actions with Things
  2. A taxonomy of Actions with well defined semantics and invocation constrains
  3. A vocabulary to specify what your parameters looks like
Here is what the JSON-LD response could look like:

HTTP/1.1 200 OK
Content-Type: application/json+ld
  "@context": "",
  "@type": "TaxiStand",
  "@id": "/mountain-view",
  "description": "Welcome to Yellow Cab!",
  "reservations": {
    "@type": "ItemList",
    "@id": "/mountain-view/reservations",
    "operation": {
      "@type": "CreateAction",
      "expects": {
        "@type": "SupportedClass",
        "subClassOf": "",

You could have also equally found that information in the HTML markup *:

<a itemprop="reservations"
  itemscope itemtype="">
  <meta itemprop="alternate" itemscope 
    content="" />
  <div itemprop="operation" itemscope 
    <meta itemprop="expects" itemscope 
Click here to book a cab!

* I agree that this looks a bit verbose, but I'll show you later how to make this more concise. Hint: it has to do with linked data.

Now, that tells you *everything* a computer needs:

  • This is a TaxiStand.
  • There is an API entry point here
  • This TaxiStand has an ItemList of reservations.
  • The reservation's ItemList takes CreateAction operations, which has *very* specific semantics (as well as a specification of what it means to invoke it, e.g. it is tight a POST request because it was defined that way).
  • To create a reservation, you pass an instance of a TaxiReservation. 

Which means that from this information a computer can send a request like the following with confidence:

POST /mountain-view/reservations HTTP/1.1
  "@context": "",
  "@type": "TaxiReservation",
    "1600 Amphitheatre Parkway, Mountain View, CA",
  "pickUpTime": "2pm",
  "numberOfPassengers": "1"

And the server should respond something like the following:

HTTP/1.1 201 Created

This is where hypermedia comes in again.

Hypermedia is an extremely powerful concept. It allows you to hop from one resource to another following links. That's quite powerful.

You just got a resource created, lets take a peak at what it can do:

OPTIONS /mountain-view/reservations/32523325225 HTTP/1.1

And it could respond:

HTTP/1.1 200 OK
Accept-Patch: application/json+ld

And you'd be quite excited knowing that this is a mutable resource because it takes a PATCH HTTP method.

It tells you additionally that it takes a JSON-LD patch document, which is quite informative too.

But as much as with POST, you wouldn't know enough what is the application semantics of the PATCH operation (e.g. what does it mean to "patch"? is it to update the pick up time? the drop off location?).

Enter actions again. Lets GET this resource:

GET /mountain-view/reservations/32523325225 HTTP/1.1

And now the server can respond to you:

HTTP/1.1 200 OK
Content-Type: application/json+ld
Accept: application/json+ld
  "@context": "",
  "@type": "TaxiReservation",
  "@id": "/mountain-view/reservations/32523325225",
  "reservationStatus": "CONFIRMED",
  "operation": {
    "@type": "CancelAction"

And because there is a very specific application semantic for CancelAction ("The act of asserting that a future event/action is no longer going to happen.") and a very specific definition of what it means to "cancel" a resource (in HTTP terms it is a PATCH request, according to the definition of "canceling"), it is well defined for a computer to send a PATCH request like the following:

PATCH /mountain-view/reservations/32523325225 HTTP/1.1
Accept: application/json+ld
  "@context": "",
  "@type": "CancelAction"

Which now sets the state of the reservation to cancelled:

HTTP/1.1 200 OK
Content-Type: application/json+ld
  "@context": "",
  "@type": "TaxiReservation",
  "@id": "/mountain-view/reservations/32523325225",
  "reservationStatus": "CANCELLED",

Note too that, once you cancelled, the "CancelAction" operation goes away, because that operation is no longer applicable to a CANCELLED reservation.

To Wrap Things Up

Phew, that was a lot of information. Here is where I think things fit:


|         Actions        | <- missing gap #2
|         Things         | <missing gap #1
|         JSON-LD        | <- hypermedia
|        microdata       |
|          REST          | <ROA vs SOA
|          HTTP          | <- URIs, methods

OK, that was an interesting read. But ...

... there is so much more to talk about.

The devil is on the details and we haven't gotten yet to how things like collections, authentication, different transport mechanisms (e.g. mobile applications, email messages) and gap #3 should look like.

Stay tuned. More to follow.

Wednesday, February 26, 2014

The Gaps in Resource Oriented Architectures

... where I'll go over "the one that doesn't quite work yet" type of service :)

This is part of a series of posts.

As I said earlier on my previous post, we have collectively found a series of problems with Service Oriented Architectures (e.g. WSDL/SOAP). The alternative, however, still has a long way to go to get where it needs to be.

I'll start by going over the main gaps in ROA. While at it, I'll give examples of poor design in one of the APIs that I wrote myself at Google (if I'm going to pick on someone, I'm going to start picking on my own work:)).

Stay tuned on my follow up post, because I'll go over how I'm proposing how to fix this.

None of these arguments are necessarily mine. In fact, most of them I'm borrowing from Leonardo Richardson, Sam Ruby and Mike Amundsen from "RESTful Web Services" and "RESTful Web APIs". I'm going to call them LSM throughout this post, to be short.

The Application Semantic Gap

"Wrappers make service programming easy, because the API of a wrapper library is tailored to one particular service. You don't have to think about HTTP at all. The downside is that each wrapper is slightly different: learning one wrapper doesn't prepare you for the next one.

This is a little disappointing. After all, these services are just variations on the three-step algorithm for making HTTP requests. Shouldn't there be some way of abstracting out the differences between services, some library that can act as a wrapper for the entire space of RESTful and hybrid services?

This is the problem of service descriptions. We need a language with a vocabulary that can describe the variety of RESTful and hybrid services. A document written in this language could script a generic web service client, making it act like a custom-written wrapper. The SOAP RPC community has united around WSDL as its service description language The REST community has yet to unite around a description language [...] " -- LSM

Let me show you how this appears in practice.

Let me introduce you to one of the APIs I* wrote a while ago.

* By "I" I mean that you can blame me individually for its faults and poor choices. I'll leave the credits and the good decisions to the number of folks that were involved in its design and implementation at Google.

What this API does is to, with the user's explicit consent, allow developers to record user's sentence-like activities to Google servers. Things like "I watched The Matrix", "I listened to The Beatles" or "I ran 5 miles with my friends".

Now you are probably asking yourself: "huh, this looks a lot like facebook's opengraph API".

And you'd be right.

Here are the problems I find with my own API:

Gap #1: Hypermedia

The first gap in ROA is that If you (a human) wanted to integrate with Google's and Facebook's APIs you would have to ask me (another human) what our RESTfull resources look like.

"One alternative to explaining everything is to make your service like other services. If all services exposed the same representation formats, and mapped URIs to resources in the same way ... well, we can't get rid of client programming altogether, but clients could work on a higher level than HTTP.


What we need is a general framework, a way for each individual service to tell the client about its resource design, its representation formats, and the links it provides between resources. That will give us some of the benefits of standardized conventions, without forcing all web services to comply with more than a few minimal requirements". -- LSM

This is what it looks like to record the fact that you've listened to a song on Google and this is what it looks like on Facebook. There are no (technical) reasons for these representations to be inconsistent.

Gap #2: A WSDL-like language for ROA

The second gap in the programmable web is that I (a human) had to make you (another human) read those APIs documentations and understand them.

The problem is that that's obviously not scalable. How do you automate that? Suppose there were 100000* of these APIs, how would you integrate with all those APIs?

* I agree that there isn't a realistic chance that we'll see 10000 APIs to record user's activities,  but I'll show you later on another post a few examples where that's realistic.

What we need is a way to connect the "human web" and the "programmable web". That is, how do you connect the entities found on the "human web" to the capabilities of the "programmable web"?

Browsers get away with this problem because their users are humans. They can display a form tag to a user and that's sufficient to allow their users to complete the form. That's an scalable approach, because virtually any developer can write a form tag and any user can understand it.

But computers are not that smart. For the programmable web, we need something more specific and better defined. And HTTP alone isn't sufficient.

"Collectively, these* methods define the protocol semantics of HTTP. Just by looking at the method used in an HTTP request, you can understand approximately what the client wants: whether it is trying to get a representation, delete a resource, or connect two resources together."

* referring to the HTTP methods, GET, POST, DELETE, PATCH, HEAD and OPTIONS.

"You can't understand exactly what's going on, because a resource can be anything at all. A GET request sent to a 'blog post' resource looks like a GET request sent to a 'stock symbol' resource. Those two requests have identical protocol semantics, but different application semantics. HTTP is HTTP, but a blogging API is not a stock quote API.

We can't meet the semantic challenge just by using HTTP correctly, because the HTTP protocol doesn't define any application semantics. But your application semantics should always be consistent with HTTP's protocol semantics. 'Get a blog post' and 'get a stock quote' both fall under 'get a representation of this resource,', so both requests should use HTTP GET." -- LSM

Here is a couple of alternatives to address some of these issues:

On understanding entities: "Rails 1.2 does an excellent job of merging the human web and the programmable web. [...] If you use a browser to access the resources, you're served HTML representations of the database objects and HTML forms for manipulating them.". -- LSM

On understanding actions: "In theory, the server can send additional information in response to an OPTIONS request, and the client can send OPTIONS requests that asks very specific questions about the server's capabilities. Very nice, except that there are no accepted standards for what a client might ask in an OPTIONS request. Apart from the Allow header, there are no accepted standards for what a server might send in response. Most web servers and frameworks feature very poor support for OPTIONS. So far, OPTIONS is a promising idea that nobody uses.". -- LSM

Here is a really good read on the subject:

Gap #3: A UDDI-like system for ROA

The third gap in ROA is that I (a human) had to send you (another human) two links [12] pointing you to google's and facebook's APIs.

"There is no magic bullet here. Any automated system that helps people find hotels has a built-in economic incentive to game the system. This doesn't mean that computers can't assist in the process, but it does mean that a human needs to make the ultimate decision.

The closest RESTful equivalents to UDDI are the search engines, like Google, Yahoo!, and MSN. These help (human) clients find the resources they're looking for. They take advantage of the uniform interface and common data formats promoted by REST. [...] But think of the value of search engines and you'll see the promise of UDDI, even if its complexity turns you off." -- LSM

Almost there

Here is an approach that *almost* fills all the gaps. It is a discussion about how to model a Blog API using a well-known vocabulary (, a well-defined set of verbs ( and a well-defined serialization format (HAL) that can connect the "human web" with the "programmable web".

"There is a microdata item called BlogPost (, which defines semantic descriptors called articleBody and dateCreated. That takes care of 'message', 'text' and 'publication date'. A collection of BlogPost is a Blog. That takes care of 'message list'.

I'll  name my unsafe state transitions post. I took that name from the ActivityStreams standard, where it means 'The act of authoring an object and then publishing it online'. Nobody ever intended microdata and ActivityStreams verbs to work together, but ALPS lets me combine their application semantics.


Am I creating the world's 58th microblogging API? In a sense, yes. But I didn't define anything new. I took everything from the IANA, and ActivityStreams. A client that already understands these semantic descriptors and link relations will understand my API. It is not very likely that such a client exists, but it is more likely that part of that client exists than it would be if I'd redesigned these basic concepts for the 58th time.


Just for the sake of variety, I'm going to choose HAL. A HAL+XML representation of the message lis might look like this:


<resource href="/">

  <link rel="profile" href="" />
  <link rel="profile" href="" />
  <link rel="profile" href="" />
  <link rel="about" href="/about-this-site" />


    <link rel="post" href="/messages" />

    <resource href="/messages/2" rel="item">
        <articleBody>This is message #2.</articleBody>

    <resource href="/messages/1" rel="item">
        <articleBody>This is message #1.</articleBody>


This conveys all the necessary resource state (descriptions of the two messages in the message list) and includes all the necessary hypermedia links (with the link relation profile, about, item and post).

There is a problem with the "post" link: it is not clear that it is an unsafe state transition that should be triggered with a POST, and it's not clear what entity-body the client should send along with the POST request. [...]" -- LSM

There are certainly plenty of reasons why this isn't quite there yet, most notably HAL and ALPS, as well as the fact that there isn't a specification that "post" means sending a POST request to the collection as well as what the blog post should look like.

What's up with the rambling?

Not much :) Rambling is just a way to organize my thoughts and give you context for my next post :)

Stay tuned!

Saturday, February 22, 2014

The W stands for Web

Back in 2009 we deprecated the SOAP Search API, and that reminded me of a saying at Google that goes "We have two kinds of internal services: the ones that are deprecated and the ones that don't quite work yet" :)

Jokes aside*, that's partially how I feel while reading about WSLD/SOAP: it is a neat idea that isn't supported anymore and the alternative doesn't quite work yet.

* I'll use examples of things that I've authored at Google as much as possible to discuss the technology rather than the folks involved.

In this post, I'll go over my thoughts on the former and I'll follow up with another post on the latter.

This is part of a series of posts.

Lots of folks are asking my opinion on the subject, so I just recently picked up "RESTful Web Services" and "RESTful Web APIs" as well as "Building Web Services in Java: Making sense of XML, SOAP, WSDL and UDDI" (as well as a few follow up blog posts, The S stands for SimpleDo We Need WADL?, REST and WS-*).

It was certainly an interesting read, and if you are working in this area, those are great starting points.

While this is certainly a long debate, I'll go over here some of the points that speaks to me the most.

NOTE: none of these ideas/arguments are necessarily mine, these are just the ones that I've read/heard that I can express with my own words.

The W in WS-* stands for Web

The "W" part of the WS-* acronym stands for "Web", but it doesn't really quite take the advantages of the web and the architectural style that is the most prominent (i.e. REST).

This is a long debate, and I'm not even close to being an expert in the area, but here are my main pain points with SOAP/WS-*:

  1. Uniform Interface (or lack of, aka poor usage of HTTP methods)

    SOAP messages are all sent to services via an overloaded HTTP POST request, which means that it doesn't comply to the uniform HTTP interface. e.g. GET/DELETE/PUT don't make as much sense in SOAP as it does to normal HTTP users (e.g. browsers). This means that it doesn't use 3 out of 4 extremely rich and successful mechanisms that are critical to the success of the Web.
  2. Addressability (aka endpoints vs resources)

    The problem with the RPC-style is that you don't get addressable resources. Instead, you get one endpoint where all operations are performed. This is equivalent to building a web pages like POST body: article=1234 rather than bookmarkable webpages like e.g. If I wanted to send someone the former, I'd have to tell them "hey, please open your browser and submit a POST request to with article=1234" rather than "hey, go to".
  3. The envelope inside the envelope (aka SOAP)

    SOAP embeds all messages inside HTTP via a XML SOAP envelope. I don't quite understand why that's needed: SOAPAction seems to be a specialization of the HTTP methods and the SOAP body seems like a duplication of the HTTP body.
Other less-relevant-but-still-applicable ramblings:
  • Usage of XML versus more modern serialization formats like JSON
  • The S Stands for Simple: Overall complexity of WSDL and the need to auto-generate those descriptions from code.

Resource Oriented Architectures vs Service Oriented Architectures

The most significant consequence of these design choices is that (in practice) it is not entirely aligned with the web (albeit the standard itself allows you to expose more resources, that doesn't happen in practice).

The web wants addressable resources that supports a uniform set of methods, rather than global resources that expose a set of static and custom methods.

So, rather than having 1 endpoint that deals with *all* operations, the web pushes you towards multiple "endpoints", called resources.

Others can explain this better than I can, but I think the simplest metaphor I can map to my skill set is the difference between structured programming to object oriented programming.

So, while you'd find something like the following in WSLD/SOAP:

// This is what a Services Oriented Architecture looks like:
public class MyService {
  public static String CreateBlogPost(String text);
  public static void DeleteBlogPost(String id);
  public static String OverrideBlogPost(String id, String text);

This is what's more aligned with the Web:

// This is what a Resources Oriented Architecture looks like:
public interface Blog {
  public Post POST(String body);
  public List GET();

public interface Post {
  public String GET();
  public void DELETE();
  public void PUT(String body);

Where the JVM "this" reference is the URL address (e.g.

Now, if you are a JAVA programmer, you should be able to spot easily what's the best interface design between the two.

The one that doesn't work yet

OK, so now that I went over my ramblings on WSDL/SOAP, what's the alternative?

I'll give you a hint: it exists, but it doesn't quite work yet :)

That's the subject of my next post (stay tuned!).

Friday, February 7, 2014

Meus "easter eggs" no Google

Trabalhar como engenheiro no Google tem as suas vantagens :) Dentre elas, as vezes aparece a oportunidade de botar um easter egg* ou outro nos produtos do Google :)

* nao sei se dah pra chamar tecnicamente de easter eggs :) mas enfim, tah ai :)

Street View

Em 2006, o google street view estava fazendo uns testes, e falaram para os engenheiros que o carro ia passar em certo local em certa hora. Varias pessoas foram lah para aparecer na foto e olha soh eu ai atras da bandeira do Brasil :)

Saca soh eu ai:


No mesmo ano, o orkut tava bombando :) Eh, eu sei, nao bomba mais :)

Eu tava trabalhando como estagiario e eu usava muito essa foto nos meus documentos tecnicos quando eu precisava dar um exemplo de um usuario.

Nao lembro exatamente como, mas alguem tava montando o novo build da nova pagina de login do orkut, e gritou no escritorio se alguem tinha algumas fotos para usar.

Eu mandei a minha foto com a dani e ela acabou aparecendo no build oficial por alguns anos:

Aqui tah essa foto com um zoom. Bastante dessas caras trabalham ainda hoje no Google :)

Google Buzz

Outra oportunidade foi no lancamento do re-share do Google Buzz. Cada engenheiro que lancava uma feature botava um blog post no ar, e lah fui eu de novo usar a foto minha e da dani:

Faz um tempinho que eu nao trabalho com coisas que estao diretamente relacionadas com o usuario, entao as oportunidades tao se diminuindo, mas conforme elas forem aparecendo eu vou postando aqui!

Fiquem de olho!

PS Talvez voce curta tambem outros dois posts que eu fiz sobre o Google:

A lenda dos macacos e da mangueira

Muito cedo na minha carreira eu ouvi uma lenda que me influenciou de forma construtiva. Reza a lenda que ...

Um dia cientistas  fizeram um experimento para ver como macacos se comportavam em grupos.

Para isso, colocaram 5 macacos em um canto de uma jaula e bananas no outro canto oposto da jaula.

Como o esperado, os macacos iam se aproximando das bananas para se alimentar.

Assim que um macaco colocava a mao numa banana, os cientistas davam uma senhora esguichada de agua em todos os macacos.

Com o tempo, os macacos associaram a esguichada com as bananas. Assim que alguma macaco ia pegar uma banana, o grupo inteiro enchia o individuo de surras, porque eles sabiam que se um macaco pegasse todos levariam uma esguichada.

Ate ai tudo bem.

O curioso acontece quando os cientistas comecaram a tirar macacos velhos e substituir por novos, um por vez.

Quando um macaco novo entrava na jaula, a primeira intuicao dele era de ir pegar as bananas, claro. Assim que ele tentava, os outros macacos velhos enchiam ele de surras e o macaco novo eventualmente entendia a mensagem: nao era para mexer nas bananas.

O curioso aconteceu quando todos os macacos velhos foram substituidos por macacos novos.

Sem nenhum macaco jamais ter visto a mangueira, ninguem encostava nas bananas.

Engracado, nao? :)

Eu nao faco a menor ideia se essa lenda eh real ou nao. Se aconteceu ou nao, pouco importa. A moral dela eh que importa.

Aqui vao algumas coisas do dia a dia que se aplicam:

  • Se voce tah estudando calculo/fisica, entenda as demonstracoes de cada passo.
  • Se voce tah trabalhando com engenharia, questione aquele "sistema que ninguem quer mexer".
  • Se voce eh religioso, estude as origens da sua religiao.

Se voce tah com medo de fazer alguma coisa porque ninguem tah tentando, experimente pegar a banana e ver o que acontece :)

Monday, February 3, 2014

A taxonomy for Verbs

Earlier last year, I went through the process of understanding verbs with my group, which eventually lead to the development of and its hierarchy.

We asked ourselves questions like:

  • Is there a structure between Verbs?
  • Do Verbs have arguments?
  • What's the relationship between buying and purchasing? Sending an receiving?

I figure that it could be useful to document what was the thinking process and the types of resources we created as we were approaching the problem in case anyone runs into this too :)

The underlying motivation was to enable this to be built.

Here is a presentation that I gave at a local conference.


  The main problem we were trying to solve with the hierarchy was twofold:

  • ambiguity: one verb having two distinct meanings that fits it
  • synonyms: one meaning having two distinct verbs that describes it

  An example of synonyms is Buy and Purchase: they both mean the exact same thing. We pick one and we make it clear on the documentation that one refers to the other.
  An example of ambiguity is "Receive": does it mean "ownership" is being transferred (e.g. receive an award/gift)? does it mean that the object has moved (e.g. receiving a package)? or does it mean assignment (e.g. i received a task)?
  We solve ambiguity by fixating a meaning, and using more verbs for the other meanings under different hierarchies. For example: Receive means getting an object that was moved from one place to another; Taking is for when the object has been given to you and is now yours; Accepting/Rejecting is what you do when a task is assigned to you.

  Additionally, a hierarchy benefits us in three ways:

  1. consistency: actions look a lot like the other subtrees in
  2. generalization: actions can be processed/consumed more easily by machines, which can "understand" just at the granularity level it needs to
  3. inheritance of properties: sub-actions inherit properties from their parents.


  Using dictionaries, wordnet and framenet, we built the following tables (full raw tables):
  1. A table of synonyms
  2. A table of antonyms
  3. Usage samples
  4. Potential omissions
  5. Existing developers/companies's usage
  We applied (5) and we found a minimum set of verbs that needed to be represented. We applied (4) and we asked ourselves if we would miss any.
  We applied (2) and tried to add antonyms to the verbs that were missing them.
  We started clustering the tree applying (1) and (3) and making sure that every single synonym had a specific place in the tree, such that the ancestor path would make it clear what specific meaning we fixed to the verb.
   We clustered verbs that were related (e.g. verbs for interactions between humans, communications, production of creative work, consumption, etc) and shared characteristics (e.g. semantic roles, types of objects). From these clusters, we picked generic representatives as parents and moved specializations to the bottom.

Design and Principles

  The clustering algorithm and leader election wasn't easy to set on. We iterated on it a few times before we got to something that we felt comfortable with. Here is what we eventually converged to:

  • All actions in the actions hierarchy are self sufficient actions: useful, meaningful and instantiable.
  • Actions are clustered in sub trees of their synonyms, gaining specialization as you go down. The list of ancestors should be a list of synonyms of the action, increasing generality/broadness as it goes up. All actions derive from more generic actions in either manner, purpose or object it deals with.
  • Actions are object-agnostic: there are *no* Nouns in their definitions (e.g. WatchMovieAction vs WatchDogAction is represented simply as WatchAction.object = movie or dog). This is done for scalability purposes: we don't want the tree to evolve into an explosion of X-Y-Actions, where X is a verb and Y is a noun.
  • When a verb is ambiguous, we fix its meaning to a specific facet and we make it clear. When other facets needs to be represented, we pick one of its synonyms and put it on another part of the tree.
  • When there are synonyms that mean the exact same thing, we pick a representative and we merge the meanings (e.g. Purchase and Buy). We understand that that may create dissatisfaction because we didn't pick the name you use in your product, but we are ok with that because we can't have synonyms in the tree (i.e. two verbs that means the exact same thing).
  • While adding more actions, an action should have as a parent the most specific existing synonym that can be found. If none is found, a new sub-tree is created. When a new sub-tree is created, we looked at table (4) to find a potential cluster or we re-structure the existing tree with better leaders.
  • Actions may have reciprocals.
  • Actions may have antagonyms.
  • Actions that have a temporal-relationship (i.e. one happens *after* the other) are *not* always synonyms (e.g. WriteAction normally gets followed by ShareAction, but one *is-not* a synonym of the other).
  • The tree takes into consideration existing online activity and lingo used on the web. It is optimized for modelling user's online actions rather than written/formal/poetic english.

On Arguments

  • We get all the semantic roles from wordnet and framenet while creating properties/roles/slots/arguments for verbs. Here is an example.
  • We merged the semantic roles "patient" and "theme" into "object" because we thought the distinction between objects that go over a change and objects that don't isn't an important one.
  • Sub actions share the properties from their ancestors, so it is important to use in the structure properties that apply to their children. Sharing properties can be used to make a case for creating structure between actions.

Early Drafts

A proposal for Actions

... in which we are combining the discovery of entities with the affordance of a broad set of actions.

I'm really excited to see things like web/android intents/activities/appurl popping up! It is certainly a cool new paradigm that could enable a lot of different interactions between decoupled applications.

I've been working on a related idea that is still in formation, but well baked enough to be worth sharing.

This is just my personal recollection of historical notes and the challenges we faced as we are designing this protocol specification (sort of the backstage of the protocol design). This is a collaborative process between Google, Microsoft, Yahoo and Yandex and you can be part of this participating here.

(edit: added more related efforts as I learned about them, corrected a few obvious mistakes)

The problem

The basic problem we were facing was very much like the one that web intents was set to solve: de-couple service providers and service requestors providing an intent brokering platform.

We wanted to enable products like this and this.

As we looked into specific use cases, a few things became clear:

- We needed to deal with a wide variety of platforms (Web, POP/SMTP, APIs, Android, iOS, Windows, Feeds, etc)
- We needed a common way to invoke these abilities.
- Declaring a service's abilities via a registry of (verb, data type) wasn't going to be sufficient. You had to be more specific.

The first wasn't that huge of a problem, but needed to be dealt with. The second is tough, but tractable. The third, however, is quite a challenge and we call it "The Inventory Problem".

The Affordance Problem

The affordance problem refers to the fact that it is not sufficient for a service to describe its ability to "act" (verb) on "types" (nouns). You actually need to go further down in the granularity level and enumerate the individual instances your service "acts" on.

Take the existing intent model as an example:


That certainly works well for verbs like "share" that apply to any image/*, but does it work for verbs like "watch"?

For example,  is it sufficient to say that "netflix can stream movies"? Not actually. There are very specific instances of movies that netflix can play, aka their inventory (e.g. the latest movies still in theatres *cannot* be watched on netflix).

So, one way or another, services need to declare more specifically what resources they can act on.

This problem comes up in a variety of different use cases.

Use Cases

We've explored a few key use cases that we wanted to support. Here are a few key ones:
  • Restaurants that allow reservations and orders (e.g. food delivery or for pickup)
  • Movies that can be watched, songs that can be listened
  • Hotels that can book rooms
  • Taxis that can be reserved
  • Airlines that can find flights
  • Flights that can be reserved or checked-in
  • Cars that can be rented
  • Local Businesses providing appointments
  • Organizations that allow you to search for Stores
  • Things that can be reviewed
  • Package deliveries that can be tracked
  • Events that can be RSVPed
  • Products/Movies that can be reviewed
  • Expense approvals that can be confirmed
  • Offers that can be saved
Here is a presentation I made that goes over modelling them.

All of these have in some shape or form the "Inventory Problem".

For example, opentable/urbanspoon/grubhub can't reserve *any* arbitrary restaurant, they represent specific ones. Netflix/Amazon/Itunes can't stream *any* arbitrary movie, there is a specific set of movies available. Taxis have their own coverage/service area. can't check-in into UA flights. UPS can't track USPS packages, etc.

That basic premise led us to take a different approach: to annotate individual resources with the operations that are available, rather than annotate services with their general abilities.

Verbs ... they are kind of weird

We first asked ourselves: how do we model verbs? Which rat-holed us into a really long discussion around things like:
  • Do verbs have arguments?
  • How do we deal with synonyms, antonyms and reciprocals?
  • Do verbs follow a hierarchy like nouns?
Which I went over in more detail here.

With a hierarchy of verbs, we started to look into how they would connect with resources.

Resources and actions

Thanks to the good work of the semantic web folks, finding and exposing resources is quite simple.

Take a movie on netflix, for example, this is what it looks like:

Roughly, with markup added to that resource, this is represented as graph:

<script type="application/ld+json">
  @context: "",
  @type: "Movie",
  @id: "",
  name: "The Pursuit of Happyness"

Now, there are plenty of actions that you can take on a movie: you can do things like watching, buying, renting and reviewing it.

Netflix, allows you to watch movies, so lets add nodes to this graph to express that:

<script type="application/ld+json">
  @context: "",
  @type: "Movie",
  @id: ""
  name: "The Pursuit of Happyness",
  operation: {
    @type: "WatchAction"

Via the property, you can attach an operation that can be performed in this resource. In this case, the fact that you can it (with well defined semantics).

Taking a step further, if you wanted to say that your application can handle this resource on the web as well as on mobile, you'd have something like this:

<script type="application/ld+json">
  @context: "",
  @type: "Movie",
  @id: ""
  name: "The Pursuit of Happyness",
  url: "android-app://com/netflix/movies/70044605",
  operation: {
    @type: "WatchAction"

That gives a movie streamer the language to express:
  • The individual movies in their catalog/inventory (resources)
  • What can be done with each individual movie (actions)
  • How to invoke the action (handlers)

Brokers, Requestors and Providers

Netflix exposes these resources as well as these operations via a variety of transport mechanisms (e.g. markup on webpages, feeds, POP/SMTP messages, etc). We call these entities the providers. 

Crawlers/browsers/registries discover these resources following the links and indexing these abilities, building a global registry. We call these entities the brokers. 

When a specific problem needs to be solved (e.g. watching movie X) by a specific application, it queries the brokers. We call these entities the requestors.


Think of the actions as the things that you can do with a resource. So, on top of things like GET, POST, PUT and DELETE, you'd now have things like Watch, Listen, Buy, Order and Review to describe what they do.

The same mental model of REST applies though: you have a resource, and you apply operations on that resource.

For instance:
As a parallel to REST collections, you'd have similar operations like:

Next Steps

There are plenty of challenges ahead of us. Here are a few things I am actively working on:
  • More and more implementation
  • Adding more action handlers, understanding how invoking these operation in multiple platforms should work
  • Standardize/Document more interactions and use cases we expect to see exposed on the web
  • A communication protocol between Requestors and Brokers, so these can be further de-coupled. Currently, the spec only covers the protocol between Brokers and Providers.

Related Efforts

Here are some efforts that are related but not quite. I'd love to learn more about related efforts and learn from experience, so feel free to drop me a line to let me know if I'm forgetting something.