Tuesday, September 1, 2015

Vocabulary negotiation

... and the need for an Accepts-Language for machines.

This is a write-up for a un-conference session I held at API-CRAFT 2015 @ Detroit as well as a follow up on the Vocabulary Fragmentation challenge I brought on my keynote at WS-REST 2014 at WWW in Seoul.

You want to read this before you continue. This is part of a series of posts.

At API-CRAFT 2015, a group of folks discussed the idea of making hypermedia API clients interoperate at the vocabulary/taxonomy level. The idea was simple: pass a parameter/header in the HTTP request to servers advertising the vocabularies/taxonomies the client understand.

GET /mountain-view HTTP/1.1
Host: api.yellowcab.com
Accepts-Vocab: http://schema.org

Sort of like how HTTP clients pass things like Accept: text/html, etc for media types and Accepts-Language: pt-BR, etc for user-visible languages.

For example, imagine this is the response you'd normally get from servers:

HTTP/1.1 200 OK
Content-Type: text/json

{
  kind: book,
  cover: "The Bible",
  id: "123",
  published: "01/01/0000"
}

What vocabulary negotiation (and more generally content negotiation) enables you is to advertise what your client "accepts/wants" the server to respond (to the extent that it *can* fulfill the request). For example, imagine your client could say:

dude, I totally don't want to have to parse this proprietary/unknown JSON blob. why don't you return it in a format and a vocabulary that I *do* understand?

It would advertise what it *does* understand with a special header, Accepts-Vocab (as well as any other other content negotiation headers, like Accept). For example:

GET /books/the-bible HTTP/1.1
Host: api.library.com
Accepts: application/json+ld
Accepts-Vocab: http://schema.org

And, as with any other content-negotiation mechanism, the server is responsible for returning the content in the requested format/vocabulary (or a 4xx in case it can't).

HTTP/1.1 200 OK
Content-Type: application/json+ld

{
  @context: "http://schema.org",
  @type: Book,
  name: "The  Bible",
  isbn: "123",
  datePublished: "01/01/0000",
}

Here is what this would look like for a different vocabulary, for example opengraph:
GET /books/the-bible HTTP/1.1
Host: api.library.com
Accepts: application/json+ld
Accepts-Vocab: http://ogp.me

Which the server can now respond:

HTTP/1.1 200 OK
Content-Type: application/json+ld

{
  @context: "http://ogp.me",
  @type: book,
  title: "The  Bible",
  isbn: "123",
  release_date: "01/01/0000",
}

This allows us, as a community, to build common/shared API clients that understand a common/shared vocabulary (whenever that's desirable/possible).

More to come, stay tuned.

JSON-ish

... thoughts on embedding HTML layout/presentation in JSON.

Something that has been bothering me a lot lately in hypermedia API formats is that JSON (and transitively JSON-LD) is purely data-oriented and it is awkward to intermingle layout.

JSON's processing model has two key aspects:
  1. declaration order doesn't matter
    {"foo": 1, "bar": 2} == {"bar": 2, "foo": 1}
    
  2. you cannot use the same identifier
    JSON.parse('{"foo": 1, "foo": 2}') == {"foo": 2}
    
And that's an issue because it means you can't easily declare layout/presentation like you would in HTML/XML. For example:
<!-- Declaration order matters. For example, this: -->
<form>
  <div>hello world</div>
  <input name='foo'>
  <input name='bar'>
</form>
<!-- ... is different than this: -->
<form>
  <input name='bar'>
  <input name='foo'>
  <div>hello world</div>
</form>
As well as you can use the same identifier multiple times and it doesn't override:
<!-- Declaration order matters. For example, this: -->
<form>
  <div>foo</div>
  <div>bar</div>
</form>
<!-- ... is different than this: -->
<form>
  <div>foo</div>
</form>
The JSON-friendly way to make this work is to make your objects declare a tree. This is what it would look like:
{
  @type: Form,
  children: [{
      @type: Div,
      children: ["hello world"]
    }, {
      @type: Input,
      name: "foo",
    },{
      @type: Input,
      name: "bar",
    }
  ]
}

And this is awkward at best.

HTML, on the other hand, is great at representing layout/presentation, but it is bad at representing machine-readable data. For example, this is what intermingling UI and data looks like with microdata:

<!-- Here is an example of embedding data with presentation in HTML -->
<div itemscope itemtype='http://schema.org/WebSite'>
  <!-- It is awkward to have to create these 'fake' UI elements 
    just to convey data -->
  <a itemscope itemprop='url'>http://example.com</a>
  <meta itemprop='name' content='My WebSite'>
  <!-- It is super hard to represent nested structures because meta tags
    aren't recursive -->
  <meta itemprop='description' content='foo bar>
  <form itemprop='potentialAction' itemscope 
    itemtype='http://schema.org/SearchAction'>
    <input itemprop='query' name='q'>
  </form>
</div>

What follows is an exploration of what combining both worlds would look like: embedding HTML layout in JSON, optimizing for producers as opposed to consumers (i.e. starting from the assumption that you want to make this easy for "writers" being cool paying the cost for "readers").

HTML (-ish) layout controls in JSON (-ish).


These are @goto's personal notes from a variety of internal discussions at Google.

This is super draft-y, read at your own risk.

The basic idea was simple: bring all the goodness of the well stablished and well thought out hypermedia controls in HTML to JSON objects. Everybody is familiar with those and there is no need to re-invent the wheel.

Here [0, 1, 2, 3] is some background reading if you are interested in the motivations.

I'll go over the most basic hypermedia controls available in HTML and a mechanism to map them to JSON. At the end, I'll go over a couple of extensions to what HTML provides to address a few needs from APIs.

Processing model

  • This isn't really JSON (and transitively JSON-LD): declaration order matters. That's important because it is designed to declare layout/presentation and data, as opposed to just data + hypermedia controls. That's super unfortunate, but I think necessary.
  • an object with a name that starts with @ starts new nodes, it is the equivalent of <> tags in XML/HTML. e.g. @a == <a>
  • a primitive with a name that starts with @ is an attribute of the parent node, it is the equivalent of attributes in XML/HTML. e.g. @a { @href: "foo" } == <a href="foo" >
  • custom properties (everything that isn't started with a @*) are added to the parent node's context.
  • There is some sort of JSON-LD-like processing that converts all these things into a bound graph.
  • // Comments are allowed.
The media type would be something like application/vnd.json-ish.

Affordances


We use a reserved token (@) to make the distinction between text and hypertext. Here are the main hypertext affordances:

TagControl
@aOutbound links
@formTemplated links
@linkInbound links
@metaMetadata

@a

The simplest (and possibly most powerful) hypermedia control is the ability to express outbound links.
{
  kind: "issue",
  id: "1234",
  name: "this is an issue",
  description: "things are not working!",

  @a: { @href: "/issues", @rel: "home", @text: "All issues", @itemprop: "home" }
}

@form

The second important hypermedia control for APIs is the ability to perform operations/actions. On the web, this is typically done via <form>. This is roughtly what it maps to in JSON:
{
  kind: "collection",
  name: "All issues",
  description: "The collection of all issues",

  // Here is one GET form that has a text and a checkbox.
  @form: { @action: "/search", @method: "GET" ,
    @input: { @type: "text", @name: "query"},
 
    // Pick many of many.
    @input: { @type: "checkbox", @name: "openOnly"},

    // Pick one of many.
    @input: { @type: "radio", @name: "priority", @value: "P1"},
    @input: { @type: "radio", @name: "priority", @value: "P2"},

    // Multiple ways ot submit a form
    @input: { @type: "submit", @value: "cancel", @text: "Bah, nevermind!"},
    @input: { @type: "submit", @value: "done", @text: "Search!"},
  },

  // Forms can perform POSTs too as well as allow selects.
  @form: { @action: "/create", @method: "POST",
    @select: {
      @name: "issueType",
      @option: { @value: "1", @text: "New issue"},
      @option: { @value: "2", @text: "New feature"},
      @option: { @value: "3", @text: "New bug"},
    },

    // A submit button
    @input: { @type: "submit"},
  }
}
Note that, while <checkbox>/<select> don't have a UI equivalency in APIs, they still have hard semantics on how they are supposed to be used: <checkbox> boolean select-many-of-many and <select>s are select-one-of-many. <radio>s don't exist, because they are equivalent to selects.
Links gives developers the ability to include external resources in the current resource (much link <img> tags work).
{
  // Here is an example of alternate links.
  @link: { @rel: "alternate", @href: "/issues/1234", @type: "text/xml"},

  kind: "issue",
  name: "A specific issue",

  author: {
    // Here is an example of a basic inbound link (clients should fetch and
    // insert the contents of the target page into this document tree).
    @link: { @rel: "import", @href: "/users/1234"},
  }
}

Microdata, RDFa, schema.org and opengraph

Suppose you had the following JSON payload:
{
  kind: "Film",
  title: "The Rock"
}
That obviously can't be understood by a generic client because it lacks common semantics. You could use microdata to give clients semantic hints:
{
  @itemtype: "http://schema.org/Movie",
  kind: "Film",
  title: "The Rock",

  // The schema.org Movie equivalent of title.
  name: "The Rock",
}
Similarly, you could use RDFa to annotate your objects:
{
  kind: "Film",
  name: "The Rock",
  @meta: { @property: "og:type", @content: "video.movie"},
  @meta: { @property: "og:title", @text: "The Rock"}
}
Now your clients can use either your proprietary vocabulary or the standardized types in external vocabularies. Here is one example of how this fits with http://schema.org/Action.
{
  @itemtype: "http://schema.org/WebSite",

  name: "hello world",
  url: "http://cnn.com",

  // Here is one GET form that allows you to search given a query.
  @form: {
    @method: "GET",
    @itemtype: "http://schema.org/SearchAction",
    @itemprop: "potentialAction",
    @input: { @required: true, @name: "q", @itemprop: "query", @text: "Puppies"}
  },
}

Extensions

There are a few things that HTML doesn't quite yet support well, so we extend it with a few key affordances.

Nested <input>s

{
  "hello": "world",

  // Forms can perform POSTs too as well as allow selects.
  @form: { @action: "/selects", @method: "POST", @enctype: "application/json",
    @input: { @name: "person", @type: "group",
      @input: { @name: "name", @type: "text", @value: "hello"}
    }
  }
}
Would lead to the following POST request:
POST /widgets/abc123 HTTP/1.1
Host: api.example.com
Content-Length: ...
Content-Type: application/json

{
  "person": {
    "name": "foo",
  }
}

PATCH, PUT and DELETE

We allow PATCH, PUT and DELETE to be a value of the "method" property of <form>.
{
  hello: "world",
 
  // Clients may send a DELETE HTTP request to the server.
  @form: {
    @action: "/resources/123",
    @method: "DELETE"
    @input: { @type: "submit"},
  },
}
This is what PUTs would look like.
{
  hello: "world",

  // Here is one GET form that has a text and a checkbox.
  @form: {
    @action: "/resources/123",
    @method: "PUT",

    // This object goes into the HTTP body of PUT.
    @input: { @type: "hidden", @name: "newvalue", @value: "foo"},

    // You may even require the client to provide an input before
    // you can PUT.
    @input: { @type: "text", @name: "newkey2", @value: "bar"},

    // A submit button
    @input: { @type: "submit"},
  },
}

This is what PATCHs would look like.

{
  count: "3",

  @form: {
    @action: "/resources/123",
    @method: "PATCH",
    @enctype: "application/json-patch",
    // This is a JSON-PATCH object.
    @input: {
      @type: "hidden", @name: "foo",
      @text: {
        @replace: "/count",
        @value: "5"
      }
    },
  },
}

This is obviously more of an exploration than an actual proposal, but I think serves to isolate a problem: JSON doesn't work well for layout/presentation. I don't know yet what's the best way to solve this problem, but I think it is an important factor while picking your hypermedia API format.

Monday, July 27, 2015

application, structure and protocol

... de-coupling common patterns in API design: the "what", "where" and "how".

Some things takes a while to digest, and this is a message from Mike that is taking me some time. Like a good wine, it takes experience and trying things to appreciate: and this one is still not going down super smoothly to me (and honestly, as a community, I still believe we have to agree on terminology here).

What I'm learning the hard way is that you want to de-couple your API design in three independent layers of specificity:
  1. the "what": application
    e.g. events, people, photos, videos, etc
  2. the "where": structure
    e.g. text/html, text/json, text/xml, json-ld, etc
  3. the "how": protocol
    e.g. http://, tel://, mailto://, android-app://, etc
As you move down, the more generic/opaque the message is: the less you know about the message. And you want all of these components to be used/replaced/switched independently.

Here is the metaphor that I use to explain to my co-workers.

When you start writing a letter you first figure out "what" to write ...

... A love letter, a thank you note, a note, etc. You use "domain-specific" language between you and your recipient, like slang, nicknames and emotions. You use recipient-specific common notions on how to organize the world (taxonomies) as well as belief systems (ontologies).


...  you also decide what's the most convenient paper format to convey your message ...

You pick the paper color, shape, size and layout. If the content of the message is short, you pick a small paper. If the content of the message has structure, you pick a more structured layout. If you are writing free-form, you pick a blank paper.


... and you also have to decide how the letter is going to be delivered ...

You pick different trade offs between cost and delivery guarantees: re-try policies, tracking numbers, latency, etc.

But  more importantly, you want to allow yourself to use any combination of these.

You want to have the ability to write about a variety of things, under a variety of paper layouts across a variety of delivery methods.

And this translates to designing your hypermedia API

You want to allow your domain-specific vocabularies (e.g. schema.org, ogp.me, microformats.org or your custom own) to be conveyed in a variety of formats (e.g. web pages, email messages, android XML UIs, etc) and to be transported using different mechanisms (e.g. http, smtp, android intents, etc).

So, where do we go from here?

Above and beyond of course!

While I think there is a lot of work to be done in all of these different levels, having already gone through a lot of time understanding the top layer, I'm particularly interested in the middle layer (structure/layout) - starting from where it currently falls short.

Stay tuned!

Sunday, July 26, 2015

hypermedia api controls missing


... a collection of affordances you find for the human-readable web that aren't yet available for the machine-readable web.

This is part of a series of posts where I try to identify things that are missing in hypermedia API design. You may want to read this as a starting point for context and this as a starting point for why it matters to me.

Here is a list of things that we still* have to parse as humans from human-readable documentation:
  1. basic <form> controls
  2. client-side navigation
  3. client-side dependencies
  4. client-side cardinality and grouping of fields
  5. client-side data loading
  6. client-side validation
* I'm unaware of any hypermedia type that I looked at (with the exception of HTML -- with the addition of javascript) that is able to express all of these. I'd love to be proven wrong and educate myself, just drop me a line in the comments with examples and I'll correct myself.

I'll go over what each of these things look like for the human-readable web.

Basic controls

You want to get some of the basic structural elements HTML forms provide. That includes things like:


  1. enumerations (e.g. <select><option name='foo' value='bar'></select>)
  2. default values (e.g. <input value="foobar">) 
  3. readonly values (e.g. <input type="hidden">) 
  4. semantic auto-completing (e.g. first name, credit card numbers, etc) (e.g. <input type="tel">, <input type="text" autocomplete="firstName">) 

Navigation

Of all of the HTML forms affordances, I'd like to point one that stands out to me: navigation. Looking at APIs right now, you have to "read english documentation" to figure out "where to start" (e.g. which field to fill first), when there is obviously an "intended order" to be followed.

We need something like "tabindexes for programmers".

Dependencies

In the same realm as "navigation", there are occasions when you want to direct your clients to fill things (or not) based on their previous choices.

For example, how often have you said in your "human-readable documentation": "if you set this property, please also add details to this other object"?


Grouping

This is an area that HTML forms falls short: nested inputs. Today, you have to live with key/value pairs. You can accomplish nesting with javascript, but you probably want this to be done declaratively.

Cardinality

You want to give your clients the ability "add/remove more-of-the-same" for repeated nested structures. You also want to give them the ability to tell the "minimum and maximum" number of items needed.


Data loading

There are some cases where your data won't fit into a single payload, and you have to load it dynamically. Supporting auto-completes in APIs are a common example of that.

Validation

I think we still need to evolve a lot how client-side validation is done and how expressive that is, including having the ability to do remote validation.


  • whether the field is required or not (e.g. <input required>)
  • min/max values (e.g. <input min=2 max=10>) 
  • max/min length
  • pattern matching (e.g. <input pattern="/[1-9]/g">) 
  • identical/different fields (e.g. email-matching)
  • remote validation

Having said that ...

I won't make a case that HTML forms are perfect either: plenty of what I mentioned here is done with unstructured javascript (as opposed to declaratively), but that's a subject for a post on its own. Stay tuned!