Google have recently announced version 3 of their YouTube API. This is great news for libgdata: it means we can have access to all the same functionality as before, just with a JSON flavour, rather than Atom.
Sarcasm aside, the last few years of working (on and off) on libgdata has made a number of things obvious about web APIs. Here are some ideas I’ve had for best practices for writing code which interacts with them. Some of these have made their way into libgdata; others would require an API break to implement. References to relevant examples of APIs in libgdata are given inline, but if something isn’t clear please leave a comment. As always, this list is probably incomplete and any additions or alterations to it would be appreciated.
- Have a very general, flexible core API (example), and add a layer of specialisation on top of it (example). This allows client programs to use the general APIs to access new features in the web API if your library hasn’t yet caught up.
- Use objects liberally. Objects can be extended with new properties without breaking API. Structs and function parameter lists cannot. Even if you end up creating objects with a single property (example), don’t create them as structs!
- Don’t worry about CPU efficiency. The cost of creating objects or doing some ‘unnecessary’ extra processing to give your API more flexibility is nothing compared to the cost of a network round trip. Network round trips and memory consumption are the main costs.
- As a corollary to the previous point, the API should be zero-copy for consumers. If possible, try to design the API so that programs using it won’t have to take copies of all the data they access, as this will end up doubling the memory consumption of the application unnecessarily. One way to do this (which is what libgdata does) is to make the objects returned by network requests effectively immutable — e.g. a query will return a new set of result objects each time it’s performed, rather than updating an existing set of them.
- Always try to think one step ahead of the web API designers. This is part of making your API flexible: if you’re thinking about the directions the web API could go in and the features which could be added to it in future, you’ll be more prepared when the web API designers suddenly spring them on you. libgdata managed this with its authentication API, but didn’t manage it with the core feed/entry API.
- Report bugs against the web API. In the case of libgdata, many of the bugs we reported have been ignored, but that’s not the point. By reporting bugs, you help other consumers of the web API, and give (a little) feedback to the web API designers as to how people are using, or expecting to use, the web API. (And also how broken it is.)
- Make everything asynchronous (example). Absolutely everything which could result in a network request should be asynchronous, cancellable, and support returning errors (even if cancellation isn’t initially implemented and no errors are initially returned). This prevents having to break API in the future to make a method asynchronous. Methods which will result in network requests should be clearly separated from non-networking methods, e.g. by using a different naming scheme for them (my_object_request_property() versus my_object_get_property(), for example).
- Design the API with batch processing in mind. Wherever possible, allow sets of objects to be passed to methods, rather than individual objects. If the web API doesn’t support batch processing, the method can just implement a loop internally. If it does, the use of batch processing allows for an order n reduction in network round trips. libgdata failed at this, having to tack batch operations onto the API as an afterthought (example). Fortunately (or perhaps unfortunately) it hasn’t been much of an issue because Google’s batch API never really went anywhere. Clients of libgdata have wanted to use batch functionality, however, and it would have been best implemented from the start.
- Integrate concurrency control in the core of your API (example). Web APIs are interfaces to large distributed systems. As we’ve found with libgdata, concurrency control is important, both in managing conflicts between different clients (e.g. when concurrently modifying an object) — but also in managing conflicts between clients and internal server processes. For example, just after a client creates a document on Google Docs, the server will modify it to add missing metadata. These modifications (and the accompanying change in the object’s version number) are exposed to clients. Google’s APIs (and hence libgdata) implement optimistic concurrency control using HTTP ETags. All operations in libgdata take an ETag parameter. This works fairly well (ignoring the fact that some web API operations inexplicably don’t support ETags).
- Don’t expose specifics of the web API in your API. Take a look at all the functionality exposed by the web API (and all the functionality you think might be added in future), then design an API for it without reference to the existing web API. Once you’re done, try to reconcile the two APIs to make sure yours is actually implementable. This means your API isn’t tied to some esoteric behaviour when the web API gets fixed. However, if done incorrectly this can backfire and leave your API unable to map to future changes in the web API. Your mileage may vary.
- Testing is tricky. You want to test your code against the web API’s production servers, since that’s what it’ll be used against. However, this requires that the machine running the tests is connected to the Internet (which often isn’t the case). It also means your unit tests can (and will) spuriously fail due to transient network problems. The alternative is to test your code against an offline mock-up of the web API. This solves the issues above, but means that you won’t notice changes and incompatibilities in the web API as they’re introduced by the web API developers. libgdata has never managed to get this right. I suspect the best solution is to write unit tests which can be run against either a mock-up or the real web API. Automated regression testing would run the tests against the mock-up, but developers would also regularly manually run the tests against the real web API.