Tuesday, 13 May 2008

How REST will neuter Web Services

I normally like to blog about things that are, and leave tea leave reading to others. Not today. I've been thinking more about a specific example of solving a problem in the RESTful way, and the solution seems like one that is likely to play out in many different arenas. So here's my attempt to predict the what will happen as RESTful thinking penetrates into some of the harder aspects of software engineerings...

Basically, I think REST is going to neuter web services. Not destroy - just neuter. Instead, the Great Irony is that as RESTful patterns for solving problems of enterprise development are discovered, used, and socialized, the net effect will be to unleash a great revival of custom, not-HTTP, non-RESTful protocols. What!?! you say. Read on.

The particular problem I referred to above is a basic one in event driven architectures. The problem statement is this: How do you create a channel for messages where messages are queued, but consumers have unique ownership of messages they dequeue. This is basically the asynchronous form of a message based load balancer, used to drive (among many other things) highly horizontally scalable "master/worker" solutions. James Strachan has a RESTful solution to the queue problem. I've studied it. I recommend you study it, and read the thread at rest-discuss.

This blog is not about the solution per se, but more about who's looking for such solutions, and what they are going to do with them. ActiveMQ is a framework that created a use case specific protocol (Openwire) for messaging that works really well. They want to add RESTful HTTP to their stack of messaging protocols, which also includes STOMP and XMPP, and hopefully one day soon, AMQP. It makes a lot of sense why they would want to enable HTTP as a transport for messaging, but notice that the effect is to make it easier to use ActiveMQ, which is inevitably going to get Openwire solutions in place, for one simple reason: IT'S A LOT BETTER AT MESSAGING.

There's a really excellent comment in the rest-discuss thread about REST, compared to custom protocols, by Alan Dean. He's just collecting comments from Roy Fielding. The first notes that REST's statelessness requirement involves a tradeoff against custom protocols that may degrade performance, since custom protocols can reduce networking costs by leaving data on the server. The third quote notes that abiding by the uniform interface also involves a tradeoff because generic interface methods generally won't be as efficient as those tailored to a specific problem.

Now think of all those WS-* specs out there. Each one was somebody trying to solve a problem that comes up in complex systems. Instead of having another REST vs SOAP debate, how about a different question: why web services at all? Why NOT problem specific protocols like Openwire or AMQP? The standard answer is that these involve sacrifices in interoperability. Unless clients also speak Openwire, they can't get your information. While many clients may choose to do so, those that don't are excluded. This is not good.

But this is why REST is so appealing. It's the ultimate least common denominator solution. EVERYBODY speaks HTTP. It's even really good at some things, if you use it in the RESTful way. And if you leverage "hypermedia as the engine of application state" and idempotency, even hard things are possible (like building concurrency safe queues). But look at WHO is going to solve the hard problems. It's going to be custom protocol creators who are trying to overcome the interoperability objection to their "best" solution. It's guys like James Strachan who want to get ActiveMQ out there for messaging. Instead of a bunch of WS-* goop, REST will see its solutions come more as solution patterns that interact with a specific implementation toolkit that solves the problem for you. These implementations will also have custom protocols tailored to the problem. I predict most people will use these, secure in the knowledge that somebody who can't speak the language can always drop down to the super-portable RESTful API.

In fact, if you use languages that have good client libraries, why wouldn't you use Openwire over REST for it? Especially when the framework provider is going to hand you juicy client libraries to go with server binaries that "just work". The guys who have to use the RESTful mechanisms are the ones that write in languages that don't get big communities creating client libraries for every protocol under the sun. Your Erlang, Lua, or whatever client may have to interact via RESTful HTTP with the server. System admins and monitoring systems will poke at production servers to make sure they're alive using the RESTful APIs. Generic things will be done via REST.

So this is why I say REST will neuter web services. The super accessibility REST provides (better than SOAP solutions) is one half of a tradeoff. The other side is solving the problem with a highly tailored protocol solution that performs and scales. Folks that want the latter will bundle the former with it as a hedge. The end result, I predict, will be that more problems will be solved using tailored solutions which previously would have been unpalatable because they aren't universally accessible. SOAP won't beat either solution, but it won't lose to REST, it will lose to custom protocols that aren't truly viable until they add a RESTful interoperability API.

Posted by spout at 1:24 AM in the internet, web, web 2.0 and beyond

Sunday, 13 April 2008

RESTful Service Descriptions

Over several blogs recently I've looked at RESTful approaches to enterprise integration, somewhat critically. Despite my conclusion that REST inside the firewall is almost crippled by the lack of service descriptions, I do in fact believe that solutions are imminent. Moreover, once these solutions exist and are socialized, I believe it will be game changing within the enterprise. The solutions will not come from the REST community, who (mostly) stubbornly refuse to admit there is an issue. The solution will come anyway, so let's look at where it stands.

Let's start with what a service description must accomplish in a resource oriented enterprise setting. Let's suppose that we stick some RESTful services inside the enterprise ecosystem of a typical enterprise. What do we need to actually describe about them. Well, several things:

  • Where the URI "entry points" are. Where do you start? If you have resources representing a customer, where EXACTLY are they. If I want to look up customers, do I start at http://myco.com/customers or at http://myco.com/accounts? If you think this is silly because somebody could just tell you, then I say two things: first, even a small company probably has 2000 different entities of interest. URI's are never supposed to change in a perfect world. In the real world, they change all the time, especially for dev and test environments inside the firewall. On the internet you probably use a search engine, read blogs, or get emailed a link. Google doesn't typically index inside the firewall, so this doesn't help. The other options don't scale.
  • Metadata about parameters OK, so you know where to go to ask for a filtered set of your favorite entity. What parameters can you put into the URL, which are mandatory, what are their datatypes, and what are the allowable values? Once you know the entry point above, you can do a GET request and get back some nice document with links galore to things of interest (as in "hypermedia as the engine of application state"). Great, how do you specify the start date and end date? Do you have to specify the account name? If so, what's the parameter called? Can you specify the maximum record size to return? etc...
  • What media types are used Can I get purchase orders in JSON? XML? as an ATOM feed? Pretty simple question. You can get this via HTTP HEAD and/or by setting the accepts request header. Sounds like we don't need this in the service description, right. Well, you often have to do gap analysis before you have physical access to a service. You need to figure out things like: When they come on line with purchase orders next month, will they support JSON or do we need to submit a change request? If you have their latest service description, you know the answer. If that document changes, you can ask questions about why things where added or removed because you have something tangible to compare to.
  • Which HTTP operations can you access In theory, this is what HTTP OPTIONS should answer. In reality, who knows what the answer to HTTP OPTIONS looks like. It should probably be the relevent service description document. So rather than replacing it, HTTP OPTIONS becomes a way of retrieving it.
  • What validity constraints exist on the media types If you are using XML, what schema, DTD, relax-NG description applies? This is a big open issue. For the same reasons as above telling me to "get one and look" doesn't work. More fundamentally, you cannot infer a global constraint by looking at instances that happen to obey it. This problem has been mentioned by several people: stu, Stefan Tilkov, James Strachan, and Dan Diephouse

The spec that seems to be able to deliver on all this without adding a bunch of extra useless weight seems to be WADL. Here's a nice quick example of WADL as used in Jersey (the JSR-311 reference implementation) to give you a sense of it. It's difficult to see how anyone could object to this: it's simple, lightweight, does the above and little else, and is comprehensible. Your eyes will not melt if you try to read it or its spec. Jersey returns the WADL doc when you do an HTTP OPTIONS on the resource. That's sweet! I'm really hoping this gets more attention. I note WADL is coming out of the Java community, not the REST community.

Another exciting development is to clothe your RESTful web services in the Atom Publishing Protocol (AtomPub). AtomPub is a RESTful application protocol for CRUD operatoins on resources that produce Atom Feeds. To see the power, look at how ActiveMQ is proposing to implement a RESTful Queue. This is a nontrivial problem. If you don't see why, follow the link to the discussion at rest-discuss and read the thread.

The use of AtomPub and Atom directly for publishing to web services (as opposed to the typical use case of HTML based blogs) is an idea that's been getting a lot more attention lately. Unfortunately, as James Strachan points out, it doesn't let you easily associate an XML schema when you use the application+xml media type. But it could, with a minor modification. Maybe this problem should be solved at the HTTP level, by introducing the URI to appropriate metadata (such as the XML schema) as a media type parameter. Either way, it's possible that we'll see an application protocol (like AtomPub) wrapping up RESTful web services and filling the gaps within it's constructs.

Finally, there's WSDL 2.0. which will also support REST. At the moment WADL's simplicity and implementation lead seem to make it more interesting to me, but WSDL 2.0 will have to be supported by all the next gen SOAP stacks, so we're likely to see these going dual architectural style and supporting REST too.

Technorati Tags:

Posted by spout at 2:48 AM in the internet, web, web 2.0 and beyond

Monday, 24 March 2008

Tilkov on Doubts About REST

In a recent InfoQ piece Stefan Tilkov attempts to address doubts about REST. I offer my reactions to his 10 items.

  1. REST may be usable for CRUD, but not for “real” business logic
  2. I agree with Tilkov, that REST actually doesn't map cleanly to CRUD, and I agree that if your only goal is to wrap resource access with business rules, REST can work just fine. I think Tilkov would have made a stronger point if he discussed the paradigm of "hypertext as the engine of application state" as part of workflow business rules. This highlights the fact that security often needs to be a maintained statefully, but this is another question.
  3. There is no formal contract/no description language
  4. Tilkov gives three answers. First that you can RESTfully access constrained XML documents. That's a weak answer because there's no way to know or discover beforehand what the XML schema at a given URL will be. And if it changes, there's nothing to confront the service provider with to say "look, you don't comply with your contract". Second, he says you don't need contracts as you'd have to read documentation anyway. That's laughable. All those WSDLs we use daily for SOAP, we don't need. OK. The third answer is WADL and WSDL 2.0. This, of course, disproves the previous two answers, except that they are both extremely young and need to mature. You can't answer the first two ways and the third way simultaneously! The problem with WADL and WSDL2 is that most REST advocates believe the other two answers and aren't creating tooling around these standards. They have this unreasonable fear that embracing contracts will take them on the path towards WS-* hell.
  5. Who would actually want to expose so much of their application’s implementation internals
  6. This one is batted down pretty handily. With REST you still control what is exposed and what is not.
  7. REST works with HTTP only, it’s not transport protocol independent
  8. I don't buy this answer. The talk about HTTP vs TCP is downright strawman-ish. You actually are starting to see things like SOAP over JMS occasionally, and when AMQP delivers true platform independence, I expect messaging based SOAP will take off.
  9. There is no practical, clear & consistent guidance on how to design RESTful applications
  10. This is a cultural thing with the REST community. Somebody should write a REST Cookbook. There are too many arguments about how this or that is not RESTful with no base of examples that are RESTful. Atom is not enough. Tilkov's strawman comparison to WSDL/SOAP is weak. There are plenty of books that take you step-by-step on how to design a SOAP service. They don't have to agree: "best" practices can be written in stone when the space stops evolving.
  11. REST does not support transactions
  12. It's agreed that distributed transactions often (even usually) aren't needed. But I shouldn't have my web services architecture decide this for me. There ARE cases where architects want to use these. Global consistent data is very important is many inventory or financial systems. It's also often needed for reliable messaging as discussed in the next bullet. That said, if I need transactions, I usually switch from SOAP to JMS to avoid the WS-* yuckies.
  13. REST is unreliable
  14. Well, I actually think WS-ReliableMessaging is weak. It's support by various SOAP stacks is spotty. When I want reliability, I usually move to JMS, often dumping SOAP in the process. WS-Death-* is SOAP's weakness. That said, idempotency is not a solution, come on. These require the client to rely on behavior from the server to verify the message is received. You need contracts to do that in an automatic way. The other solutions are an attempt to start creating WS-* equivalents. Technically it could work, but these are curiosities, not standards with widespread adoption. Really, the moment you have contracts (see #2) you have a mechanism where you don't need human intervention at design time.
  15. No pub/sub support
  16. This one is bunk. He gives Atom as proof. With some standards, there could easily be ways to POST or PUT a callback url, too, for peer to peer interactions. Also see the next question. Contracts to allow discovery would be nice (see #2).
  17. No asynchronous interactions
  18. He should mention Comet. But POST or PUT of a callback URL could work for peer-to-peer interactions, or simply hand the client of the URL to poll. His idea to use response code 202 is a little weird. Contracts to simply declare support for a particular standard would be nice.
  19. Lack of tools
  20. He's right that if you use XML formats, you have all the same tools. HTTP is otherwise pretty universally embraced. However, full access to all the HTTP methods besides GET and POST is occasionally problematic. REST lags far behind when it comes to service contracts, of course. In fact, I predict it will be the SOAP stacks that first close this gap for REST when they embrace WSDL 2.0. There need to be better client side tools for operating on particular contract standards, too. For example pub/sub is easy if your client knows how to post its callback URL because it sees a standard pub/sub mechanism is declared. Tooling means making these standards and making clients that use them transparently. REST can hopefully skip some of the WS-* hell, but it can't skip all of it.

    In summary, I still think REST contracts need to mature (but it looks like progress is starting). Once this happens, standards and tooling can deal with the rest. It's not clear how REST can avoid the WS-* debacle that SOAP went through. But many of the ten doubts really have been dispelled.

    Technorati Tags:

Posted by spout at 3:36 AM in the internet, web, web 2.0 and beyond

What is Software Architecture?

In reacting to my recent blog discussing the Uniform Interface as used in ROA via its reliance on REST, Roy Fielding posted a blog On software architecture. In it, Fielding states his opinion that people "don’t understand the differences between software architecture and implementation, let alone between architectural styles and software architecture." I'm not sure if that was directed at me specifically or not (I suspect it was), but I think it's unreasonable to introduce nuanced abstractions and then complain if people don't adopt them. Like any term of art, the meaning of "software architecture" is open to debate and individuals can propose their definitions and the community of practitioners will socialize what they find useful. But since the point was raised, I'm going to chime in and ask the question "what is software architecture" and I'm going to sketch my own answer. I don't see much value in abstractions above this, as there is too much vagueness at the this level alone.

My job title has "software architect" in it, so I do think I'm entitled to an opinion on what such terms mean. In an enterprise, an architect spends his time deciding what aspects of software and system design should be standardized across the organization and what things should be left to the judgment of individual implementations. Our goal, like that of anyone else in the business, is to deliver value to the company as efficiently as possible. When we decide to standardize, we have to spell out what is and isn't allowed, and we need to defend the value proposition associated with making rules over tolerating divergent solutions. I view an architecture as a set of principles, where a principle is a function that maps the space of designs into a binary space of "yes/encouraged/allowed/true/conformant/good" vs "no/discouraged/rejected/false/nonconformant/bad". When I use or hear discussion of architectural "constraints", I think in terms of such functions.

A business considers an architecture to be "good" if it leads to lower IT costs over time, balancing well between actually shipping working solution with a shorter-is-better look at development time. Since this is evaluated "over time" you have to include maintainability, adaptability, and so on. What makes an architecture "good" is dynamically changing: it depends on the habits and knowledge of the individuals in the organization, and it depends on the extent to which consensus has emerged. These things aren't perfectly knowable, and architectural leadership is as much about communicating and guiding as it is about deciding. I think my concepts here of software architecture align a lot with those of Martin Fowler, as described in his essay Who Needs an Architect?.

There is a classic debate between academics and applied practitioners as to the value of standalone theory. I'm an unapologetic applied practitioner. When we introduce nuances like the difference bewteen "architectural style" and "software architecture", I ask "who are you trying to communicate to?" When most people don't understand the distinction I wonder whose fault Dr. Fielding thinks that is and what we should do about it. At a minimum, I think it's fair to ask "why introduce a second level of abstraction?" Can't I reformulate a REST equivalent as a set of architectural principles in the sense I give above and take any proposed concrete design and say which, if any, of those principles it violates? Isn't the real problem here that we named the abstraction without having a few concrete architectures to justify it?

The use of language as a means to develop common understanding is an important step in the evolution of software architecture. Common understanding hopefully leads to communication about common experiences, and this in turn leads to consensus on what is or isn't a best practice worth standardizing upon. When I adopt language, I do so with this end in mind. I tried to sidestep the double abstraction by writing about ROA and not really about REST other than how it's used within ROA. I did this because over the past year or so, there has been an ongoing discussion influenced by Sam Ruby's book RESTful Web Services. Part of my job is to critically examine all sides of archictural trends. I, along with my fellow architects, need to guide my company on how and when to adopt architectural principles. In order to do that, I need a name for the set of practices we should consider adopting. All I can say is that continued confusion over terminology and concepts is not a good selling point for REST or ROA or whatever I should call the thing we could adopt is.

I note the Roy appears to lump me in as an "SOA advocate". I suppose it looks that way, but I'd like to make it clear that this is "for now" only. Again, what's "best" is a influenced heavily by human factors such as understanding, consensus, industry support, and so on. I'm no SOAP fanboy: there's a whole lot to dislike about SOAP and the lack of simpler solutions for SOA wastes endless sums of money across the IT industry. In my view the race is on to see if SOAP can simplify before some particular REST approach matures enough to displace it. I would not be surprised at all if specs like WADL or WSDL 2.0 help move a RESTful web services approach forward. But lots of REST advocates seem to have a philosophical opposition to these and my message to them is unambiguous: within the enterprise, REST won't matter without these. If nothing changes REST will continue be used to solve the problems of web communities. Solutions like Atom where a critical mass of people with a common problem collaborate on standards and tools will be the only arena where REST impacts. Inside the enterprise where programmers toil one or two at a time under deadlines to integrate the backends of systems in environments where politics exist, SOAP will be what people like me conclude we need to use. Again, I said "if nothing changes". I hope it will, and when it does you'll read about it here. Dr. Fielding reads this as "criticisms of [his] work". That's a glass half-empty view. A gap is an opportunity to improve, and closing this one would make it more useful in the space I happen to work in. That space is a big one, but it's not the only one. I wouldn't be discussing REST at all if it didn't solve some problem well. Please note that my Atom feed for this blog works great and is joyously RESTful.

Finally, I'd like to point out that Dr. Fielding essentially restated my main point when he says "REST constraints do not constrain Web architecture — they constrain RESTful architectures (including those found within the Web architecture) that voluntarily wish to be so constrained." It seems really odd to recognize the value of a set of voluntary constraints without immediately asking how to declare that you have volunteered, so that interacting parties can rely upon it. This of course, is exactly what a contract is. It would be nicer still, and downright useful, if there was an objective way to test that a site that declared itself to have some particular RESTful architecture actually did. In other words, its also useful to audit compliance to the contract.

Technorati Tags:

Posted by spout at 12:56 AM in the internet, web, web 2.0 and beyond

Thursday, 20 March 2008

ROA vs SOA: The Uniform Interface

Here's part 3 in my series comparing Service Oriented and Resource Oriented architectures. If follows part 1: Comparing ROA and SOA and part 2: ROA and SOA and Service Contracts. This article will focus on the second major difference between ROA and SOA from part 1: REST's uniform interface vs SOAP's rich operation sets (the other two being contracts and data/metadata differentiation). REST relies on the HTTP uniform interface, and prefers to do so with minimal overloading (ie don't put requested operations as parameters), whereas SOAP provides a rich set of domain specific operations, and requires an invocation POST to specify which operation from an endpoint specific set that is contractually specified.

My conclusion is: REST was supposed to define a set of architectural constraints, but nobody is constrained by the uniform interface and many don't implement it or worse violate it. To identify those resources agreeing to abide is a contractual determination, not an architectural one.

Let's review what a uniform interface is, and then what the ROA uniform interface is. As a simpleton definition, a uniform interface means that for all things capable of responding to a request, the language to invoke the request is the same (aka uniform). This means the language is a system invariant as opposed to something that must be specified per target. This is important because it means that intermediaries can attach to the system and the semantics of any observed requests are transparent to the intermediary. This explaination is my own spin and interpretation. Consulting the authoritative source is (to me, YMMV) like reading a political speech: it says much that is often repeated by adherents but uses too many nebulously defined terms that at the end of the day don't convey any precise meaning. The uniform interface, as described by Fielding, is defined by four interface architectural constraints: identification of resources via a URI, manipulation of resources through representations, self-descriptive messages, and hypermedia as the engine of application state. Fielding later explains: "REST enables intermediate processing by constraining messages to be self-descriptive: interaction is stateless between requests, standard methods and media types are used to indicate semantics and exchange information, and responses explicitly indicate cacheability." Generally, I agree there is tremendous value in having a standard set of operations with externally known, target-independent semantics. The idea in ROA is that all resources in the system can receive GET, POST, PUT, and DELETE. Unfortunately, very few people actually implement these faithfully, and there are other HTTP operations like TRACE, HEAD, OPTIONS, and CONNECT which are even less faithfully implemented. Almost all of the internet fails to implement the uniform interface. Browsers don't actually support PUT and DELETE, by the way. Lot's of resources even violate the normal meanings. As one example, ActiveMQ's is a message broker that allows HTTP GET to destructively dequeue a message. Is this "bad"? or useful?

I hate to come back to the contract question again, but how exactly do I declare that I do faithfully use the full uniform contract of ROA. How do I shout from the rooftops that my GETs are safe? I suppose somebody could publish an atom feed of all "know uniform" sites. The sad irony is that were such a way invented, it would express nothing more than that specific targets obey a particular interface (the uniform one). This is the problem with trying to avoid contracts - it doesn't work. The sad reality is that the uniform interface is not uniformly adopted. It'd be nice if it were, but alas, this is a pipe dream. It'd be nice if people wouldn't spam me, too.

Let's pretend that everybody gets religion and tries to follow the "spirit of the web". ActiveMQ only dequeues on DELETE. Even, then, there's the problem that POST doesn't actually have a standard meaning that can be predicted in a resource independent way. It's the operation with side effects, right? If that's all we can say about it, that's lame. It's also false: there's no guarantee that POST must have side effects on any particular invocation. The REST crowd cries with horror at the "abuse of the web" that occurs when somebody creates a SOAP interface and does a POST with an operation parameter of getPurchaseOrder to return a purchase order document. The offender is often lectured about the HTTP Spec, section 9.3 and 9.5 and the Axioms of The Web, and then referred to The REST Bible. Tim Berners Lee and Roy Fielding's contributions to the sciences and useful arts deserve respect, but only because they try to explain their points without the sophistry of appeal to authority, something lost on most REST advocates. Back to the point: there is nothing in the HTTP spec section 9.3 or 9.5 that implies you cannot embed a "GET equivalent" inside a POST, as SOAP often does. One of the (many) enumerated functions POST was intended to cover is that of "providing a block of data, such as the result of submitting a form, to a data-handling process" and " The actual function performed by the POST method is determined by the server ...". To the Turing machine at the other end of your POST there is no meaningful distinction between such an allowable data block and one that invokes "operations" via a string identifier.

When you look at TBL's document above, you do find this axiom that appears to forbid the practice. His argument for the axiom is that "The introduction of any other method apart from GET which has no side effects is also incorrect, because the results of such an operation effectively form a separate address space, which violates the universality." The axiom of universality in two parts states "Any resource anywhere can be given a URI" and "Any resource of significance should be given a URI." Keep in mind that resources in the semantic web can be "...anything - real objects, abstract concepts." The idea here is that since you are "obviously" just getting a resource, you should give it a URI and GET it.

Unfortunately, to denounce getPurchaseOrder relies on your desires as client rather than what I as the implementer have agreed to. How exactly how are you concluding from the name only, that "getPurchaseOrder" has no side effect? The WSDL doesn't specify that this is true and one of the SOA principles "autonomy" says the other details are encapsulated in a single point of control. Unless I explicitly gave you a commitment (and I didn't), you really you have no such guarantee. How do you know that my getPurchaseOrder implementation doesn't increment a counter somewhere as a side affect. How do you know it doesn't assign a random sales engineer's name from a rotating pool on each request? You want getPurchaseOrder to be side effect free, but that desire is all you have. Here's the problem: functionality changes over time. I may not be willing to commit to getPurchaseOrder being side effect free for now and all times. Even if it is truly side effect free, that's just for today. Will it stay that way? Maybe it wasn't put in a GET so that I can have the flexibility to change the operation's side effects next year when we get serious about Sarbanes-Oxley. The rule has to be that the operation is guaranteed to NEVER have side effects. The "axioms of the web" really don't contradict embedding GET in POST if the side-effect free guarantee isn't established. And that is 100% up to me as implementer to determine. You as client can never verify the call is side effect free, as I don't have to tell you everything I do when I receive your call, and you can never observe "all resources" because of the open world assumption on the web.

In summary, I wish the uniform interface of REST was actually adhered to by all HTTP resources. Even if it was, there is nothing nonconformant with SOAP. I think I could defend the proposition that SOAP is RESTful. Finally,in the real world, adherence to the uniform interface is not uniform, so there is no escaping the need for contracts to declare target specific interfaces.

Technorati Tags:

Posted by spout at 10:58 PM in the internet, web, web 2.0 and beyond

Tuesday, 12 February 2008

The Need for Distributed Version Control in the Enterprise

I'm looking at software configuration management tools again. My last three projects at work have all used subversion. This was a big improvement over CVS, but I'm growing tired of dealing with needing to push everything back through a central server to share it with others (or to share it between my computers). I'm also finding that the pains of branching and merging really limit the pace of change. These are the problems that some of the new SCM tools like git and mercurial were created to solve. I'm convinced that enterprises need distributed version control.

It seems many people think tools like git an mercurial are only appropriate for large open source projects, and aren't necessarily relevant to the enterprise. I submit that enterprise development teams need the new tools just as much, if for slightly different reasons. Basically, as enterprise developers, we need to write code in a feature oriented way and then group completed features together into releases. There are two approaches to dealing with how to allocate features to releases: you either decide the features in a release up front and work until you are done, or you decide the time frame for a release and pull in the completed features to it and push incomplete ones to the next release. My thesis here today is that the latter method is better, but it really stresses your ability to manage branching and merging, and that tools like git and mercurial are simply better at managing this.

The big problem in software development (as everyone knows) is that nobody can figure out how to estimate the duration of a task very well. If you think you have a solution to this problem, slap yourself, because you don't. The variability in estimating means that doing the "feature scoped" method incurs heavy waste as all features in a release go out at the pace of the slowest one. So most developers finish their tasks well before the slowest and then have degraded productivity until the next release. These problems get worse the more features are in a branch.

The degraded productivity goes away if developers can push and pull features between releases. In fact, the cost of estimating wrongly drops if we can juggle features easily, because now aren't holding a bunch of working features when we estimate wrong.

But you can start to see how all these problems rely on nimbleness in source control tools. Branching, merging, and conflict resolution have to be easy, quick, and simple. At any given time you might have a released branch, a branch under QA/test, and usually several different release branches in development: the next few minor releases and also the next major release. A feature's code can be juggled around among the unreleased versions. For example, if we find a bug, we want to apply the fix to all in process development branches. We create a branch just for this bug, maybe, and merge it's changes into all the other branches. Often we juggle a feature out of one branch and into another because it isn't ready in time. Whenever we create multiple development branches, we have to merge the earlier ones into the latter ones a lot.

The fact is that merging is extremely painful in subversion for several reasons. First, subversion doesn't really think in terms of branches. The directories in your /branches folder are really nothing more than ordinary directories. Worse, subversion doesn't help you at all in terms of keep in track of completed merge operations. Keeping track of merged revisions manually simply doesn't scale - the process becomes error prone, and anyone who has done a lot of branching and merging in subversion has stories to tell on the theme of "oops". In fact, if branching was really cheap it might be better to create a branch per feature. This would let release managers wait until the last possible moment to merge changes: they'd look in the ticket tracking system, see which ones are "ready for merge" and pull them in, or better yet have the feature developer do it, resolving conflicts first. Many of us are used to doing this with patch files, but the problem there is an ordinary diff can't understand directory structures and how things in different places really are different versions of "the same" thing. And there's also the problem of the directory structure itself.

As the number of branches goes up, there will be a natural desire for peer-to-peer merging capabilities. Your peer has a bug? Pull his branch. If two people collaborate on a feature, why make them talk to a middleman to share changes? Especially when those changes may be broken or half-baked? Ever been annoyed by doing a subversion update and getting a bunch of changes you didn't want yet? Worse, ever seen one that broke everything? What if you only pull in things you want, when you are ready and know it's coming? What if half the time you don't even have to do that because the other guy is pulling your stuff first? Now distributed SCM seems a lot more reasonable. What if you have multiple computers (home/work or PC/laptop or both). Wouldn't it be nice to share changes with yourself, even if they don't even compile at the moment? Do it in one step instead of two. Oh, and now you and your coworker can collaborate outside of the office. So go to a cafe and get some work done together.

It looks like Git and Mercurial both have a compelling improvement to offer enterprise development shops. Git is slightly faster and has a more comprehensive command set, but Mercurial's commands will be more familiar to SVN/CVS users and Mercurial has good windows support. Lots of big projects are using these tools: Linux uses Git, Java uses Mercurial. We're starting to see more medium and small shops use these too. Which is better? They're both good, and there's a tradeoff of power vs simplicity and speed vs portability. Pick the first option in either tradeoff and you should look at Git, pick the second and you should use Mercurial.

Technorati Tags:

Posted by spout at 3:02 AM in the internet, web, web 2.0 and beyond

Monday, 21 January 2008

Java on the Decline? Think Again

It seems that everywhere I look people are taking pot shots at Java, declaring it passe and acting like it's on it's death bed. What a nice thing it is to see unbiased measurement of language popularity from Tiobe. That link is to the January 2008 Programming Community Index.

Here's what their unbiased measurements actually show:

  • Java is the most popular language, capturing 20.8% of the popularity
  • That lead increased over second place C (now at 13.9%) by 3.6% over a year ago
  • Java's absolute popularity increased from a year ago by 1.7%
  • Ruby's popularity is 2.3% ranking it 11th
  • Python was the biggest gainer over the last year, growing 2.0%, giving it 6th place
  • Python overtook long time rival Perl for the first time this year
  • Visual Basic (now 3rd) was the second biggest gainer, Java was 3rd
  • Ruby's popularity actually fell 0.17% over the last year
  • Perl, ranking 7th, is still more than twice as popular that Ruby
  • C lost the most ground at -1.89%
  • C++ lost the second most ground at -1.7%
  • C++ fell to 5th, as it was overtaken by both Visual Basic and PHP
  • PHP gained 1.25% in popularity over the year
  • Java is more popular than VB and C# combined (aka .net) which total 15.7%
  • Delphi at 3.3% surged ahead of both Javascript and Ruby.
  • Lua ranks 16th, up from 46th
  • Groovy ranks 31st
If these numbers surprise you, maybe you need to be more skeptical when you read blogs.

Technorati Tags:

Posted by spout at 1:04 AM in the internet, web, web 2.0 and beyond

Thursday, 15 November 2007

GCal & Lightning/Thunderbird

I'm the kind of guy who adopted linux as my primary desktop years ago. I've been lucky enough to use it at work exclusively for almost two years, and I've used it at home exclusively for five or six years. Windows actually feels really strange to me now. One issue that has been a big pain at work was calendar support. Finally I think I've found something that makes me happy: I'm using Thunderbird with Lightning and the Provider for Google Calendar. A good write up on setting this up is here.

Finally I think I have a good setup that meets all my needs. First of all, my Calendar has to live in my email program, so I can get meeting invitations and hit "accept" and it's on my calendar. There are a lot of email programs for linux that I don't like, so this was a limiting factor for a while. I need one with good support for multiple mailboxes and lots of features. I also want to be able to get to my calendar from anywhere. I use a VPN at home a lot to read my email as I somewhat dislike webmail. I want to not care which box I'm "accepting" the meeting message on. Also, I want to have a browser bookmark to my calendar or be able to navigate to it from somebody else's browser if I don't have my computer.

I actually think this combination has advantages over Outlook (besides being open source and working on any operating system). GCal and Lighning finally solve my calendar issue, so I can cross one of my biggest annoyances off my list. Now if I could just get an open source driver for my ATI graphics card, I'd be very happy.

Technorati Tags:

Posted by spout at 7:31 PM in the internet, web, web 2.0 and beyond

Sunday, 11 November 2007

Yes you do need a Product Road Map

You've got to love David Heinemeier Hansson. He may be the new Mark Fleury: always ready to give you his opinion on why everything you know and do is wrong. His latest: a little essay on why product road maps are harmful.

While it's good to question the sacred cows, I'm not so sure that 37signals has really sustained growth over a long enough period of time to deserve any special credibility in this arena. There is an arrogance that often comes to upstarts who produce some innovations and get market success and I suspect DHH is surfing in the wake of his own big head on this one. You would think that if it actually worked to ignore customer input and just innovate, that it wouldn't have taken until 2007 to discover it. Certainly there has been no shortage of really smart people who tried to ignore their customers (I just can't remember any who were successful over the long laul). Yet it seems like the old rich guys all say the same thing: listen to your customers like they are God, because they are.

The problem with DHH's argument is that it is a cynical straw man. He assumes that having a road map and peddling vaporware are the same thing. They are not. The fact that road maps can be and are often abused by "bullet point" management only proves that the abuse is bad, not tool used by the abuse. Good companies almost by definition execute well on their road maps.

I suspect "most people can make do without [a road map]" is the kind of idea that only occurs to small companies who are growing really fast. While you have the big growth, you tend to take customers for granted, though most don't have the nerve to celebrate it like DHH. If you lose a few customers in that situation, who cares: you can't grow fast enough to accommodate them all anyway. Unfortunately, the problem with this is that it never lasts and sooner or later you'll wake up and find you aren't the hot thing anymore, and you'll realize that your customers have no loyalty because you've treated them like crap for years in terms of discussing their needs, desires, and goals. Once the market buzz has died, it's a whole lot better to be eBay than AOL.

DHH compares software product road maps to big upfront software design in the derided waterfall software development methodology. By the way, I recently saw Dan Pritchett of eBay say that they use waterfall. He was almost apologetic about it, during a talk about how they solve their insane scalability problems, but I digress. Even assuming that big design up front is bad for software design, I disagree with the point. Product management and software design are not at all similar. Examine java: it's hardly developed waterfall style. Most of the agile principles out there originated or found early homes in the java community. Yet java has a roadmap created by the Java Community Process and Sun's (and IBM's and BEA's and Oracle's) business strategies. The road map serves as a direction to unify cooperation. It makes the strategy tangible.

Want to know what's going to be in Java 7? It's there in detail. Want to know what's going to be in Java 8? It's there too. See something you like and want to help, click the link. For example, coming is JVM support for for dynamically typing for scripting language support, which will make groovy and jruby substantially faster. Did Sun have a road map when they hired the jruby team? Yes, you bet they did. And jruby is already at par performance-wise with ruby, and Glassfish will leverage this to be the most scalable app server for rails. Did they have a road map when they added ruby support to the Netbeans IDE? Yes, you bet they did. Did Java have a road map when they created groovy? Yes, it's called JSR-241. If DHH really believes that road maps are harmful, I suspect he's happy that there seems to be a road map within the java community of equaling or bettering all of the purported innovations of RoR within the java stack. Sooner or later, no matter who you are and how gifted, one of your competitors will innovate and jump ahead of you in some area. What do you do then? If you are smart and humble, you put closing the gap on your road map and rally your people on how to do it. If you want to lead, you have to tell people generally where you are going, unless you want to attract sheep. Maybe you do, but sheep aren't that lucrative.

Nobody can sustain their business on ground breaking innovation year after year. Do companies that innovate more do better, yes. Can innovation jump start you from where you are? Yes. Is innovation the only thing that matters, long term? No. Once you've innovated, you've got to get and keep customers. Tivo is a great example of an IT company that missed this point. The jury is still out on whether Rails has any long term staying power. I predict Rails is more like Tivo than like the Macintosh, but we'll see.

The simple fact is that people don't spend money on non-commoditized things unless they have confidence in the road map. That includes both customers and investors. I have personally been part of several large dollar purchases of software (the kind will more than one comma in the price), and you can be damn sure we asked vendors about and factored into our decisions the vendor's road map. The road map tells you whether they understand you or not. If you don't want me as a customer, don't have a road map.

Technorati Tags:

Posted by spout at 4:26 AM in the internet, web, web 2.0 and beyond

Thursday, 6 September 2007

ROA and SOA and Service Contracts

I study service description contracts as a key difference between REST and SOAP, as part of my series comparing Resource Oriented Architectures and Service Oriented Architecture. In the previous article on ROA and SOA I examined what ROA and SOA are as objectively as I could. Now it's time to state some opinions on how the differences play out in an enterprise setting by studying the practical considerations regarding service description contracts with adopting ROA or SOA. These are non-technical considerations that an enterprise has to think about when deciding on what to adopt.

In the previous article, I found three key differences between ROA and SOA:
  1. Contractual vs Not - SOA offers contractually, discoverable interfaces, while and ROA expects designers to solve problems as they see fit.
  2. Rich vs Uniform Behavior - ROA has the uniform interface applied with minimally overloading, while SOA provides rich, discoverable behavior
  3. Differentiation vs Equivalence of Data and Metadata - SOA relies on metadata to inform behavior, while ROA uses connectedness and design time conventions for representations and URI's to blur data and metadata together.
Today I study the value of web service description contracts, in a typical enterprise development setting.

My conclusion: I think enterprise settings mostly need service descriptions. A contract levels the playing field between service provider and client which is important when organizational politics might get involved. Otherwise the provider has all the power and the proliferation of unstandardized web service providers in an enterprise setting will take an huge toll. Lack of a contract certainly gives service providers freedom. That's great if you are a provider as there's no one to answer to. It's terrible if you need to be the client of 20 different kinds of web services, since basic questions like "what URL do I POST to and what do I POST to get a list of X's with property Y" will be documented 20 different ways. While efforts like WADL and NSDL offer solutions, these have to be institutionalized into ROA for it to be viable in the enterprise, and frankly the REST people resist contracts for reasons that appear cultural. The sad thing about all this is that REST is not incompatible with contractual service descriptions, there just are no warmly embraced standards for them to correspond to WSDL in SOAP.

Recall with horror how hard it is to maintain an application if SQL has leaked throughout the layers, especially to my view code. Somebody moves an entity attribute to another table and "ack!" we've got tons of duplicated code to fix. Now consider the weak correspondence between GET, POST, PUT, DELETE and the CRUD operations. Now consider that the URI query parameters are your new form for expressing WHERE clauses, and that, unlike SQL, web services are a distributed technology so you don't control the analog of the database schema. Now consider that you can't encapsulate these URI's because they are designed to be universal addresses. Links could be in your source code, in your data, or in your end user's bookmarks and emails. REST depends a little too much on links not breaking and POST parameters not changing to NOT express this in a contract.

In order to use any web service, the caller must know the technical details about how to invoke it. WSDL helps SOAP solve this problem. The WSDL spec is admittedly a bit over-engineered, and solves a much broader, more abstract problem than what SOAP over HTTP alone requires. The bloat is a minor flaw, and is greatly overstated by detractors. I often look at raw WSDL in an editor.

But really, who cares how ugly it is or isn't? As long as the tools eat it, I'm happy. And WSDL is very well tooled because the IT industry invested heavily in SOA. Most of the kinks have been worked out by now, so that if you follow the WS-I recommendations, you get great interoperability that's easy to use. Stick the WSDL it in your favorite SOAP stack, whether that's one of a dozen in Java, or C#, Ruby, Erlang or whatever. And then forget about it. Visual WSDL explorers are appearing, like the one the Eclipse web services plugin provides.

I hear the REST people saying "how hard is it to GET and POST to the URI - why do we need a web service description?". It's not very hard as long as you know everything in this list:

  1. The server that hosts the service
  2. The URL path structure to the resource you want
  3. The required and allowable query parameters after "?" in the URL
  4. What values are allowable for each parameter
  5. What type of response you will get back
For an example, let's look at GData atom feeds from google. Items 1-4 are defined by the GData API Protocol Reference and item 5 is defined by the Atom RFC. Google and ATOM have done a great job of defining a nice RESTful architecture, and I expect it's pretty efficient and I think they got it right. We'll see below they're in the suite spot for using ROA.

But let's look at what it took to create this solution and ask if we can replicate this in the enterprise when we serve up enterprise data instead of blog feeds. Instead of an instance of a WSDL and a discoverable XML Schema contained in it, we have to find something that corresonds to the ATOM spec itself, which stands on its own 42 pages (and ATOM is a VERY simple mechanism). ATOM is nicely written and readable, but every ATOM tool maker should read and understand it because the spec is all you get.

ATOM is a widely known enough standard to have attracted tool makers to help. The problem is that that only helps when I make web services for things like ubiquitous blog feed formats backed with open standards. The tooling that chews on WSDL applies to *ALL* SOAP services. That's the difference without contracts: every service with a different payload paradigm is a one-off. I think this hints at an economic reality: ROA without service description contracts is indicated when the economics favor a very large, diverse set of web service clients that can collaborate on tooling up with format-specific solutions.

There's one special case that deserves comment: if the ROA web service supplies clients (such as a UI) within the same development team, then you can get reuse of tribal knowledge of the payload format. In such settings, you probably can be agile with what the web service provides, so long as no other parties are depending on it for critical business processes. I think of this as ROA in the MVC layer and not really as enterprise web services. This kind of ROA might often prefer JSON over XML and dazzle us with mash-ups. In summary, ROA without contracts works well within shouting range or globally, but not as well as SOAP/WSDL in between. Another way to state the same thing: if organizational politics are involved, you need contracts governing your web services.

Also, don't forget that to use a specific web service like GData, the GData API style information has to be created, maintained, communicated, and understood. These are all costs accrued per resource type. To be fair, before you call a web service to make a state change you need to understand what is going to happen in a way code reading the WSDL can't help with. But WSDL provides a natural place to describe the operations in human readable form, though admittedly like any form of code documentation, getting developers to create it will be a challenge in an enterprise setting. Still this improves over REST which offers no convention at all for a natural home for the description of what to POST to a given URL.

The key difference is this: with ROA, somebody has to create written specs like the GDATA API and the ATOM standard FOR EVERY distinct resource type. A resource provider might choose to use the same XML marshaling tools as in SOA to create XML Schemas or DTD's, or they might decide validation is to restrictive and force clients to read their stellar natural language description, like the ATOM team did. To describe what URL's GET or POST what entities, you have no help. REST is a programming language that hides the input and output types of it's call parameters and forces you to read documentation to know what they are.

In a corporate setting most of these specs corresponding to the GData API have to be created in house and would be maintained as documentation (and we all know how well internal IT shops do at maintaining documentation). Contrast this with WSDL that is going to roll out of your tools and be definitive, since it's a code level artifact.

Also consider what happens if you have 10 different RESTful services to deal with as a consumer. For each of them, you have to figure out all five of the elements above. Every service will have different semantics for invoking it and dealing with the response. That is, unless you adopt (and enforce) a standard for how to express these within your enterprise. Then the question arises, if instead of bolting on SOA features to ROA by spending your company's IT time and effort for no business value, why not just start with SOA and get the benefit of tooling created by the massive investments by Microsoft, IBM, Apache, Sun, Oracle, and so on? For this reason alone, I don't think ROA is ready for the enterprise in 2007. But let me be clear: this is a solvable problem and it only applies in settings where organizational interactions come into play.

Next time, I'll talk about the Uniform Interface vs rich behavior, and here, I think I see some wisdom in the ROA position.

Technorati Tags:

Posted by spout at 7:56 PM in the internet, web, web 2.0 and beyond

Wednesday, 5 September 2007

Comparing ROA and SOA

As part of my ongoing look at enterprise architecture in 2007, I want to focus on the commonalities and distinguishing features between Resource Orientation and Service Orientation, comparing ROA and SOA. I'm witholding judgement as best I can, to try to be as objective as I can about what these two approaches to web services are. Accordingly, I'm not looking at practicalities. Those clearly matter, but they are separate and distinct from the purely technical differences. Practicalities change over time, but the technical pieces probably don't. I'm considering SOA implemented as SOAP web services, in angry defiance of people who say SOAP is just one implementation mechanism of SOA. While that's reasonable language, it has little value to me in the real world to continually say "SOAP based SOA" instead of just "SOA". There's no other widely adopted implementation of SOA with market and mindshare significance. ROA, on the other hand, is a new term for a particular perspective on REST. The flag bearer for ROA is the book RESTful Web Services by Richardson and Ruby and I follow their interpretation of ROA as faithfully as I can.

The principles of SOA are pretty well known:

  1. Open Standards - HTTP, XML, XML Schema, SOAP, WSDL, various WS-*
  2. Loose Coupling - Service consumers do not couple to implementation platform, but instead rely on open standards or metadata.
  3. Reusability - Services are intended to be highly reused. The goal is a single authoritative service, meeting the needs of all consumers.
  4. Contractual - Services adhere to an interface agreement artifact (a service description) that is functionally definitive of the interface.
  5. Discoverability - The service contract is an outwardly descriptive and externalizable metadata artifact suitable for forwarding by 3rd parties to potential service consumers
  6. Autonomy - Services have control over the implementation of logic and technical machinery they encapsulate.
  7. Statelessness - Services generally do not maintain conversational state, Every call is self contained.
  8. Rich Behavior - The service contract exposes many kinds of method invocations and associates them logically with the data structures described in the service description.
  9. Metadata - Both requests and responses have a clear delination between metadata and payload.

The principles of ROA, as presented by Richardson and Ruby are easily understood:

  1. Open Standards - HTTP, often XML, possibly XML Schema or DTD
  2. Loose Coupling - Service consumers do not couple to implementation platform, but instead rely on open standards.
  3. Universal Addressability - Resources have a URI which serves as a single global name that defines their identity.
  4. Uniform Interface - All resources support a small and standard set of behavior methods (GET, POST, PUT, DELETE). Often only GET and POST are used.
  5. Behavior Conventions - GET never has side-effects, and anything without side-effects should use GET. PUT and DELETE should be idempotent (repeating them has no additional affect).
  6. Statelessness - Services generally do not maintain conversational state, Every call is self contained.
  7. Representations - A resources can have tangible representations as bits in various MIME types, but the resource is different than the representations. Representations of a resource may vary with time, and the set of available representations can also grow and shrink over time.
  8. Connectedness - A resource can (and should) reference other resources by name (URI) in the representation.
  9. Autonomy - Resource providers have control over the implementation of logic and technical machinery they encapsulate, subject only to the uniform interface behavior conventions.
  10. Minimal Overloading - parameters to state changing operations should not be used solely to select different behaviors. Heuristically, if an operation parameter could be converted to part of the URI, then it should be.
OK, so ROA and SOA each have open standards, a loose coupling concept, autonomy, and statelessness. Those are probably properties of "Web Services", which generalizes both.

Here are the key differences between SOA and ROA as I see it:

  1. Contractual vs Not - SOA offers contractually, discoverable interfaces, while and ROA expects designers to solve problems as they set fit.
  2. Rich vs Uniform Behavior - ROA has the uniform interface applied with minimally overloading, while SOA provides rich, discoverable behavior
  3. Differentiation vs Equivalence of Data and Metadata - SOA relies on metadata to inform behavior, while ROA uses universal addressability, connectedness and design time conventions for representations and URI's to blur data and metadata together.
Let's examine these in depth.

In SOA, the client and server have a contract that specifies the service operations, endpoints, and input and output data formats. Typically the artifact for expressing the contract (WSDL) is closely tied to the physical implementation of both the service and the client, so that there it is expected to be definitive. In ROA, there is no contract guarantee and the interpretation of resources address mechanisms and response types must be based on knowledge not contained in the system. Often external standards for particular sets of resources arise. These can be open standards like RSS or ATOM, or local standards like a corporate schema or DTD for the output. The local standard might require the resource to self declare it's format, or it might not. There is no standard for how to structure these standards, so resource designers are free to meet needs as they arise, but resource consumers have little assurances over such designs.

The second difference is the types of supported operations. SOA allows a service to offer a rich set of discoverable operations. ROA has specified the operations as a small fixed set with standard meaning. Where an SOA service may be a logical grouping of many operations, each with an independent name within the service, ROA flattens each operation out to it's own URI. The logical grouping might exist within the path of the URI. The minimal overloading principle counsels the implementer to refactor an ROA resources that masks rich behavior within it's parameters, so that the URI fully differentiates separable operations.

The final critical difference is that SOAP messages have an envelope which contains a header and a body. The header concerns information about the handling of the message, while the body is the message itself which must comply to the validation rules for well formed messages defined by the contract. ROA, in contrast simply treats metadata as part of the resource representation, possibly leveraging connectedness and universal addressability to link to other resources that describe the orignal resource. It's up to the conventions and standards surrounding each resource to define the expected behavior associated with it.

This is my attempt to simply define what ROA and SOA are and to observe their common and differentiating characteristics as objectively as possible. In a future article I'll get into dynamic qualities and practicalities, and someday I'll try to combine the technical with the practical and draw conclusions about the suite spots of each.

Technorati Tags:

Posted by spout at 7:26 PM in the internet, web, web 2.0 and beyond

Saturday, 1 September 2007

Architectural Paradigms 2007

At work we are going through a period of architectural planning, focused around integrating our internal systems. There seem to be a mix of architectural paradigms that are in fashion these days, and I intend to explore them over a series of posts here. In my estimation the reasonable architectural paradigms to advocate in 2007 are:
  • Service Oriented Architecture (SOA), typically using SOAP as the dominant web service. This is probably the leading architecture motivating IT spending these days.
  • Event Driven Architecture (EDA), typified by asynchronous message notifications and message buses. Messages are often XML or serialized objects.
  • Resource Oriented Architecture (ROA), typically using REST based web services, often with XML payloads. This is a relative newcomer and represents a mesh of a backlash against SOA and a progressive fundamentalism over the symmantic web.
Ones I didn't include, and why:
  • Event Stream Processing (ESP), stateful analysis of streaming messages turning SQL on its head by storing queries and issuing data. This is just too new, but it's definitely an exciting area to watch. Maybe in 2008 the big debate will be over stateful vs stateless bus architectures.
  • Enterprise Service Buses (ESB), Many see EDA and web services as complimentary and wrap Enterprise Service Buses around both to handle routing, transformation, and encapsulation of providers and consumers. I see an ESB as an enabling ingredient, but not an architectural paradigm that stands alone. Bobby Woolf, co-author of Enterprise Integration Patterns agrees. That said, a good question is: "When should I mix in an ESB to my enterprise architecture?". I may blog on that separately.
OK, so the big question to resolve is SOA vs ROA for web services and then when to use messaging instead of web service calls. Maybe with good internal architectures, web services manifested as both SOAP and REST simultaneously are possible. Amazon S3 does this, for example.

SOA says to deploy solutions embodied within services that are reused at the system level. Service in the flavor I'm considering are generally constructed using XML and XML Schemas, and interoperate using SOAP and WSDL. WSDL really ties the whole thing together, providing a techncial description of the operations of the service and the XML inputs and outputs of each such operation. WSDL also provides the endpoint information too. Doing SOAP right requires tooling - nobody writes stuff by hand. Good tools take a WSDL document and generate everything you need. SOAP functions by taking an XML document, and wrapping it in an envelope and decorating it with headers. Operations, even simple ones are generally going to provide XML input and XML output.

ROA is a particular template for a RESTful web service, using URI's, HTTP, and XML. ROA names resources with URI's and accesses them via HTTP in a way that emphasizes conformance with the standard operations for HTTP. ROA adopts four principles: addressibility, statelessness, connectedness, and the "uniform" interface of HTTP operations.

I'll be exploring this in depth over the next few posts.

Technorati Tags:

Posted by spout at 6:12 PM in the internet, web, web 2.0 and beyond

Thursday, 30 August 2007

ISO: Standards For Sale - No Standards for Standards

It turns out that Microsoft has admitted, after being caught red handed, to buying votes in Sweden for their OOXML proposed standard. See the groklaw article for details.

Sweden has invalidated it's vote, concluded that there isn't time to revote, and abstained. The net result is that a NO vote was turned into ABSTAIN, netting MS half a point. If ISO approves this standard, I will lose all respect for ISO, as it means that there are no standards for standards.

Technorati Tags:

Posted by spout at 3:22 PM in the internet, web, web 2.0 and beyond

Sunday, 26 August 2007

MS OOXML and ECMA 376 are a Sham

Microsoft, as a response to the recent pushes by governement entities who want an open standard for office document formats, has documented OOXML and submitted it to ECMA and ISO as an office document format standard. Unfortunately, the proposed standard has been carefully crafted by Microsoft to provide the marketing buzzword benefit of having an approved standard without actually conferring any of the benefits of open standards. They do this by two methods: (A) the standard is too complex to implement from scratch and (B) complying with the standard will not confer the benefit of interoperability with MS Office, which is the sole purported benefit of this standard beyond the already existing ODF standard.

I'll discuss mainly the second point. Let me be clear: MS Office is not, nor will it be, interoperable with a theoretical OOXML implementation. Here's the crux of the matter: the OOXML document submitted to the ECMA and ISO standards processes does not describe what Microsoft Office implements, nor will it ever The submission has been carefully crafted to obscure this. The goal is to intentionally make a standard with the following properties:

  • The standard is impossible in practice to implement from scratch
  • The standard appears to specify the MS Office document format
  • The standard fails to specify MS Office document format
Microsoft is well seasoned at playing the standards game. They've learned from their battles over various web standards like HTML, CSS, and DOM, that you can have all the benefits of being proprietary while appearing to conform to an open standard, if you "almost" conforming to it. The first order model is conformity, while the second order model is intentional non-conformity. This is a brilliant tactic in politicized settings, since it allows people who want to claim conformity to do so. It allows MS to lobby successfully because they can demonstrate, to first order, compliance to officials who do not have the time or expertise to look deeper. When you pitch something you know to be false because you designed it to be false, I call it a sham. A number of researchers have been documenting the discrepancies between OOXML and what MS Office actually uses. Stéphane Rodriguez is one such researcher. Here is my interpretation of some of his findings: I focus on three glaring examples explored by Mr. Rodriquez.
  1. Proprietary floating point operations. Excel stores numbers in it's file format that differs from what was typed into the cell, and is transformed by unspecified proprietary floating point operations. For example, the proper way to express "12345.12345" in MS Office file formats can be verified to be <v>12345.123449999999</v> which is not based on an open standard. If you enter <v>12345.12344</v> Excel will not treat this as if you had entered "12345.12345" in the formula.
  2. VML. VML is a proprietary format for drawings. It is not specified by OOXML and is required by MS Office as it is pervasive in Word, Excel and Powerpoint documents. MS calls it "deprecated" but uses it extensively.
  3. Proprietary Date formats. When you enter a date literal into a cell in Excel, a string representation of that date is serialized into the XML. Much like the case with floating point operations, the meaning of this string is defined by a proprietary, undisclosed standard.
These examples show that OOXML simply does not document what MS Office does. A key milestone for creating an open standard should be that there are at least two separate parties who have constructed distinct implementations and demonstrate a working interchange of data. Calling something an open standard when this cannot possibly happen is a sham. OOXML is a sham. For more information, go to: the grokdoc summary of OOXML objections and a compendium of objections from nooxml.org.

Technorati Tags:

Posted by spout at 12:26 PM in the internet, web, web 2.0 and beyond

Friday, 24 August 2007

The Case Against Software Patents

A patent is a government sanctioned monopoly that our Constitution authorizes the government to give to inventors. The exclusive rights given to patent holders are intended as a form of quid pro quo, inducing inventors to disclose to the public the details of their novel creations. The purpose of the limited duration monopoly is to benefit the public through greater access to inventions which other wise might be kept secret. In many cases, patents live up to this ideal, and are a wise investment by the public. Software patents are a glaring failure, for a variety of reasons which I wish to discuss.

A monopoly is generally considered a bad thing, and to the extent that a patent is wrongly granted in an individual instance or to a class of creations that do not warrant it, the harm to the public is severe. So, I present the case against software patents in these terms. There is a vast body of legal praxis in the area of patents, and I should note that I am not a lawyer, but I emphatically reject the notion that one must be trained in the profession. In fact, I don't think lawyers have really added much to the discussion of whether software patents are good thing or not for society. It is primarily a political question, and only slightly a legal one. The constitutional purpose of patents is abundantly straight forward and isn't a net win for software patents. Oddly, it looks to me like the legal questions have actually been decided by the Supreme Court in the correct way, and have gotten muddled and contorted by the Federal Circuit.

Software patents have flaws that can be classified along three lines:

  1. Foundations. Software is an expression of ideas, not an invention. Patents should not apply.
  2. Utility. A software patent grant does not, cannot, and will never be capable of achieving the public benefit necessary to justify it. In the arena of software, patents stifle innovation more than they reward it.
  3. Practicality. The patent process cannot make proper determinations with regard to the standards for patentability within the arena of software patents, nor is it likely to be able to within reasonable budgetary bounds.
Lets review these in detail.

Foundations. We start with a look at the history of patents in software. Many people are rather suprised to know that standing Supreme Court precedent is that "for use in programming conventional general-purpose digital computers ... a series of mathematical calculations or mental steps and does not constitute a patentable 'process' within the meaning of the Patent Act". Gottschalk v. Benson, 409 U.S. 63 (1972). It's really difficult to put it any plainer than that.

Eventually, the Supreme Court upheld a patent that happened to use some software as part of the invention. This case was Diamond v. Diehr, 450 U.S. 175 (1981). James Diehr invented a process for molding and curing rubber that happened to use a computer to achieve real time solutions to equations to control the heating process. The physical/chemical process was a novel invention and the Court ruled the necessity of using software to implement it was not a disqualification for the patent.

These two cases, should tell us everything we need to know about when a patent can contain software. Most software is not part of a process where the process itself is a novel invention. Unfortunately, the Federal Circuit Court of Appeals has completely destroyed the common sense approach by filling in the area between the two Supreme Court precedents with blathering nonsense. The unique construction of the Court system for patents contributes to the failure to fix the mistakes. For most judicial matters, we have multiple circuit courts who pass judgements, and if one court does something strange (often it's the Ninth Circuit) you'll get a split between Circuits and the Supreme Court can resolve the split. From a legal quality view, this is a much more reliable system. Unfortunately, we've bought into this bogus idea that "patents are special" and we've created a single point of failure in the Federal Circuit, and it's failed with regard to upholding software patents despite well-reasoned guidance from above that should form the basis for rejecting most of them.

As described in Gottschalk, you don't invent mathematics or algorithms. Software is predominantly an expression of an algorithm. The guy who invents computer parts or a new combination of them deserves a patent. People who twiddle the bits on a machine invented by someone else deserve only a copyright for their pattern of twiddled bits. Only in the rare circumstance where the algorithm is used to control a physical device, generally by interacting with a hardware controller of some kind, is there an actual physical component present that might qualify for a patent.

As Thomas Jefferson expressed "it is the invention of the machine itself, which is to give a patent right, and not the application of it to any particular purpose" (letter to Isaac McPherson Monticello, 1813). He's saying the machine, taken as a whole must be something new. If you invent a new way to use a hardware graphics card in a computer, or some other part of it that controls something physical, you might meet this standard of having a new machine. But taking the same old PC and writing a program on it that makes it output something different simply is not an invention. It cannot be novel and it is not supposed to be patentable. The idea is so painfully obvious it's hard to understand how the entire legal patent profession has been so boneheaded about this over the last 20 years.

Utility Software is a form of speech, and that's why it's protected by copyright. The cost of creating, reproducing, and distributing software is really quite low, once we obtain a computer. This is so true, that one of the leading software development models, open source development, generally seeks no compensation other than attribution for the software license itself. Of course, many companies continue with proprietary licensing models as is their right, but the point is there isn't much need to seek to catalyze innovation in the software market by granting patents.

To the extent it's important to secure to software creators the rights to recoup their invested hours, the just market rewards for the fruits of labor are completely protected by copyright. Since software is speech, the body of precedent for securing original speech to its author has a well reasoned, vibrant, and enforcable set of laws backing it. Simply put, there is no problem with lack of financial reward for software creators needing patents to solve it.

Moreover the public disclosure benefit motivating the patent grant really achieves nothing. Since software is expression, there is a vibrant body of free code floating around for the public to benefit from. Open source alone does more for public access to ideas than a software patent system ever could. Worse, since software is also copyrighted, even when it's patented, so you can't actually use the disclosed ideas in the executable form even when the patent expires, since this violates the copyright. We do not need double protection for software creators.

Practicality Since the rules for granting copyrights are much more economical and the market demand for most software doesn't support it, most software authors choose not to seek patents. This systematically deprives the patent process of the very body of prior art that it needs to determine whether a new submission is novel. While the existence of prior art should be sufficient to have a Court invalidate a patent, in practice the cost of litigation and the cost of searching for prior art create economics where the decision to contest a wrongly granted patent is too expensive to justify.

Compounding the problem is the incompetence with which the USPTO searches for software prior art. I've seen examples where prior art can be found by typing the patent title into Google. The standard for obviousness has been polluted to such an extent that you could probably patent removing the fuzz from your navel near a computer and you would get a grant. That the obviousness of the Amazon 1-click patent is debated is a travesty. If it outrages people as obvious, it's obvious. Worse, since many of the software creations during the "golden age" of computing, the 1970's and 1980's, happened when it was believed that software was normally not patentable, there was no industry effort to document, or even preserve, prior art. It is fundamentally impractical given this and the above reality to economically demonstrate that a work isn't novel.

This reality has several effects. Some people knowingly file "stupid patents" and rely on the fact that it isn't affordable for anyone to contest the patent. The stupid patent, once granted, will be used to extract fees that depend more on the cost litigation than the value of the invention.

Large companies often become targets of such patent extortion. Very large companies typically seek to patent their software ideas in large volumes not because they want to exercise the right to exclusivity, but simply to have a defensive weapon. Such companies almost never engage each other in patent litigation, because it would become a form of mutual assured destruction. Unfortunately, patent holding companies seem to be the fatal flaw here. Patent litigation such as over the Blackberry or the Eolas claim against Internet Explorer has no adequate defense. It's disgusting to see companies whose sole reason to exist is patent extortion. It is only a matter of time until some well known and respected innovator is forced out of business by bogus patent claims. Software patents are not rewarding inventors, they are rewarding litigation specialists who produce nothing of value to the economy.

Wrap Up In consclusion, software patents are foundationally wrong, cannot achieve any public benefit, and in practice drive economic waste and fail to reward anyone who deserves to be rewarded. Since the legal system has abandoned common sense, Congress should step in and pass legislation that makes software not patentable unless it's part of a physical machine which is, taken as a whole, novel.

Technorati Tags:

Posted by spout at 7:05 PM in the internet, web, web 2.0 and beyond
« July »
SunMonTueWedThuFriSat
  12345
6789101112
13141516171819
20212223242526
2728293031