Monday, 18 February 2008

Hibernate Interceptor Gotchas

I'm trying to implement a Hibernate Interceptor to track user edits to an entity. I call this an EditInterceptor and it is similar to the AuditInterceptor that many people have described. I found implementing a Hibernate Interceptor in practice to have many "gotchas". These include:

Problem 1: Doing work in onSave() you discover the database id may not have been created yet (as the SQL insert hasn't occurred yet). Similarly, onFlushDirty() and onDelete() occur before the database has actually changed.
Solution: Process items in onFlush() using Interceptor instance collections to pass them from onSave(), onFlushDirty(), and/or onDelete().

Problem 2: You cannot modify objects in the session you are working with while it is flushing.
Solution: Create a second session in postFlush using the same jdbc connection.

Problem 3: When implementing the solution to #2, you discover that session.connection() is deprecated, but that no alternative has been provided yet. Note that the Hibernate team doesn't practice proper use of deprecation here, as pointed out in the comments to HHH-2603
Solution: Use it anyway, and add @SuppressWarnings("deprecation"). Note that cloning a session from the same jdbc connection this way is safe, even if there are problems with using session.connection() to do ad hoc jdbc mischief.

Problem 4: You're using Spring's Hibernate support and realize the Interceptor instance collections from Problem 1 aren't thread safe.
Solution: Wrap them in a thread local. See this example.

Problem 5: Your updates are getting processed multiple times and you realize onFlushDirty is called multiple times, and read the javadocs and discover flush() doesn't always conclude with SQL syncronization to the database and that onFlushDirty() is still called in these cases.
Solution: Use a Set to collect changed entities and be sure to initialize this set in afterTransactionBegin().

Problem 6: You realize that in onFlushDirty, because of mulitple flushes as described above, that previousState and currentState are changing from call to call as you hit different points where hibernate automatically flushes. You need the true original state and the true final state.
Solution: Use a ThreadLocal map that associates an entity with the first previousState seen for it. Clear this map when a transaction starts and after you flush. Clear the updates in preFlush() and as usual process them in postFlush(), but when you calculate the update in onFlushDirty() do it relative to the originalState from the map, instead of previousState, which is only used to seed the map if it doesn't have a value for the entity.

Technorati Tags:

Posted by spout at 11:15 PM in stuff about java

Tuesday, 12 February 2008

The Need for Distributed Version Control in the Enterprise

I'm looking at software configuration management tools again. My last three projects at work have all used subversion. This was a big improvement over CVS, but I'm growing tired of dealing with needing to push everything back through a central server to share it with others (or to share it between my computers). I'm also finding that the pains of branching and merging really limit the pace of change. These are the problems that some of the new SCM tools like git and mercurial were created to solve. I'm convinced that enterprises need distributed version control.

It seems many people think tools like git an mercurial are only appropriate for large open source projects, and aren't necessarily relevant to the enterprise. I submit that enterprise development teams need the new tools just as much, if for slightly different reasons. Basically, as enterprise developers, we need to write code in a feature oriented way and then group completed features together into releases. There are two approaches to dealing with how to allocate features to releases: you either decide the features in a release up front and work until you are done, or you decide the time frame for a release and pull in the completed features to it and push incomplete ones to the next release. My thesis here today is that the latter method is better, but it really stresses your ability to manage branching and merging, and that tools like git and mercurial are simply better at managing this.

The big problem in software development (as everyone knows) is that nobody can figure out how to estimate the duration of a task very well. If you think you have a solution to this problem, slap yourself, because you don't. The variability in estimating means that doing the "feature scoped" method incurs heavy waste as all features in a release go out at the pace of the slowest one. So most developers finish their tasks well before the slowest and then have degraded productivity until the next release. These problems get worse the more features are in a branch.

The degraded productivity goes away if developers can push and pull features between releases. In fact, the cost of estimating wrongly drops if we can juggle features easily, because now aren't holding a bunch of working features when we estimate wrong.

But you can start to see how all these problems rely on nimbleness in source control tools. Branching, merging, and conflict resolution have to be easy, quick, and simple. At any given time you might have a released branch, a branch under QA/test, and usually several different release branches in development: the next few minor releases and also the next major release. A feature's code can be juggled around among the unreleased versions. For example, if we find a bug, we want to apply the fix to all in process development branches. We create a branch just for this bug, maybe, and merge it's changes into all the other branches. Often we juggle a feature out of one branch and into another because it isn't ready in time. Whenever we create multiple development branches, we have to merge the earlier ones into the latter ones a lot.

The fact is that merging is extremely painful in subversion for several reasons. First, subversion doesn't really think in terms of branches. The directories in your /branches folder are really nothing more than ordinary directories. Worse, subversion doesn't help you at all in terms of keep in track of completed merge operations. Keeping track of merged revisions manually simply doesn't scale - the process becomes error prone, and anyone who has done a lot of branching and merging in subversion has stories to tell on the theme of "oops". In fact, if branching was really cheap it might be better to create a branch per feature. This would let release managers wait until the last possible moment to merge changes: they'd look in the ticket tracking system, see which ones are "ready for merge" and pull them in, or better yet have the feature developer do it, resolving conflicts first. Many of us are used to doing this with patch files, but the problem there is an ordinary diff can't understand directory structures and how things in different places really are different versions of "the same" thing. And there's also the problem of the directory structure itself.

As the number of branches goes up, there will be a natural desire for peer-to-peer merging capabilities. Your peer has a bug? Pull his branch. If two people collaborate on a feature, why make them talk to a middleman to share changes? Especially when those changes may be broken or half-baked? Ever been annoyed by doing a subversion update and getting a bunch of changes you didn't want yet? Worse, ever seen one that broke everything? What if you only pull in things you want, when you are ready and know it's coming? What if half the time you don't even have to do that because the other guy is pulling your stuff first? Now distributed SCM seems a lot more reasonable. What if you have multiple computers (home/work or PC/laptop or both). Wouldn't it be nice to share changes with yourself, even if they don't even compile at the moment? Do it in one step instead of two. Oh, and now you and your coworker can collaborate outside of the office. So go to a cafe and get some work done together.

It looks like Git and Mercurial both have a compelling improvement to offer enterprise development shops. Git is slightly faster and has a more comprehensive command set, but Mercurial's commands will be more familiar to SVN/CVS users and Mercurial has good windows support. Lots of big projects are using these tools: Linux uses Git, Java uses Mercurial. We're starting to see more medium and small shops use these too. Which is better? They're both good, and there's a tradeoff of power vs simplicity and speed vs portability. Pick the first option in either tradeoff and you should look at Git, pick the second and you should use Mercurial.

Technorati Tags:

Posted by spout at 3:02 AM in the internet, web, web 2.0 and beyond
« February »
SunMonTueWedThuFriSat
     12
3456789
10111213141516
17181920212223
242526272829