For the past five years I have had the dubious pleasure of using
Hibernate in Oracle-backed production environments and more often than not it has made me want to crawl into a cave to sleep off the months of ensuing darkness. For the uninitiated: Hibernate is a popular open source
object-relational mapping tool for Java, an interface layer between your (Java) code and a relational database which lets you query and manipulate data by means of the object-oriented paradigm, effectively hiding the SQL it generates to achieve this.
I have nothing against Hibernate or ORM tools in general, but it annoys me how often it is touted as the perfect fit anytime you need to do something with databases. It irks me how often I have seen it used in environments where it is totally unsuited. It gets this undeserved support from people who have been lulled into a false sense of simplicity.
Let´s assume that I'm a green and optimistic Java programmer with no experience in relational databases and no wish to acquire any. Let´s say I´m writing an application to manage all logging and billing for a large phone company. The Ministry of
Love needs me to store the recorded sound data of every conversation. So I bash away:
public class Subscription {
int number;
List calls;
SubscriptionType subscrType;
}
public class Call {
Subscriber caller;
long receiver;
Date started;
Date finishded;
byte[] data;
}
public class SubscriptionType {
String name;
double fixedMonthlyFee;
double callChargePerSecond;
}
I suppose this is conceptually valid, albeit very simplified. All I do now is insert some clever annotations in the code, let Hibernate create the tables for me and within minutes I can do things like this:
List smallList = session.query(“select name from SubscriptionType”);
List whoppingList = session.query(“select data from Call”);
The SubscriptionType class maps to a table with no more than ten or twenty rows at any time. No problem there. If marketing and sales do a good job the Subscriber class will hold millions and the Call class billions of records and terabytes of data, mostly consisting of binary voice data. So within weeks of your launch party that whoppingList is sure to bring your system to its knees, because behind the iterator of the whoppingList is actually a JDBC resultset that will add an entry to its in-memory object cache with each call to next(). A Java List that mimics a collections of objects which are not already on the stack is not forbidden, but at the very least counter-intuitive.
An intricate schema with hundreds of tables will not break Hibernate. It only takes a few tables, LOTS of data and an ignorant developer who doesn't know which buttons to push. The so-called impedance mismatch between the relational and the object-oriented realm often gets really ugly when your data reaches a sizable volume. That means in production environments, if you skimp on proper testing.
In all fairness, a seasoned Hibernate guru would not be this naive. She would anticipate that the Call table will grow like mad and needs to be archived regularly. Since the raw sound data are for archiving and rarely queried, she would store them in a separate database that doesn't require daily backups. She would have given our newbie one of the fine manuals that tell him how to do things properly. Feeling the sheer weight of the tome would have taught him instantly that serious database solutions are never plain sailing, (not even/especially not) with Hibernate. Why is that?
BECAUSE SIMPLE THINGS DON'T TAKE 880 PAGES TO EXPLAIN
It´s time for a little historical context, because I think the confusion started with Plato. He proposed that the clutter of our daily lives is no more than a feeble reflection of the world of ideal Forms. The true essence of the objects we see are beyond space and time, just to give you a nano-summary of his wisdom.
Now skip a few millennia. The enlightened designer likes to think up wildly intricate entity-relationship diagrams that model a fraction of the world as he perceives it, ignorant of the sordid reality of fragmented indices, concurrent modifications and failed backups, because on Mount Olympus nobody pollutes their pristine tables with real-world data. Sadly, you cannot make a living by keeping your databases empty. That means you cannot design with a Platonic eye. You must have some estimate of how many rows each table will hold, how often it will be queried, updated, inserted into and deleted from. You have to allow for scalability within reasonable margins. Databases don't automatically polish up a crummy data definition based on actual query behaviour and rowcounts. That's your job, and a dirty one it is at times.
Object Oriented Programming has become the new sliced bread. Java is sooo much cleaner and nicer than eighties Oracle PL/SQL. I want to do it all in Java, why can't I? After all, we have tables and columns, classes and objects, foreign keys and object references. Same thing, isn't it? I should be able to tinker with data records as if they were objects. Not so fast. What we have here are analogies, not essential similarities. I believe the crucial difference between objects and data-records lies in their lifespan and numbers. Objects are disposable by nature. We create them, use them for our vile private purpose and let the garbage collector kill them for us. They never survive beyond the lifespan of a processor process. They're not built to last. Database records, if well nursed, will survive Larry Ellison and his grand-children. An object stack is to a database as a whiteboard to the British Museum Library. The whiteboard is meant only for the people in the meeting and is erased afterwards. The library is accessible to anyone with a library card. These are essential differences, not accidental ones. If it doesn't look like a duck and doesn't quack like a duck, don't dress it up as one.
| Objects
| Database records
|
Average lifespan
| milliseconds to hours | minutes to millennia |
Used by
| one virtual machine at a time
| thousands of them
|
Average count
| thousands to millions
| thousands to gazillions
|
Object relational mapping solutions want you to forget about tables and think objects instead. What you scribble on the whiteboard goes into the database thanks to clever proxy objects that wrap a database connection and take care of all the inserting, deletion and updating for you. These are the standard things you do with objects. You create or receive them, use them and leave them to die. Looking them up is a less frequent operation, and when you do it's usually by flipping through the Rolodex/iterating through the collection, of picking one out by its index. Big deal. Database retrieval is all about speed and efficiency, because libraries are big places. It is a big deal. That's why nobody says.
SELECT * FROM products,items,customers,invoices
Since we don't have unlimited memory and patience, we have where-clauses. Getting what you want means telling the database how to join tables and what restrictions to put on column values, so actually telling it what you don´t want. The where-clause is is the key to querying and Hibernate doesn´t even pretend that its OO-alternative Hibernate Query Language (HQL) is a perfect substitute. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always (it's right on the first page). So apparently Hibernate has deemed it necessary to build in a backdoor. Why would you want to know what SQL it generates? Two reasons. One because Hibernate gets it wrong from time to time, and even when it doesn´t, the RDBMS itself can behave in cruel and unusual way by choosing an inefficient execution plan. Note again that nastiness of this kind goes unnoticed when you only test with miniature datasets. Either way you´re up the creek and you find yourself using native SQL littered with query hints, leaving the object paradigm and thinking tables again.
Is there a moral to this? If you need a car analogy: learn how to change gears manually, because Hibernate's automatic gearbox may work fine on the highway, but if it does abandon you it will be on the rockiest of roads, by which time you have forgotten how to use a gearstick. Under the right circumstances Hibernate can make your business layer look a paragon of cleanliness, a poster-child for separation of concerns, but in a less forgiving environment it will behave like the leaky abstraction it is. By all means use it if you know what you're doing, but if you're building serious database-backed stuff don't even think you can get away not knowing your SQL and the quirks of the RDBMS it is built on. Don't get lulled into a false sense of simplicity.