Big Honking Databases: Who leaves a country packed with ponies to come to a non-pony country?

Usually I think of myself as a “hot-shot” when it comes to XSPRADA technology and its applications. This is because I’ve been involved with it for ten years, and that kind of history builds bonds. In a word, having lived and breathed it for so long, I’m severely biased, but at least, I’m aware of it. It’s a completely different story when you talk to a user who also happens to be biased from experience running the stuff to solve real problems. That’s when you hear praise that makes you step back and go “wow, we really do shine here above and beyond”. There is nothing sweeter than an adamant customer evangelist.

One such person we shall call Tim (why not, since it's his real name). He works for a major DOD contractor out here in California. Tim asked me to withhold his company’s name for obvious reasons. He’s been running POC projects using XSPRADA technology for years. As a matter of fact, Tim once ran a real-time CEP version of our engine (which can handle both real time and historical input) for a demo bid project he needed to put together where no other vendor could come close.

Tim is the ultimate engineer’s engineer and one of the smartest folks in the “information” field I’ve ever met. He’s got experience galore and has been around the block a few times. Currently there are other groups in Tim’s shop running the XSPRADA engine for other purposes, and he keeps abreast of those POCs as well. I could tell you what they entail but then I’d have to kill you . Suffice to say that the engineering being done there would blow most people’s minds (as in, holy cow, we're actually doing this?!?).

It turns out Tim is biased as well, but he’s biased from a user perspective. This is "been-there-done-that" advocacy. And that, in my book, is far more compelling than any argument coming from an insider like myself. Although I don’t know too many people who can explain and position the technology as well as I can (I’m so modest, no pictures please! ), Tim had the following analysis recently and I thought it was so “perfect” I had to reproduce it here (bolding my own):

“One of the distinctions I’ve been using lately to explain the difference between set based data processing and most everything else (row based, column based, partition based,…) is that most other DBs are based on defining somewhat arbitrary bins of data of fixed size or dimension (tables with fixed columns in RDBMs, column collections for things like Vertica ,et al, and chunks in Google’s BigTable designs, to name a few). Then there is significant overhead to partition the incoming/outgoing data to fit into these fixed containers. Inevitably, any operation on these artificial partitions will include wasted processing or I/O on irrelevant data that just “happens” to live in the affected partitions. This is a huge waste of time and resources. In addition, these bins are continually reused by means of destructive updates which require them to be locked during transactions to avoid data corruptions. This is the other main source of waste in that significant delays are now imposed not only on the relevant data involved in the operation, but also on collateral data that might be holding up other operations unnecessarily. These two effects are mutually opposed: larger partitions would help the I/O problem, but at the expense of exacerbating the locking problem. And vice-versa. By contrast set based data systems, like XSP, use completely variable sized containers (the sets) dynamically partitioned based on operational relevance and not on any predetermined partition sizes. In this way, the amount of irrelevant data moved across the I/O boundary for any data operation is significantly reduced. And because these sets are immutable, there is no locking interference with other concurrent data operations.”

To put it in Southern California speak, I was like, wow, this dude really gets it! And there isn’t much I can add to Tim’s conclusions. I bow to the completeness of his understanding; his analysis stands on its own.

The context of our exchange was about pre-structured or “canned” mechanisms used by other database engines versus the flexibility of our approach. It’s what I naively call “bucketizing”. And this topic is of course related to the ADR functionality I was discussing in my last post. But it also pertains to the “schema agnosticism”, parallelism, and ACID aspects of the database I’ve mentioned in the past.

The XSPRADA engine dynamically adapts to queries and data. It avoids the inherent rigidity prevalent in all other technologies. Yes it’s compelling to handle analytics via columns, but that’s only one of many ways you can address the problem. And if your technology is “columnar” in nature, it’s the ONLY way you can address the problem. You’ve put all your eggs in one basket. At the end of the day, you’re still looking at the world in a tabular format (which happens to be vertical). So it’s what I call the one-trick pony approach.

There’s nothing wrong with one-trick ponies if you need to solve a very specific business problem quickly and efficiently. But from a holistic business perspective, it’s a scary proposition. If you’re running an enterprise, you want flexibility. You want the ability to address problems as they come up using all available means at your disposal. You’re looking for a wide array of tools and methods to win battles, not a single weapon system. And this is what XSPRADA technology offers: “the ability to apply the right technique for any question for any data at any time”.

If Tim is concerned with waste and inefficiency, it’s because his shop deals with tera and petabyte volumes of data with limited shelf-life. In other words, waste is not an option. This isn’t about airline transactions messing up your trip or bank accounts being debited incorrectly by the way. This is about national security and people living or dying in real-life tactical situations.

In applications like this, one-trick ponies don’t cut the mustard. And this is why Tim and several other groups in his company have been looking at XSPRADA technology for years. There simply isn’t anything out there that can meet their requirements, and believe me they’ve tried all the usual suspects. To Tim and his colleagues, the unfair competitive advantage XSPRADA can deliver to their company (and clients) is well worth the risk of evaluating technology that’s a little out of the ordinary.

Big Honking Databases

Monday, July 27, 2009

Who leaves a country packed with ponies to come to a non-pony country?

No comments:

Post a Comment

About Me

Small sample of blogs I follow

Search This Blog

Tracer

Followers

Blog Archive