Big Honking Databases: Please Stop Making More ADBMS Sausage

If you’re thinking about building a new startup in the high-performance analytical database (ADBMS) market, hat’s off to you: kudos and respect my brother. I’ve been in the kitchen, and I’ve seen the sausage being made. But let me tell you something: you might be a day late and a dollar short to the party.

In the past years, I’ve often pondered where the new-breed high-performance analytical database industry was headed. Will the existing players manage to survive? And is there a chance in hell for new ones to succeed in this market? If you had asked me to peek into my newly minted BI crystal ball in early 2009, I would have said “no way”. Why? Because at the time I was predicting the demise of a majority of the twelve or so players in this space based on the observation that the field was too crowded and too expensive. I figured, in this cut-throat competitive space, and with tough economic times ahead, we’d be lucky to see two, maybe three survivors come 2010.

Since then, not only have several other significant players, technologies and business models popped up (for example, Groovy, XtremeData, VectorWise, Hadoop/MR and the OSS guys), but we have clearly not seen the level of attrition I was anticipating. Nobody has (officially) gone out of business save Dataupia as best I can tell, and Datallegro got a check from Steve Ballmer. Sure, some folks are experiencing tougher times than others (I dare say several are hanging by a thread) but overall, resilience has been the name of the game. So what gives?

From my point of view, there are two types of new-breeders really: those living in “comfortably numb” mode (quietly outliving peers may not be a bad strategy these days), and those kicking it into high-gear with a vengeance. In the latter category I can’t help but think of Vertica, Netezza, Aster Data and ParAccel. It takes a lot of “cojones“, cash and luck to build a new ADBMS company. But even blessed with all these, and given the proper planetary alignments, I would advise anyone considering a start from scratch nowadays to ponder the following points.

First, it is insanely expensive and complex to develop systems software to produce a complete analytical database engine (and I mean “complete” in a holistic Product Management sense). Tony Bain highlights some of the database startups challenges in his excellent series starting here (things are not significantly different between OLTP and OLAP in this respect). But systems software is a different animal than your run-of-the-mill corporate enterprise application. On average you’re looking at 200,000 man-hours and anywhere from $60M to $100M to fund such a venture to completion. This is just to get rolling. Additionally, you cannot just put out a database product and call it a day.

Maintaining (enhancing) a behemoth of Oracle or SQL Server stature runs hundreds of millions of dollars every single year. Everything from equipment to talent costs more when developing database software. Believe you me, the folks who will work on your SQL optimizer, inter-fabric communications, parallel or compression schemes better not be affordable newbies. Your development platforms won’t likely be the average oh-hum laptop attached to cheap storage. An efficient QA or Performance Group will cost a small fortune in payroll and redundant equipment. And seasoned performance architects don’t run the streets. You cannot assemble a database engine product by cobbling together open source bits and distributed talent like a new Web 2.0 RIA venture. You can’t take a Kia (even a dozen of them) to the Indy 500.

Second, I think it’s no longer possible to find sufficient levels of Venture Capital funding for such endeavors. My feelings on this issue are re-enforced when I read articles like this, or this. I think the writing was on the wall for several years now. Reports of the VC’s demise are greatly exaggerated, but the funds, the endurance, and the risk acceptance levels are gone. Small bets on small returns are in. Large bets on IPO-driven returns are out (for now). Even if you manage to score a major industry name like Mike Stonebraker on your Board, I think VCs in this space (those that are left and not now engaged in M&A) will say “talk to the hand”. Investors in numerous existing new-breeders are biting their nails to the bone (or looking for ways out). So to me, the train has left the station. And unless you can pony up your own seed money, trying to fund such a project via institutional money is currently, in my opinion, an exercise in futility.

Third, the field is already too crowded and spread out very thin. For a great overview of the major players out there, don’t miss Bloor Research’s competitive analysis paper. A lot of existing players do not have sufficient “boots on the ground” to make headway against larger ones, much less established powerhouses. Heck, even going against Vertica’s deep-pocketed marketing is no piece of cake. Worse yet, in this business, success is not guaranteed by technical superiority. I know it sounds heretic saying this about an industry dominated by performance claim testosterone, but it’s true.

Besides technical prowess, you need to get the word out louder than everyone else. Unfortunately, everyone has the same “word”. In a crowded space this means you have to yell “Fire!” pretty darn loud and relentlessly to get noticed. I think a lot of “database deals” are sealed on the golf course, more so than POC or bake-offs. Mind you, this is probably the case with most enterprise software. But to get your foot in the door, you need BOD and Primadona action. BOD are well-connected heavy hitters on your board. Primadonas are the star sales guys (or gals) currently working for your competition. Those you’ll have to poach with sweetheart deals to come work for you, a totally unproven new-breeder with a year of runway to go.

Fourth, pricing pressure in this business is relentlessly choking. This is a consequence of my previous point. A little over a year ago, word on the street was $100K/TB retail ($50K/TB street price) but now we’re seeing $20K/TB retail (TwinFin land), which probably means you can do $10K/TB on the street. Aster Data is pitching an appliance for $50K (1TB, includes Dell hardware), and Oracle’s new improved Exadata V2 (SATA storage) even touts $5,700/TB so I mean, at these margins, you’re basically talking about giving stuff away, and in a lot of cases, I suspect that’s what’s going on. So unless you’ve been around the block a bit and have some ammo in the bank, I don’t know how a newcomer can sustain this type of pricing “carpet bombing”. As if that weren’t enough, you have OSS and cloud players breathing down your neck. Customers expect more for less and perception of BI as a “commodity” is growing. In this pricing environment, survivability for a newbie is improbable at best.

Fifth, several windows of “technology opportunity” for ADBMS are closing. For example, if your great idea for a new ADBMS company involves a columnar approach, you might be too late to the party. If your “innovation” hinges on massive parallelism, compression, in-memory caching/cubing schemes, super-fast intra-nodal fabrics, hybrid MPP/SMP, hybrid row-column storage (PAX-like), or yet another SQL chip accelerator or super-duper FPGA, you might have missed the boat (on the other hand, if you figured out how to do analytics on compressed encrypted data, then you might be on to something).

I believe the top new-breeders did all the technical legwork in the past 4-5 years. It took Mike Stonebraker long enough, but Vertica pretty much put columnar on the map. And most of that engineering is now mature enough (and well proven) to warrant acquisition interest from the big boys. Initially, the big guys took a “wait and see” attitude (remember, OLTP butters their bread anyway, not analytical OLAP) but now, having seen results and traction on others’ dime, I think they’re ready to pony up some cash (classic buy vs. build decision) to absorb the bits and pieces suiting their marketing strategies. By doing so, the good ole boys re-invent themselves and say “hey look, we have columnar technology as well now!” (How Sybase didn’t corner this market with IQ is beyond me, especially having read Seth Grime’s excellent paper about it).

Better yet, by integrating new technologies into existing code bases, the big dogs can say “hey, we have the best of both worlds for OLTP and OLAP” (Oracle’s latest Exadata comes to mind). And perhaps “look, we have SMP on the processing side and MPP in storage layer”, or vice-versa, thereby returning to the old “one-size-fits-all” GP-RDBMS paradigm so criticized by Stonebraker (but so convenient for the corporate user).

And clearly, given the growing popularity of “operational analytics”, an OLTP+OLAP offering is compelling. So I think the “proof-of-concept” window for many new-breed technologies, specifically MPP columnar (but others as well, for instance, acceleration hardware, where Ingres is picking up VectorWise, or MPP where Microsoft snapped up Datallegro), has closed. The winners (and their results) are in and acquirers will likely make their move in 2010. This dovetails nicely with a recent TDWI survey claiming half the respondents plan to replace their DW platform between 2010 and 2012 (apparently this is presented here on October 7^th).

All this being said, is it possible that a brand new software endeavor currently in stealth-mode development in Nepal might suddenly dominate the analytical database scene within months? How about a revolutionary FPGA/SQL Chip/Flash/Optical hardware contraption that could blow the hinges off industry standards and benchmarks? Sure why not. Real innovation is usually unexpected, and often unintended. But I don’t see it being driven by the classic “VC funds startup makes big database scores big IPO” model much longer.

When I look at things like Hadoop and current developments in the OSS space for VLDB analytics, I still have trouble grasping the business model, but I clearly see “life force” innovation at work here. A year ago I would never have expected a place like Visa to stray from the “Big Threes” but nowadays these guys are messing with Hadoop! People are also doing amazing things with MapReduce implementations and BigTable KV types of massive data storage systems.

How does open source fare against the five points mentioned above? Pretty darn well if you ask me. Costs are significantly lower, venture capital is not needed or minimal, engineering is crowd-sourced, there’s more breathing room, market entry is viral and massive, distribution and testing self-fueled, and pricing (or lack thereof) better controlled.

Additionally, open source seems shielded from the “Borg Effect”. I don’t see how massive proprietary shops like Oracle, IBM or Microsoft can successfully “absorb” these entities. I don’t think Larry has a clue what to do with MySQL. He can’t really unload it, but he can’t really integrate it either. Darn Trojan horse! In 2008, Infobright went Open Source and raised $10M in the process. Looking at the results, I think these guys were smart!

So if you’re thinking about building yet another high-performance analytical database engine the classical way (and not going OSS), my advice to you is simple: unless you have $60M in the bank and technology significant (and new) enough to impress people like Daniel Abadi (good luck on that one), you might be climbing up a greased pole. I'm not saying it's impossible mind you, but there’s been a lot of cooks making the same sausage over the last five to six years to last us a while. Maybe it's time to look at the next curve.

6 comments:

Ben WertherSeptember 23, 2009 at 11:47 AM
Hi Jerome,

Great article. Building an enterprise-grade scalable analytical database is a huge endeavor, and requires building a team of some of the most sought after engineers in the world, many (4-5+) years of deep and broad technology investment, and deep ties and understanding of the enterprise and how to repeatably close business there. It really isn't easy, and I fully expect that the shakeout you predicted will come in the next 12-24 months.

I was intrigued by your prediction of the winners in the space, and was bemused by the fact that Greenplum wasn't on the list. I do understand why that is -- candidly we've benefited from a strong pipeline of word-of-mouth referrals in enteprise CIO circles, and have chosen not to play the buzz marketing game of the Paraccels and Aster Datas. But we recognize that we need to raise our profile outside of these circles.

Along these lines, your list correctly reflects level of marketing buzz, but not actual customer traction.

Case in point -- at Greenplum we closed 14 new large enterprise customers last quarter, and have more than 80 customers (compared to Netezza's 200ish). We're adding new customers at approx the same rate as Netezza, and we're on a significantly faster growth trajectory than they were at our stage. Those aren't dinky little customers -- they are market leaders like NYSE, Nasdaq, eBay, FIM/MySpace, Deutche Bank, TMobile and many many more.

If you were to recast your list based on this metric, you'd end up with Netezza and Greenplum, with Vertica on the bubble. The others may spin a good story, but where are the customers?

- Ben
UnknownSeptember 23, 2009 at 1:45 PM
Hi Ben,

First, thank you for following the blog and for your kind comments. If memory serves me right, GP has the largest MPP grid out there (96 nodes) running 6PB at eBay so clearly its reputation/achievements speak for themselves thus far.

It's true that GP's marketing presence has been lacking, at least IMHO - ever since they published that cloud paper with great fanfare several months ago (which was about setting up private clouds really) I have not heard/seen anything from them and there doesn't seem to have been significant follow-through on that opportunity. In all fairness I don't check out the website very often either and as I never hear from GP on twitter either, I tend to view them as "comfortably numb" when in fact last year they seemed much more active.

But if everything is going on behind the scenes out of the public eye as you suggest, this would clearly explain that. However in my experience, when a vendor has the sort of success you describe, they're hard-pressed not to scream it from the rooftops and understandably so.

I'm not sure you can relegate Aster's progress to buzz marketing either -- they came out with MySpace and Akamai as clients and that's not small potatoes. Vertica has an impressive list and is constantly out there "on the street" talking about it and engaging people. And if one company has had little marketing so far, to say the least, it's definitely been ParAccel (save for the infamous 30TB TPC benchmark of course which wasn't really a marketing move IMHO).

I think in all 3 cases there has been customer traction - perhaps not to the level and numbers GP benefits from, granted (although in Vertica's case, you gotta wonder...)

But as you correctly point out, my metric was not simple customer size & counts. It has more to do having what I call a strong "pulse" out there. I'm all with you on the marketing hype and all that, but when successful implementations and (more importantly) satisfied customers don't follow, the marketing usually collapses (when the money runs out). So we'll have to wait and see how this all comes down in the next year or so as you point out.

Thanks for chiming in!
J.
Dave MenningerSeptember 24, 2009 at 7:08 AM
Jerome,

Very insightful article. You have catpured the market dynamics well and comprehensively. I'm also flattered by your comments about Vertica.

Dave
UnknownSeptember 24, 2009 at 8:11 AM
Hi Dave,

Thanks for the kind comment and for following the blog!
J.
Susan DavisOctober 1, 2009 at 6:01 AM
Jerome,

Great analysis and well-thought out as usual. While the market certainly is crowded, and we are likely to see other companies in trouble over the next year, it is not one homogeneous market.
Some of these companies are focused on the high end of the market (in terms of database size, use cases like large enterprise data warehouses) - and I would put Greenplum and ParAccel in this bucket. Others, like Infobright, are more focused on use cases that include "log analytics" apps such as are prevalent in online marketing, telco, other online businesses etc.; in providing a low cost, small footprint embedded analytic database for ISVs and SaaS apps; and for data warehouses for the mid-market. This has been driving the huge growth in customers that Infobright has had since our move to open source - from less than a dozen customers a year ago to just about 100 today. Our move to open source was clearly the driver for this growth, as it allowed widespread adoption of our open source technology. There have been over 15,000 downloads of our open source version in the past year, which has had an enormous impact in terms of helping us improve both the quality and functionality of both our open source and enterprise edition products based on community feedback. It has also enabled the company to become the database for open source BI, as we work closely with companies like Actuate, Pentaho, Jaspersoft, Talend, and MySQL and others to provide integrated downloads that deliver an end-to-end BI infrastructure. That is tough to do in the proprietary software market, especially if complex hardware configs are required. Getting back to my original point (finally :-) the very large market for ADBMS is a combination of several smaller markets, so there are not quite as many vendors in each segment as the overall market makes it appear. In any case, I have gone on long enough..I look forward to your blog posts and continued presence on Twitter.
UnknownOctober 1, 2009 at 8:02 AM
Hi Susan,

Thanks for reading the blog and nice comments. I agree there are several sub-segments in this market as you point out. Not only in terms of data volumes but also in terms of "service area" or specialty.

You guys clearly serve the MySQL market much of which deals with "web stuff" including log analytics as you point out. And MySQL is a very very large installed base.

KickFire has bet the same farm it seems (except they're hardware-based and of course not OSS). Although I see InfoBright easily able to break out of this MySQL "mold", I don't think KF has the same flexibility. I see struggle in their future.

On the VLDB side, GP and PA are clearly there as you point out. I'm surprised you didn't mention Vertica but I do not know what their largest data volume has been. I would venture to guess they focus more on customer volume for public reporting than individual deal sizes (data-wise I mean) but that's just a hunch.

Big Honking Databases

Tuesday, September 22, 2009

Please Stop Making More ADBMS Sausage

6 comments:

About Me

Small sample of blogs I follow

Search This Blog

Tracer

Followers

Blog Archive