If you’re thinking about building a new startup in the high-performance analytical database (ADBMS) market, hat’s off to you: kudos and respect my brother. I’ve been in the kitchen, and I’ve seen the sausage being made. But let me tell you something: you might be a day late and a dollar short to the party.
In the past years, I’ve often pondered where the new-breed high-performance analytical database industry was headed. Will the existing players manage to survive? And is there a chance in hell for new ones to succeed in this market? If you had asked me to peek into my newly minted BI crystal ball in early 2009, I would have said “no way”. Why? Because at the time I was predicting the demise of a majority of the twelve or so players in this space based on the observation that the field was too crowded and too expensive. I figured, in this cut-throat competitive space, and with tough economic times ahead, we’d be lucky to see two, maybe three survivors come 2010.
Since then, not only have several other significant players, technologies and business models popped up (for example, Groovy, XtremeData, VectorWise, Hadoop/MR and the OSS guys), but we have clearly not seen the level of attrition I was anticipating. Nobody has (officially) gone out of business save Dataupia as best I can tell, and Datallegro got a check from Steve Ballmer. Sure, some folks are experiencing tougher times than others (I dare say several are hanging by a thread) but overall, resilience has been the name of the game. So what gives?
From my point of view, there are two types of new-breeders really: those living in “comfortably numb” mode (quietly outliving peers may not be a bad strategy these days), and those kicking it into high-gear with a vengeance. In the latter category I can’t help but think of Vertica, Netezza, Aster Data and ParAccel. It takes a lot of “cojones“, cash and luck to build a new ADBMS company. But even blessed with all these, and given the proper planetary alignments, I would advise anyone considering a start from scratch nowadays to ponder the following points.
First, it is insanely expensive and complex to develop systems software to produce a complete analytical database engine (and I mean “complete” in a holistic Product Management sense). Tony Bain highlights some of the database startups challenges in his excellent series starting here (things are not significantly different between OLTP and OLAP in this respect). But systems software is a different animal than your run-of-the-mill corporate enterprise application. On average you’re looking at 200,000 man-hours and anywhere from $60M to $100M to fund such a venture to completion. This is just to get rolling. Additionally, you cannot just put out a database product and call it a day.
Maintaining (enhancing) a behemoth of Oracle or SQL Server stature runs hundreds of millions of dollars every single year. Everything from equipment to talent costs more when developing database software. Believe you me, the folks who will work on your SQL optimizer, inter-fabric communications, parallel or compression schemes better not be affordable newbies. Your development platforms won’t likely be the average oh-hum laptop attached to cheap storage. An efficient QA or Performance Group will cost a small fortune in payroll and redundant equipment. And seasoned performance architects don’t run the streets. You cannot assemble a database engine product by cobbling together open source bits and distributed talent like a new Web 2.0 RIA venture. You can’t take a Kia (even a dozen of them) to the Indy 500.
Second, I think it’s no longer possible to find sufficient levels of Venture Capital funding for such endeavors. My feelings on this issue are re-enforced when I read articles like this, or this. I think the writing was on the wall for several years now. Reports of the VC’s demise are greatly exaggerated, but the funds, the endurance, and the risk acceptance levels are gone. Small bets on small returns are in. Large bets on IPO-driven returns are out (for now). Even if you manage to score a major industry name like Mike Stonebraker on your Board, I think VCs in this space (those that are left and not now engaged in M&A) will say “talk to the hand”. Investors in numerous existing new-breeders are biting their nails to the bone (or looking for ways out). So to me, the train has left the station. And unless you can pony up your own seed money, trying to fund such a project via institutional money is currently, in my opinion, an exercise in futility.
Third, the field is already too crowded and spread out very thin. For a great overview of the major players out there, don’t miss Bloor Research’s competitive analysis paper. A lot of existing players do not have sufficient “boots on the ground” to make headway against larger ones, much less established powerhouses. Heck, even going against Vertica’s deep-pocketed marketing is no piece of cake. Worse yet, in this business, success is not guaranteed by technical superiority. I know it sounds heretic saying this about an industry dominated by performance claim testosterone, but it’s true.
Besides technical prowess, you need to get the word out louder than everyone else. Unfortunately, everyone has the same “word”. In a crowded space this means you have to yell “Fire!” pretty darn loud and relentlessly to get noticed. I think a lot of “database deals” are sealed on the golf course, more so than POC or bake-offs. Mind you, this is probably the case with most enterprise software. But to get your foot in the door, you need BOD and Primadona action. BOD are well-connected heavy hitters on your board. Primadonas are the star sales guys (or gals) currently working for your competition. Those you’ll have to poach with sweetheart deals to come work for you, a totally unproven new-breeder with a year of runway to go.
Fourth, pricing pressure in this business is relentlessly choking. This is a consequence of my previous point. A little over a year ago, word on the street was $100K/TB retail ($50K/TB street price) but now we’re seeing $20K/TB retail (TwinFin land), which probably means you can do $10K/TB on the street. Aster Data is pitching an appliance for $50K (1TB, includes Dell hardware), and Oracle’s new improved Exadata V2 (SATA storage) even touts $5,700/TB so I mean, at these margins, you’re basically talking about giving stuff away, and in a lot of cases, I suspect that’s what’s going on. So unless you’ve been around the block a bit and have some ammo in the bank, I don’t know how a newcomer can sustain this type of pricing “carpet bombing”. As if that weren’t enough, you have OSS and cloud players breathing down your neck. Customers expect more for less and perception of BI as a “commodity” is growing. In this pricing environment, survivability for a newbie is improbable at best.
Fifth, several windows of “technology opportunity” for ADBMS are closing. For example, if your great idea for a new ADBMS company involves a columnar approach, you might be too late to the party. If your “innovation” hinges on massive parallelism, compression, in-memory caching/cubing schemes, super-fast intra-nodal fabrics, hybrid MPP/SMP, hybrid row-column storage (PAX-like), or yet another SQL chip accelerator or super-duper FPGA, you might have missed the boat (on the other hand, if you figured out how to do analytics on compressed encrypted data, then you might be on to something).
I believe the top new-breeders did all the technical legwork in the past 4-5 years. It took Mike Stonebraker long enough, but Vertica pretty much put columnar on the map. And most of that engineering is now mature enough (and well proven) to warrant acquisition interest from the big boys. Initially, the big guys took a “wait and see” attitude (remember, OLTP butters their bread anyway, not analytical OLAP) but now, having seen results and traction on others’ dime, I think they’re ready to pony up some cash (classic buy vs. build decision) to absorb the bits and pieces suiting their marketing strategies. By doing so, the good ole boys re-invent themselves and say “hey look, we have columnar technology as well now!” (How Sybase didn’t corner this market with IQ is beyond me, especially having read Seth Grime’s excellent paper about it).
Better yet, by integrating new technologies into existing code bases, the big dogs can say “hey, we have the best of both worlds for OLTP and OLAP” (Oracle’s latest Exadata comes to mind). And perhaps “look, we have SMP on the processing side and MPP in storage layer”, or vice-versa, thereby returning to the old “one-size-fits-all” GP-RDBMS paradigm so criticized by Stonebraker (but so convenient for the corporate user).
And clearly, given the growing popularity of “operational analytics”, an OLTP+OLAP offering is compelling. So I think the “proof-of-concept” window for many new-breed technologies, specifically MPP columnar (but others as well, for instance, acceleration hardware, where Ingres is picking up VectorWise, or MPP where Microsoft snapped up Datallegro), has closed. The winners (and their results) are in and acquirers will likely make their move in 2010. This dovetails nicely with a recent TDWI survey claiming half the respondents plan to replace their DW platform between 2010 and 2012 (apparently this is presented here on October 7th).
All this being said, is it possible that a brand new software endeavor currently in stealth-mode development in Nepal might suddenly dominate the analytical database scene within months? How about a revolutionary FPGA/SQL Chip/Flash/Optical hardware contraption that could blow the hinges off industry standards and benchmarks? Sure why not. Real innovation is usually unexpected, and often unintended. But I don’t see it being driven by the classic “VC funds startup makes big database scores big IPO” model much longer.
When I look at things like Hadoop and current developments in the OSS space for VLDB analytics, I still have trouble grasping the business model, but I clearly see “life force” innovation at work here. A year ago I would never have expected a place like Visa to stray from the “Big Threes” but nowadays these guys are messing with Hadoop! People are also doing amazing things with MapReduce implementations and BigTable KV types of massive data storage systems.
How does open source fare against the five points mentioned above? Pretty darn well if you ask me. Costs are significantly lower, venture capital is not needed or minimal, engineering is crowd-sourced, there’s more breathing room, market entry is viral and massive, distribution and testing self-fueled, and pricing (or lack thereof) better controlled.
Additionally, open source seems shielded from the “Borg Effect”. I don’t see how massive proprietary shops like Oracle, IBM or Microsoft can successfully “absorb” these entities. I don’t think Larry has a clue what to do with MySQL. He can’t really unload it, but he can’t really integrate it either. Darn Trojan horse! In 2008, Infobright went Open Source and raised $10M in the process. Looking at the results, I think these guys were smart!
So if you’re thinking about building yet another high-performance analytical database engine the classical way (and not going OSS), my advice to you is simple: unless you have $60M in the bank and technology significant (and new) enough to impress people like Daniel Abadi (good luck on that one), you might be climbing up a greased pole. I'm not saying it's impossible mind you, but there’s been a lot of cooks making the same sausage over the last five to six years to last us a while. Maybe it's time to look at the next curve.