Big Honking Databases: April 2009

Monday, April 13, 2009

On-demand BI beyond SMB

The more I read, test and learn about on-demand BI every week, the more surprised I am to realize how many players there are on the market. And this isn’t even just a small & medium size (SMB) market anymore as I originally assumed. There’s a whole slew of on-demand BI companies out there targeting serious size enterprises with giga and terabyte size warehouses.

It seems one of the main arguments against the performance (latency) issues for BaaS is that the penalty is imposed only once at load time. In other words, yes it’s time-consuming and fairly slow to upload warehouse data (sometimes up to weeks) due to current network and pipeline bandwidth, but it’s something that is done only once and subsequent pushes are essentially incremental and consequently much quicker. I guess I can buy that argument provided the “done once” endeavor is resilient enough to resist catastrophic errors. For example, I better be able to just pick up and continue where I left off if I spent 9 of 10 days waiting for my upload to finish before my network connection dropped or my server exploded.

But even the “big guys” are dipping toes in the SaaS pool lately. As this article points out, SAS is investing in the cloud big time.

Vertica seems to have tacked from an “appliance” model to a hosted or cloud-based one (Vertica for the Cloud) as evidenced in their latest webinar as well, The Cloud and the Future of DBMSs in which they pretty much repeat their usual marketing litany.

Kognitio, who claims to have put the “D” in DaaS, just announced a cloud deal with Kelkoo.com, "Europe’s largest e-commerce website after Amazon and eBay”. They also have a cloud-based DaaS implementation with British Telecom (BT), who is about to lay off 10,000 people incidentally. Kognitio has been one of the most cloud-aggressive companies out there.

And of course behemoth Microsoft is breathing down everyone’s neck (discretely at the moment) via things like www.sqlserverdatamining.com/cloud and the entire Madison and Azure platforms.

Pretty much all the big players have some sort of stake in the “cloud” one way or another. No one wants to be left out, just in case. But even beyond these well-known players you also have folks like these here, some of which address specific analytical niches:

www.deciphertech.com – sales analytics with Salesforce.com

www.hostanalytics.com – financial analytics (budgeting/revenue planning) niche. I initially thought these guys might be connected to www.i-lluminate.com by the nature of the audio on their website.

www.adaptiveplanning.com - Budgeting, forecasting and reporting analytics.

www.quantivo.com – Customer behavior analytics.

www.1010data.com - I believe they use tenbase on the backend, are columnar in architecture and have an ODBC as well as an Excel plug-in connector.

www.shsna.com/ Nutricia North America (owned by Danon) makes baby food and now also runs Pentaho over MySQL in the EC2 cloud for its internal BI needs as described here.

www.kpiweb.com – Some French startup focusing on (I’m willing to bet) KPI metrics.

www.limtree.com – Another French startup. In fact, just a QlikView integrator.

Speaking of French sites, if you can read French then check out this BI blog at www.legrandbi.com – If you can’t read French, head over to translate.google.com and read it anyway because it’s a precious resource full of interesting “in-your-face” BI insight and informational tidbits I have not found elsewhere.

www.pivotlink.com which I mentioned in a previous post, is geared exclusively at large enterprise warehouses in the cloud (small players need not apply) and backed by Trident Capital from what I gather.

Even data integration seems to have made some inroads in the cloud realm. I often read that ETL and integration consume 70-80% of a typical BI project. I don’t know if the actual proportion is this huge, but I _do_ know from experience that integration tends to get grossly underestimated. That being said, it’s clearly a huge BI pain point and I was initially surprised to see anyone trying to do this on a hosted/cloud basis but the guys at www.boomi.com are pitching just that.

And speaking of boomi, I have to hand it to them for trying this approach which I believe has merit but they have about the worse webinar I have _ever_ attended. I think they’ve managed to do absolutely everything that a company should avoid doing in a webinar namely:

Advertize a webinar lasting more than 60 minutes. Right off the bat, that doesn’t give me a warm feeling. How could they possibly need that much time when everyone else is managing to stay at or under an hour?
Make it an expensive pain in the ass for people to connect. There’s a toll-full number to call in the US. Fine. Then there’s also a 9 digit access code. Then there’s an audio pin, then another 9-digit webinar code. What the heck?
Display what seems like an interactive question-answer widget during the session but in fact have no one managing it on the other end. Hate when that happens.
Hire a consultant presenter who is obviously quite astute technically but sporting a monotonically depressive voice.
Take 35 minutes to explain how to take an FTP input, transform some rows, and output it to a folder, all on a local machine. Fascinating, but I think most people could have groked this rocket science in say, five minutes. No wonder they need 90 minutes to get through all this!

The point being, I couldn’t stand more than 35 minutes of this treatment and decided to bail out and try it out later on my own. Basically these guys have an agent-based architecture that allows you to connect to their “AtomSphere” (get it? Like atmosphere. Yeah) and have agents manipulate your data via connectors and transformations. You can setup and “send” agents onto other boxes and platforms. It did sound interesting technically, and I tried to pull their demo from the site to no avail. I got a message saying they welcomed me. Great. Where’s the bits at? I emailed back to support. I was assigned a “request for assistance” case number. Wow. Finally I got another message saying my account was active and I could login from the main page (which contradicts the initial instructions claiming you’ll get some email with a link in it) – Oh, and as for the suggestion that since I was new to Boomi, I should register for one of their training webinars, thanks but no thanks. I will definitely try it out though. It’s too compelling to pass up based on a few minor glitches from weak marketing or customer support departments.

Moving right along, I did want to mention www.lyzasoft.com, even though their offering is not “on-demand” per say, it kind of is in a “local” way. Basically you download their java thick client application. You then plop a bunch of connectors onto a workbook, bring in some data, and start graphing or analyzing it within minutes. All point-and-click, drag and drop. Yes I know this sounds like a “so-what” scenario but you don’t understand: I actually had _fun_ using this thing, yet it’s far from a toy! I wasn’t planning on spending more than 30 minutes with the product initially but ended up messing around with it for a couple hours. You can pull in anything from ODBC to flat-file connections, then you graphically describe relationships among tables (ie: joins) then you can merge that output with other data sources into a graph or statistical “component” where you then drag measures and attributes into corresponding axis “boxes” (much like the Excel pivot table designer interface). Amazingly enough (to me) this stuff just worked. It’s really cool how you can try stuff out and then back out or delete, then start from scratch or add/delete relationships and data at will. It’s very intuitive. There’s some performance and UI quirks (not surprising running Java UI code on Windows) and I doubt you can bring in significant (read terabytes) amounts of data at this stage. Yet Lyzasoft claims 175-200 maximum number of data input sources with largest customer databases at a “few million rows” of around 250-300 columns (I think that’s probably around 30-50GB of data roughly?). But overall it’s an impressive beginning and, quite honestly, probably easy enough to adapt to a hosted model. Add to that excellent and efficient real-time customer support, and you have a winner worth looking at here that could, in my opinion, pose a serious challenge to someone like QlikTek, with the right analytical engine behind them.

Tuesday, April 7, 2009

And up through the ground came a bubblin' crude

Anyone not convinced that “doing BI” is both very hard and very expensive need only consult this link. Yes it’s from 2006, but have things really changed since then? Probably the opposite. In his postings, SSAS guru Dave Rodabaugh describes the hiring process for BI/DW architects in a really compelling five-part story. He starts off by stating “I admit to being a real hammer when conducting an interview.” – Quite honestly, I don’t blame him given these folks are charging anywhere north of $250/hr these days. At that price, you _better_ damn well know your stuff!

Here are a couple other posts on the trials and tribulations of hiring good DBAs: http://tinyurl.com/ct75os and http://tinyurl.com/dksywb

And finally, here’s what I thought was an interesting set of SQL Server interview questions (understanding them alone is challenging).

What’s the point of all this? DW/BI is indeed rocket science. Its High Priests are DBA and DW Architects. And with time, they become irreplaceable in the enterprise. If a programmer, a software architect, or even an enterprise architect makes a mistake, you can usually catch it in time and if not, there is usually an opportunity to fix the problem downstream – it’s like losing one of several engines in flight. It’s not fun, it can get bumpy, but chances are no one is going to die. On the other hand, a database BI/DW management mistake can be fatal because data can get whacked, irreversibly. Or it can get stolen, or corrupted for example, with dramatic legal repercussions for the business. This is like an explosion in flight. It’s not something you can fix in time, and survival is not likely. That’s why competent DBA and DW architects rule the BI/DW world. If you want to see what these guys really do for a living and why they commend the big bucks, check out the following for starters:

http://dbscience.blogspot.com/

http://prodlife.wordpress.com/

http://www.fulltablescan.com/

http://jesseorosz.spaces.live.com/

http://optimaldba.blogspot.com/

http://www.petefinnigan.com/weblog/entries/

http://www.sqlservercentral.com/

So I have two questions about the existing state of BI. First, why did it get so complicated and second, is it good or bad for business?

I think the first question is simple enough. Things are like this because the darn products are simply too hard to use. You shouldn’t need a PhD to properly setup, configure, deploy, tune and question a database engine. Forgive me for liking simplicity, but last I checked, operations manuals for behemoths like DB2, Oracle and SQL Server were in the thousands of pages combined. To me, when a set of product manuals have more pages than the Federal Budget, that’s cause for concern. I’m an avid follower of the KISS principle.

Recently I purchased this book which is essentially a Microsoft SQL Server Analysis “bible” covering every part of that ecosystem including SSAS, SSIS, SSRS and Excel. It runs 624 pages. And quite honestly, unless you have significant mastery of each subject matter, you’re not going to be very effective doing BI at the enterprise level (at least on the Microsoft stack, but the others are typically hairier anyway, let’s face it). In my opinion, this book represents the _minimum_ one needs to know to be effective in this business. And believe me that’s not for the casual weekend DBA.

The second question is a bit more subtle. How do you define “bad for business”? For the past forty years, business leaders (CIOs) and bean counters (CFOs) have computed the total costs of owning and exploiting these databases (salaries, hardware, software, licenses, power, time wasted, failure rates etc.) and determined that the rewards justified the costs. But do they really? In light of the well-documented failure rates among business intelligence projects in the past years, can they possibly be right? It isn’t too challenging to pull up a myriad of articles such as these decrying and documenting the dismal success rates of BI project all over the planet: http://tinyurl.com/cr8lt8 or http://tinyurl.com/cnc36h

And in some cases, you also need a PhD to even figure out _how_ to calculate ROI on a BI endeavor as this 2002 44-slide PowerPoint presentation from Jonathan Wu will attest to.

So from the looks of it, countless millions of dollars and hours of intellectual effort have been spent in the past decades achieving mostly failure. The same argument could be made (and has often times) about software development in general, but it looks like BI is faring even worse! When’s the last time you heard about a successful enterprise BI project that wasn’t over budget, over schedule or over-complicated? We know what happens when complexity is used to hide questionable assumptions, processes and results. Ask anyone at AIG. If it’s too complex, too obfuscated, and too inaccessible, then chances are it’s bad business.

And so it is in light of this that I have recently started to think differently about this “cloud” offering in the BI world sometimes referred to as BaaS, BIaaS or even DaaS. Having been in software technology for twenty years, I’ve seen my share “new improved”, world-changing, and “revolutionary” concepts come and go. But people are missing the point about this one as it applies to BI.

The BI “cloud” isn’t about technology, delivery, security or governance hurdles per say. It’s about abstraction. And abstraction is the antidote to complexity. This is why I feel many players in today’s BI world are threatened by this concept, and why there’s so much gold in them thar hills when applied to enterprise business intelligence in judicious ways.

Wednesday, April 1, 2009

Mind your own business [intelligence] Part II

First, I want to thank Curt Monash (www.dbms2.com) for mentioning my blog in his postings this morning -- that's really nice of him. He is also reviewing several BI players at the moment. God knows there's a lot of them. Go read his reviews.

Since my last post about that market, I emailed with Gooddata's CEO and Founder Roman Stanek who was kind enough to put up with my questioning back & forth emails. He suggested several other names I might be interested in exploring so I set out to do just that. In the process, I discovered Indicee who totally blew me away. They have a clean, fast, operational UX. Not only was I able to upload data immediately, but they were nice enough to kick up my account size so I could push up more data. Next, I have to say their customer support response time is just awesome. For me, Mr. instant gratification, this says a lot about a company. And finally, believe it or not, you can ask questions of the data in English! This has been one of my "dreams" for a long time and I had actually envioned the same kind of interface Outlook has to setup mail rules, but theirs is even cooler. As you probe the data, they show you what the question is like _IN ENGLISH_ -- Nirvana. Their stuff just works. These Canadians can write software I tell you!

Not so impressive was PivotLink. Their website clearly displays a link marked "Start Free Trial". Stupid me, I thought I might be able to get a free trial. No such luck. Instead, I was contacted by a Sales Development person who offered to schedule a "demo/webinar" for me and "go over my requirements" and determine if their "solution is a fit for the company". Are they kidding? Why don't you let ME the customer decide for myself? Thanks but no thanks. Then, they offered to "send me over" to their Chief Marketing Office. Needless to say, by that time I had already scratched off their name from my list of potentials. Disingenous and condescending.

Next in line was Lucidera. I contacted the company to see if a trial version was available. I'm not even going to write one word about this little experience. Instead, I'm going to post the email response I received -- you decide:

We actually have somewhat of a different "trial" experience than most
organizations. Since we are an on-demand model and analytics tends to be
pretty extensive, we have what is called a Pipeline Healthcheck. What we
do with the Pipeline Healthcheck after having a call with you to go over
your analytic requirements is we access your Salesforce.com data and
bring that into our application via a read-only log in and we analyze
your sales data, focusing on your sales people, sales processes and your
pipeline while highlighting areas of opportunity or potential risks that
we find based on our best practice analytics. Customers and prospects
are typically finding hundreds of thousands to millions of dollars in
revenue opportunity. The Pipeline Healthcheck also serves as a business
case with quantifiable value based on your sales data if you think the
application makes sense for your business. If you'd like to set up a
call to discuss further and your organization is a Salesforce.com
customer, please let me know of a few good times and a number to reach
you and I'd be happy to go over this further as well as discuss your
interest in sales analytics. Thank you!

Huh?!? I don't know about you, but the last thing I need these days is to be "pipeline checked". It just doesn't sound appropriate on a first date.

Big Honking Databases

Monday, April 13, 2009

On-demand BI beyond SMB

Tuesday, April 7, 2009

And up through the ground came a bubblin' crude

Wednesday, April 1, 2009

Mind your own business [intelligence] Part II

About Me

Small sample of blogs I follow

Search This Blog

Tracer

Followers

Blog Archive