Tuesday, July 28, 2009

How I Learned to Love Mondrian (confessions of a WISA guy)

I’ve been playing with Pentaho’s Mondrian for almost a year now on and off. I have to say, everytime I mess with that stack I am more and more impressed by its richness and capabilities. And twelve months ago, when I started learning it, I was what you could call “severely LAMP-challenged”. I’ve sure made a lot of progress since then and wanted to talk about this as I figured it might help other Microsofties out there needing (or wanting) to put a toe in these mysterious LAMP/OSS waters.

The first thing I ever did with Mondrian was figure out how to install it on a Windows platform. The reason I did was twofold. First, we didn’t have appropriate Linux hardware/software in house at the time, and second, I have way more experience on Windows so it’s a lot easier for me, and third, I wanted to do it locally and avoid dealing with cross-platform bridging at the moment (our ODBC drivers are Windows only as well). Path of least resistance is an engineering mantra in my book.

Lucky for me I had worked with Java and Apache Tomcat in the dot-com days so I had no trouble pulling and installing the JRE/JDK and the web server itself (which comes as a Windows service). Next, I deployed the Mondrian WAR file into the Tomcat webapps folder which caused it to be automatically “deployed” as a web application. Way easier than deploying ASP.NET applications (but you didn’t hear this from me ).

Then, I fired up the Mondrian landing page, clicked on the Jpivot link and, of course, kaboom. Yes, without a JDBC driver, Mondrian is not a happy camper. It took me a little longer to figure out the Sun JDBC-ODBC bridge and how to plug corresponding connection string it into numerous Mondrian files to replace the default connections there (which are all for MySQL if I recall).

The Mondrian documentation isn’t great but if you Google long enough you can usually find some other poor slob with a similar problem and, with luck, published solutions online. [Side note: I once had the nerve to email Julian Hyde, their Chief Architect, about some technical question. He abruptly suggested I don’t bother him and use the “community” forums instead]. Unfortunately those forums are often useless for non-enterprise (read: non-paying) users. There are flurries of unanswered questions and problems up there. This is a generic OSS problem I suppose. You get what you pay for J

So finally I had my bridge setup, along with a DSN called MondrianFoodMart (the default) pointing at the default Access database (distributed with Mondrian). And now I was able to fire up Mondrian on top of the database and do a couple drills, run a couple MDX queries. Bliss.

Next, I took that Access database and exported it to CSV format with corresponding DDL. This is the way to feed the XSPRADA engine. Fired up our RDM/x server and ran the DDL scripts. Then re-defined the MondrianFoodMart DSN to point to us via our ODBC driver (32-bit only, 64-bit won't fly with the bridge - painful lessons learned...). Reloaded the Mondrian page, and voila! Mondrian was now talking to RDM/x and displaying the Sales cube.

One point of the exercise was being able to show Mondrian OLAP on top of our database. Another was being able to show our database’s behavior in time as more and more queries come in (hint: it gets faster). Now, with Mondrian, this is a little tricky because the platform is heavily cache-based. Mondrian shoots initial queries at a relational system and proceeds to cache heavily as it aggregates results. So the more you use it, the more it caches. Obviously the reason for this is that Mondrian is designed to run on top of relational databases, and not “OLAP-intelligent” engines such as ours. It has to translate the MDX into straight SQL queries every time, as it fills its caches initially. Nevertheless it does re-hit the database as needed when you start slicing and dicing on new dimensions or facts, as one would expect. So you can actually see our engine’s “dinosaur tail” behavior as I once described in a previous post.

Now, the MondrianFoodMart database is fine for setting up the stack, but it’s not particularly interesting in so far as data volume goes if you’re in the VLDB space. More recently, I attempted to setup the TPC-H/SSB sample data under Mondrian, meaning I tried to manually create the XML defining some SSB cube. There is a whole fairly complex XML language Mondrian uses to define and connect fact tables (measures) with associated dimensions. They have a UX driven tool called Workbench but I could never get it to work on my system (and didn’t have enough time to keep messing with it). With online help and using the FoodMart.xml sample file, I was able to get a basic cube up in about a day. Nothing fancy, but now I can OLAP into arbitrarily large data sets and that’s a good thing.

As cool as it is seeing our stuff run under Mondrian, I always dreamed of doing the same thing under Excel (as in 75% market share, yeah I want to support that please). Until recently, I thought this would not be possible until we implemented MDX in the engine but then I saw the light. It is called the SimbaO2X connector and it rocks!

[Start Commercial] Did I mention how much I love this company Simba? They pretty much wrote the book on data connectivity. Within a day they had me a 30-day trial version of their O2X offering, no questions asked. And follow-up to boot. Their stuff works, and they know how to take care of people. What a concept! [End Commercial]

This SimbaO2X puppy lets ODBO clients (say like Excel) talk to XML/A OLAP servers (say like Mondrian). Note, there is a similar offering from Pentaho called Pentaho Spreadsheet Services. It carries a small yearly license fee from what I understand. Supposedly you can email Pentaho sales for additional information and a local contact. I’m still waiting for their reply. Hey it’s OSS…Did I mention you get what you pay for?

Either way, the relevant fact is that the SimbaO2X connector works without a hitch. I am finally able to create and manage pivot tables from Excel, talking to Mondrian (via XML/A), talking to RDM/x (via ODBC)! This is the bomb! I need to really get a deeper understanding of MDX capabilities now. But the more I learn about it the more impressed I get, and the better demos I can do.

7 comments:

  1. Hey Jerome,

    How in the hell did you get SimbaO2X to work with Mondrian?!

    I've tried an struck-out a few times. Mind offering some pointers?

    Cheers,
    ~A

    ReplyDelete
  2. Oh dear, a blast from the past! :) I copied their license file to my system folder, selected it as a connection, set the parameters in the dialog (XML url etc...I don't recall the details) and that was pretty much it. What problems are you encountering?

    ReplyDelete
  3. Sorry for not checking-back sooner. I eventually got it working.

    Turns out I had licensing issues. Once I resolved that. Voila!

    Thanks for your input, though.

    Next challenge: running custom/canned MDX queries against a cube from xls. Fun stuff.

    Cheers,
    ~A

    ReplyDelete
  4. Good deal. Yes it's not for the faint of heart and if you have the time to putz with this stuff it can be rewarding - otherwise it can be frustrating as you can see :)

    Yours in BI,
    J.

    ReplyDelete
  5. Jerome,

    I saw the posts above, but I am still having difficulty. To all outward appearances, I have SimbaO2X Provider set up correctly on a Vista Ultimate box. I am trying to hit a Mondrian schema running under the community version of Pentaho on a virtual linux box. The virtual computer is visible on the network and I can hit the schema using Firefox from any physical computer.

    From the XMLA Provider Logon dialog box my entries are:

    URL: http://boss-desktop:8080/pentaho/Xmla
    User: shumphre
    Password: mypassword

    I get this error: An HTTP redirect to this URL was encountered: http://boss-desktop:8080/pentaho/Login:jsessionid=etc.

    This is the web page for the Pentaho User Console which the provider, of course, can't process.

    the URL in datasources.xml is:
    http://boss-desktop:8080/pentaho/Xmla

    I've googled this one down to the stubs of my fingers. Any thoughts?

    ReplyDelete
  6. @Steven, sounds like some funky connection/permissions issue perhaps? Who is generating/displaying that error?
    Use my email if you want to pursue
    jerome.pineau@gmail.com - easier than posting comments on here ;)
    Best,
    J.

    ReplyDelete
  7. Sorry to wake up an old topic, but could you make that SSB schema file available somewhere if you still have it?

    ReplyDelete