Sunday, February 22, 2009

Try the hotpockets they're breathtaking!

So, as we saw earlier, XSPRADA belongs to a new generation of software companies building specialized analytical database engines for business intelligence purposes. And as previously described, each of these companies concocts a “special recipe” of “new-age” engineering features to offer simpler, faster and cheaper results than the competition. At least, that’s the marketing shpiel. So what makes XSPRADA a little different from the rest? Let’s re-examine secret sauce ingredients from our last post and see how (and if) each is used in the XSPRADA kitchen:

Row-based or column-based?
Neither. XSPRADA doesn’t think in terms of rows or columns, or any other pre-determined canned data structure, tabulated or not. The XSPRADA approach is one of “adaptive data re-structuring” where data is re-arranged on disk based on queries being asked. If it makes sense to re-structure a given piece of information in a particular way, then XSPRADA does just that. This means that, in some cases, the information will indeed be re-arranged into columns, but such an arrangement is not a hard-coded rule. Rather, it is adapted to the queries being received, taking into consideration the nature and type of the data itself. So, it is entirely possible to have the same information re-arranged on disk in numerous different ways simultaneously to satisfy different querying patterns.

Caching and direct memory access
In the XSPRADA engine, all queries are automatically converted to their algebraic (mathematical) form and cached as such in memory. Result sets may or may not end up realized on disk as needed. In that respect, “materialized views” are automatically generated in RAM as needed by the engine and based on usage without explicit user intervention. Basically if you hit a certain area of the data frequently, you can be confident it will end up being cached fairly quickly. A similar process occurs when a user is setting up and sending in “sliced and dice” queries. XSPRADA detects this behavior and immediately starts building an OLAP cube internally, possibly aggregating information as needed to answer subsequent queries faster. As such, there is no explicit need to pre-define cubes or cache aggregates because these steps are taken automatically based on usage patterns.


Separating engine functions to judiciously place some "closer to the data" for pre-processing queries
There is no need for such logical-physical layer optimizations in the XSPRADA system because the engine is already as close to the bits on disk as can be. As a matter of fact, the XSPRADA engine reads data directly off disk store and requires no specific “loading” step. The data is ready for use as soon as a SQL CREATE TABLE (or INSERT INTO) DDL statement is issued.


Maximizing storage I/O
XSPRADA manages its own disk I/O using a channelized streamed architecture. The more attached storage you have, the more channels you can support and the faster the I/O becomes. As a matter of fact, the XSPRADA engine scales better with disk space than with CPU cores. The XSPRADA I/O system is streamed and not random-access based as many conventional transactional systems are. Through adaptive data restructuring, the XSPRADA engine paginates most accessed data dynamically in contiguous “islands” yielding much higher performance when streaming to and from the dedicated pages.


Automatic indexing algorithms either at load time, processing time or both.
The XSPRADA engine requires no indexes, no FK/PK or any other standard database constraint typically seen in OLTP systems to pre-define data relationships. XSPRADA indexes the data internally based on inspection and queries asked. Internal relationships are also inferred automatically based on usage patterns. This is particularly attractive in mixed-workload environments where different areas of the data may have different relationships and/or models and in conventional systems, addressing both “worlds” at the same time in ad-hoc fashion is less than practical or fast.


Proprietary optimization of SQL queries
XSPRADA aggressively optimizes all incoming SQL queries. The SQL is transformed into algebraic expressions then mathematically optimized, processed and stored as such. As a consequence, the system is more forgiving of poorly tuned SQL queries than others might be. XSPRADA reduces all SQL queries into its internal mathematical model, whose integrity is maintained at all times. Based on your current query and looking at past history, XSPRADA can determine a more optimal query path (if necessary) than the one being suggested and re-arrange the query accordingly.


Custom hardware solutions (SMP, grid or MMP based, share-nothing or share-all implementations)
At the moment, XSPRADA runs on multi-core multi-disk SMP commodity hardware. XSPRADA is a pure software play.


Software compression
XSPRADA does not implement any form of compression at the current time. Although it is of course possible to compressed volumes using NTFS (on Windows platforms), such mechanism is independent of and transparent to the XSPRADA software.

Now, there is more to this than just a few simple feature/functionality points undoubtedly. But the question beckons: what makes XSPRADA technology significantly different or better than others? The answer to that is called ALGEBRAIX, but that’s just a fancy term for some pretty impressive mathematics called extended set theory. In a subsequent post, we’ll explore how that magic works without boring the reader to death :)

No comments:

Post a Comment