In my previous post below I received a comment (question) from Swany about inserts in the XSPRADA database engine RDM/x. Specifically, he (or she) asked:
“What happens if there is an error during the SELECT .. INTO? Are such inserts ACID or will I get partial data in a table if the system crashes?”
This is of course an excellent question, and I thought it was worth addressing on a wider level beyond just incremental loads. To place this in context, recall that ACID properties are defined as a set of rules pertaining to transactional database management systems defined as Atomicity, Consistency, Isolation and Durability. If a transactional database does not meet these conditions, it is not considered “reliable”. I won’t bore the reader with yet another ACID definition. Suffice to hit Wikipedia for a reasonable description.
Now the first question of course is whether analytical database engines supporting OLAP style work can or should meet the same criteria as classical transactional OLTP systems. By definition, analytical systems are biased for read access and updates are supposed to be rare, but inserts certainly occur as incremental loads are performed on warehouses (and data marts) at various intervals (from hours to weeks typically). In either case, an analytical engine clearly needs to handle data changes in an ACID way or data loss and corruption can occur. Similarly, data value and integrity need to be protected (locked) from concurrent (possibly conflicting) access patterns. Internal database structures are vulnerable to corruption during these transactions. So how does XSPRADA technology handle these issues?
The answer, not surprisingly, lies in XSPRADA’s “magic sauce”, namely, the mathematics of Extended Set Processing (XSP). To appreciate this, one needs to understand that all entities inside the XSPRADA mathematical model (ie: tables, rows, fields etc.) are represented as extended sets. And all extended sets by definition are immutable. This means updates to the system are implemented by creating additional extended sets, and the original ones are never mutated or deleted by subsequent processing. This ensures that data sets in the system are never corrupted or worse, deleted by mistake.
Internally, the XSPRADA data model is fully contained in a “universe” of extended sets. This is the set of all sets. Sets in this universe are related to each other via algebraic relations (hence the “relational” part of Relational Data Miner or RDM/x). Depending on the state of the system at a specific point in time, sets are either “realized” (materialized) to disk, or “virtual”, meaning they have an internal mathematical representation defined by algebraic expressions involving other sets, but no physical existence. (This has repercussions concerning “materialized views” which I’ll attempt to discuss in a future post).
So when sets are modified by internal operations, both new and old sets remain in existence within the universe. This means updates and inserts never actually change information, but only add to it. To complete the transaction, RDM updates the universe metadata to include knowledge of the newly created sets (if any) along with the algebraic relations linking them to their original brethren. Genesis is maintained. This is crucial because, unlike in conventional DBMS systems, the original data never needs to be re-generated (or created) to achieve rollback. The universe is only updated once all operations have completed successfully. If an error occurs, no harm no foul, as the previous state of the universe was maintained and still exists. In essence, the INSERT, UPDATE, DELETE functionality of the XSPRADA database is merely a logical emulation of conventional DBMS DML. Each of these statements internally results in an additive activity. In fact, UPDATE is internally implemented as DELETE+INSERT. So to recover from a failed set of operations (a transaction gone south) the system simply deletes any incomplete sets and does not update the universe metadata! This mechanism enforces atomicity and consistency natively without any need for additional programming or complexity.
On the isolation front, lists of in-process and pending operations are maintained in dynamic pipelines for each extended set. The system examines these pipelines and algebraically identifies any potential conflicts between operands. Again, the mathematics allows this to occur natively. So if the results of pending operations do not affect in-process operations, the system executes them concurrently and immediately. Conversely, if the mathematics identify a potential conflict or deadlock, pending operations are queued until conflicting running operations have completed.
Durability is the last remaining condition. The system maintains all realized sets in persistent storage (disk drives). Although sets or parts thereof can be (and often are) kept in cache, any complete set also has an image on disk. This prevents system failures from affecting the durability of realized sets. When the system restarts, it also restarts any operations that were executing at the time of failure.
For all these reasons, XSPRADA technology is actually superior to conventional database mechanisms for enforcing ACID, as the enforcement is inherently “built-in” via the mathematics underlying the system at all times. As I mentioned in the last post, the engine is also time-invariant, meaning it can always be queried at a given time point in the past. The ability to do this without any external programming or internal modeling is significant. One use case that immediately comes to mind (to me anyway) in wake of the recent Wall Street disasters is being able to ask a financial database to yield answers as if it were being queried months or years ago. Imagine being able to roll back time to analyze or audit results that supported past decisions and the people who signed off on them. What a concept!
If you'd like to take the XSPRADA database out for a spin (and pull the plug in the middle of using it just to see what happens
No comments:
Post a Comment