Monday, February 26, 2007

Cheetah spot by spot: LAST COMMITTED READ

This is the first of a series of articles about the announced features of IDS v11.10, code named Cheetah and currently available as Beta release. I'll try to show the features in random order. Being the first doesn't mean it's more or less important than other features.

LAST COMMITTED READ is a new isolation level that is available in Cheetah. As you probably know, the isolation levels define how "isolated" one session's transactions are from other sessions. If you're not familiar with the subject I suggest you search online. Wikipedia has a nice article on this.

If you ever did any programming in Informix databases you surely understand the subject and you know that the two most used isolation levels are COMMITTED READ (the default for logging databases) and DIRTY READ. In short terms their definitions are:

  • COMMITTED READ
    In this isolation level, your session will "honour" other session's locks. The session's behaviour when it finds a locked resource (table row, index key) depends on the LOCK MODE you selected. If using the default (LOCK MODE NOT WAIT), you'll get an error immediately. If using a time wait (LOCK MODE TO WAIT [x seconds]), it will wait the maximum number of seconds for the lock release. If the lock remains after that timeout you'll get the same error as with lock mode NOT WAIT.
    This is equivalent to the ANSI COMMITTED READ isolation level
  • DIRTY READ
    In this mode, your session won't "see" any locks. It will not give you errors or wait. It will read whatever value is available at the moment. It is called "dirty" because this means you may get values that aren't and possibly will never be committed in the database.
    This is equivalent to the ANSI UNCOMMITTED READ
Most of the time you have probably used COMMITTED READ and we usually try to minimize the time we're holding locks. This way we minimize the number of lock waits and lock related errors. Sometimes, depending on the kind of data and what we're doing with it we can use DIRTY READ.

Note also that this lock waiting and/or lock ignoring happens when we're SELECTing data. Obviously, if you want to change the data, even in DIRTY READ you'll have to wait for any lock holder to finish his transaction and unlock the resource.

So, what does Cheetah bring into this issue? A new isolation level called "COMMITTED READ LAST COMMITTED".
This isolation level is similar to COMMITTED READ in the sense that it only returns committed data, but with the great advantage that your reads will never block. If your session hits a locked resource, the IDS engine will retrieve the value that was there before the current lock holder session made any change. That's where the name "LAST COMMITTED" came...

This information is always available because if the lock holding transaction tries to ROLLBACK WORK, the original value will have to be restored.

Some of you with knowledge from other RDBMS like Oracle or Postgres may be wondering what's new about this... Well, historically there have been two types of RDBMS: The versioning (each transaction gets a "version" number that marks all the data it changes) and the non-versioning. The versioning RDBMSs have always this sort of isolation levels because each transaction sees the "current image" of the Database. But the non-versioning RDBMS usually didn't implement this. They have more light-weight reading primitives that read the data directly (not the before images marked with earlier versions stamps) and this causes the locking conflicts.
Also, some versioning RDBMS don't support the ANSI UNCOMMITTED READ, because the way they read doesn't allow it.

So, it's has been a kind of trade-off. But currently, most of the non-versioning databases implement this kind of COMMITTED isolation without blocking the readers.

Ok... But how do we use it?!

Well, the obvious answer is by issuing the "SET ISOLATION TO COMMITTED READ LAST COMMITTED". After this statement your session will be working in this new mode and it won't get blocked by any row or index locks while reading.

But this would force you to change your existing application code... and in some cases you possibly don't have access to that. So what can be done? Ah!... The nice people of R&D did think about this and you have three solutions:

  1. There is a new ONCONFIG parameter called USELASTCOMMITTED
    You can set it to several values and depending of these values, the database server will automatically "promote" the specified isolation level to the new mode. The values are:
    1. 'NONE'
      This is the same as clearing it. No isolation level will be promoted. This is also the default value
    2. 'COMMITTED READ'
      The transactions with traditional COMMITTED READ will work in the new isolation level
    3. 'DIRTY READ'
      The transactions with traditional DIRTY READ will work in the new isolation level
    4. 'ALL'
      The transactions with traditional DIRTY READ and the transactions with traditional COMMITTED READ will work in the new isolation level

  2. You can also use USELASTCOMMITTED as a client side environment variable. The syntax is the same

  3. If you want this promotion to happen only to specific users, you can use the new feature that allows you to force the execution of a stored procedure (sysdbopen) when a user connects to a database. Inside this procedure you could use the SET ISOLATION.... statement
Some limitations or considerations:
  • This feature requires that the tables use the ROW level locking. Tables with PAGE level locking will inhibit the feature in that transaction.
  • When you established this new isolation level for a session, if it makes a distributed query to a remote database server that don't support this feature (i.e.: any pre v11.10) it will fallback to COMMITTED READ
  • If the table is a RAW (non logged) table the feature will not work
  • If the table has some user-defined columns and these don't support the new mode, the feature will not work

Now, let's see an example. To test this you'll need two sessions. We'll use the customer table from stores_demo database. Please make sure it's defined to use row level locking.

Session one:
  1. connect to the stores_demo
  2. show the customer data (fname, lname) for customer number 101
  3. upate this data to "John Doe" inside a transaction and maintain the transaction open



cheetah@PacMan.domus.online.pt:informix-> dbaccess stores_demo -

Database selected.

> select fname, lname from customer where customer_num = 101;


fname lname

Ludwig Pauli

1 row(s) retrieved.

> begin work; update customer set fname='John', lname='Doe' where customer_num=101;

Started transaction.


1 row(s) updated.

>





Now, in session 2:
  1. Set the isolation to committed read (default) and select the data: An error is returned
  2. Set the isolation to dirty read: the "current" data, as changed by session 1
  3. Set the isolation to the new mode: we get the original (pre session 1 update) data



cheetah@PacMan.domus.online.pt:informix-> dbaccess stores_demo -

Database selected.
> set isolation to committed read; select fname, lname from customer where customer_num = 101;
Isolation level set.


fname lname


244: Could not do a physical-order read to fetch next row.

107: ISAM error: record is locked.
Error in line 1
Near character position 90
> set isolation to dirty read; select fname, lname from customer where customer_num = 101;

Isolation level set.



fname lname

John Doe

1 row(s) retrieved.

> set isolation to committed read last committed; select fname, lname from customer where customer_num = 101;

Isolation level set.



fname lname

Ludwig Pauli

1 row(s) retrieved.

>


One final note: Recent versions of IDS have an ONCONFIG parameter (DEF_TABLE_LOCKMODE) where you can configure the default lock mode for tables created without the "LOCK MODE" clause. The current Cheetah Beta drop apparently doesn't support this. So, if you're counting on this to make your tests it won't work.

You can see aditional info about this subject at:

Tuesday, February 13, 2007

Cheetah in the wild!

No chat talk:


The Open Beta of Cheetah was officially announced. You can get in on:
http://www.ibm.com/software/data/informix/ids/new

IBM create a public web support forum at: Cheetah open Beta support
and there is a new infocenter at: Cheetah InfoCenter
As promised, IBM delivered a public open Beta product. This will give the chance to everyone of us to test it, view and work with the new incredible features that R&D have put in this release.

The feature specification is really impressive. Some of them are by far the deepest changes in IDS I've seen since I started working with it in 1998. It doesn't matter what is your role (DBA, Developer...). You'll find some features specifically directed to some of your major pain points.

A quick list of the features:

  • Backup and Restore to Directories with ontape
    Adds flexibility specially if used with the next feature
  • Continuous Logical Log Restore
    Besides HDR, you have another option to disaster recovery or to get a quick copy of your instance. The second instance will be on rollforward until you need it
  • Improved Parallelism during Backup and Restore
    OnBar got a lot smarter. It's able to make faster backups and specially restores. Full system backups can be done in parallel now (as it should have been from the start if I may say it... :) )
  • Named Parameters in a Callable Statement
  • Indexable Binary Data Types
    Well... not really new. We've got it in 10.00.xC6. An evidence on how easy IDS features travel through product families
  • Encrypted Communications for HDR
    Enterprise Replication (ER) already had it. Now you can use encryption on High availability Data Replication (HDR). It's not a must have in LAN, but very useful if you replicate across public or semi-public networks.
  • RTO Policy to Manage Server Restart
    Killer... IDS can tune itself to assure a user defined fast recovery interval. You specify that want the server up again in 2 minutes... IDS will manage checkpoint intervals to assure this. It can also auto tune LRU cleaning and AIO VPs. The RTO is a big step in self in direction to a self managed system. Of course you may wonder why we need it, since it's normally up and running without issues? Well, IDS tends to have greater uptimes than the hardware... at least in my experience.
  • Non-blocking Checkpoints
    Wow... Serial killer! Got your system tuned for everyday load and have seen it 'hang' in a very long checkpoint after some special maintenance (index, table moving....)? No more! Checkpoint locks will be a small (if ever noticeable) fraction of a second. LRU flushing will be done concurrently with normal transaction processing.
    It's easy to imagine that this kind of feature takes very serious thinking, working and testing. But R&D did it. Besides this you'll get a lot of very useful info about checkpoints with a new onstat -g ckp option...
  • Trigger Enhancements
    One trigger for each type operation is not enough? Now you can have several triggers on the same table for each type of operation (Insert, Delete, Update... )
  • Truncate Replicated Tables
    You can now use truncate on tables involved in ER
  • Derived tables in the FROM Clause of Queries
    Ty
  • Tired of writing "SELECT .... FROM TABLE (....)"?. Ok... Then use the general "SELECT ... FROM (SELECT ...). A usability feature that will facilitate application porting from other RDBMS
  • Index Self-Join Query Plans
    Not new... Reviewed in the last article. Another example of code portability between product families
  • Optimizer Directives in ANSI-Compliant Joined Queries
  • Optimistic concurrency
    Serial killer part II... You don't like dirty reads (who likes?!) and your application keeps waiting on locks because of bad programming or due to high concurrency? No problem! Just set an ONCONFIG parameter, or an environment variable and either DIRTY READ or COMMITTED READ or both (or none) will work in the new mode: LAST COMMITTED READ. No more lock waiting. You'll get the image before commit. No need to change application code
  • Enhanced Data Types Support in Cross-Server Distributed Queries
    Some improvements in distributed queries. More data types supported
  • Improved Statistics Maintenance
    If like me, you're mad for having to scan a table to create statistics for some columns where you've just created an index (which makes a scan of the data and a sort...), than you can rest. Now, an index creation will automatically calculate column distributions and index and table data.
  • Deployment Wizard
    New features. Much more control about want you want installed. Your installations can become smaller (not that they were big...)
  • Performance Improvements for Enterprise Replication like trigger firing during synchronization (already in 10.00.xC6) and dynamically change ER configuration parameters and environment variables
  • Schedule Administrative Tasks
    Database cron. You can schedule jobs inside the database
  • Query Drill-Down to Analyze Recent SQL Statements
    Tracing SQL. Choose who you want to trace, how much etc. Turn it on, off and change it dynamically
  • SQL Administration API
    Ever wanted to add a chunk through dbaccess? If you're like me, probably not, BUT... this will allow every maintenance to be done through client side tools. I think this will fuel a lot of innovation around IDS management environments.
  • LBAC
    Label Based Access Control. Just like in DB2. This is powerful and clean. But my first impression is that it will require lots of thinking before use. You have to structure you access policies. The tooling is there, but has to be correctly used.
  • XML
    XML publishing, XML extraction... The type of data centric XML use that we need. DB2 on the other side (Viper release, v9) has an XML engine inside it. The perfect choice for document centric applications.
  • Node datablade
    Indexable hierarchical data. It was available, but now is part of the product and fully supported
  • sysdbopen/sysdbclose
    Ah!!! It was rumored to appear in 10.00.xC1, but never did it. Now it's here. Do you want to change lock mode or isolation level for blackbox applications? No problem. Create a sysdbopen procedure and the user(s) will run whatever you put in there right after connect and before anything else that the application sends.
So... I've already tested some of this features and I got really impressed. Now it's possible for everybody to test them. Bare in mind a few thoughts:
  1. This is a Beta product. It's not supposed to be perfect. There are issues and a few features aren't already there. (stay tuned for more Beta drops)
  2. Don't use it in production. The product should be able to migrate a supported version to the 11.10 release. But it's not guaranteed that you can migrate between Beta drops. Use it for test only purposes and to inspect the new features.
  3. The product has a time limitation... Again don't use it for serious work. Use it for serious testing
  4. Feedback, feedback... IBM is surely making a big effort to push this. It's the first time I recall seeing an Informix product in open beta testing. The R&D and support personal are very motivated by this effort. If you can spare a few moments, please report any issues you find, give suggestions and ask questions. Your response will be the first reward these people will feel, and believe me... They deserve it.

A final non technical note:

Ever since the acquisition in 2001, we've been listening FUD from competitors saying that IBM would kill Informix. It wouldn't make sense to keep two competitive RDBMS systems. According to this FUD customer should move. Of course, people launching the FUD were expecting customers to migrate in their directions. Now... What do we all know about FUD? We all know it stands for Fear, Uncertainty and Doubt. It doesn't stand on Facts, it's Unfounded and we must Doubt it. Let's recall the facts:
  1. Since 2001, IBM released 3 major IDS releases:
    1. 9.40 with major changes in scalability (Big chunks), usability (manageabilty) and security (PAM)
    2. 10.0 with major code changes (configurable page size, table level restore, ER enhancements, default roles, better partitioning....)
    3. 11.1 with all of the above
  2. IBM kept support for IDS 7.31. In fact 7.31 was enhanced with the new btree cleaner and other minor features. Some intermediate release levels were discontinued (9.21, 9.30)
  3. Meanwhile, since 2001 an RDBMS vendor name O.... has discontinued two products (every release levels): version 8 and version 9 (this year, more or less at the same time Cheetah will be released) forcing it's customers to migrate. And let me remind you how easy is to migrate IDS...
So, thinking as a customer, whom do you think will protect your investment? If there was still any doubt about the IBM commitment to Informix, Cheetah should be enough to clear it.
But as we all know, FUD will always be around. Hopefully we'll all be too busy playing with this beloved "cat", exploring new ways to bring value to our organizations and customers to pay any attention to it.

Regards.

Saturday, February 03, 2007

1st safari tour on cheetah territory: FULL

Accordingly to Informix-Zone, the first public customer workshop to show the upcoming IDS release is fully booked.

Our German friends will have the opportunity to see one of Cheetah's first appearance on 15, February in Munich. Munich has been for a long time a technical center of expertize in Informix technology, so it's not surprisingly to see this happen there.

So, this city, well known for it's beer festival will also have the privilege to be one of the first to see our beloved animal outside of it's habitat (that being the IBM labs).

I really hope they'll enjoy it and that it fulfils our expectations. I would love to get my hands on this feline, but I'll have to wait...