Data Integrity

CSA provides primitives that can be used by an application program to try and preserve data integrity in most situations. Let's take the following code fragment:

 echo some data > /path/to/file1 || csaExit.fault 0009 /path/to/file1 
 echo more data > /path/to/file2 || csaExit.fault 0009 /path/to/file2 
 
 csaExit.ok 
 
That is, "some data" is written to "file1" first. If the operation fails the first error exit is taken, otherwise "more data" is written to "file2", and eventually the program completes successfully. But what if "file1" and "file2" contents are logically related ? That is, what if either the second operation completes succesfully, or also the first one should be rolled-back ? With the proposed scheme, if an error occurs when writing to "file2" we exit, but at that point "file1" has already been changed, and the relationship between the two file contents may be left in an inconsistent state.

A better approach would therefore be to bind the two actions together, in a way that both of them are always guaranteed to complete. Using the proper CSA functions, the previous example can be rewritten as:

 csaOpen /path/to/file1 || csaExit.fault 
 echo some data > $CSA_RESULT || csaExit.fault 0008 $CSA_RESULT 
 
 csaOpen /path/to/file2 || csaExit.fault 
 echo other data > $CSA_RESULT || csaExit.fault 0008 $CSA_RESULT 
 
 csaCommit || csaExit.fault 
 
 csaExit.ok 
 
As shown, the csaOpen function is called for each file that is to be written to. The function stores the path to a temporary workfile in the $CSA_RESULT variable. Each time csaOpen is called, it sets $CSA_RESULT to point at a different temporary file. Our program will then apply the changes to the temporary files returned by csaOpen, rather than modifying the actual targets.

When we are done with the changes, to make them permanent we need to call the csaCommit function. This function builds the sequence of shell-level commands that are necessary to copy the temporary workfiles onto the respective actual targets, and then it will run those commands. If, while those commands are being run, the program is killed with a trappable interrupt, the commit operations are registered to a special "csa-commit" file. The next time a CSA program of the same application (i.e. using the same $CSA_ROOT/csa.rc application profile) is run, it will be forced to run the "csa-commit" file, if any, thus causing whatever was left incomplete to complete. This scheme is likely not to be bullet-proof, and it may not be able to cope with every possible situation, but I found it to be adequate in most cases, as it effectively manages to bind together operations that would otherwise be unrelated, and prone to lead to inconsistencies in the data.

In any case, until csaCommit is called no changes actually occur to the real data. So, if the program exits on errors before calling csaCommit none of the changes previously scheduled will be applied, and all the temporary workfiles will be automatically removed on exit.


Trackbacks (0) | New trackback | Print