NoSQL operator: constraint

Experimental NoSQL Referential Integrity Schema processor

Usage: constraint [options] column-schema

Options:
    --input (-i) 'file'
      Read input from 'file' instead of STDIN.

    --output (-o) 'file'
      Write output to 'file' instead of STDOUT.

    --help (-h)
      Display this help text.

    --quoting-style (-Q) shell-type
      Specifies the quoting stile to use in the output shell script.
      Supported values are 'sh' and 'rc'. Default value is 'sh'.

    --prefix (-P) 'string'
      Prefix each column name in the column-schema table with 'string'.

    --path (-p) 'string'
      Consider the column-schema table file to be relative to 'string/'.

    --lock-command (-l) 'string'
      Specify the table locking command to be used in the resulting
      shell script. Default: 'lockfile -r0'

    --lock-only (-L)
      Output only the appropriate locking shell script fragment, and
      nothing else.

    --test-only (-T)
      Perform the needed constraint checks but do not output the
      relevant shell script fragments. Note that if if this option is
      specified, the program will output either 'true' or 'false',
      depending on whether the relevant tests succeeded or failed.

    --quiet (-q)
      Refrain from printing messages to STDERR.

    --lock-error-handler (-e)
      What statement to use for locking error handling in the resulting
      shell script. For example, the statement:

      constraint -e lockErrorHandler nosql.schema < nosql.input

      will produce the following output:

      lockfile -r0 .lock || lockErrorHandler

      Default is no locking error handler.

    --lock-suffix (-s)
      What to use for lockfile suffix. Default: '.lock'

    --lock-also (-a) 'string'
      What else to lock, if there are objects that need also to be locked
      but that cannot be desumed from the database schema. There can be
      as many '-a' options as needed.

    --quoting-style (-Q) 'type'
      What shell quoting style to use in the resulting script, where
      'type' can be either 'sh' or 'rc'. Default is 'sh'.

    --initial-tables (-I) 'string'
      This option is still undocumented, see the program source code
      for more info.


Notes:

This operator reads an input table from stdin, containing "Insert" and
"Delete" statements, and tests those statements against a database
"column-schema" table passed on the command line to assess whether they
violate any relational constraints established on the database. A shell
program fragment is then printed to STDOUT, with the NoSQL commands
needed to access the involved pieces of data and perform the actual
integrity tests. The calling program will then execute the resulting
script and take appropriate actions based on its return code.

If any violations are detected, then the resulting script will contain
just the "false" statement and appropriate explanatory messages will be
printed to STDERR.

The statements produced to access the involved data will depend also on
the content of the "Flags" column of the "column-schema" table.

Please refer to the following Web page for more information regarding
the NoSQL Database Schema:

http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/NoSQL/Database%20Schema

Please note that the operator currently only uses the database
column-schema, not the table-schema. This means that table names
referred to in the program input statemets, as well as those printed
to the resulting shell script, will be the symbolic table names (or
'aliases'). This requires that a table name must be the same as the
name of the actual file hosting the table. Any leading filesystem
path information can be provided to 'constraint' through the '--path'
command-line option. This limitation will eventually be removed when
'constraint' will make use also of the table-schema file, in some future
release of NoSQL.

A use-case of the 'constraint' operator follows, based on the sample
tables described at the above URL:

Example 1:

We want to insert a new row into the 'course_student' relation, having
Course="math-4" and Student="2". The relevant input to the 'constraint'
to tell it what we are going to do will have to be the following table:

   Column    Table           Value   Action
   -------   --------------  ------  ------
   Course    course_student  math-4  Insert
   Student   course_student  2       Insert


Assuming the above table is stored in a file called 'actions.table',
and with the database schema (described at the above URL) stored in
'school-db-schema.table', here is how we can get 'constraint' to do
its job:

   constraint school-db-schema.table < actions.table

and here is the shell script printed by the operator to STDOUT:

   lockfile -r0 course.lock course_student.lock student.lock
   true &&
   keysearch -t "2" student &&
   awktable -i course -- "BEGIN{_RC=1}\$Course==\"math-4\"{exit(_RC=0)}END{exit(_RC)}"


Let's see what that means, line-by-line:

The first line is the shell statement needed to lock the relevant tables,
in alphabetical order. To prevent deadlocks, all database operations done
by indipendent programs, and that require locking, must try lock the
involved tables always in the same order. Option '-e' of 'constraint' can
be used to specify what to do if locking fails, for instance:

   constraint -e my-lock-error-handler school-db-schema.table < actions.table

   lockfile -r0 course.lock course_student.lock student.lock || my-lock-error-handler
   ...

The second output line ('true') is a no-op, which is needed in some
circumstances (see 'constraint' source code for more info).

The third output line ('keysearch -t "2" student') is the command needed
to assess whether a record having primary key equal to '2' exists in the
'student' table. This is because the database schema contains the following
definitions:

   Column   Table           Flags  Parent   Insert     Delete
   -------  --------------  -----  -------  ---------  --------
   Student  course_student  k      student  Dependent
   Student  student         Kb                         Restrict


The above line having Insert="Dependent" tells 'constraint' that inserting
a new 'Student' value into the 'course_student' table requires that a
primary key with the same value exist in the 'student' table. Also, flag
'b' on the 'Student' field of the 'student' table tells 'constraint' that
binary search can be used to access the requested piece of data, hence
the row 'keysearch -t "2" student' produced by the operator.

Likewise, the same database schema contains also:

   Column   Table           Flags  Parent   Insert     Delete
   -------  --------------  -----  -------  ---------  --------
   Course   course_student  k      course   Dependent
   Course   course          K                          Restrict


This tells 'constraint' that inserting a new 'Course' value into the
'course_student' table also requires that a primary key with the same
value exist in the 'course' table. Since no access-method related flags
are set for that key field in the database schema, then linear search
is assumed and hence the line 'awktable -i ...' in the shell script
produced by the operator (note: to date 'constraint' only supports
linear and binary search; in the future more access methods may be
supported, see the above URL for the complete list).

By executing the shell script produced by 'constraint', the calling
program can perform the actions needed to fulfill the requested database
operation with much reduced risk of breaking the database integrity.

Beside printing the shell script just described, 'constraint' prints
informational messages to STDERR. In the scenario just described these
would be:

constraint: insert test triggered
constraint: insert test triggered

Since 'actions.table' contains two insert statements, two messages are
output.

Let's see an example of deleting a record when doing so without proper
care would corrupt the database integrity. This time our 'actions.table'
contains:

   Column    Table   Value   Action
   -------   ------  ------  ------
   Course    course  chem-1  Delete


That is, we are going to delete the record having Course="chem-1"
from the 'course' table. By virtue of the same schema table contents
already described further up, that contains Delete="Restrict" for the
'Course' field of the 'course' table, deleting a course requires that
also all subscriptions of students to that course are dropped from the
'course_student' table, or data integrity wil be compromised. And here
is the script produced by 'constraint' in this case:


   true &&
   ! awktable -i course_student -- "BEGIN{_RC=1}\$Course==\"chem-1\"{exit(_RC=0)}END{exit(_RC)}"


That is, when executed as a shell script the above fragment will exit
with a return code '0' only if the 'course_student' table does not
contain any rows having Course="chem-1". The only way for this to verify
is that the application has already deleted the involved records from
the 'course_student' table in some previous stage.

The optional 'Match' column in the database field schema table is used by
'constraint' to check that the data to be inserted matched a specified AWK
regular expression. Still referring to the example database described at
http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/NoSQL/Database%20Schema ,
if our 'actions.table' contains the following directions:

   Column    Table   Value   Action
   -------   ------  ------  ------
   Day       course  abc     Insert

then 'constraint' will print the following shell code to STDOUT:

   constraint: bad value 'abc' for table 'course', column 'Day'
   false

The trailing 'false' statement is meant tto trigger a non-zero exit in
the shell script that will parse and execute the above code fragment.

Preserving data integrity, both within a table and across different
tables, with NoSQL is an affair entirely left to the application
program. NoSQL, through the 'constraint' operator just described, only
provides a "what-if" facility that applications may want to use to
help them preserving data consistency. Such facility, however, is not
"prescriptive" but only "descriptive", in line with the basic NoSQL
philosophy which does not impose any mandatory restrictions on either
applications or command-line users. In a way, this can be paralleled
with the C programming language, whereby data-typing and other possible
constraints set by the compiler can be deliberately violated by the
application programmer if she devises the need for doing so. This is
the same philosophy that underlies also the classical UNIX programming
paradigm, whereby preventing the user from doing silly things will also
prevent her from doing clever things.
Back