Database Schema

A Database Schema is a set of two files containing certain pieces of meta-information regarding the tables that comprise a NoSQL database. Such two files can take any names, but I will refer to them as the file schema and the field schema. Such schema files are just like any other NoSQL table, and can be manipulated with all the standard NoSQL commands and operators. In the explanations that follow I will make use of an hypothetical "school management" database. Our school has courses, and each course is followed by a certain number of students. We will therefore have three tables in the database:

  Course  Credit  Room   Day         Time   Teacher
  ------  ------  -----  ----------  -----  -------
  chem-1       2  HA-18  2003-10-12  14:00        1
  cs-101       3  DB-1   2003-10-11  10:00        2
  econ-1       1  RB-8   2003-10-10  09:00        2
  his-10       3  HA-18  2003-11-12  15:00        3
  Student  First    Last        Year    Phone
  -------  -------  ----------  ----  -------
        1  Thomas   Jefferson      1     9876
        2  George   Washington     1    87654
        3  Ronald   Reagan         1  7654210
        4  William  Clinton        1  6542109
  Course  Student
  ------  -------
  chem-1        1
  chem-1        2
  chem-1        3
  cs-101        2
  cs-101        4
  econ-1        1
  econ-1        2
  econ-1        3
  his-10        1

And here's what a file schema table for our "school management" database will look like, as displayed by the NoSQL justify command:

  Table           Path              Edit
  --------------  ----------------  ----------------------
  course          ./course
  course_student  ./course.student  ./course.student-edits
  student         ./student

Such a schema shows a typical many-to-many database relation (one student can take many courses and one course involves many students). Here's what the table columns mean:

Table
this is a symbolic name that is given to the associated table, and it is often the same name that is given to that particular database entity when the database is designed. The Table field must be unique across the whole database.
Path
this is the actual path -- either relative or absolute -- in the file-system, leading to the physical file that corresponds to the associated Table entry. In the sample schema above, relative paths are used, and to make clear that they are paths they were prepended by ./, which is optional for those files that are located in the current working directory, like in this particular example.
Edit
path -- either relative or absolute -- of the associated edit buffer, if any. See Big tables for more information.

The second Database Schema file is the field schema. It contains the list of all column names that comprise a database, associated with the table names they belong to. The table names are those listed in the file schema previously explained. Here is the field schema table of our sample school management database:

  Column   Table           Flags  Match         Parent  Insert     Delete
  -------  --------------  -----  ------------  ------  ---------  --------
  Course   course          K                                       Restrict
  Course   course_student  k                    course  Dependent
  Credit   course
  Day      course          h      ^[1-3][0-9]$
  First    student
  Last     student
  Phone    student
  Room     course
  Student  course_student  k                    student Dependent
  Student  student         Kb                                      Restrict
  Teacher  course
  Time     course          h      ^[0-9:]+$
  Year     student

As you can see, fields that occur in multiple tables are listed multiple times, one time per table, and thus the fieldlist table key is the concatenation of Column and Table values.

The Database Schema files can contain additional, application-specific columns beside the basic ones explained above (i.e. to add annotations or anything else); such columns will not be read by NoSQL schema-related operators but their presence will be tolerated.

The Flags field in the fieldlist table provides further properties concerning each field, namely:

h
hidden field. Defining a field as hidden means that that field must be dropped from the result of NoSQL Brokering Language queries. The input tables may contain sensitive information in some of their fields, like passwords or other data that we do not want to expose. In the above example the fields Time and Day are flagged as hidden, meaning that we do not want them to appare on output.
k
partial key field. The full table key is made by the concatenation of multiple (usually two) such fields.
K
primary key field.
N
the field may not be null. This is implied for key fields, both primary and partial.
s
linear search on first (leftmost) column.
S
linear search by column name.
b
binary search on first (leftmost) column (the table is assumed to be sorted on such field).
i
indexed search on single-column index.
1-9
indexed search on multiple-column indexes; all fields having the same number are assumed to be part of the same multiple-column index file (e.g. table._x.Column1.Column2...). At most 9 multicolumn index files are supported for each table, but each index can be on as many columns as desired.

Each Flags field may contain multiple flags, such as kh, Kh, etc.

The Match field in the fieldlist table provides a crude method of datatype enforcement. The field is optional and may contain any valid AWK regular expression that the content of the associated table column must match to be considered valid. If any flags in the Flags column imply any form of data-typing they take precedence over the Match field. For instance, if one of the flags is N then NoSQL operators that read the database schema table MUST check for the field not to be null first, and then apply the pattern-matching specified by Match, if any. In case of conflicts between Flags and Match the former MUST take over and the second MUST be ignored by the relevant operators. Likewise, if N is not listed in Flags the field content MUST be considered valid if either it is null or it matches the specified pattern.

The Parent field indicates that the column is a foreign-key of the specified table.

The Insert field, which also applies only to foreign-keys, specifies whether the presence of the associated primary key is mandatory or not when a new value is inserted into the column being described. If the field contains the keyword "Dependent" then the associated primary key MUST exist, or an insert declaation will trigger an exception in the 'constraint' operator.

The Delete field, which applies only to primary keys, tells what to do if the primary key being described is deleted. If the field contains the keyword "Restrict" then the primary key may not be deleted until there are foreign keys referring back to it from other tables.

The Parent, Insert and Delete schema fields are used by the 'constraint' NoSQL operator. See contraint's help text for more information.

A copy of the sample database used in the above examples can be downloaded from here.


Trackbacks (2) | New trackback | Comments (0) | Print