NoSQL's most prominent feature is that its native query language is the UNIX shell. This is a major advantage over the vast majority of other database systems, but it can also be a problem. The shell is an extremely powerful interface to the underlying operating system, sometimes too powerful for the naive database user. Furthermore, not everyone may be granted direct shell access to a NoSQL system, especially on potentially insecure public networks, such as the Internet. For both ease of use and better security, most database-enabled applications that are accessible with an HTML Web browser only offer pre-built query forms, and free-form queries against the database are generally not allowed.
Sometimes, though, free-form queries may be useful, for instance when the database can be queried by a remote programmatic client, like what happens in traditional SQL/ODBC setups. Occasionally one may want to set up a Web form where a user can enter free-form queries, usually in the form of SQL statements. Of course, accepting native shell statements coming from either a remote program or a Web user would pose serious security issues, as that would be equivalent to granting the remote client entity direct shell access to the server system.
In other words, I have started to feel the need for an intermediate query language, something that can be used to de-couple a NoSQL query from the underlying shell interpreter. A language that is much less powerful than the shell, but simpler to learn and still powerful enough to allow arbitrary queries against a NoSQL database. And a language that is still close enough to the original Operator-Stream Paradigm?. I have dubbed such language NoSQL Brokering Language (NBL), and although it is still in its early stage of development and may undergo extensive modifications in the future, it can already be used for simple real-world applications. In its present form, NBL can only be used for querying the data and cannot be used for editing or changing it in any way.
NBL requires that a Database Schema table is provided by the database administrator, containing certain pieces of meta-data describing the database that is being queried. Not all tables need to be listed in the schema, but only those ones that are to be made available for querying through NBL. NoSQL schemas are currently used only with NBL, but they may be extended in the future to contain additional meta-data to be used also with other NoSQL functions. To date NBL only supports table primary indexes, while hashed secondary-indexes are not yet supported (see Big tables).
To show how NBL works I have prepared a sample database, with an associated Database Schema and an example NBL query. The example database has been freely adapted from the book " UNIX Relational Database Management", and is attached here for you to download and play with. The schema
file contained in the tarball is the aforementioned Database Schema table, and here's its content as displayed by the justify
operator:
Table Path Edit -------------- ---------------- ---- course ./course course_student ./course.student student ./student
Before explaining the details, to give you a taste of how NBL looks like, below is a batch of NBL statements that can be used to query the aforementioned sample database, based on the above schema. In the example tarball these statements are contained in file query.nbl
:
use course course_student student read course expression $Course == "chem-1" join - course_student columns Student Course order-by - join - student columns Course First Last Phone format
Such NBL batch, when run through the NoSQL nblparser
operator, produces the following shell program:
#!/bin/sh set -e lockfile -r6 -l40 -s8 course.lock course_student.lock student.lock trap 'rm -f course.lock course_student.lock student.lock' 0 trap 'exit 2' 1 2 3 15 cat ./course | awktable -H -- "\$Course == \"chem-1\"{print}" | jointable - ./course.student | column Student Course | sorttable | jointable - ./student | column Course Last First Phone | justify
and when such script is piped into sh(1)
the relevant NoSQL query is executed. In other words, by typing nblparser schema < query.nbl | sh
at the command-line and pressing ENTER, we get the following result on the screen:
Course Last First Phone ------ ---------- ------ ------- chem-1 Jefferson Thomas 9876 chem-1 Washington George 87654 chem-1 Reagan Ronald 7654210
Let's see in greater detail the NBL statements used in the query.nbl
file of the above example.
use
statement will set a lock over the involved tables to prevent others from changing them while they are being queried. The use
clause, if used, must appare only once and it must occur before the first read
clause (see below). Note that use
should be used also when we are querying a single table, as it may be that such table has an edit buffer associated with it (see Big tables), and thus the query will involve multiple files, that is the main table and the edit buffer, and it will therefore be non-atomical by definition.tablename
is a Relation
value in the Database Schema, becomes the current input table (i.e. the active stream). Every new read
statement closes the previous stream and activates a new one based on the specified table. Most other NBL statements require that a stream has been activated by a previous read
statement, and will complain otherwise. If keyvalue
is specified, then only those records which leftmost field matches keyvalue
are taken into consideration, otherwise the whole table will be used. If keyvalue
begins with a @
it is considered to be a partial match on the beginning of the table key field, otherwise it defaults to an exact match. It is recommended that keyvalue
be specified whenever possible, to greatly improve performance, especially with large input tables. The @
character is called NBL modifier and is used also by other NBL statements, as explained below. The precise meaning of the NBL modifier is context-dependent.nblparser
is called with the --unsafe
option, expression is scanned for insecure AWK instructions before it is passed to awktable.table1
and table2
on a common field. Either table1
or table2
(but not both) can be specified as -
, meaning the active stream. That is, join - othertable
uses the active stream as the primary join table, while join othertable -
uses it as the secondary join table. The othertable
argument must correspond to a valid Database Schema entry. If column
is not specified (or if it is specified as -
), join
will try to infer the key column name(s) from the Database Schema Path
value; if that fails, the join column will default to the leftmost field in both tables. If column
is specified then the join is done on that field(s), which must be present in both tables although possibly in different column positions. Both the active stream and othertable
are always expected to be already sorted on the join column(s). Remember that if either table is not correctly sorted on the join column(s) the result of the join operation will be unpredictable. The optional @
modifier at the beginning of the join
command itself tells NBL to perform an outer join rather than the default inner join. If only one table is specified, then it must be different from -
and it will be used as the secondary table in the join. If the second table is not specified, not even as -
, then the join column(s) may not be specified either.@
modifier makes the operator revert its behaviour, i.e. it will pick from the active stream all columns except those listed. See the NoSQL getcolumn operator for more.flags
fields. If column
is not specified, or if it is specified as -
, then the active stream is sorted on whole records. The optional @
modifier tells NBL to sort in reverse order. See the NoSQL sorttable operator for more.@
modifier formats the result in list format as opposed to the default table format. See the NoSQL operators prtable and tabletolist for more. Note that NBL programs other than those that are run at the command-line will not normally rely on format
to produce a formatted output, but they will rather apply their own filtering to convert the resulting table to HTML or to other formats, depending on their specific needs. The format
statement causes the active stream to be closed, so if the NBL program has to continue a new read
will be necessary to set a new active stream.
Note that with all NBL commands that take a table name as an argument, the latter can be specified as $table
. The leading $
sign means that the associated name is a variable name. That is, table
will not be sought for in the Database Schema but will be expected to be a file-path that was set previously by the NBL program itself (see the remember-as
NBL command below for more).
NBL provides also other commands beside the ones contained in the sample query.nbl
file that has been explained above. Such other commands are listed below.
name=value
statement in the resulting shell script. Both name
and value
are very restrictive regarding the characters that they may contain. The name
part must abide by the shell variable naming rules, while the value
part must be a simple string in the set -_.,=:+A-Za-z0-9_
, with some additional restrictions to avoid directory traversal exposures. Any forbidden characters will cause nblparser
to exit with an error. The main purpose of the setvar
statement is to resolove incomplete Path
entries in the Database Schema. That is a schema table may contain entries such as:Table Path Edit -------------- -------------- ---- tablename ./$[year]/file
That is, the actual path to a table in the file-system may depend on a variable token, year
in this case. Through the setvar
command, NBL provides a way for the NBL user to supply such variable token, for instance with setvar year 2003
. If value
is not provided then name
will be unset.
commands
to the underlying shell unchanged, with the active stream on STDIN. For obviouos security reasons system
is disabled by default, unless nblparser
is called with the --unsafe
option.@
modifier at the beginning of the totals
command tells NBL to assume that the specified columns contain currency values.labels
directive will be applied only at the end of the query, regardless of where it appares in the sequence of NBL statements. Furthermore, it is applied only once, i.e. if multiple labels
statements are present, only the last one will be effective. This is done to avoid the risk of accidentally exposing hidden columns on output.nblparser
only understands a very limited subset of the LDAP query syntax, although such subset has proven to be enough for many practical situations.name
to the active stream and closes it. A new read
against a (possibly) different table will then be necessary if the NBL program is to continue. From this point on, it will be possible for subsequent NBL commands to refer to name
as $name
, as explained further up in this page. For example:# set "course" as the active stream. read course expression $Course == "chem-1" remember-as goofy
# set $goofy as the active stream. read $goofy format
Of course this particular example may not look very useful, and in fact it isn't, but there are cases where the possibility of saving the query that has been done up to a certain point, and re-using it at a later stage in the program, may indeed come handy.
order-by
but the output is made unique on the specified column(s). See order-by
for more.