Next Previous Contents

5. Basic CSA conventions

5.1 The CSA name space.

The rc shell, on which CSA is based, exports all variables and functions to the program environment, making them global to large portions of code (IMHO this is both an advantage and a nuissance, but I won't get into this discussion here). Because of this, we have the need to establish a precise naming convention for all names, also to make them as self-documenting as possible.

In the CSA name space, a few name prefixes are either reserved for CSA or they are used internally by the rc shell, by AWK or by other underlying utility programs. The reserved prefixes are:

csa, CSA

Prefixes reserved for CSA use. They SHOULD NOT be used by application programs to identify their own commands, functions and variable names.

fn_

rc function definitions.

In addition to reserved prefixes there are others that, although not strictly reserved, are simply typical, or conventional, and they are:

WWW_

Decoded HTTP GET/POST variables.

XML_

XML-encoded strings (PCDATA), that can be safely included in XML files and templates.

ISO_

Template variables that are to be inserted in HTML response pages and forms. Such variables are derived from their unencoded versions by replacing special characters with their ISO representations (i.e. the newline becomes 
, and so on). CGI response variables SHOULD always be encoded like this, also to prevent the so called cross-site scripting (CSS) vulnerabilities.

URI_

These are similar to ISO_ variables, but the escaping of special characters is done according to RFC 1378 and the resulting values are meant to be inserted in the QUERY_STRING part of Uniform Resource Indicator (URI) strings (i.e. the newline becomes %0A, and so on).

URP_

These are a special version of the URI_ variables, that are meant to be used in the PATH_INFO part of a URI. They are slightly different from their URI_ equivalents, to account for the decoding done by the CGI interface on the PATH_INFO part of a URI.

CNS_

Recommended prefix for what I dubbed "Custom Name Space" (CNS) variables. These are global application-level variables that application programmers may use for their own Inter-Program Communications (IPC) needs.

For an application name-space not to clash with the CSA one, just make sure that your own CSA applications always use mixed-case function and command names (i.e. "someFunction", "MyCommand", and that), and either lower- or mixed-case variable names. Never define names that begin with either "csa" or "CSA". In fact, casual all upper-case names (like "SOME_NAME") SHOULD be avoided altogether. If nevertheless you need them, you SHOULD prefix them with the string "CNS_" (i.e. "CNS_MYVAR"), if they are to be exported to the environment, or "CNS" (like "CNSMYVAR") otherwise. This stands for "Custom Name Space", and it is the prefix that CSA sets aside for use by application programs, as already explained.

In addition to the naming conventions that we have seen so far, there are others that concern the naming of files on disk. The following few filename extensions are typical in a CSA application:

file.data

A generic data file, usually a TAB-delimited table.

file.data,v

RCS/CVS repository of file.data

file.xref

If file.data is a table, this is the associated cross-reference file.

file.d

Clustered table directory.

file.data.lock

Advisory lock on file.data.

5.2 CSA environment variables.

CSA sets and uses a number of environment variables for its own purposes. All of these variables are available also to the application program, but not all of them may be freely modified by the latter, or things may break. This is the complete list of current environment variables. See the following paragraph for more detailed documentation on many of them.

In the explanations that that follow, each variable is tagged with a Scope attribute that defines whether it can be set/unset by the application program or it is reserved for CSA internal use. A "CSA" scope means that the variable can still be tested and used also by an application program, but it SHOULD be considered read-only. If the scope is "profile", then the value can still be set by an application program, but its most proper setting place is the application profile "$CSA_ROOT/csa.rc".

quote

The single-quote character. Default: "\047". Scope: CSA.

tab

The horizontal-tab character. Default: "\009". Scope: CSA.

nil

The empty rc value. Default: ''. Scope: CSA.

nl

The newline character. Default: "\010". Scope: CSA.

cr

The carriage-return character. Default: "\013". Scope: CSA.

CSA_ACCOUNT

Current CSA UNIX user name. Default: the value returned by whoami(1). Scope: CSA.

CSA_ALLOW_USER

UNIX account that the CSA application MUST run as. If the application finds itself to be running under a different account then it MUST "commit suicide", i.e. stop immediately with an error message. CGI programs are often executed by setuid wrappers and this security measure is meant to avod that, if the wrapper crashes, the application program runs as root, with unpredictable and usually dangerous effects on the integrity and security of the system. Default: "nobody". Scope: profile.

CSA_AUDIT

If set to "1", then any changes to files done by the csaCommit function will be subject to rcs(1) versioning. Do not set this variable if change-management is already done with something different/better, like CVS for example. Default: unset. Scope: profile.

CSA_AUTH_DOM

"domain" attribute of the User Session Cookie. Default: unset. Scope: profile.

CSA_AUTH_PATH

"path" attribute of the User Session Cookie. Default: unset. Scope: profile.

CSA_AUTH_SSL

"secure" attribute of the User Session Cookie. Default: unset. Scope: profile.

CSA_CGIBIN

URL, either relative or absolute, of the CGI program directory on the CSA Web server. Default: "/cgi-bin". Scope: profile.

CSA_CGIBIN2

Same as CSA_CGIBIN, but for SSL Web connections. CGI program directory on the CSA Web server. Default: "$CSA_CGIBIN". Scope: profile.

CSA_CMD_MD5

Name and arguments of the local md5sum(1) command, which is run through a library function, as its output format varies between different versions of UNIX/Linux. Default: "md5sum". Scope: profile.

CSA_CMD_PS

Name and arguments of the local ps(1) command, which syntax can vary Default: "ps h -u $CSA_ALLOW_USER". Scope: profile.

CSA_COMMIT

Reserved for CSA internal use. See mainlib.rc. Default: unset. Scope: CSA.

CSA_COOKIE_

Common prefix of the HTTP cookies sent to the Web client by CSA programs. See section HTTP Cookies, for more on this. Default: unset. Scope: all.

CSA_DEBUG

If set to "1", then the $CSA_ROOT/var/debug.log file will be created, containing the CSA function-call trace and a lot of other useful debugging information. Writing such file is quite expensive in terms of extra system resources, so it should be avoided when not strictly necessary, by setting CSA_DEBUG=0. Default: "0". Scope: profile.

CSA_DOCROOT

Document-root directory of either the Web Server or the Virtual Host. Default: $DOCUMENT_ROOT. Scope: profile.

CSA_DOCROOT

Same as CSA_DOCROOT, but for SSL Web connections. Default: $CSA_DOCROOT. Scope: profile.

CSA_GUIDLIST

List of Globally Unique IDs generated during the current run. Default: unset. Scope: CSA.

CSA_HOST

Host-name of the Web Server running the current CSA application instance. Default: the value returned by the "hostname -s" command. Scope: CSA.

CSA_TPLEXT

File-name extension of template files. Default: "html". Scope: profile.

CSA_ID

CSA application name. It MUST comprise only characters in the set "[A-Za-z0-9_-]", such as "foo", "test", "example", etc. Default: unset. Scope: profile.

CSA_INSTALL

CSA installation directory. Default: "/usr/local/csa". Scope: profile.

CSA_ISINDEX

Contents of the QUERY_STRING variable of an ISINDEX HTTP request. Given the current MIME-RPC CGI calling conventions, pure ISINDEX queries may no longer occur, as the "?" URL argument is already used to identify the target CSA program. Default: unset. Scope: CSA.

CSA_LANG

Local language code for messages and Web pages, according to the usual classification (en, en_US, it, en_UK, es, etc.). The selected language MUST correspond to message and template directories that actually exist on the server. Currently, CSA provides messages only in the "it" and "en_US" versions. Default: "en_US". Scope: profile.

CSA_LOCKS

List of lock-files (semaphores) created so far by the CSA program through the csaLock function. Default: unset. Scope: CSA.

CSA_MAXPROC

Max. no. of active processes allowed on the system. If this limit is exceeded then no new requests will be denied with an error message. This check is normally not active. Default: unlimited. Scope: profile.

CSA_MSG_GRP

Current message group name. Default: "CSA_SYSTEM". Scope: CSA.

CSA_MSG_NUM

Current message number. Default: "0000". Scope: CSA.

CSA_MSG_TEXT

Current message text. Default: unset. Scope: CSA.

CSA_ONCOMMIT

Extra actions to be performed by csaCommit. It MUST be a valid rc program fragment, properly escaped to make it eval-safe. It is up to the application program to make sure that special rc characters have been properly escaped in the fragment. Default: unset. Scope: all.

CSA_PGM

Name of the current CSA program, to be used in messages printed by csaPrintMsg. Default: "CSA". Scope: CSA.

CSA_PID

NFS-safe unique identificator of the current CSA program. Default: $CSA_HOST.$pid. Scope: CSA.

CSA_REQUEST

Current request URL. Default: $REQUEST_URI.$pid. Scope: CSA.

CSA_RESULT

Used by many CSA library functions to return the function result to the caller, mainly to sav a `{} subprocess. Default: unset. Scope: all.

CSA_RID

Current request-ID. Default: $CSA_HOST^_$pid. Scope: CSA.

CSA_ROOT

CSA application installation directory. Default: "/". Scope: profile.

CSA_RPC_CMD

Path to a temporary file containing the RPC request. Default: $TMPDIR/rpc$pid.tmp. Scope: CSA.

CSA_RPC_MAXSIZE

Max. size in bytes of GET/POST data. Default: 10000. Scope: CSA.

CSA_RPC_WWW

Path to a temporary file containing the CSA program call (GET/POST) variables, in rc syntax. Default: dinamically set. Scope: CSA.

CSA_STATUS

Generic CSA error flag, used by some of the library functions. Default: "0". Scope: all.

CSA_TESTMODE

Generic flag to tell programs that we are running in test mode. Whether to test this flag is up to the application programs. Default: "0". Scope: profile.

CSA_URL

Base URL of the current CSA server application. Default: "http://localhost". Scope: profile.

CSA_URL2

Same as CSA_URL, but for SSL Web connections. Default: "$CSA_URL". Scope: profile.

CSA_USER_ID

Web User ID, for authenticated sessions. Default: unset. Scope: CSA.

CSA_USER_PBC

User Path-Based Clustering index. This is the relative directory tree under which pieces of data belonging to the current Web user can be found in PBC structures. For instance, if the current user-ID is "goofy" then the associated CSA_USER_PBC value will be "g/o", as explained in section Path-Based Clustering. Default: unset. Scope: CSA.

CSA_VERSION

Current CSA version. Default: fixed. Scope: CSA.

CSA_WBMSG

Write-back message. This is a variable through which a sub-program, i.e. a called process, can request the parent rc program to print a CSA message through csaPrintMsg. This capability requires that the caller capture the called program output into a temporary file, which is then sourced with the "." shell operator. After sourcing the generated script file, the caller will then test the content of said variable and take the proper actions. Building a source-safe script is the responsibility of the called program. Default: unset. Scope: CSA.

CSA_WORKFILES

List of the work-files created by the CSA application during the current run. Such files will be removed on exit by the csaExit function. Default: unset. Scope: CSA.

Date and time values.

A number of CSA variables are used to hold date and time values in various formats. The values account DST as appropriate. These variables are set by storing the output of one single invocation of the following shell command:

 * =`{date -d now '+%Y %m %d %H %M %S %Z %a %b %s %z'}
 

The "now" argument of date(1) can be overridden by a new explicit call to the csaSetTime function. Here's the complete list of the date/time variables, with their settings:

  CSA_TIME_YEAR  = $1
  CSA_TIME_MONTH = $2
  CSA_TIME_DAY   = $3
  CSA_TIME_HOUR  = $4
  CSA_TIME_MIN   = $5
  CSA_TIME_SEC   = $6
  CSA_TIME_TZ    = $7
  CSA_TIME_STAMP = $1$2$3$4$5$6
  CSA_TIME_ISO   = $1$2$3^T$4:$5:$6
  CSA_TIME_ISO2  = $1-$2-$3' '$4:$5:$6
  CSA_TIME_ISO3  = $1-$2-$3^T$4:$5:$6$11
  CSA_TIME_ISO4  = $1-$2-$3^T$4:$5:$6
  CSA_TIME_DNAME = $8
  CSA_TIME_MNAME = $9
  CSA_TIME_UNIX  = $10
  CSA_TIME_LOG   = $CSA_TIME_ISO2.$CSA_TIME_TZ
  

5.3 CSA rc(1) functions.

To date, the CSA libraries provide the following functions. Providing documentation for all of them in this document is going to be a major effort. In the meantime please refer to the explanatory comments that are contained in the library files, and to the example programs.

Shell function overrides.

The rc shell, on which CSA is largely based, exports all names and function definitions to the program environment by default. Function definitions, in particular, may cause the environment to become really big and cluttered. To try and mitigate this problem, different CSA shell libraries often re-define previously defined shell functions. For instance, the csaExit.fault function is defined in mainlib.rc and cgilib.rc. In all cases the function serves the same purpose: exiting on errors. The way it accomplishes its job, however, may be different in the three cases. A command-line shell script will only load mainlib.rc, and the version of csaExit.fault contained in that library will simply exit non-zero (after doing some housekeeping) if called in that context. A CGI program, however, beside loading mainlib.rc will also load cgilib.rc. This second library will provide its own re-definition of csaExit.fault. The latter, if called by the CGI program, will do the housekeeping, send an error HTML page to the client and exit non-zero. In this way, by re-using the same function names for different context-specific code, I managed to:

  1. keep the program API simple and consistent across different contexts;
  2. keep the program environment small, as each re-definition replaces the previous one.

5.4 C-like stuff.

In the AWK libraries provided by CSA, I have occasionally tried and mimic C concepts. For instance, in csalib.awk there are functions like strdup(), ctime(), stat(), creat() and others, that try and behave somewhat like their C-library counterparts. The similarities are rough at best, so do not expect to use those functions exactly in the same way as you would do in a real C program, but the basics should be there. Beside C-like function names, I have also tried and use C-like error codes. To date, the following Linux-style symbolic error codes have been defined (see errno(3) and <errno.h> for more info):

      ENOENT        =   2
      EIO           =   5
      EACCES        =  13
      EISDIR        =  21
      EINVAL        =  22
      ENOMSG        =  42
      EMSGSIZE      =  90
 

Note that these error flags have rc(1) conterparts with the same name, prefixed by the string "CSA_" as mandated by CSA naming conventions for environment variables.

CSA AWK global variables.

Beside the few global AWK variables described above, here follows the complete list of global CSA names that are defined in the relevant AWK library functions. Please refer to the associated files for more info on them.

CSA AWK functions.

This is the list of the AWK functions that are currently provided by the relevant CSA libraries. As it was the case with the list of rc(1) functions, describing all of them in this document will be a major task. In the meantime please refer directly to the comments in the associated library files.


Next Previous Contents