This chapter assumes that the reader be already familiar with the underlying topics of HTTP, GET/POST processing and form variables.
As it is often the case with other Web programming environments, CSA makes usually no real distinction between a GET and a POST operation. It is therefore perfectly ok to have an HTML form snippet like this:
<form method=post action="http://www.example.com/cgi-bin/CSA?0=example.hello-world&var1=123">
<input type=text name=var2 value=456>
With the exception of a few special variable names, GET and POST variables
are made accessible to the application program
through the program environment, with their names
prefixed by "WWW_
". That is, in the above example our
example.hello-world
would receive through its environment the
assignments WWW_var1=123
and WWW_var2=456
.
Since values are loaded into the environment, the total size of the
data that a client can send to our program through GET and POST actions
is limited to the value contained in the $CSA_RPC_MAXSIZE
variable. The default limit is 10 KBytes, and it can be changed by
setting a different value in the CSA shell wrapper.
If the same variable is assigned multiple values, as it may happen
for instance with a "<select multiple>" HTML element, it
will be made available to the application program as an rc
list. For example: "WWW_var=('value1' 'value2' ...)
".
As with single-valued variables, if individual values contain
single-quotes they will be automatically
escaped. See rc(1) for more.
As explained in section
HTTP Cookies,
cookies are just a third way to set variables, beside GET and POST.
CSA considers cookie assignments first, then POST variables
and finally GET ones. So, if a GET variable contains the assignment
"goofy=123
, a cookie contains "goofy=345
" and
a POST says "goofy=789
", the application program environment
will contain "WWW_goofy=('345' '789' '123')
". Furthermore,
multiple cookies may assign different values to the same variable,
the resulting list will contain all of them. When a cookie is set more
than once in the client browser, the latter returns the latest setting
first. It will be up to the application program to account for this and
explicitly reference the relevant list element if appropriate.
Another way of conveying application-level data is through custom HTTP headers, like "X-Something: somevalue". This is supported by CSA in a way very similar to HTTP cookies, so for instance this will result in a "WWW_X_SOMETHING=('somevalue')" assignment to be passed to the application program. Since HTTP header names are case- insensitive, the corresponding variable name is turned into upper-case before it is passed to the application program.
Form variables that refer to "<textarea>" HTML elements SHOULD always have names that begin with two underscores (__). Such variables are treated especially by CSA. The way they are handled is modelled after Un-CGI, a rather well-known HTML form processor. Please refer to the Un-CGI Web page (section "Special Processing") for more on this.
Up to now I have been stating that HTTP variables are passed to the
application program through the environment. Well, this is
not really true, or not until the application explicitly
asks for them by calling the CSA function csaGetArgs
.
Only then the assignments will be loaded in the environment.
Before csaGetArgs
is called, all assignments only
exist in the temporary file pointed to by the CSA variable
$CSA_RPC_WWW
.
CSA supports numeric variable names, as already mentioned in section The classical Hello World. That is, the following URL:
http://www.example.com/cgi-bin/cgiwrap/~goofy/CSA?0=example.hello-world&1=123&2=345
will cause the program environment to contain the assignments
WWW_1=('123')
and WWW_2=('345')
. Such variables are
meant to bring a positional connotation associated with their names.
Of course, the just look like numbers from the Web client point of
view, but they are actually strings, like any other CGI variable.
and that becomes evident once CSA has prefixed them with the usual
WWW_
prefix. As usual, the same variable may be assigned
multiple values by a GET/POST/Cookie assignment, in which case we will
have WWW_1=('123' '456' ...)
, as usual.
CGI variables that begin with "X-" are treated especially by CSA. They are considered custom HTTP headers, regardless of whether they are entered via GET/POST variables, actual HTTP custom headers or HTTP cookie assignments. Treating such variables as headers means that their names will always be turned into upper-case, with invalid characters replaced by underscores (_). For instance, "X-SomeHeader: 123" will be passed to the application program as WWW_X_SOMEHEADER=123. Refer also to section HTTP Authentication .
The special CSA variables $CSA_LANG
, if present, MUST have
been entered in the form of an HTTP custom header, as explained above.
It is checked
for valid values by CSA and then is passed to the application program
without being prefixed by "WWW_
", and with the "X-" prefix
stripped-off. That is, "X-CSA-Lang: en_US" (when entered through
an HTTP header), or "X-CSA-Lang=en_US" (if entered via a GET/POST
variable or a cookie), will be passed to the application
as "CSA_LANG=en_US".
As it was the case with program output variables (see
section
Output hook),
also program input variables can be processed by the application
programmer through a proper application-level function, or hook.
The function is expected to have the name of invar(), and
is expected to be in the same RPC I/O library that we have already
explained in the
Output hook section.
Note that said file is optional, but if it is present it MUST
contain both invar() and outvar(). If the programmer
only needs one of the two, she will anyway have to provide
also the other, although the one that is not needed can be dummy
code that simply returns unchanged to the caller whatever it received.
Refer to the CSA rpciolib.awk
file for examples of
dummy input/output hooks. Like outvar(), that could use
RPCOBUF[] to store intermediate results, invar() can
use the global CSA AWK array RPCIBUF[] for the same purpose.
And still like outvar(), also invar() will
be called with ENOMSG when there are no more input parameters
(or "events") to process.
Quite often, a Web resource may require the HTTP client to authenticate itself before a service can be granted. The HTTP authentication scheme is based on a couple of message headers, namely WWW-Authenticate and Authorization.
Many Web servers,
like
Apache, provide
built-in support for HTTP authentication, of either Basic- or Digest-type.
When server-based HTTP authentication is in effect, the authorization
phase is negotiated
directly between the client and the server, with no involvement of
CSA. Once the client has authenticated succesfully, the requested
resource (CSA program) is called, and the authentication token
is passed to it by the HTTP server in the environment variable
$REMOTE_USER
. If this variable is set and contain acceptable
characters, CSA will do the following:
It will then be up to application-level routines to decide what to do if those variables are set. The normal behaviour will be that if CSA_AUTH_OK is set, access to the requested resource is granted with no further checks.
For $REMOTE_USER
to be considered valid it MUST be
all lower-case and begin with at least two letters.
So, "goofy
", "minnie123
" and "donald-123
"
are ok, but "a456
" is not.
Beside relying directly on the Web server for the authentication phase, CSA is also able to perform basic HTTP authentication itself. Unlike what normally happens with server-based authentication, however, the authorization token can be sent by a client to the server not only in the relevant HTTP Authorization header, but also through an HTTP cookie, an ordinary GET/POST variable or a PATH_INFO element with the same name (case-insensitive).
This application-based authentication scheme cannot be applied to static objects, like images and HTML pages that are served directly by the HTTP server without the CSA intervention. Being CSA an application system, however, I assume that no private objects will be made accessible to the HTTP server directly, and that everything that needs to be protected will be accessible only through the application layer. That is, no protected "physical resources" will be exposed as such, but only their "logical representations", consistently with the REST architectural style.
Note: in the CSA context, I consider the REST architectural style as a special case of the more generic term "Remote Procedure Call" (RPC).
A typical authenticated CSA program will start as follows:
# Load local authentication functions.
csaLoadLib --custom authlib.rc || csaExit.fault
# Verify client authentication.
authCheck
# ... other stuff follows.
Providing the authlib.rc
function library and
the relevant authCheck
function is entirely up to the CSA
application programmer. CSA considers authentication an application-level
issue. Users can be authenticated against flat-file user/password
pairs, SQL databases, LDAP servers, and so on. This is why CSA itself
does not provide built-in authentication facilities, but simply
a way to set the authentication variables $CSA_AUTH_xxx
, and
in particular $CSA_AUTH_USER
and $CSA_AUTH_PW
, that
will be the strings
to look up in whatever authentication system is being used.
The --custom
switch tells the csaLoadLib
function
that the specified function file is an application-level one, i.e.
it isn't a standard CSA library, which has therefore to be loaded
from the application-specific library directory "$CSA_ROOT/lib/
".
In fact, CSA does provide a few basic built-in facilities to
authenticate Web users against a simple flat-file database, but
you do not necessarily need/want to use them. I will however
explain how they work, just in case you want to resort to
a similar mechanism rather than providing a completely different
one of your own. The default user table is a clustered collection
of TAB-separated flat-files
(see sections
Path-Based Clustering
and
Sample authenticated session for more),
and it is located in "$CSA_ROOT/var/user.d/
".
HTTP cookies can be set by CSA in the client browser with
the function csaCookie.set
. Unsetting a cookie
is not supported, and is left up to the client browser.
In general, session cookies are dropped by browsers after
a while, or when the browser is closed, while persistent
cookies have an explicit expiration date.
Say you want to set a cookie with the assignment "mycookie=somevalue"
in the client browser. This can be done by calling csaCookie.set
as follows:
csaCookie.set mycookie'='somevalue
The single-quotes around the "=" sign are mandated by rc, and also "mycookie" and "somevalue" will have to follow rc quoting rules, where appropriate. To virtually "unset" a cookie, we can simply set it again with a bogus value. Actually unsetting the cookie in the client browser is more tricky, as it would require the inclusion of all the information that was used to set it, and specifying an expiration date in the past. When a cookie is set multiple times in the client, the latter will return all values beginning with the most recent one. CSA always considers the first cookie value returned, so setting a cookie to a bogus value is virtually equivalent to unsetting it, from the CSA point of view.
In the example CSA application are included samples of how a very basic cookie-based authenticated session works. To run the examples, point your browsed at the following URL (adapt the relevant parts according to your local setup):
http://www.example.com/cgi-bin/cgiwrap/goofy/CSA?0=example.showpage&page=ask-pass
If everything was setup correctly, you should see a login page. Enter the proper credentials (userid=smithj and password=mypass) to login, then play around a bit with the inner mechanics of CSA, by following the various links that are provided on the page that is displayed after login. You may also want to see what happens with an unsuccesful login, by entering invalid credentials.