Next Previous Contents

6. HTTP front-end

This chapter assumes that the reader be already familiar with the underlying topics of HTTP, GET/POST processing and form variables.

6.1 HTTP GET/POST variables

As it is often the case with other Web programming environments, CSA makes usually no real distinction between a GET and a POST operation. It is therefore perfectly ok to have an HTML form snippet like this:

  <form method=post action="http://www.example.com/cgi-bin/CSA?0=example.hello-world&var1=123">
  <input type=text name=var2 value=456>
 

With the exception of a few special variable names, GET and POST variables are made accessible to the application program through the program environment, with their names prefixed by "WWW_". That is, in the above example our example.hello-world would receive through its environment the assignments WWW_var1=123 and WWW_var2=456.

Since values are loaded into the environment, the total size of the data that a client can send to our program through GET and POST actions is limited to the value contained in the $CSA_RPC_MAXSIZE variable. The default limit is 10 KBytes, and it can be changed by setting a different value in the CSA shell wrapper.

If the same variable is assigned multiple values, as it may happen for instance with a "<select multiple>" HTML element, it will be made available to the application program as an rc list. For example: "WWW_var=('value1' 'value2' ...)". As with single-valued variables, if individual values contain single-quotes they will be automatically escaped. See rc(1) for more.

As explained in section HTTP Cookies, cookies are just a third way to set variables, beside GET and POST. CSA considers cookie assignments first, then POST variables and finally GET ones. So, if a GET variable contains the assignment "goofy=123, a cookie contains "goofy=345" and a POST says "goofy=789", the application program environment will contain "WWW_goofy=('345' '789' '123')". Furthermore, multiple cookies may assign different values to the same variable, the resulting list will contain all of them. When a cookie is set more than once in the client browser, the latter returns the latest setting first. It will be up to the application program to account for this and explicitly reference the relevant list element if appropriate.

Another way of conveying application-level data is through custom HTTP headers, like "X-Something: somevalue". This is supported by CSA in a way very similar to HTTP cookies, so for instance this will result in a "WWW_X_SOMETHING=('somevalue')" assignment to be passed to the application program. Since HTTP header names are case- insensitive, the corresponding variable name is turned into upper-case before it is passed to the application program.

Form variables that refer to "<textarea>" HTML elements SHOULD always have names that begin with two underscores (__). Such variables are treated especially by CSA. The way they are handled is modelled after Un-CGI, a rather well-known HTML form processor. Please refer to the Un-CGI Web page (section "Special Processing") for more on this.

Up to now I have been stating that HTTP variables are passed to the application program through the environment. Well, this is not really true, or not until the application explicitly asks for them by calling the CSA function csaGetArgs. Only then the assignments will be loaded in the environment. Before csaGetArgs is called, all assignments only exist in the temporary file pointed to by the CSA variable $CSA_RPC_WWW.

Special CGI variables

CSA supports numeric variable names, as already mentioned in section The classical Hello World. That is, the following URL:

 http://www.example.com/cgi-bin/cgiwrap/~goofy/CSA?0=example.hello-world&1=123&2=345
 

will cause the program environment to contain the assignments WWW_1=('123') and WWW_2=('345'). Such variables are meant to bring a positional connotation associated with their names. Of course, the just look like numbers from the Web client point of view, but they are actually strings, like any other CGI variable. and that becomes evident once CSA has prefixed them with the usual WWW_ prefix. As usual, the same variable may be assigned multiple values by a GET/POST/Cookie assignment, in which case we will have WWW_1=('123' '456' ...), as usual.

CGI variables that begin with "X-" are treated especially by CSA. They are considered custom HTTP headers, regardless of whether they are entered via GET/POST variables, actual HTTP custom headers or HTTP cookie assignments. Treating such variables as headers means that their names will always be turned into upper-case, with invalid characters replaced by underscores (_). For instance, "X-SomeHeader: 123" will be passed to the application program as WWW_X_SOMEHEADER=123. Refer also to section HTTP Authentication .

The special CSA variables $CSA_LANG, if present, MUST have been entered in the form of an HTTP custom header, as explained above. It is checked for valid values by CSA and then is passed to the application program without being prefixed by "WWW_", and with the "X-" prefix stripped-off. That is, "X-CSA-Lang: en_US" (when entered through an HTTP header), or "X-CSA-Lang=en_US" (if entered via a GET/POST variable or a cookie), will be passed to the application as "CSA_LANG=en_US".

6.2 Input hook.

As it was the case with program output variables (see section Output hook), also program input variables can be processed by the application programmer through a proper application-level function, or hook. The function is expected to have the name of invar(), and is expected to be in the same RPC I/O library that we have already explained in the Output hook section. Note that said file is optional, but if it is present it MUST contain both invar() and outvar(). If the programmer only needs one of the two, she will anyway have to provide also the other, although the one that is not needed can be dummy code that simply returns unchanged to the caller whatever it received. Refer to the CSA rpciolib.awk file for examples of dummy input/output hooks. Like outvar(), that could use RPCOBUF[] to store intermediate results, invar() can use the global CSA AWK array RPCIBUF[] for the same purpose. And still like outvar(), also invar() will be called with ENOMSG when there are no more input parameters (or "events") to process.

6.3 HTTP Authentication.

Quite often, a Web resource may require the HTTP client to authenticate itself before a service can be granted. The HTTP authentication scheme is based on a couple of message headers, namely WWW-Authenticate and Authorization.

Server-based HTTP authentication

Many Web servers, like Apache, provide built-in support for HTTP authentication, of either Basic- or Digest-type. When server-based HTTP authentication is in effect, the authorization phase is negotiated directly between the client and the server, with no involvement of CSA. Once the client has authenticated succesfully, the requested resource (CSA program) is called, and the authentication token is passed to it by the HTTP server in the environment variable $REMOTE_USER. If this variable is set and contain acceptable characters, CSA will do the following:

  1. Set variable CSA_AUTH_USER=$REMOTE_USER
  2. Set variable CSA_AUTH_PBC (see section Path-Based Clustering)
  3. Set variable CSA_AUTH_OK=1

It will then be up to application-level routines to decide what to do if those variables are set. The normal behaviour will be that if CSA_AUTH_OK is set, access to the requested resource is granted with no further checks.

For $REMOTE_USER to be considered valid it MUST be all lower-case and begin with at least two letters. So, "goofy", "minnie123" and "donald-123" are ok, but "a456" is not.

CSA-managed HTTP authentication

Beside relying directly on the Web server for the authentication phase, CSA is also able to perform basic HTTP authentication itself. Unlike what normally happens with server-based authentication, however, the authorization token can be sent by a client to the server not only in the relevant HTTP Authorization header, but also through an HTTP cookie, an ordinary GET/POST variable or a PATH_INFO element with the same name (case-insensitive).

This application-based authentication scheme cannot be applied to static objects, like images and HTML pages that are served directly by the HTTP server without the CSA intervention. Being CSA an application system, however, I assume that no private objects will be made accessible to the HTTP server directly, and that everything that needs to be protected will be accessible only through the application layer. That is, no protected "physical resources" will be exposed as such, but only their "logical representations", consistently with the REST architectural style.

Note: in the CSA context, I consider the REST architectural style as a special case of the more generic term "Remote Procedure Call" (RPC).

Sample authentication code

A typical authenticated CSA program will start as follows:

 # Load local authentication functions.
 csaLoadLib --custom authlib.rc || csaExit.fault

 # Verify client authentication.
 authCheck

 # ... other stuff follows.
 

Providing the authlib.rc function library and the relevant authCheck function is entirely up to the CSA application programmer. CSA considers authentication an application-level issue. Users can be authenticated against flat-file user/password pairs, SQL databases, LDAP servers, and so on. This is why CSA itself does not provide built-in authentication facilities, but simply a way to set the authentication variables $CSA_AUTH_xxx, and in particular $CSA_AUTH_USER and $CSA_AUTH_PW, that will be the strings to look up in whatever authentication system is being used.

The --custom switch tells the csaLoadLib function that the specified function file is an application-level one, i.e. it isn't a standard CSA library, which has therefore to be loaded from the application-specific library directory "$CSA_ROOT/lib/".

In fact, CSA does provide a few basic built-in facilities to authenticate Web users against a simple flat-file database, but you do not necessarily need/want to use them. I will however explain how they work, just in case you want to resort to a similar mechanism rather than providing a completely different one of your own. The default user table is a clustered collection of TAB-separated flat-files (see sections Path-Based Clustering and Sample authenticated session for more), and it is located in "$CSA_ROOT/var/user.d/".

6.4 HTTP Cookies.

HTTP cookies can be set by CSA in the client browser with the function csaCookie.set. Unsetting a cookie is not supported, and is left up to the client browser. In general, session cookies are dropped by browsers after a while, or when the browser is closed, while persistent cookies have an explicit expiration date.

Say you want to set a cookie with the assignment "mycookie=somevalue" in the client browser. This can be done by calling csaCookie.set as follows:

 csaCookie.set mycookie'='somevalue
 

The single-quotes around the "=" sign are mandated by rc, and also "mycookie" and "somevalue" will have to follow rc quoting rules, where appropriate. To virtually "unset" a cookie, we can simply set it again with a bogus value. Actually unsetting the cookie in the client browser is more tricky, as it would require the inclusion of all the information that was used to set it, and specifying an expiration date in the past. When a cookie is set multiple times in the client, the latter will return all values beginning with the most recent one. CSA always considers the first cookie value returned, so setting a cookie to a bogus value is virtually equivalent to unsetting it, from the CSA point of view.

6.5 Sample authenticated session.

In the example CSA application are included samples of how a very basic cookie-based authenticated session works. To run the examples, point your browsed at the following URL (adapt the relevant parts according to your local setup):

 http://www.example.com/cgi-bin/cgiwrap/goofy/CSA?0=example.showpage&page=ask-pass
 

If everything was setup correctly, you should see a login page. Enter the proper credentials (userid=smithj and password=mypass) to login, then play around a bit with the inner mechanics of CSA, by following the various links that are provided on the page that is displayed after login. You may also want to see what happens with an unsuccesful login, by entering invalid credentials.


Next Previous Contents