The following limits apply.
There must not be any ASCII TAB characters in the data. This is the primary limit as the ASCII TAB character is the delimiter in tables. Neither there can be octal ``\001'' characters at the beginning of rows other than the table header. The following names are reserved to the AWK language, and should not be used to indicate column names:
BEGIN, END, break, continue, else, exit, exp, for, getline, if, in, index, int, length, log, next, print, printf, split, sprintf, sqrt, substr, while, and possibly others, depending on the implementation of your AWK (i.e. mawk, gawk, etc.). Refer to the man page and the documentation of you AWK interpeter.
Horizontal TABs and newlines, although forbidden as such in table data, can be conveniently represented by means of the ASCII strings '\t' and '\n' respectively. This rule applies to tables only. Files in 'list' format can contain any characters literally, including physical TABs and newlines.
The number of columns in a table may be limited to 32.768
by some AWK implementations (I think mawk
is one of those).
It should not be a problem though, as it is a very high number anyway.
In spite of this, mawk is very fast and I recommend it
over other AWK implementations.
A more serious drawback of the operator-stream paradigm is that it is process-based. This means that an average pipeline will open several processes at once, one or more for each operator. On complex queries this can lead to exceed the max. No. of child processes allowed by your operating system. This limit is O.S. specific and it can usually be overcome by getting the system administrator to increase this value as needed.
The new table header format introduced by NoSQL v4 was designed also to help with overcoming such problem. The new table header is compatible with manipulating NoSQL tables either with the supplied operators or with common system-level utilities, like grep(1), cut(1), sort(1), join(1), look(1) and many others (one exception is ``sort -r'', of course, as it would move the table header at the end of the table). In this way, in applications that really need to keep the no. of system processes at a minimum, the native shell utilities can be used to manipulate NoSQL table directly without breaking the table header, and this is a major advance in both speed and efficiency with respect to the old /rdb-like header format.