This installment will continue the feature analysis of the last installment, and discuss the benchmark application programs that are being written in each of the the DBMS products as a way of evaluating them. We will also look at other alternatives to the relational model and at alternatives to UNIX itself.
In the last installment, Informix, Progress, Rubix, and Unify were compared using a list of distinguishing features. We now examine another DBMS. Mistress is a product of Rhodnius, Inc. It is a commercial version of an earlier product called MRS, for Microcomputer Relational System, which was intended to be a product like the IBM System R DBMS which uses the SQLTM language. It is now offered on a number of systems, with some variations, using several operating systems. We will look at the current version 2.2.10 of Mistress Plus, which includes their M-Writer report writer (RG) and M-Vision screen interface program (DEL).
Version 3.1, Mistress/32, which has been optimized for 32-bit UNIX machines, is supposed to be released in April, 1984, and is to include facilities for audit trails, rollback recovery, nested select to 100 levels, math functions in the Query Language (QL), record locking, and a menu generator. Look for a progress report on this version.
Here is how Mistress rates using the distinguishing features from the previous installment. For a more complete explanation of those features, see Part Two of this series. Some of the items counted as one feature in the last article are broken out below. Those features rated as NY (not yet) are supposed to come with version 3.1, which I have not validated.
| 1 | Relational? | QY* |
| 2 | Has DBM languages? | Y |
| 3 | Easy restructuring? | N* |
| 4a | Passwords to databases? | Y |
| 4b | Passwords to tables? | Y |
| 4c | Passwords to records? | N |
| 4d | Passwords to fields? | N |
| 5a | Has type integer? | Y* |
| 5b | Has type fixed? | Y |
| 5c | Has type float? | Y* |
| 5d | Has type dollar? | Y* |
| 5e | Has type string? | Y* |
| 5f | Has type date? | Y |
| 5g | Has type time? | Y |
| 5h | Has type boolean? | N |
| 5i | Has type serial? | N |
| 6a | String match test? | Y |
| 6b | String contains test? | Y |
| 6c | String beginswith test? | Y* |
| 6d | String endswith test? | Y* |
| 7 | Entity integrity? | QY* |
| 8 | Referential integrity? | QY* |
| 9a | File locking? | Y |
| 9b | Record locking? | NY |
| 10 | Data entry forms? | Y |
| 11 | General join? | Y |
| 12 | Views? | N |
| 13a | DML addition? | NY* |
| 13b | DML subtraction? | NY* |
| 13c | DML multiplication? | NY* |
| 13d | DML integer division? | NY* |
| 13e | DML modulo operator? | NY* |
| 13f | DML rational division? | NY* |
| 13g | DML exponentiation? | N |
| 13h | DML logarithm? | N |
| 13i | DML maximum? | NY* |
| 13j | DML minimum? | NY* |
| 13k | DML truncation/int? | N* |
| 13l | DML rounding function? | N* |
| 13m | DML random function? | N |
| 13n | DML string concatenation? | N |
| 13o | DML substring select? | N |
| 13p | DML substring index? | N |
| 13q | DML string length? | N |
| 14 | Report writer? | Y |
| 15 | Menu generator? | NY |
| 16 | DEL/QL/RG/DML/CL procedural? | QY* |
| 17a | Record-by-record operations? | Y* |
| 17b | Intermediate variables? | N* |
| 18 | Auto save intermed var? | N |
| 19a | First operator? | N |
| 19b | Last operator? | N |
| 19c | Next operator? | Y |
| 19d | Previous operator? | Y |
| 20 | Helpful error messages? | QY* |
| 21 | Mult tables in UNIX file? | Y |
| 22 | CL files separate UNIX files? | Y |
| 23 | CL files link to db/tables? | N |
| 24 | Mult db file per direc? | N |
| 25 | Recs/fields fixed/var? | Fixed* |
| 26a | Max num tables/db? | 9999 |
| 26b | Max num records/table? | 9999 |
| 26c | Max size records? | 231-1 |
| 26d | Max num fields/table? | 9999 |
| 26e | Max size fields? | 231-1 |
| 27 | Dump, load ops? | Y |
| 28a | Shell escape? | Y |
| 28b | C interface? | Y |
| 29 | CL compilation? | N |
| 30 | Use/perf run statistics? | N |
| 31a | Timestamp records? | N |
| 31b | Rollback to previous state? | NY |
| 32 | Use raw I/0 device? | N |
| 33 | Transaction logging? | NY |
| 34 | Disk storage needed? | 3M |
| 35 | Main memory needed? | 512K |
(1) There is no unique default key to guarantee nonduplication of records, so this must be supplied when display (data dictionary) is defined. See (7) and (8).
(3) Tables and fields can be renamed, but to restructure a table, its contents and "display" must be dumped into a UNIX file, the display file edited, the current table "dropped", a new table created from the edited display file, and the data inserted from the contents dump file.
(5a) Range is +/- 215-1. Also a type shortinteger +/- 27-1 and longinteger +/- 231-1.
(5c) Precision is 7 digits. Notation is scientific, exponent +/- 38. Also a type longfloat with 15 digits.
(5d) Dollar type permits amounts up to $9,999,999,999,999.99, sufficient for most national budgets.
(5e) Type char may have length of up to 231-1! There is also a type text with a display length, storage length, and overflow storage length. If the block size exceeds the amount specified for storage length in the table, the excess is stored in an overflow table. This design combines the performance advantages of fixed-length records for the main table with the storage advantages of an overflow table using variable-length records.
(6cd) Wildcard character * is used for matching strings, so can handle contains, beginswith, and endswith.
(7) Has unique qualifier which may be used to prevent duplicate records and, in particular, to prevent more than one record from having a null entry in an index field, but this is not default situation.
(8) This can be handled by writing a procedure, but it is not default situation.
(13) The report generator M-Writer can be used to do some calculations and the result can be sent to a UNIX file, from which they can be inserted into a database table, but this is too inconvenient to qualify for a Y rating, for the reason discussed in (3).
(16) The QL, RG, and CL are procedural, with some limitations, such as the level of nesting of select commands, which is to be increased in v. 3.1.
(17a) M-Writer can do this, but it is inconvenient for data manipulation in a table, for the reason discussed in (3).
(17b) Variable values may not be changed from record to record in a "for each" or "while" loop.
(20) The line with an error is displayed, with a pointer to the position of the error and a message "syntax error". Does not say what kind of syntax error.
(25) See (5e) above. Main tables have fixed length records, the overflow table variable length records.
Mistress is an older product, and is available on a variety of machines under several operating systems, from the IBM 3031 to the IBM-PC/XT. Several applications have been written using it. Its Query Language departs from SQL in ways that makes its programs more readable and comfortable, but it still suffers from the limitations of SQL for doing data manipulation, which makes it unsuitable for some applications if the user doesn't want to program in C. Mistress tries to make this easier with its mx and mr routines. Hopefully, some of these limitations will be alleviated in the next version. It has a unique data type: greek, which can be used for strings of Greek characters, making it the DBMS of choice for classical Greek scholars. Probably its most outstanding distinguishing feature is its text data type, which makes it a good choice for applications needing character fielda of almost unlimited length, such as databases of documents and for library science applications.
The usual approach to benchmarking DBMSs is to time them in their performance of each function. However, this does little to test the suitability of the products for practical applications. Anticipated new versions of the products with greatly expanded functionality are too close to make such performance benchmarking appropriate at this time. While waiting for the situation to stabilize, work has begun on rudimentary applications designed to test the functionality of each of the DBMS products. These applications will be written to function as nearly alike as feasible, with no frills and little concern for appearances. The following are some of the ones being written:
Task Manager. Permits the entry of tasks and how they are assigned to each member of an organization. It groups them into classes, links them into projects, and tracks performance to completion. It assigns values and priorities, frequency of repetition, and relates tasks to deadlines and triggering events. It can be used to generate schedules for each person or team for any period and show periods open for appointments or discretionary tasks. It permits the entry of activities, and can generate an operations log. It can analyze productivity and the utilization of time and other resources. It can be used as a source of data to the Time Billing package.
Sales Manager. Permits the entry of data about prospects, either from the keyboard or from a variety of data files, and facilitates the reduction of data redundancy. It permits updates of data on each prospect, including occupation and interests, what the prospect already has or needs, financial ability, and a history of previous contacts and the results, with suggestions for follow up which can be fed to the Task Manager. Can be used to generate daily call lists for salesmen, and to document sales activity and productivity. Maintains history of sales made and commissions paid.
Time Billing. Permits entry of billable tasks, with the client and matter, the procedure and rate, for each of several producers, and produce statements to responsible parties. Can enter payments received and post them to appropriate accounts. Can distribute payments to appropriate producers and analyze productivity and resource and time utilization.
Document Manager. Uses unique serial number for each document, assigns each to locations and containers, such as file folders and cabinets, and generates labels for each. Matches documents on hand against lists of documents required. Tracks movement of materials and accounts for return to assigned location. Manages disposition of each item.
Reservation System. Could be used for travel reservations, shipping, class schedules, or like situations. Can be used to assign traveler, cargo, or student to sequence of vehicles or classes, allowing time for transfer, and to reassign when unforeseen departures from schedules occur. Can track actual movement. Analyzes productivity. Reassigns to take vehicles or classes out of service if not needed. Prepares statements.
Troubleshooter. Entry of data on observed symptoms and results of tests and procedures. Correlation of data to assist in the specification of production rules of the form "If <condition> then <recommended action>". This would be the front-end for an expert system for which knowledge has not yet stabilized. Would make use of statistics, so would need math capabilities, unless interfaced to statistics package.
General Ledger. Would provide all of usual GL functionality, such as Chart of Accounts, Journal, Profit & Loss, and Financial Statement.
These applications have been designed to represent the kinds of things users may want to do with a DBMS. Readers and vendors are invited to suggest others. Ease of programming will be a consideration. If it turns out to be too much trouble to use a DBMS to write one of these applications, it will be postponed or skipped, and the difficulties noted.
Although this project is far from completed, some comments can be made about preliminary experience with the products. Most work on Mistress and Unify has been put off pending arrival of their coincidentally numbered versions 3.1. Work on Informix has been put off because I found its command structure unpleasantly awkward. The reader should not take that as a final judgement. Further experience with the product may change that reaction. But any professional programmer will admit that esthetic reaction to a programming tool or language affects his productivity, and that is a factor that vendors of such products must take into account if they hope to get acceptance for them. Functionality is not enough. Elegance is important, also.
The Progress DBMS language from Data Language Corporation is appropriately named. I have found it difficult to beat for getting small applications running quickly. A programming task can often be accomplished in minutes using Progress which would take hours using another product. It is highly interactive and has some elegant ways to do a lot with a few simple commands.
Although stored as ASCII files, Progress programs are compiled before being executed. This slows operation if the programmer writes many small programs which call one another often, but it is fast enough to make it easy to modify and rerun programs during development, much like an interpreter. In executing a Progress program, it soon becomes apparent that it is being compiled, because the effect of a program statement can depend on later sequences of statements.
The Progress compiler has some interesting not-yet-documented side-effects which can be made good use of once you get some experience with how they work. So far, it has been possible to write all of the applications in Progress without having to do any C programming or calls to the shell. This may not be important to competent C programmers, but it can be worth a lot to the many potential users who either don't want to learn to program in C or UNIX, or who would prefer not to have to, for the kind of ad hoc jobs they have to do. Although using Progress is programming, it is on about the level of difficulty of a good spreadsheet. Error messages are very helpful.
When some of the planned features of their Prefix[TM] user interface become available, Infosystems Technology's Rubix DBMS with its compilable Q language may become even easier to use than Progress is. In the meantime, it offers a theoretically robust functionality which makes it a good choice for more com- plex applications. Achieving some of the effects of a single Progress statement might take some thought and the writing of a long function in the C-like style of Q, but the Rubix programmer is not limited by the assumptions the Progress compiler makes about the programmer's intentions. Using Rubix, a programmer could come up with something that would look and act like a spreadsheet and which could be used for applications like interactive modeling of a complex process, as well as more conventional DBMS applications. It would be fairly easy to interface it to a virtual-memory spreadsheet, a graphics or statistics package, or a communications network. However, an effect of having more options is to require more thought in the design of a program if one is to make use of them.
The people at Infosystems Technology are embarked on another project worth watching: the extension of the concept of the relational DBMS to handle the kind of imprecise and uncertain information encountered in artificial intelligence and expert systems, and to provide it with the functionality needed for querying, manipulating, and inferencing on such information. They call their concept the Set Manager. It would permit the modeling of not just sets or the relational algebra, but the much more general categories and categorical algebra, and this would extend its reach to advanced applications which have not heretofore been approached.
The logic of existing' DBMS products is two-valued. That is inadequate to represent imprecise or uncertain information. One way to do so is with fuzzy (sub)sets, which are the subject of fuzzy reasoning theory. In this context, this involves the replacement of field values with functions which map the domain of possible values into the unit interval, which is interpreted as the level of confidence associated with each value. These level of confidence values resemble probabilities, but their logic is different, and a new body of theory called possibility theory has been conceived to deal with them.
Deep-knowledge expert systems need an inference capability. This can be provided by a single rule of inference called resolution which is not very intuitive but which replaces the several rules of inference used by humans and is efficient on a computer. By using a single rule of inference, one avoids having to provide the program with the ability to decide which rules of inference to use and in what order.
The Set Manager or something like it is going to be needed if AI work is to make the transition from the academic world into the commercial marketplace. Commercial expert systems will need to be able to use large quantities of data in complex ways. A product like Rubix may serve as a foundation for this.
It should also be mentioned that Rubix is derived from Logix, which was developed by Logical Software, Inc. They also offer a full-screen user interface to UNIX called Softshell. The company is marketing the products on several machines.
In the first installment, the alternatives of the hierarchical and network models were discussed, with the advantages and disadvantages of each. The relational model is now favored in the marketplace, but it has its limitations and should not be assumed to be the ultimate answer.
Suppose we have a relational database we are happy with, for the most part, except that we want to attach some qualifications to some of the field values. For example, we might want to note about a telephone number that it is only valid for the hours from 10 to 3, Monday through Wednesday, and that there is a second number to be called at other times. If every other telephone number was valid 24 hours a day, we would not want to structure the table with extra fields to handle just this one exception. Cramming the extra information into the telephone number field would violate one of the conditions for considering a database relational, namely that it be in First Normal Form (1NF) and have no composite field values. Departure from this restriction is discussed by Dreizen1
Another approach is to abandon the relational table and represent information using nested arrays. In this model, the principal data structure is the rectangular array, each element of which may be another array of arbitrary dimension and size, with no requirement for uniformity. A language, called NIAL, has been developed for this model2. It can be used to implement the relational model, but can also go beyond it. Its syntax is similar to that of APL, but the notation is more readable. It has the functionality for AI work of LISP and Prolog, but avoids the need for the programmer to be constantly aware of the backtracking logic used by Prolog. It also provides resolution as a primitive. It promises to be better for AI work than LISP or Prolog. It is written in C, and is currently implemented as an interpreter. A compiler is under development. I will be doing the part of the interpreter to at least one UNIX machine, starting with the Fortune 32:16. Watch for a progress report on this effort.
Running DBMSs under UNIX raises a number of issues. We may be enthusiastic about UNIX because it is better than the alternatives that we have seen, but we must also admit that it has its flaws. It is not quite a standard - there are several incompatible versions. Such portability as exists is of the C language rather than of the operating system, and C is not standard, either. The shell and many of the utility programs are user-unfriendly, their syntax is irregular, and some are antiquated. There are no standard approaches to record and file locking, interprocess communications, removable media handling, real-time process control, and fast disk I/O. UNIX is not secure against defective programs or malicious programmers. It does not adapt well to multiprocessor architectures. The file system is slow, lacks several features like keyed files, and is vulnerable to crashes. It does not support reliable transaction logging, and has no standard way to implement virtual memory.
That the world may move to UNIX as the new standard now seems certain, but it may also be a stepping stone to something else. A number of alternatives are under development. I will be looking carefully at these. One on which we have received documentation is S1TM, from Multisolutions, Inc. It claims to provide all the functionality and advantages of UNIX and to overcome all of the deficiencies noted above. If it can do what the developers say it can, it may indeed be the successor to UNIX. Watch for reports on it.
[1] Dreizen, Howard M., and Chang, Shi-Kuo, "Imprecise Database: A Rationale for Relations with Embedded Subrelations," Information Systems Laboratory, Illinois Institute of Technology, Chicago, IL 60616.
[2] Glasgow, Janice J., "Logic Programming in NIAL," Queen's University, Kingston, Ontario, Canada K7L 3N6.
Logix. Logical Software, Inc., 55 Wheeler St., Cambridge, MA 02138, 617-864-0137.
Mistress. Rhodnius, Inc., 10 St. Mary St., Toronto, Ontario, Canada M4Y lP9, 416-922-1743.
Progress. Data Language Corporation, 5 Andover Road, Billerica, MA 01821, 617-663-6500.
Rubix. Infosystems Technology, Inc., 6301 Ivy Lane, Greenbelt, MD 20770, 301-345-7800.
S1TM. Multisolutions, Inc., 660 Whitehead Road, Lawrenceville, NJ 08648, 609-695-1337.
Jon Roland is a consultant with Cyberian Computer Consultants and Director of the Vanguard Institute, which does research in several fields, mainly artificial intelligence. His address is 206 East Nakoma, San Antonio, Texas 78216, 512-340-7641.