Adit Cookbook Pages

Associative Databases

Introducing the Associative Database - by Mike Griffiths © 2001

It is not often that a new database concept comes along and even rarer when one comes from a team with a long record of business success. Simon Williams of Lazy Software Ltd. has devised a new database model that he thinks advances beyond the limitations of the Relational Model and is well fitted to the Internet based applications of the future. The new database structure has been named "The Associative Database" model and Lazy Software have developed "Sentences" as their implementation of that model.

The database market is buoyant, with over $8 billion being spent on database software in 1999. Databases built following the Relational Model represent most of those sales at around 95%. The remaining 5% of sales mostly going to Object Oriented products.  Is there a need for a new database model? Can a new idea push a way into the market and disturb the dominance of Oracle, IBM and Microsoft?

The Associative model was devised by Simon Williams of Lazy Software and is said to be built upon current research with some unique additions. Anyone familiar with applying XML to data will feel comfortable with most of the underlying concepts. Both XML and Associative databases have a common route in Semantic Databases and Topic Maps. Most of the terms and concepts used will be found in references to Binary Relational Databases

The Associative Model is based upon a simple subject-verb-object syntax that has strong parallels in the sentences of English and many other languages. This is why Lazy software have called their software product "Sentences". You have to stretch the meaning of "verb" just a little but some example sentences that fit the Associative Model would be:

Red is a Colour
Mary is a Vegetarian
Vegetarians eat Plants
Cardiff is located in Wales
Wales is part of the UK     (the bold text identifies the verb)

As any sentence may be the subject or object of any other sentence it is possible to express increasingly complex data. (Ski Lessons start at 08:00) on Sunday. As you can see the "verb" is really a type of association. The association is used to explain the relationship between the subject and the object.

An Associative database has two fundamental data structures. There is a set of "Items" and a set of "Links" that connect them together. In the "Item" structure entries have a unique identifier, a name and a type. Each entry in the "links" structure also has a unique identifier together with the identifiers for the relevant "source", "verb" and "target" (the Subject, verb and object from our sentences). This can be illustrated with the following two diagrams. For clarity, the question of item type has been ignored in this illustration.

Items
Identifier     Name
12                   Red
41                   Is a
76                   Colour
14                   Mary
81                   Vegetarian
43                   Eats
82                   Plants
15                   Ski Lessons
39                   Start at
83                   08:00
42                   On
85                   Sunday

Links
Identifier   Source  Verb   Target

101                12          41          76
103                14          41          81
124                81          43          82
105                15          39          83
107              105         42           85

The last entry in the Links structure (107) shows how another entry (identifier 105) has become the Source for that entry. The two entries show how you could store the "ski lesson" sentence within the Links structure. Readers with some familiarity with Microsoft Access will have seen the "AutoNumber" facility that can be used to create a unique numeric identifier for any row in a given table. Here we see an identifier being assigned on a database wide basis. The number itself has no significance, it is simply required to be unique.

You can see that it would be very straightforward to re-construct the original subject-verb-object representation of the data from the two structures.

The Associative model structure is economical with storage space as there is no need to hold available "spaces" for data that is not available even if it is a normal part of a given data set. This contrasts with relational databases. A relational database stores a minimum of a single "null" byte for missing data items in any given row. Some relational databases reserve the maximum space for a given column in every row. The Associative database makes the storage of "custom" data for different users or for other varying needs straightforward and "inexpensive" in terms of maintenance or network resources. If there is a need to store different data about, say, different customers or customer groups in different countries then an Associative database can manage this more efficiently than a relational database.

The Associative model differentiates between what it calls Entities and Associations. An entity is defined as being discrete and having an independent existence. An association is something that depends upon one or more other things. An example may help with this. A person or company is an entity while a supplier or a customer or an employee are associations - their existence depends upon the role being played at any one time. Indeed it is possible for an Entity to have multiple business roles simultaneously, each being recorded as an association. If circumstances change, one or more of the associations may die away but the entity would endure. The difference may seem a little moot at first but is designed to simplify rather than complicate the data model.

The Associative model stores meta-data (data that describes the data) within the same structures as the data itself. The meta data describes both the structure of the database and how the different types of data can interrelate. You will probably have noticed the parallels with XML. The claim for Associative model databases is that, as with XML tool kits, generic programs can be written to interact with and manipulate the data while, when using a more conventional relational database, programs have to be custom written to reflect the data contained. You have to understand the data and the structure of the database to write anything other than the simplest program accessing data within a relational database.

The simple data structures described above need something more to deliver a database capable of storing the variety of data that a modern business requires together with the security and control that is essential for an Internet implementation. An Associative database is comprised of a number of "chapters" and a user's view of the content of a database is controlled by his or her "Profile". A Profile is a list of Chapters. The database designer consigns the various elements to specific Chapters and the user Profile restricts access to the relevant Chapters for a given user. If some links exist between items in chapters inside and outside of a particular user Profile then those links are not visible to that user. The combination of Chapters and Profiles can simplify the tailoring of the database to particular users or subject groups. Data that is relevant to one user group could be invisible to another and indeed may be replaced by an alternate data set.

The concept of a record is missing from the Associative model. To assemble all of the current information on something as complex as (say) a sales order, the data storage will need to be re-visited many times. This is a potential disadvantage, although it should be recognised that a well normalised relational database would probably also require a number of data store reads to establish a similar data set. Some rough calculations based upon a small personal sample would suggest that the Associative database would require more than four times as many data reads as a relational database. If the process of reading a sequence of links can be optimised then this may go some way to minimising the difference as experienced by the user. Those whose careers pre-date the early commercial relational databases will remember that rapid reductions in the price of computer memory matched by the introduction of large capacity, low cost disk drives allowed the wide deployment of relational technology. It may be that the Associative model will make inroads into commercial systems as system performance continues to advance.

All changes and deletions to an Associative database are affected by the addition of Links to the database. A deleted association is not physically deleted but simply linked to an assertion that it is deleted. Similarly a re-named entity is not re-named but simply linked to it's new name.

Chapters, Profiles and the existence of a database engine that expects data held to differ between individual Entities or Associations could reduce the complexity that results from the parameterisation required by large software packages (such as SAP) . It is standard practice to use "flags" held in a database to trigger or suppress program functions or modify a screen display. Packages based on an Associative database could use the structure of the database, together with the associated meta-data to control this processes. This would lead to the simplification of the often lengthy and costly implementation process. Any such simplification would result in significant cost reductions for those purchasing and implementing large software packages and could also reduce the risks associated with changes introduced post-implementation.

Is there a demand for a new database model? The weaknesses of the dominant relational model have become apparent as the nature of the data we need to store has changed. Large binary structures that support multimedia have posed significant challenges for relational databases as did the arrival of object-oriented programming techniques. Object databases have not taken the market by storm and neither have the hybrid relational products with object extensions. So it looks as though business rules are being implemented in code rather than in the database and that leaves the programmer with the problem of understanding and managing the database to deliver the required business objects. There are also the key Internet age issues of "scalability" and "distributed databases". While it may be that most of the limits in these areas are imposed by individual implementations of the relational model they also reflect the rather rigid fundamental structure inherent in the design.

Can the Associative database solve the shortcomings of the current relational model? Well some maybe, although it is still not clear how well the model will manage with large binary blocks of data. Distributing and combining Associative databases is an inherently straightforward process yet problematic for relational databases. Even where different terms (or languages) are used for the same data elements this can simply be resolved by assertions in the links structure. Such an assertion could ensure that any Associative database or group of databases would "understand" that "Great Britain", "UK", United Kingdom" and "Royaume-Uni" were all one and the same thing.

A good theory is one thing but the proof of any new database is in the detail of the delivery. Data security and transaction speeds are crucial. The user interface and the database management facilities need to be up-to-scratch. If a database is designed to support Internet applications then it must also allow back-ups without having to take the data off-line. The Programming Interface must be robust and available to a wide range of development languages. An Associative database will also have to show that it is practical to store data using the subject-verb-object paradigm in all cases. There will be concerns about maintaining performance as a database grows. If many links need to be read to arrive at the current state of a given piece of data with a specific profile then things could slow down. The fact that all changes are made by adding to the database side-steps many locking problems but there are still going to be occasions when one user or process must be sure of exclusive control over a piece of data. Poor locking strategies where the bane of early relational databases, will Associative products do better?

While certain parts of the design of an Associative database look simpler than the design of a relational database there are undoubtedly key areas that need careful attention. The creation and maintenance of the Chapters is clearly problematic. These are important to secure the appropriate level of granularity in the database to establish both control and flexibility. This has to be achieved without an excessive administrative overhead.

The Associative database model is being introduced by an entrepreneur heading a commercial business. This raises the possibility that the model is, or will become, a proprietary product. This need not be the case. If the concept proves to be workable and capable of delivering a new and effective database then others could develop products based upon the core ideas that exist outside of Lazy Software. The entry of a second player would create a market and competitive markets are what refines products and add to the essential functionality.

There is certainly a demand for a fast running database model that will scale up to large servers and down to small hand-held devices and it will be interesting to see if databases built using the Associative model can force their way into this very demanding market.

WWW Links

Topic Maps
www.infoloom.com/topmap.htm

Semantic Databases
http://www-ccs.cs.umass.edu/db.html

http://www.web.glam.ac.uk/schools/soc/research/hypermedia/index.php

Lazy Software
www.lazysoft.com and information can be requested from:
info@lazysoft.com

Mike Griffiths is a freelance technical journalist and software developer.

Google
  Web www.adit.co.uk