Introducing the Associative Database - by Mike Griffiths © 2001
It is not often that a new database concept comes along and even rarer
when one comes from a team with a long record of business success. Simon Williams of Lazy
Software Ltd. has devised a new database model that he thinks advances beyond the
limitations of the Relational Model and is well fitted to the Internet based applications
of the future. The new database structure has been named "The Associative
Database" model and Lazy Software have developed "Sentences" as their
implementation of that model.
The database market is buoyant, with over $8 billion being spent on
database software in 1999. Databases built following the Relational Model represent most
of those sales at around 95%. The remaining 5% of sales mostly going to Object Oriented
products. Is there a need for a new database model? Can a new idea push a way into
the market and disturb the dominance of Oracle, IBM and Microsoft?
The Associative model was devised by Simon Williams of Lazy Software and
is said to be built upon current research with some unique additions. Anyone familiar with
applying XML to data will feel comfortable with most of the underlying concepts. Both XML
and Associative databases have a common route in Semantic Databases and Topic Maps. Most
of the terms and concepts used will be found in references to Binary Relational Databases
The Associative Model is based upon a simple subject-verb-object syntax
that has strong parallels in the sentences of English and many other languages. This is
why Lazy software have called their software product "Sentences". You have to
stretch the meaning of "verb" just a little but some example sentences that fit
the Associative Model would be:
Red is a Colour
Mary is a Vegetarian
Vegetarians eat Plants
Cardiff is located in Wales
Wales is part of the UK (the bold text
identifies the verb)
As any sentence may be the subject or object of any other sentence it is
possible to express increasingly complex data. (Ski Lessons start at
08:00) on Sunday. As you can see the "verb" is really a type of
association. The association is used to explain the relationship between the subject and
the object.
An Associative database has two fundamental data structures. There is a
set of "Items" and a set of "Links" that connect them together. In the
"Item" structure entries have a unique identifier, a name and a type. Each entry
in the "links" structure also has a unique identifier together with the
identifiers for the relevant "source", "verb" and "target"
(the Subject, verb and object from our sentences). This can be illustrated with the
following two diagrams. For clarity, the question of item type has been ignored in this
illustration.
Items
Identifier Name
12 Red
41 Is a
76 Colour
14 Mary
81 Vegetarian
43 Eats
82 Plants
15 Ski Lessons
39 Start at
83 08:00
42 On
85 Sunday
Links
Identifier Source Verb Target
101 12
41 76
103 14
41 81
124 81
43 82
105 15
39 83
107 105 42
85
The last entry in the Links structure (107) shows how another entry
(identifier 105) has become the Source for that entry. The two entries show how you could
store the "ski lesson" sentence within the Links structure. Readers with some
familiarity with Microsoft Access will have seen the "AutoNumber" facility that
can be used to create a unique numeric identifier for any row in a given table. Here we
see an identifier being assigned on a database wide basis. The number itself has no
significance, it is simply required to be unique.
You can see that it would be very straightforward to re-construct the
original subject-verb-object representation of the data from the two structures.
The Associative model structure is economical with storage space as there
is no need to hold available "spaces" for data that is not available even if it
is a normal part of a given data set. This contrasts with relational databases. A
relational database stores a minimum of a single "null" byte for missing data
items in any given row. Some relational databases reserve the maximum space for a given
column in every row. The Associative database makes the storage of "custom" data
for different users or for other varying needs straightforward and "inexpensive"
in terms of maintenance or network resources. If there is a need to store different data
about, say, different customers or customer groups in different countries then an
Associative database can manage this more efficiently than a relational database.
The Associative model differentiates between what it calls Entities and
Associations. An entity is defined as being discrete and having an independent existence.
An association is something that depends upon one or more other things. An example may
help with this. A person or company is an entity while a supplier or a customer or an
employee are associations - their existence depends upon the role being played at any one
time. Indeed it is possible for an Entity to have multiple business roles simultaneously,
each being recorded as an association. If circumstances change, one or more of the
associations may die away but the entity would endure. The difference may seem a little
moot at first but is designed to simplify rather than complicate the data model.
The Associative model stores meta-data (data that describes the data)
within the same structures as the data itself. The meta data describes both the structure
of the database and how the different types of data can interrelate. You will probably
have noticed the parallels with XML. The claim for Associative model databases is that, as
with XML tool kits, generic programs can be written to interact with and manipulate the
data while, when using a more conventional relational database, programs have to be custom
written to reflect the data contained. You have to understand the data and the structure
of the database to write anything other than the simplest program accessing data within a
relational database.
The simple data structures described above need something more to deliver
a database capable of storing the variety of data that a modern business requires together
with the security and control that is essential for an Internet implementation. An
Associative database is comprised of a number of "chapters" and a user's view of
the content of a database is controlled by his or her "Profile". A Profile is a
list of Chapters. The database designer consigns the various elements to specific Chapters
and the user Profile restricts access to the relevant Chapters for a given user. If some
links exist between items in chapters inside and outside of a particular user Profile then
those links are not visible to that user. The combination of Chapters and Profiles can
simplify the tailoring of the database to particular users or subject groups. Data that is
relevant to one user group could be invisible to another and indeed may be replaced by an
alternate data set.
The concept of a record is missing from the Associative model. To assemble
all of the current information on something as complex as (say) a sales order, the data
storage will need to be re-visited many times. This is a potential disadvantage, although
it should be recognised that a well normalised relational database would probably also
require a number of data store reads to establish a similar data set. Some rough
calculations based upon a small personal sample would suggest that the Associative
database would require more than four times as many data reads as a relational database.
If the process of reading a sequence of links can be optimised then this may go some way
to minimising the difference as experienced by the user. Those whose careers pre-date the
early commercial relational databases will remember that rapid reductions in the price of
computer memory matched by the introduction of large capacity, low cost disk drives
allowed the wide deployment of relational technology. It may be that the Associative model
will make inroads into commercial systems as system performance continues to advance.
All changes and deletions to an Associative database are affected by the
addition of Links to the database. A deleted association is not physically deleted but
simply linked to an assertion that it is deleted. Similarly a re-named entity is not
re-named but simply linked to it's new name.
Chapters, Profiles and the existence of a database engine that expects
data held to differ between individual Entities or Associations could reduce the
complexity that results from the parameterisation required by large software packages
(such as SAP) . It is standard practice to use "flags" held in a database to
trigger or suppress program functions or modify a screen display. Packages based on an
Associative database could use the structure of the database, together with the associated
meta-data to control this processes. This would lead to the simplification of the often
lengthy and costly implementation process. Any such simplification would result in
significant cost reductions for those purchasing and implementing large software packages
and could also reduce the risks associated with changes introduced post-implementation.
Is there a demand for a new database model? The weaknesses of the dominant
relational model have become apparent as the nature of the data we need to store has
changed. Large binary structures that support multimedia have posed significant challenges
for relational databases as did the arrival of object-oriented programming techniques.
Object databases have not taken the market by storm and neither have the hybrid relational
products with object extensions. So it looks as though business rules are being
implemented in code rather than in the database and that leaves the programmer with the
problem of understanding and managing the database to deliver the required business
objects. There are also the key Internet age issues of "scalability" and
"distributed databases". While it may be that most of the limits in these areas
are imposed by individual implementations of the relational model they also reflect the
rather rigid fundamental structure inherent in the design.
Can the Associative database solve the shortcomings of the current
relational model? Well some maybe, although it is still not clear how well the model will
manage with large binary blocks of data. Distributing and combining Associative databases
is an inherently straightforward process yet problematic for relational databases. Even
where different terms (or languages) are used for the same data elements this can simply
be resolved by assertions in the links structure. Such an assertion could ensure that any
Associative database or group of databases would "understand" that "Great
Britain", "UK", United Kingdom" and "Royaume-Uni" were all
one and the same thing.
A good theory is one thing but the proof of any new database is in the
detail of the delivery. Data security and transaction speeds are crucial. The user
interface and the database management facilities need to be up-to-scratch. If a database
is designed to support Internet applications then it must also allow back-ups without
having to take the data off-line. The Programming Interface must be robust and available
to a wide range of development languages. An Associative database will also have to show
that it is practical to store data using the subject-verb-object paradigm in all cases.
There will be concerns about maintaining performance as a database grows. If many links
need to be read to arrive at the current state of a given piece of data with a specific
profile then things could slow down. The fact that all changes are made by adding to the
database side-steps many locking problems but there are still going to be occasions when
one user or process must be sure of exclusive control over a piece of data. Poor locking
strategies where the bane of early relational databases, will Associative products do
better?
While certain parts of the design of an Associative database look simpler
than the design of a relational database there are undoubtedly key areas that need careful
attention. The creation and maintenance of the Chapters is clearly problematic. These are
important to secure the appropriate level of granularity in the database to establish both
control and flexibility. This has to be achieved without an excessive administrative
overhead.
The Associative database model is being introduced by an entrepreneur
heading a commercial business. This raises the possibility that the model is, or will
become, a proprietary product. This need not be the case. If the concept proves to be
workable and capable of delivering a new and effective database then others could develop
products based upon the core ideas that exist outside of Lazy Software. The entry of a
second player would create a market and competitive markets are what refines products and
add to the essential functionality.
There is certainly a demand for a fast running database model that will
scale up to large servers and down to small hand-held devices and it will be interesting
to see if databases built using the Associative model can force their way into this very
demanding market.
WWW Links
Topic Maps
www.infoloom.com/topmap.htm
Semantic Databases
http://www-ccs.cs.umass.edu/db.html
http://www.web.glam.ac.uk/schools/soc/research/hypermedia/index.php
Lazy Software
www.lazysoft.com and information can be
requested from: info@lazysoft.com
Mike Griffiths is a
freelance technical journalist and software developer. |