Introduction
In recent times the use of object-oriented designs in manufacture of software has skyrocketed. This has led software engineers to think of ways of coming up with database systems that are object oriented since they are much capable of meeting market needs. At the moment, there is no standardized language that can be used to program relational database systems. The field of relational database is still evolving and stakeholders hope to formalize some standards for object oriented database systems.
To maximize utility of relational database systems, concerted efforts must be initiated and which should aim at containing the shortcomings associated with the current technology. A historical analysis of evolution of relational database technology will help us to understand how object oriented database systems can be implemented with the aim of eliminating the aforementioned shortcomings.
Relational database system is defined as a database that allows any data visible to the user to be organized in form of tables that allow all operations on them to be possible (Chamberlin, 1990).Database refers to collectively to data or information organized and stored in a manner than allows for quick access to enhance usability.
Between 1950 and 1960, a system called database management system was invented which provided necessary functionality for maintenance, creation and modification of databases. This systems were however not efficient due to complexity associated with them. A client in database client/server architecture makes an application by requesting for data related services e.g. filtering or sorting from a server (Batory, 1998). The later is also known as the sequel/ SQL engine or database server in full.
The clients request is granted by the SQL by returning a secured access to data that is to be shared. SQL statements allow client applications to perform certain operations on a set of server database records such as retrieval and modification of data. The engine also allows for other operations to be performed on the data such as filtering of query result sets there by improving communication of saved data.
There are various types of database management systems such as hierarchical databases, network databases and relational database models. The later had less advantages compared to the previous ones. This led into increased interest in how it worked. Relational database systems are unique in that data is organized in separate structures commonly known as tables which can be linked so as to enhance storage of data.
This model was first discovered by Dr. Codd who’s aimed at was to eliminate shortcomings of the previous database management which was mainly huge amounts of information and complexity. Dr. Codd invented relational database management model in 1970 at San Jose Research Laboratory. Sequel or structured query Language is one of the most renowned standardized languages for enhancing interaction with a database.
History of SQL
Dr. Codd and his colleagues had developed SQL or SEQUEL (Sequential English QUEry Language) as a data sublanguage for relational model at IBM San José Research Laboratory in 1970. The language was originally put down in a series of papers from 1974.
IBM used this language in a prototype relational system known as system R. which was developed in 70s (Codd, 1970). Other prototypes that were developed then include INGRESS and Test Vehicle developed by University of California Berkeley and IBM UK Scientific Laboratory respectively.
The first relational database management to be released to the market came about when system R was refined and evaluated in 1981 to come up with fine product that was user friendly. Other DBMS (database management systems) developed using SQL included Oracle, IBM DB2 in 1970 and 1983 respectively.
Other relational DBMS that were to later incorporate SQL into their system included but not limited to, MySQL,Paradox, FoxPro and hundreds of others (Codd, 1970). Dr. Codd’s 12 initial rules for relational database model have increased through time to total up to 333.
SQL was endorsed as the standard language for relational databases by both International standards Organization and ANSI, the American National Standards Institute. Its use was formalized in 1986 and given the name SQL 1. Three years down the line, a publication of some trivial revision known as SQL-89 was made. However, during the year 1992, major revisions were done and endorsed by both ISO and ANSI.
These revisions reduced the size of SQL and made it simpler to use. In 1999, another SQL standard known as SQL: 1999 was published and endorsed by ANSI and ISO. This version which is currently in use had additional features such as user-defined data types and most importantly, it had object-oriented data management capabilities. It is common to find most dealers of relational database management systems implement their own extensions of the SQL to enhance functionality.
Historical Background of Object Oriented Systems
The need for advanced relational database technology that was easier to use made researchers consider the possibility of incorporating object oriented capability in DBMS. In 1980’s the disadvantages associated with relational database systems and the need to increase capacity of manageable objects led to the invention of commercial object-oriented database systems. Database systems have evolved overtime to allow for step by step incorporation of object oriented capabilities.
The first object oriented language was Simula-67 in 1967. This was followed by Smalltalk. After this, the researchers saw it fit to come up with new languages by creating extensions of existing ones instead of coming up with a new language from scratch. Programming languages formed as a result of extension of LISP included LOOPS and Flavors (Codd, 1970).
Extensions were made to C to come up with languages like C++ and objective C and so on and so forth. Similarly semantic data models for database systems like ER/ entity relationship, DAPLEX and SDM were developed (Batory, 1998).
There are five generations of evolution of database technology. The first being files systems, followed by hierarchical database systems, then CODASYL database systems and the fourth one is the relational database systems. The fifth generation is still under development. The third and second generations had allowed remote users to access a central or integrated database.
However, it was difficult to navigate through these systems and there was no data independence. This led to the rise of the next generation of database systems, the fourth generation. The four generations are designed for use in business applications such as accounts, sales inventory, purchases inventory, pay roll and hundreds of other data processing applications. The fifth generation database technology is expected to meet needs that go beyond business applications.
The database systems have subsequently lain of some functionality from successive generations that caused users to suffer from fatigue due to repetitive nature of some functions. This has enhanced database systems by enabling programmers to carry out their duties with ease.
This move was no without some shortcomings since performance of these systems was compromised. This made researchers to work extra hard in trying to make sure that the performance of next generation of database technology was at par. The use of declarative queries in relational database made it easier for programmers to retrieve information from the database. Performance was enhanced by introducing a new functionality, the query optimizer that determined the fastest method of retrieving record from the database.
Concerted research efforts were focused on developing reliable relational database technology in 1970. This saw the introduction into the market of commercial relational database systems. Apparently, there were major shortcomings when it came to use of this technology in other applications.
Researchers undertook to investigate these shortcomings in the 80’s. The affected applications included knowledge based systems i.e. expert system shells etc, CAD, CASE, multimedia systems etc (Batory, 1998). The main difficulty arises from the difference that exists between programming languages and database languages in that their data structures and data model vary to a wide degree.
Evolution and History of System R.
System R. was a prototype database system from where relational database technology was derived. This prototype proved that relational data model had various utility advantages that could be successfully realized in everyday use. The most important aspect of a computer is the ability to store and retrieve data.
Modern DBMS offer the user with the very much needed independence through the use of an advanced user interface. This allows the user to deal with every aspect of information content rather than representatives of that information i.e. lists, pointers, bits etc.
As stated earlier, the pioneer of relational data model was Dr. Codd in the early 70’s. According to Codd (1970) there are two ways through which conventional database systems store information:
- Through record contents in the database.
- Through the way in which these records are connected to each other.
This is to show that different systems use things such as parents, links etc to connect among the various records.
Codd observed that there were two important properties associated with relational database technology. First, data values represented all information and second the system is capable of supporting a very high level language. Through the later, the users were able to request for data without the unnecessary use algorithms. System R. was intended to accomplish seven goals.
System R. has three phases namely, ‘zero’ phase which occurred from 1974 to 1975 and it involved the development of a user interface. The other phase ‘one’ occurred from 1976 to 1977. This phase a fully functioning multiuser version of the prototype was designed. The final phase ‘two’ that occurred from 1978 to 1979 involved the evaluation of system R. after this, further experiments were carried out on the system but the system was not to be released to the market until much later.
Of particular concern to our historical review is the optimizer that was built at phase ‘zero’ and final phase ‘two’ that involves the introduction of the concept of normalization. As previously discussed, optimizer facilitates navigation in a database system by minimizing the number of page fetches through the use of clustering property. This is possible because a clustering index enables all records with the same key to be placed on the same page.
Phase ‘two’ took two years to be completed and it consisted of two main parts:
- San Jose experiments that were conducted on system R.
- Actual application of the system at various IBM sites and selected client outlets.
System R. was not to be used for any commercial purpose at this stage. This stage was to test the usability of the system on experimental basis only. The first experiment on usability of the product was carried out in 1997, June.
All users who were involved in the experiment gave positive feedback. Some of the qualities whose efficiency was being investigated included ability to reconfigure the database as fast as possible, high level of user language and ease of installation among other things. It was reported that several users found it possible to load a database with ease apart from being able to install and design the database.
Further reports suggested that users found it quite possible to adjust the performance of the database system after loading data by creating and dropping indices without interfering with the application programs or the ends user. Tables could be updated and database tables adjusted even when on read only mode.
Users rated the system R. experiment as satisfactory in terms of fair consumption of resources and performance that was ostensibly reliable for a project at an experimental level. Multiple users accessed the relatively smaller System R. experimental database; the number was often restricted to ten of them. Naturally, interactive response time was longer whenever a complicated SQL statement was being executed (Codd, 1970).
To solve this performance problem, a concept called normalization was taken into account. Since performance slowed down every time a complicated SQL involving several tables was being executed an alternative would be to break large database tables into smaller parts to eliminate the occurrence of redundancy and later joined back together by user applications or the view mechanism, this process is known as normalization.
Normalization
Normalization is the process of eliminating redundant information from tables. It improves the efficiency of a database and makes the data resistant to corruption. For instance, if a certain computer user had two tables labeled Black and White and he uses both of them to store peoples contact details like cell phone numbers, postal addresses, emails etc. If the User changes or someone else makes changes to either of the tables, then there is the probability that changes made in table black will not reflect in table white and vice versa.
This means that if the user changed someone’s cell phone number in table white that change might not be shown in rows or columns of table black. If the change was to be shown, then it would involve tremendous amounts of work from the part of the user a case that would beat logic given that database systems are meant to improve efficiency and save the business as much time and money as possible.
This problem can be solved by keeping only the ID of the person in table Black. This will in turn give the user the freedom or independence to make changes of cell phone number or to make changes related to any other contact information in table white. Such adjustments or changes would be reflected on table black automatically.
References
Batory, D., et al (1998). GENESIS: An Extensible Database Management System. IEEE Trans. On Software Engineering, 11(13), 12-14.
Chamberlin, D. (1990) Relational Database Management System. Computing Survey. 19(20), 5-9.
Codd, F. (1970) A Relational Model of Data for Large Shared Data Banks. Communication. ACM. 13(6), 377-387.