What Is Spatial Database? Term Paper

Exclusively available on Available only on IvyPanda® • No AI

Table of Contents

Abstract
Overview
Common Themes
Discordant Themes
Non-Overlapping Themes in Papers
Final Remarks
References

Abstract

The successful deployment of a given geographic information system (GIS) work demands extra care to be taken while designing spatial databases by using conceptual data modeling. Such projects require an in-depth understanding of the fundamental spatial data model. The traditional database technology sees entity-relationship diagrams as a restricted form of depicting the system’s data model because these diagrams tend to clutter various types of spatial relationships.

With research conducted on spatial databases and networks, this paper aims at highlighting the details of Spatial Database and the systems which are developed and based on Spatial Databases. The detailed conception of spatial databases and the roles of such databases in developing a proper geographic information system will be evaluated. The major part of the research study will focus on the review of the three primary papers namely “Preference Queries in Large Multi-Cost Transportation”, “Locating Mapped Resources in Web 2.0”, and “Approximate String search in Spatial Databases”, documented by an independent team of researchers.

The emphasis is mainly given on how spatial databases are helpful for the systems and applications built for geographic information systems.

Overview

Spatial database management applications and systems are intended to effectively and efficiently manage geographic data. This geographic data consists of images, points, geometric shapes, lines, the topological connections between them, as well as attribute information. Recent years have witnessed the application of spatial databases to a vast array of diverse application domains such as agricultural domain, forestry, urban infrastructure, mining, resources and energy allocation, transportation, land management, and so on. The data represented by spatial databases is getting more diversified with advancements in several fields such as remote sensing, GPS (Global Positioning Systems), etc. Much attention is paid to the spheres where an ever-increasing demand for conceptual data models is considered to be an integral part of a system.

In this context, a conceptual data model is used as a form of data abstraction. This scheme allows the details of the data storage to be hidden from external sources. It is based on the usage of logical concepts that are typically easier to understand for every user. In general, either graphical or visual tools or both can be used for the execution of conceptual data modeling. Such forms of visual data-modeling techniques provide a better understanding, and an enhanced elaboration of exact the contents of the database aims at performing the same functions; all this takes place intuitively.

However, from the standpoint of the system under consideration, visual approaches are known to be an aid in improving the processes of programming as well as maintaining the system. With such models, users are enabled to accurately model the system data and a number of respective relationships intuitively. Again, modeling has been defined as a primary element to achieve success in the deployment of the project, and this is particularly true for the management of projects based on spatial databases, such as geographical information systems, as mentioned previously [Nyerges 1997; Dave 2010].

Models of spatial data are normally categorized into two main groups, which are identified as a field and an object. One of the possible examples of such a model is as follows: the slope has to be created considering the earth surface; at the same time, the field model should be regarded as the most successful common solution under which the object is more effective in case the track of land-parcels is kept in regard to such requirements like taxation or even ownership.

Nonetheless, one of the most crucial drawbacks detected in spatial database is related to geometry of the spatial objects, the overall geometric relationships among those objects, their depiction at various different resolutions, their geometric growth with time, and lastly, their spatial integrity limitations. The next section begins with a careful review of the three papers as discussed in the previous section, and outlines the common themes or, to be more exact, the common problems and appropriate solutions mentioned in these papers [Zhang et al. 2010].

The next section for a discordant theme is dedicated to those parts of the review of the three papers which are concerned with the issues all these papers strongly disagree about. In other words, this part of the work aims at defining the issues and the whole context with the help of which several forms of disagreements may be identified [Raja 2006; Shahabi 1994]. The third section is concerned with the issues which dealt with within the aforesaid papers. In their turn, these papers are neither common statements nor disagreements. This is why this part of the paper will be known as the Non-overlapping themes in the context of which none parts of the paper represent any neither common issues nor subjects nor points which show disagreements about certain aspects of spatial databases.

The final section deals with the concluding remarks to be made at the end of the research study. Briefly, the section concludes the concept of spatial database by providing remarks and discussions based on the overall understanding of the research study. In addition, this section also highlights the writer’s personal review in regards to the likes and dislikes of the three papers in question, such as the technical content, the style of writing, the positive and the negatives of the papers, and so forth [Yingcheng & Ling 2004; Mouratidis et al. 2010]

Common Themes

Essential factors of preference-based queries

This section aims at providing the evaluation of some common themes which are described in the articles under consideration. Following the background and the underlying concept of Spatial Databases, a study review has been conducted in view of acquiring a better understanding of spatial databases. Here, the three primary papers namely “Preference Queries in Large Multi-Cost Transportation Networks”, written by Mouratidis et al, “Locating Mapped Resources in Web 2.0”, documented by Zhang et al, and lastly, “Approximate String Search in Spatial Databases”, authored by Yao et al, have been used for carrying out the common as well as different studies pertaining to spatial databases.

In spite of the fact that authors of the articles demonstrate rather different approaches for explaining information, there are a number of common issues, problems, and solutions observed in these readings.

The paper written by Mouratidis, “Preference Queries in Large Multi-Cost Transportation Networks”, is based on the study of spatial network databases. It states that there is only a single cost value linked to every road segment of the spatial network. But, some real-world situations have seen multiple types of costs associated with the transportation decision-making process. This paper considers multi-cost transportation networks wherein multiple cost values are assigned to every road segment.

The paper illustrates experiments that aim at formulating skyline as well as top-k queries in such transportation networks and develops algorithms for solving the problem or queries effectively. Solutions for such queries are characterized by two essential factors in terms of preference-based queries. The primary solution and the skyline methods are said to be progressive in nature. While, the secondary solution, the top-k methods are said to be incremental in nature [Shahabi 1994; Dave 2010].

Locating geographical resources

Similarly, the paper named “Locating Mapped Resources in Web 2.0 Discordant Themes” that is written by Zhang et al. deals with the basic applications of locating geographical resources and intends to suggest an effective tag-centric query processing approach, much like the paper presented by Mouratidis et al. Here, the research study aims at determining a group of closest co-located objects that, in combination, are capable of matching with the corresponding query tag.

The paper proposes an effective search algorithm by taking the fact into account that a great number of data objects and tags may exist in the chosen algorithm. This search algorithm is developed in the way it may be scaled up with respect to objects and tags. Furthermore, a unique ranking mechanism has been proposed to ensure that the outcomes of the aforementioned algorithm are aptly relevant. This ranking mechanism, also known as geo-tf-idf, resides in the context of geography and is considered to be sensitive to its parameters.

By conducting extensive experiments on synthetic data material, the scalability of the proposed solution is determined, whereas the experiments conducted on real-life data material allow determining its practicality. This paper proposes the use of tags the way it was developed in the first article by means of building a consistent data model for the resources that are mapped inside of the geographic context. It also explains that in Web 2.0, tagging is one of the most efficient means of annotating several kinds of resources.

The examples for such means may become successful blogs, forums, news, images, and some videos. This annotation enables users to add additional textual data as a semantic form of description. Users are also able to define this data as a kind of summarization of all objects. Both the said papers are based on human intelligence where the tags are seen well formulated in order to reduce the costs of managing term ambiguities [Yingcheng & Ling; 2004; Mouratidis et al. 2010]

Spatial approximate string queries

Much like the first paper, the third paper named “Approximate String Search in Spatial Databases” and written by Yao et al. delivers a comprehensive analysis of spatial approximate string queries. The fundamental problem domain of this paper is the study of keyword search within a large quantity of data in various types of domains. It states that keyword search has become a fundamental element for large number of real-world applications by means of using spatial databases. Hence, the paper proposes the IR² Tree to meet the purpose set. The only support for such effective keyword search having exact matches is the main drawback of the IR² Tree.

This is because practical scenarios demand the need of keyword search to obtain approximate string matches. Similar to the above mentioned papers, this paper also focuses on the problem of solving queries. The achievements of this paper prove that approximate strong searches offered are considered to be rather useful in the process of solving problems which are characterized by some fuzzy search conditions or some examples of spelling mistakes during the process when a user has to submit the query or when the strings are stored with the defined mistakes. In essence, as far as spatial databases are concerned, an approximate string search is executed in combination with any kind of spatial queries, such as range as well as nearest neighbor query [Acharya et al. 1999].

Considering the achievements evaluated, a simple and effective solution for any given spatial approximate string (SAS) query may be developed. It sounds as follows: to achieve success in data system development, it is possible to adopt any existing methods or approaches and resolve the spatial component of a SAS query. In this technique, the approximate string match predicate is verified either during the post-processing step or when the intermediate outcomes of the spatial search are obtained [Nyerges 1997].

Discordant Themes

Comparative significance of the costs

Still, in the three articles under analysis, there are several points that contradict to each other and make their writers start arguing on the same issues. The methods offered in the articles consider either the path length, or the overall travel time, or the total toll fees. On the one hand, it is hard to achieve the same results in evaluating using different aspects of the study, this is why it is hard to believe that the themes of the articles have much in common On the other hand, such multiple costs types are needed and can collectively have an impact on the decisions of end-users such as decisions relating to their preferences over the facilities.

Moreover, the comparative significance of the costs is totally reliant on the end user; a user could be interested in reducing the summer travel time, whereas another user could readily accept an extended and longer travel time for reducing the overall cost of the journey such as fuel consumption, fees at toll, etc.

Multi-cost transportation network

Therefore in these types of multi-cost transportation networks, the choice of a type of facility matters a lot, and each team of the authors introduces various approaches. It could even affect several end users or may offer multiple purposes, and each purpose will be characterized by a number of varied reachability prerequisites. Indeed, balancing these requisites urges the need to compute a skyline across the facilities or determining the top-k amongst them. This may happen in case different users or purposes served are prioritized. The example of such prioritizing may be a case when significant weights are delegated. If not all practical application, preference queries in MCNs take place in a wide array of applications like logistics, spatial planning, assigning location in a specific geographic area, etc. [Shahabi 1994; Mouratidis et al. 2010]

Non-Overlapping Themes in Papers

Applications and queries to be analyzed

In fact, it is wrong to say that the three articles under consideration may have only some common or some discordant themes. Still, there are several themes which should be regarded as non-overlapping. On the one hand, the themes disclosed have similar characteristics and impacts on the chosen sphere; and, on the other hand, various methods demonstrate how the topic may be covered in different ways.

For example, the first paper does not deal with web applications and does not solve queries for applications relating to Web 2.0. In its turn, the second paper addresses the emerging issue of locating mapped resources across Web 2.0. The authors of this paper have proposed the extensive usage of tags in order to facilitate generation of a general data as well as a query model for supporting co-location searches with the help of tag matching. Furthermore, the data resources obtained from various types of applications can be combined and incorporated into the R* tree and inverted index.

The paper basically deals with the issue of detecting or locating geographic locations. Efficient location searching techniques have been used since they have substantial commercial potential and are capable of assisting the search engine for classification and indexing of the web resources with the main aim to enhance the accuracy of the returned results. This paper does not make use of IR² tree for solving queries, but the ambiguities encountered while detecting locations as well as those involved in location names are discarded with the help of NLP or IR scheme in order to allocate the accurate scope so that similarities between two distinct forms of names do are not treated with ambiguity.

A good example of this is the case when “Washington” falling in the name “Marrie Washington” will not be detected as the name of a location. However, regardless of a sufficiently higher accuracy, conventional methods still tend to encounter newer challenges within the Web 2.0 environment [Yao et al. 2010]. These challenges arise from the fact that a variety of resources are present within the spatial database. Current state-of-the-art search engine are more attentive towards gazetteer terms that are taken directly from web documents.

Importance of tags in spatial database

One of the authors under analysis states that detecting co-locating tags within spatial databases has been a serious research problem till date. However, traditional schemes of keyword search within spatial databases can be replaced by a mapped resource matching of every query tag. But the number of tags linked to every object is usually small hereby making it all the more restricted to seek for a complete and accurate match.

At the other end of the scale, these techniques ignore the fact that resources that are spatially nearer might belong to the same object as well as are linked to each other. For instance, news regarding the “New York city plane river crash” might be possibly marked by users surrounding the crash location in the Hudson river, followed by pictures of the Statue of Liberty to be uploaded about the Liberty Island. Hence, rather than seeking out one-to-one match, the paper suggests using one query object at a time while matching multiple spatially correlated objects only if the combination of their tags can accurately match every query tag(Raja 2006).

The paper written by Zhang et al. highlights the use of nothing but only tags for building a typical data model. It does not choose to use a combination of various solution approaches for solving queries, but it describes a system framework on the basis of the above stated model in order to support co-location form of searches over a variety of resources in application based on Web 2.0. In addition to this, the paper does not show experiments using traditional approaching for solving query problems; rather it shows the use of widely employed tf-idf technique with respect to the geographical context, known as the geo-tf-idf ranking technique.

This technique allows one to measure the accuracy as well as the relevancy of the geo-tags in terms of the geographical area where they are situated. Synthetic as well as real-world data sets have been employed for conducting the experiments extensively. The results so obtained are used to determine the efficiency, effectiveness and the practicability of the proposed problem domain with respect to Web 2.0 based applications [Zhang et al. 2010; Yao et al. 2010].

Data resources obtained from various Web 2.0 applications are combined rather than treating them as individual set of data source. Every resources obtained from such sources are depicted in a uniform model for enabling the indexing as well as the searching process. The IR systems use information unit as a document, that is, even though the uniform data model used in this paper presents the concept of a document as entirely vague.

Every location point is linked to a set of tags. Resources unique from each other that are located at the same point may form a huge virtual document along with the combined tags. In case there are no resources seen in the surrounding area, the conventional tf-idf method can be applied for determining and measuring the weight among the keywords and the locations [Yao et al. 2010]. Lastly, the third paper, named “Approximate String Search in Spatial Databases”, introduces a new index to solve SAS queries, unlike the second paper. In this paper, the problems with storing the queries are presented initially, for approximation of editing distance computation straight into the R tree.

In case the query is solved with the help of the method that presupposes embedding minimum signatures, the R tree may be used as the following step to be taken and aims at converting the several problems in one problem that is able to resemble any kind of evaluating set. This paper presents a novel approach in the form of an algorithm which generates a robust selective evaluator for SAS range queries [Dave 2010]. The primary concept adopted in this algorithm is that it leverages an adaptive algorithm which determines balanced and equal partitions of nodes in the R tree based in the index, such as MHR-tree on the basis of both the string as well as spatial information obtained from the R tree nodes.

Furthermore, the detected partitions are then utilized as a means of storage or a form of cavity (bucket) for the selectivity estimator. Issues relating to coupling multiple strings with that of query and data points, together with other spatial types of queries and are also covered in this paper [Zhang et al. 2010]. The paper also revolves around the use of MHR-tree where the effectiveness, practicality has been demonstrated in order to answer SAS queries. Much the previous two papers, this paper also shows evaluation techniques that are based on synthetic as well as real data sets of as many as 10 million points and 6 dimensional domains.

In general, the essence of such data system is all about the fact that the query could be hold a number of strings. This kind of technique that may be used to handle the case under consideration may be considerably extended in a very good and straightforward way. The issues to be considered under the chosen system are a data point that is characterized by a number of strings; the solution that may be build to define the already set min-wise signatures inherent to all strings; and the combination of the above-mentioned signatures with the help of a union based on them.

It is necessary to remember that every single signature of the leaf node has to be determined on the tree. And in its turn, a query that is characterized by multiple strings should be treated the way it is applied to the pruning method that is available for every query string having a particular index. In case some string cannot pass through the pruning test successfully, the corresponding node of such string may be pruned.

The paper also discusses another crucial problem that defines the string query component for any SAS query by making use of a more general approach, such as a more generalized conjunction-disjunction semantic [Raja 2006; Shahabi 1994].

Final Remarks

This research study has successfully presented the concept of spatial databases. This paper studies the uses of databases in geographic information system application that is so optimized for holding and querying data obtained from the objects in space, such as lines, points, and so on. It is able to compare spatial databases as against typical databases, where the latter is capable of understanding several forms of numeric as well as character types of data. However, such databases also demand additional functionality for databases to able to execute spatial data types.

The paper has reviewed to three distinct studies namely Preference Queries in “Large Multi-Cost Transportation Networks”, “Locating Mapped Resources in Web 2.0”, and “Approximate String Search in Spatial Databases”. Works presented in these papers have been critically reviewed and compared on the basis of their depth of study in regards to spatial databases.

The paper has addressed the emerging issues of locating mapped resources of application based on Web 2.0. Researchers of this paper have proposed to utilize distinct tags in order to generate a simple data and query model for resolving issues relating to co-location searches. This is done by tag matching. Furthermore, R^* tree is used which is formed by combining and taking a union of the data resources of various forms of applications. The paper also developed strategies for effective searching in order to solve the tag matching queries. The paper extensively uses a ranking approach known as geo-tf-idf.

The second paper named “Approximate String Search in Spatial Databases” addresses a comprehensive study of spatial databases and approximate string query. By making use of edit distance as the similarity computations, the MHR tree was designed which embeds the min-wise signatures with respect to the q-grams of the sub trees in each index node of the R tree. In this context, the MHR tree effectively applies to both range queries. The issue of query selectivity estimation has also been resolved. This problem was stemmed out by SAS range queries. Furthermore, a future work for this study may include studying and analyzing the spatial approximate sub-string queries. In addition, a future project may also see designing of techniques which can be updated easily even by end users.

Lastly, the final paper used as a review aim of this research study is “Preference Queries in Large Multi-Cost Transportation Networks”. The paper outlines the skyline and the top-k query approaches for multi-cost road networks. It has been clearly stated that such queries are generated typically during a vast array of decision making process in applications involving multiple forms of transportation costs that exist at the same time. Moreover, the paper puts an attempt to formalize such queries as well as generate algorithms and schemes for the corresponding processing purposes. Rigorous experiments, carried out over real road networks, prove the level of efficiency of the aforementioned techniques.

In essence, a future project corresponding to this subject matter may include an extension to these techniques for incrementally updating the top-k set or skyline set when facility or query locations can be updated. Yet another greatly challenging area is preference queries in multi-cost road networks wherein the costs of every edge in summation are nothing but functions of time. These queries can retrieve preferred facilities for every instance of time in a certain specified period.

The current research study has also reviewed the importance of selectivity estimation as one of the most essential elements of query processing within spatial databases. Irrespective of the ever growing demand and widespread use of spatial databases, there still has been a minor amount of work done to offer accurate and effective methods for spatial selectivity estimation. The main comparison and significance of spatial databases to that of the relational databases is that the former performs much better in this domain; that is in the domain of geographic information systems applications.

This paper has proposed a number of new approaches and methods for spatial selectivity estimation. The essence of such methods is all about the indices which are known as spatial skew, indices, or some binary space separation. Considering the results of the extensive experiments driven by new techniques as well as several versions of traditional techniques, the current study has been able to present that:

Techniques based on the analysis of samples and parameters which work in good harmony with that of the relational one-dimensional domain work poorly with spatial data.
A binary search, based on partitioning, is commonly known as min-skew that performs much better than other conventional techniques across a wide array of query workloads as well as data sets. In this context, a min-skew partitioning can be created effectively and provides additional benefits such as lower memory requirements throughout its generation.

Hence, to summarize, the results obtained from the analysis of these experiments and methods reveal that spatial selectivity estimation can be easily treated accurately and effectively for spatial databases of excessive size. This paper also outlines the reverse furthest neighbour queries having numerous real-world practical applications. The work is able to solve the RFN queries across both bi- and monochromatic adaptations.

The R tree based effective algorithms have been presented for problems related to MRFN and BRFN having incredible pruning abilities. Every algorithm presented in this paper enables dynamic updates to the relevant data sets. Moreover, these algorithms have also been adapted to a couple with disk-resident query categories within the R tree case. The future work for this case may include current algorithms being generalized to higher dimensions in regards to moving points as well as continuous queries. This also includes solving RFN problems in road network or an Ad-Hoc subspace.

Therefore, to conclude, the current study of the paper presents a well-structured study of spatial databases, first starting with the background, then the algorithms and issues related to solving the associated queries, and lastly the future work which may be performed by taking several variations on the aforementioned techniques.

References

Acharya, S, Poosala, V, & Ramaswamy, S. 1999. Selectivity Estimation in Spatial Databases, Information Sciences Research Center. Web.
Dave, P 2010, SQL SERVER – What is Spatial Database? – Developing with SQL Server Spatial and Deep Dive into Spatial Indexing, SQLAuthority. Web.
Mouratidis, K, Lin, Y, & Yiu, M. 2010. Preference Queries in Large Multi-Cost Transportation Networks¸ Singapore Management University and Hink Kong Polytechnic University.
Mouratidis, K, Papadias, D, & Papadimitriou, S. 2005, Medoid Queries in Large Spatial Databases, Springer-Verlag Berlin Heidelberg, New York.
Nyerges, T. 1997, UNIT 10 – SPATIAL DATABASES AS MODELS OF REALITY, University of Washington. Web.
Raja, B. 2006. Spatial Database in Object Oriented Approach, The Geospatial “Resource Portal. Web.
Shahabi, C. 1994, Introduction to Spatial Database Systems, VLDB Journal, pp. 1- 9. Web.
Yao, B, Li, F, Hadjieleftherious, M, & Hou, K. 2010. Approximate String Search in Spatial Databases, Florida State University.
Yingcheng, L, & Ling, L. 2010. RESEARCH ON SPATIAL DATABASE DESIGN AND TUNING BASED ON ORACLE AND ARCSDE, Chinese Academy of Surveying and Mapping. Web.
Zhang, D, Ooi, B, & Tung, A.2010. Locating Mapped Resources in Web 2.0, National University of Singapore.