Introduction
Databases (DBs) are used to store large amounts of data of a variety of types. They differ in the way they store and organize their data, and how one accesses this data. The most common type is the relational DB. In such a DB, individual entries are kept as rows in a table or, often, multiple tables. Each item, or record, has a unique ID, called the key. This key can be used to point to an entry in a different table, establishing a relationship between items. This DB model is versatile but can be difficult to design and maintain.
Hierarchical databases tend to be simpler, organizing their data in a tree-like structure. Any item is assigned another item as its parent, up to an initial root node. This approach makes a database fast to access, but since each item can have only one parent, moving an item can require significantly rearranging the entire structure.
The term nonrelational, or NoSQL DB refers to DBs structured in a multitude of ways that do not rely on the relational model. These include key-value and graph models, which allow nodes to be organized in specific ways that can be better suited to data than the relational model. Their primary advantage over relational DBs is their horizontal scalability: they can expand by adding servers.
Object-based, or object-oriented DBs represent data items as individual objects — a concept shared with object-oriented programming practices. As such, these DBs work best when coupled with an object-oriented application (MongoDB, 2019). The object-oriented approach allows one to customize a DBs node to work optimally for a particular application; however, the same approach limits further customization. Moreover, object-oriented databases are heavily reliant on specific object-oriented programming languages and can be difficult to migrate.
Distributed databases are DBs stored across multiple physical locations. These databases can duplicate the same data in multiple locations, allowing it to be accessed quickly from anywhere and serve as backups in case one location has a malfunction. Alternatively, they can store data in a fragmented form, even in multiple different schemas; fragments are accessible where they are required.
The main benefits of a distributed database are its reliability if data is duplicated, and its availability — users can have local copies of the database. However, the need for queries to be processed and routed to the appropriate locations means there is significant overhead. Furthermore, data integrity can be compromised in a “duplicate” setup if something interferes with updating the data at multiple sites.
Access and Encryption
While some attacks against a DB are performed without legitimate access, one can also abuse legitimate privileges to corrupt or steal data. Because of that, clear authorization policies are necessary to ensure only legitimate users have access. Furthermore, to prevent privilege abuse, users should only have those privileges that are required for their work.
Without legitimate access rights, an attacker can still gain information from a database. One example is gaining access to backups, which are often stored less securely — although this data may not be up-to-date, it can still be useful. Encrypting or hashing the DB ensures that if this happens, the attacker is not left with readily readable data. Decryption can take unreasonable amounts of resources, making the attack less worthwhile.
Access Controls
Access controls are policies that ensure users have access to reading or modifying information they need, not more or less. Multiple approaches to access controls exist (Kriti, 2013). Discretionary access control grants individual users access privileges and the ability to grant such privileges to other users, creating a “chain of command.” In mandatory access control (MAC), a security level from top secret to unclassified is assigned to every object, and users receive matching access privileges. Role-based access control (RBAC) assigns access privileges to groups of users.
The Clark-Wilson model is a general access model aimed at ensuring data integrity. It posits that user authentication, authorization, and appropriate access restrictions are critical, and data transactions must be recorded in an audit log. This model forms the base of modern data security models.
Aggregation and corruption
Data aggregation is the practice of gathering data, analyzing it, and presenting it in a summarized way. A variety of aggregation operations can be used for this purpose, including sum, average, receiving the minimum or maximum values, and so on. Aggregation is often used in statistical and financial applications, for instance, to determine the demographics of an application’s user base.
Data corruption can occur deliberately through an attack, or accidentally in case of software or hardware failure. In any case, its effects can render a database unusable or useless. Therefore, security measures that prevent or mitigate attacks are crucial. For unintentional corruption, data must be backed up often; an audit log can also be helpful to determine what caused the damage.
Inference attack
The inference is a type of attack where one can indirectly determine, or infer, information of a higher security level by analyzing legitimate requests at a lower level. While this attack was initially targeted at statistical databases, it is now a significant threat as it can target data on mobile devices and the internet of things. For instance, an attacker can gain sensitive information if he or she can access accelerometer or eye tracking data.
Inference controls are methods of protecting sensitive data from this attack (Malik & Patel, 2016). They aim to prevent data from being disclosed through common inference channels: correlated data, missing data, and statistical inference.
Injection attack
An injection is a type of attack where a user manipulates legitimate input to execute a DB query. This is a particularly severe attack, as any DB that communicates with an external user, even indirectly, can be vulnerable. An injection attack can be executed through a browser’s address bar, and potentially grants the attacker unlimited access to the DB. Since an injection gives one the ability to directly issue DB queries, it has the potential to destroy the data by injecting a request such as SQL’s DROP TABLE (Microsoft, 2021). To prevent or mitigate this type of attack, user input must never be given to the DB directly. Instead, any symbols that can be used in it — commonly tokens used to terminate commands or blocks in code, such as quotes or brackets — must be replaced with appropriate escape sequences.
References
Kriti, I. K. (2013). Database security & access control models: A brief overview. International Journal of Engineering Research & Technology, 2 (5), pp. 743-751.
Malik, M., & Patel, T. (2016). Database security — attacks and control methods. International Journal of Information Sciences and Techniques, 6 (1/2), pp. 175-183. Web.
Microsoft (2021). SQL injection. Web.
MongoDB (2019). What is an object-oriented database? Web.