Database Inference


Inference Overview

In the context of database security, inference is the act or process of deriving sensitive information from premises known or assumed to be true. Whereas, in a multilevel secure DBMS an inference attack occurs when a low level user is able to infer sensitive information through common knowledge and authorized query responses without directly accessing the DBMS. (Morgenstern, 1987)

A technique that facilitates database inference is data mining, a method used to discover patterns within sets of data. Within data mining there are a number of useful tool is such as “data association”, which is a user-defined grouping of seemingly unrelated groups and elements. (Raman, 2001) Association rules are valuable for analyzing consumer behavior, and consist of antecedent (if) and consequent (then) statements, as in “if” a customer buys “x”, “then” they will buy “y”.

Another data mining tool is “aggregation”, which occurs when information is gathered and expressed in a summary format for statistical analysis, such as examination of mean, median, standard deviation, and other parameters.

Methods of Attack

Out of Channel

This is a particularly difficult inference vulnerability to protect the DBMS against, as much of the data that is acquired is from external sources. In this type of attack extensive use is made of freely accessible information sources, and using that data to perform inference of a secured database.

Indirect Attacks

This type of attack is accomplished by the use of intermediate results gleaned from aggregate mean, median, standard deviation, the use of the Sum, Count functions, or set theory.

Direct Attacks

This type of attack is typically conducted against a DBMS with poor security, such as inadequate MAC and DAC configurations. In the direct attack, queries that will elicit small responses are launched at the DBMS.

Logical Inferences

The logical inference is often considered a type of direct attack, but may designated as an indirect attack, dependent upon its’ level of complexity. This type of attack makes use of association rules, and the data mining strategies of apriori algorithms and clustering.

Statistical Inferences

This indirect attack utilizes aggregate data and mathematical and statistical analysis to derive inferences on numerical data or textual data sets. The textual data can be enumerated or represented as frequencies or counts, and this same statistical method can then be used to derive associations. (Hylkema, 2009)

Query Results

We will use statistical inferencing to extrapolate the unknown salaries of Alice, Bob, and Dan. To accomplish this, we utilize the salaries culled from the Java applet “Database Inference” of the various groups that we know they are a member of, and calculate the mean. In the case of Alice who is in the “Clerk, Support, and 3rd floor groups”, we will use the following figures:

All Clerks Avg. $34, 5000

All Support Avg. $35, 500

All 3rd Floor Avg. $35, 000

Thus to determine Alice’s salary, we would utilize the following formula: 34,500+35,500+35,000=105,000/3. Therefore, we can infer that Alice’s salary is $35,000.

Using the same methodology, we can deduce Bob’s salary. Bob is a member of “Admin, Sales”, with no floor designated which equates to: 38,500 + 52,625 = 91,125 / 2 for a statistical inference of a $45,562.5 salary for Bob.

Based upon Dan’s group memberships “Supervisor, Sales, Basement”, our calculations   68,333 + 52,625 + 68,333 = 189,291 / 3 produce an inferred salary of $63,097 for Dan.

Mitigation Methods

Suppression and concealing

In suppression, some query results are withheld by rounding, presenting a random sample or range of results. Similarly in concealing, data may be approximated, combined, rounded, or returned in a range or random sampling of results.

Random Data Perturbation

Random data perturbation functions by the addition of random degrees of erroneous data in response to the query request.


Partitioning consists of segregating data based upon its’ degree of sensitivity. This technique while highly effective in enhancing the confidentiality of our data does have a downside in the redundancy and complexity, which it introduces to the DBMS administration.


This technique is utilized in multilevel DBMS to preclude inference. In it, data is classified based upon sensitivity ratings, and end-users are only able to access data that they have the requisite clearance for.

Query Controls

This inference prevention method is typically used to counter indirect attacks. The query control will process the incoming query, the resultant output, or perhaps both, and deny queries or results that do not conform to DBMS inference policies.

Preprocessing and Result Analysis

Query preprocessing occurs prior to query execution, and is used to prevent questionable queries. Conversely, query result analysis is performed after query execution, and is used to prevent dubious results from being too precise, particularly those that may have been missed by the preprocessing stage.

Query History Retention

Typically in query history retention, clustering algorithms are utilized to archive queries of users or groups to ensure that multiple queries are not being used perform inferences on classified data. Collecting information on groups can assist with the mitigation of collaborative inferencing, though it does require more system resources, and may generate false positives. (Hylkema, 2009)


Database Inference. (2012). Retrieved from

Hylkema, M. (2009). A Survey of Database Inference Attack Prevention Methods. Retrieved from

Jajodia, S. Meadows, C. Inference Problems in Multilevel Secure Database Management Systems. Retrieved from

Morgenstern, M., Denning, D., Akl, S., Heckman, M. (1987). Views for Multilevel Database Security. Retrieved from

 Raman, S. (2001). Detecting Inference Attacks Using Association Rules. Retrieved from

Rouse, M. (2011). Association Rules In Data Mining. Retrieved from