DBMS

Business Intelligence

 

 

Business intelligence (BI) is a set of theories, methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information for business purposes. BI can handle large amounts of information to help identify and develop new opportunities. Making use of new opportunities and implementing an effective strategy can provide a competitive market advantage and long-term stability.

BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.

Though the term business intelligence is sometimes a synonym for competitive intelligence (because they both support decision making), BI uses technologies, processes, and applications to analyze mostly internal, structured data and business processes while competitive intelligence gathers, analyzes and disseminates information with a topical focus on company competitors. If understood broadly, business intelligence can include the subset of competitive intelligence.

History

In a 1958 article, IBM researcher Hans Peter Luhn used the term business intelligence. He defined intelligence as: "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.

Business intelligence as it is understood today is said to have evolved from the decision support systems that began in the 1960s and developed throughout the mid-1980s. DSS originated in the computer-aided models created to assist with decision making and planning. From DSS, data warehouses, Executive Information Systems, OLAP and business intelligence came into focus beginning in the late 80s.

In 1989, Howard Dresner (later a Gartner Group analyst) proposed "business intelligence" as an umbrella term to describe "concepts and methods to improve business decision making by using fact-based support systems." It was not until the late 1990s that this usage was widespread.

Business intelligence and data warehousing

Often BI applications use data gathered from a data warehouse or a data mart. A data warehouse is a copy of transactional data that facilitates decision support. However, not all data warehouses are used for business intelligence, nor do all business intelligence applications require a data warehouse.

To distinguish between the concepts of business intelligence and data warehouses, Forrester Research often defines business intelligence in one of two ways:

Using a broad definition: "Business Intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making. When using this definition, business intelligence also includes technologies such as data integration, data quality, data warehousing, master data management, text and content analytics, and many others that the market sometimes lumps into the Information Management segment. Therefore, Forrester refers to data preparation and data usage as two separate, but closely linked segments of the business intelligence architectural stack.

Forrester defines the latter, narrower business intelligence market as, "...referring to just the top layers of the BI architectural stack such as reporting, analytics and dashboards."

Business intelligence and business analytics

Thomas Davenport argues that business intelligence should be divided into querying, reporting, OLAP, an "alerts" tool, and business analytics. In this definition, business analytics is the subset of BI based on statistics, prediction, and optimization.

Applications in an enterprise

Business intelligence can be applied to the following business purposes, in order to drive business value.

  1. Measurement – program that creates a hierarchy of performance metrics (see also Metrics Reference Model) and benchmarking that informs business leaders about progress towards business goals (business process management).
  2. Analytics – program that builds quantitative processes for a business to arrive at optimal decisions and to perform business knowledge discovery. Frequently involves: data mining, process mining, statistical analysis, predictive analytics, predictive modeling, business process modeling, complex event processing and prescriptive analytics.
  3. Reporting/enterprise reporting – program that builds infrastructure for strategic reporting to serve the strategic management of a business, not operational reporting. Frequently involves data visualization, executive information system and OLAP.
  4. Collaboration/collaboration platform – program that gets different areas (both inside and outside the business) to work together through data sharing and electronic data interchange.
  5. Knowledge management – program to make the company data driven through strategies and practices to identify, create, represent, distribute, and enable adoption of insights and experiences that are true business knowledge. Knowledge management leads to learning management and regulatory compliance.

In addition to above, business intelligence also can provide a pro-active approach, such as ALARM function to alert immediately to end-user. There are many types of alerts, for example if some business value exceeds the threshold value the color of that amount in the report will turn RED and the business analyst is alerted. Sometimes an alert mail will be sent to the user as well. This end to end process requires data governance, which should be handled by the expert.

Prioritization of business intelligence projects

It is often difficult to provide a positive business case for business intelligence initiatives and often the projects must be prioritized through strategic initiatives. Here are some hints to increase the benefits for a BI project.

  • As described by Kimball you must determine the tangible benefits such as eliminated cost of producing legacy reports.
  • Enforce access to data for the entire organization. In this way even a small benefit, such as a few minutes saved, makes a difference when multiplied by the number of employees in the entire organization.
  • As described by Ross, Weil & Roberson for Enterprise Architecture, consider letting the BI project be driven by other business initiatives with excellent business cases. To support this approach, the organization must have enterprise architects who can identify suitable business projects.
  • Use a structured and quantitative methodology to create defensible prioritization in line with the actual needs of the organization, such as a weighted decision matrix.

Success factors of implementation

Before implementing a BI solution, it is worth taking different factors into consideration before proceeding. According to Kimball et al., these are the three critical areas that you need to assess within your organization before getting ready to do a BI project:

  1. The level of commitment and sponsorship of the project from senior management
  2. The level of business need for creating a BI implementation
  3. The amount and quality of business data available.

Business sponsorship

The commitment and sponsorship of senior management is according to Kimball et al., the most important criteria for assessment. This is because having strong management backing helps overcome shortcomings elsewhere in the project. However, as Kimball et al. state: “even the most elegantly designed DW/BI system cannot overcome a lack of business [management] sponsorship”.

It is important that personnel who participate in the project have a vision and an idea of the benefits and drawbacks of implementing a BI system. The best business sponsor should have organizational clout and should be well connected within the organization. It is ideal that the business sponsor is demanding but also able to be realistic and supportive if the implementation runs into delays or drawbacks. The management sponsor also needs to be able to assume accountability and to take responsibility for failures and setbacks on the project. Support from multiple members of the management ensures the project does not fail if one person leaves the steering group. However, having many managers work together on the project can also mean that there are several different interests that attempt to pull the project in different directions, such as if different departments want to put more emphasis on their usage. This issue can be countered by an early and specific analysis of the business areas that benefit the most from the implementation. All stakeholders in project should participate in this analysis in order for them to feel ownership of the project and to find common ground.

Another management problem that should be encountered before start of implementation is if the business sponsor is overly aggressive. If the management individual gets carried away by the possibilities of using BI and starts wanting the DW or BI implementation to include several different sets of data that were not included in the original planning phase. However, since extra implementations of extra data may add many months to the original plan, it's wise to make sure the person from management is aware of his actions.

Business needs

Because of the close relationship with senior management, another critical thing that must be assessed before the project begins is whether or not there is a business need and whether there is a clear business benefit by doing the implementation. The needs and benefits of the implementation are sometimes driven by competition and the need to gain an advantage in the market. Another reason for a business-driven approach to implementation of BI is the acquisition of other organizations that enlarge the original organization it can sometimes be beneficial to implement DW or BI in order to create more oversight.

Companies that implement BI are often large, multinational organizations with diverse subsidiaries.A well-designed BI solution provides a consolidated view of key business data not available anywhere else in the organization, giving management visibility and control over measures that otherwise would not exist.

Amount and quality of available data

Without good data, it does not matter how good the management sponsorship or business-driven motivation is. Without proper data, or with too little quality data, any BI implementation fails. Before implementation it is a good idea to do data profiling. This analysis identifies the “content, consistency and structure [..]”of the data. This should be done as early as possible in the process and if the analysis shows that data is lacking, put the project on the shelf temporarily while the IT department figures out how to properly collect data.

When planning for business data and business intelligence requirements, it is always advisable to consider specific scenarios that apply to a particular organization, and then select the business intelligence features best suited for the scenario.

Often, scenarios revolve around distinct business processes, each built on one or more data sources. These sources are used by features that present that data as information to knowledge workers, who subsequently act on that information. The business needs of the organization for each business process adopted correspond to the essential steps of business intelligence. These essential steps of business intelligence include but are not limited to:

  1. Go through business data sources in order to collect needed data
  2. Convert business data to information and present appropriately
  3. Query and analyze data
  4. Act on those data collected

The quality aspect in business intelligence should cover all the process from the source data to the final reporting. At each step, the quality gates are different:

  1. Source Data:
    • Data Standardization: make data comparable (same unit, same pattern..)
    • Master Data Management: unique referential
  2. Operational Data Store (ODS):
    • Data Cleansing: detect & correct inaccurate data
    • Data Profiling: check inappropriate value, null/empty
  3. Datawarehouse:
    • Completeness: check that all expected data are loaded
    • Referential integrity: unique and existing referential over all sources
    • Consistency between sources: check consolidated data vs sources
  4. Reporting:
    • Uniqueness of indicators: only one share dictionary of indicators
    • Formula accurateness: local reporting formula should be avoid or checked.
    • Please click here to read more....

 

Database Inference

 

Inference Overview

In the context of database security, inference is the act or process of deriving sensitive information from premises known or assumed to be true. Whereas, in a multilevel secure DBMS an inference attack occurs when a low level user is able to infer sensitive information through common knowledge and authorized query responses without directly accessing the DBMS. (Morgenstern, 1987)

A technique that facilitates database inference is data mining, a method used to discover patterns within sets of data. Within data mining there are a number of useful tool is such as “data association”, which is a user-defined grouping of seemingly unrelated groups and elements. (Raman, 2001) Association rules are valuable for analyzing consumer behavior, and consist of antecedent (if) and consequent (then) statements, as in “if” a customer buys “x”, “then” they will buy “y”.

Another data mining tool is “aggregation”, which occurs when information is gathered and expressed in a summary format for statistical analysis, such as examination of mean, median, standard deviation, and other parameters.

Methods of Attack

Out of Channel

This is a particularly difficult inference vulnerability to protect the DBMS against, as much of the data that is acquired is from external sources. In this type of attack extensive use is made of freely accessible information sources, and using that data to perform inference of a secured database.

Indirect Attacks

This type of attack is accomplished by the use of intermediate results gleaned from aggregate mean, median, standard deviation, the use of the Sum, Count functions, or set theory.

Direct Attacks

This type of attack is typically conducted against a DBMS with poor security, such as inadequate MAC and DAC configurations. In the direct attack, queries that will elicit small responses are launched at the DBMS.

Logical Inferences

The logical inference is often considered a type of direct attack, but may designated as an indirect attack, dependent upon its’ level of complexity. This type of attack makes use of association rules, and the data mining strategies of apriori algorithms and clustering.

Statistical Inferences

This indirect attack utilizes aggregate data and mathematical and statistical analysis to derive inferences on numerical data or textual data sets. The textual data can be enumerated or represented as frequencies or counts, and this same statistical method can then be used to derive associations. (Hylkema, 2009)

Query Results

We will use statistical inferencing to extrapolate the unknown salaries of Alice, Bob, and Dan. To accomplish this, we utilize the salaries culled from the Java applet “Database Inference” of the various groups that we know they are a member of, and calculate the mean. In the case of Alice who is in the “Clerk, Support, and 3rd floor groups”, we will use the following figures:

All Clerks Avg. $34, 5000

All Support Avg. $35, 500

All 3rd Floor Avg. $35, 000

Thus to determine Alice’s salary, we would utilize the following formula: 34,500+35,500+35,000=105,000/3. Therefore, we can infer that Alice’s salary is $35,000.

Using the same methodology, we can deduce Bob’s salary. Bob is a member of “Admin, Sales”, with no floor designated which equates to: 38,500 + 52,625 = 91,125 / 2 for a statistical inference of a $45,562.5 salary for Bob.

Based upon Dan’s group memberships “Supervisor, Sales, Basement”, our calculations   68,333 + 52,625 + 68,333 = 189,291 / 3 produce an inferred salary of $63,097 for Dan.

Mitigation Methods

Suppression and concealing

In suppression, some query results are withheld by rounding, presenting a random sample or range of results. Similarly in concealing, data may be approximated, combined, rounded, or returned in a range or random sampling of results.

Random Data Perturbation

Random data perturbation functions by the addition of random degrees of erroneous data in response to the query request.

Partitioning

Partitioning consists of segregating data based upon its’ degree of sensitivity. This technique while highly effective in enhancing the confidentiality of our data does have a downside in the redundancy and complexity, which it introduces to the DBMS administration.

Polyinstantiation

This technique is utilized in multilevel DBMS to preclude inference. In it, data is classified based upon sensitivity ratings, and end-users are only able to access data that they have the requisite clearance for.

Query Controls

This inference prevention method is typically used to counter indirect attacks. The query control will process the incoming query, the resultant output, or perhaps both, and deny queries or results that do not conform to DBMS inference policies.

Preprocessing and Result Analysis

Query preprocessing occurs prior to query execution, and is used to prevent questionable queries. Conversely, query result analysis is performed after query execution, and is used to prevent dubious results from being too precise, particularly those that may have been missed by the preprocessing stage.

Query History Retention

Typically in query history retention, clustering algorithms are utilized to archive queries of users or groups to ensure that multiple queries are not being used perform inferences on classified data. Collecting information on groups can assist with the mitigation of collaborative inferencing, though it does require more system resources, and may generate false positives. (Hylkema, 2009)

References

Database Inference. (2012). Retrieved from http://prismhome.org/resources/vise-tools/database-inference

Hylkema, M. (2009). A Survey of Database Inference Attack Prevention Methods. Retrieved from http://met-research.bu.edu/met-ert/Internal%20Documentation/Inference%20Research/Michael_Hylkema_Research_Paper.pdf

Jajodia, S. Meadows, C. Inference Problems in Multilevel Secure Database Management Systems. Retrieved from http://www.acsa-admin.org/secshelf/book001/24.pdf

Morgenstern, M., Denning, D., Akl, S., Heckman, M. (1987). Views for Multilevel Database Security. Retrieved from http://faculty.nps.edu/dedennin/publications/ViewsMultilevelDatabaseSecurity.pdf

 Raman, S. (2001). Detecting Inference Attacks Using Association Rules. Retrieved from http://andromeda.rutgers.edu/~gshafer/raman.pdf

Rouse, M. (2011). Association Rules In Data Mining. Retrieved from http://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining

 

PHPMyAdmin

phpMyAdmin is a free and open source tool written in PHP intended to handle the administration of MySQL with the use of a web browser. It can perform various tasks such as creating, modifying or deleting databases, tables, fields or rows; executing SQL statements; or managing users and permissions.

Features

Features provided by the program include:

  1. Web interface
  2. MySQL database management
  3. Import data from CSV and SQL
  4. Export data to various formats: CSV, SQL, XML, PDF (via the TCPDF library), ISO/IEC 26300 - OpenDocument Text and Spreadsheet, Word, Excel, LaTeX and others
  5. Administering multiple servers
  6. Creating PDF graphics of the database layout
  7. Creating complex queries using Query-by-Example (QBE)
  8. Searching globally in a database or a subset of it
  9. Transforming stored data into any format using a set of predefined functions, like displaying BLOB-data as image or download-link
  10. Live charts to monitor MySQL server: activity, connections, processes, CPU/Memory usage, etc.

 

 

 

SQL Injection

 database security

SQL Injection

Description

In 2003, an analysis of the buffer overflow exploit caused it to be pronounced the vulnerability of the decade (Cowan et al., 2003).  Since that time, buffer overflow exploits have ranked in the top ten exploits of the Open Web Application Security Project (OWASP), National Vulnerability Database, and Common Weakness Enumeration / SANS list of Top 25 Most Dangerous Software Errors.

SQL injection is a code injection technique that can be used against MS SQL, MySQL, as well as other DBMSs. A buffer is a region of memory that temporarily holds data.  In a buffer overflow attack, a malicious program injects too much data into the buffer. This can cause errors, program crashes, and security breaches.

 Buffer overflow occurs when a program or process tries to store more data in a buffer than it was intended to hold. The program execution will cause the application to write beyond the finite constraints of a pre-allocated size buffer. The overflowing data will overwrite adjacent memory locations and may corrupt the valid data held in them, or execute new instructions on the affected computer that could, damage user files, change data, or disclose confidential information.

Mitigation. Due to the fact that buffer overflow exploit a vulnerability occurring in at the database layer of an application, a simple mitigation is to not allow unauthorized user-input to be  directly embedded into SQL statements.

Another mitigation strategy is to use a programming language that performs its own memory management, such as Java and Perl, or an environment like .NET which may diminish the impact of buffer overflows. Additionally, Cyclone C may be used to negate buffer overflows, and other related exploits. Other languages such as, C#, and Ada, which supports run-time checks to protect against access to unallocated memory, buffer overflow errors, range violations, and other bugs may also be used. Both programs will allow the checking functionality to be disabled by the programmer if need be, to enhance performance.

We should assume all user-input is malicious, thus use a whitelist of acceptable inputs that adheres to specifications. The use of blacklists or looking solely for malformed input cannot be depended upon, though blacklists may be used as a yardstick for attack detection.

It is also advised that we run our DBMS in a sandbox, and that we set boundaries between processes and the operating system. Additionally, we should run our DBMS with the lowest level of privileges necessary to perform our functions.

We may also consider using Data Execution Prevention (DEP)/ NX memory protection. This is common security feature included in, Windows, Linux, and Mac operating systems. Its function is to prevent services and applications from implementing code in a non-executable memory region, thus preventing exploits that store code via a buffer overflow.

References

Cowan, C. (2003). “Buffer Overflows: Attacks and Defenses for the Vulnerability of the Decade.

OWASP Top 10. (2010) The Top 10 Most Critical Web Application Security Risks.