Extract-transform-load (ETL)

An extract-transform load (ETL) tool typically performs the process of extracting, transforming, and loading heterogeneous data into a data warehouse.

Shown below, is the process of extracting the heterogeneous data of the transactional database of the retail outlet, the gasoline station outlet, and the customer records database. These very dissimilar data sources are transformed and loaded into the data warehouse, where a business decision may be made. If need be, the data may be further extracted, and sent on to a corporate or in-store data mart. By sending the data to the data mart, specific aspects of the data may be analyzed, due to data marts being tasked with analyzing a specific product line. This contrasts with the data warehouses, which generally is concerned with the data of the entire enterprise.

To re-iterate, in computing, Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that:

  • Extracts data from homogeneous or heterogeneous data sources
  • Transforms the data for storing it in proper format or structure for querying and analysis purpose
  • Loads it into the final target (database, more specifically, operational data store, data mart, or data warehouse)

Usually all the three phases execute in parallel since the data extraction takes time, so while the data is being pulled another transformation process executes, processing the already received data and prepares the data for loading and as soon as there is some data ready to be loaded into the target, the data loading kicks off without waiting for the completion of the previous phases.

ETL systems commonly integrate data from multiple applications(systems), typically developed and supported by different vendors or hosted on separate computer hardware. The disparate systems containing the original data are frequently managed and operated by different employees. For example a cost accounting system may combine data from payroll, sales and purchasing.