Extract, Transform, and Load

The process of Extract, Transform, and Load (ETL) is the process of taking data from one or more sources, and sending it to a different destination. During the process the data will be converted to fit the new destination locations.

While ETL doesn’t get the attention of many other algorithms, it is a very important process that many businesses need to be able to perform as data comes in from many sources and has to be processed. This can be a scheduled process with data from outside sources on a daily basis or even more often, or a one time data conversion process when an organization moves to a different system.

The data extraction process may pull data from a single source, or multiple sources. The sources may be in the same format, or they may even be in different formats. When the data formats of different sources are heterogeneous, you must find ways to link the data together.

However the data often doesn’t come in a format that can be used natively, therefore it must be transformed. The data type and order often needs to be changed. This can be as simple as rearranging data, to combining or splitting data, or even converting a data format. Often data comes in via a string, or text data, that needs to be converted.

Since data might come from different sources, from different vendors. The data might come from a SQL database, JSON, XML, CSV, or other format. Your application will need to know how to load and process the appropriate files. Luckily there are tools to help you with the loading and processing.

Unfortunately, most ETL processes use a brute force methodology so traditional schools tend to ignore the process. However, in the transform step – there are often some unique challenges, that while simple for a single instance of data, makes you have to wonder when the data is run against tens or even hundreds of thousands of rows of data – let alone larger data sets. So learning how to optimize the process becomes very important.

The other key element in converting data, is knowing that your data is not always going to be correct. Whether data is missing, in an incorrect format, etc, there will always be a need for strong data error checking, and a system that finds a way to work around those hazards.

However, the downside of this is that because you are often working to get data into a system, each application is going to require a unique set of processing.

Once the data is converted you will need to load it into your system, or systems as data may need to be split into different sections to load the data into the systems.

Extract, Transform, and Load was originally found on Access 2 Learn

Walter Wimberly

Assistant Professor

Walter Wimberly is an Assistant Professor at a regional college in Tennessee, teaching Computer Science in the Software Engineering track. He works as a student advisor, oversees curriculum changes, develops new courses, and manages the advisory panel.
Walter taught full time for about 7 years, before going back into “industry” as a full stack Software Developer for a dozen years. There he focused on web based projects coding in JavaScript/jQuery and utilizing the Bootstrap CSS Framework on the front-end, and coding in PHP, ASP/ASP.Net, SQL on the back-end.

Since he loves teaching, he taught as an adjunct web and digital media classes for eight (8) years, while working in industry, and has since returned to teaching full time.

He has been married for over 25 years, and is father to several special needs boys. As such, he is working on some projects to help others who have special needs to be self-sufficient, and support the care givers of those with special needs. Check out his Autism blog for more info.

Algorithms | Programming

Searching Indexs

ByWalter Wimberly January 14, 2020January 21, 2020

Now the idea behind weighted search is nice, but it is complicated, requires several extra calculation steps, and can cause other issues. In a Database, you might have heard about indexing. The idea of finding a way to store a subset of your table which means faster searching. Of course, if you are looking through…

Algorithms | Programming

Weighted Search

ByWalter Wimberly January 14, 2020January 21, 2020

When working with a very large list of items, you might want to use a weighted search. One where instead of jumping to the middle, you try to get closer to your choice. Consider you had list with values between 1 and 10,000,000, and you were looking for an item with a value of 8,536,324….

Algorithms | Programming

Brute Force Search

ByWalter Wimberly January 14, 2020January 21, 2020

A brute force method of searching is slow and inefficient from a run-time perspective. However, it is relatively fast to write. With a brute force search, you scan through every item in a list until you find the element. As you can imagine, scanning through every item can be a time consuming process, especially if…

Algorithms | Programming

Binary Search

ByWalter Wimberly January 14, 2020January 21, 2020

If our data is presorted then a binary search starts to make a lot of sense. It is like the guessing game we mentioned earlier where you need to find a number between 1 and 100. No one asks for 1 then 2 then 3… until they find it. Instead we ask for the mid…

Algorithms | Programming

Similar Posts