Let’s go back to the beginning and the birth of ETL tools, back in the 90’s when Data Warehouse become mainstream. Data had to be extracted from source systems, transformed and then loaded to a database to allow Analysis and Reporting.
The birth of ETL tools arrived with tools like OWB Oracle Warehouse Builder, IBM DataStage, Cognos DecisionStream (became IBM Data Manager), Informatica and Ab Initio.
I believe that the first ELT tool with heterogeneous capabilities was Sunopsis (Oracle Acquired that in 2006) and renamed ODI, and coincided with the rise in more powerful DW Appliance Databases like Oracle Exadata.
With ever increasing volumes, variety and velocity of data, it made sense to move the compute to the data. Hadoop was a platform that promised expandable compute and capability to address large storage pools, so became the go-to place for “Big Data” projects.
ELT is back with a vengeance. What’s Changed? The Next Generation of ELT tools like Matillion is built for Cloud-based data warehouses like Snowflake, Google Big Query and AWS Redshift. Removing the legacy needs to perform Data Crunching within the Application layer and fully utilsing the underlying cloud databases to pick up the load (computations) and not requiring the data to leave the database, eliminating latency issues.
Another complementary capability to your data warehouse is the ability to stream data to allow real-time analytics combing data sources to enable your business to make decisions quickly to help drive efficiencies or sales via real-time marketing. You can read more about real-time replication and how it can work with your ELT tool in our blog: Replicate your way to fast results, using HVR is a cost-effective way to replicate your data. Something we have coined as RLT – Replicate Load Transform, blending Replication with ELT.
This is a huge subject and worthy of its own Blog. But without lineage and rich metadata, audit and log information, it makes the Governance for things like GDPR an even more significant challenge. Using tools like Matillion ensures you get that information easily with the ability to navigate the metadata within the tool or using the HTML generated documentation of your project or by using the API to connect Data Governance tools like Collibra.
With the Rise of DataOps, the need for the easy deployment of changes using tools like GIT is essential and is supported by ELT tools such as Matillion. Gone are the days of Quarterly / Monthly releases to the Data Warehouse. Modern days requirements for agile working requires building deployment, pipelines and regression testing capabilities allowing regular changes to accommodate source systems changes or new data sources or data-marts while ensuring the existing data’s integrity. Using Snowflake Cloning capability makes this an easy task, something we cover in a different blog: How is Snowflake so much better for development cycles than traditional databases used for Data Warehousing. We plan to cover more on this in a future blog.
So how do you get to this new world? This is something that we have covered in our blog: Transitioning from a legacy analytics platform to a modern insights architecture and something we, at Leading Edge IT will be happy to discuss with you.
With the era of Cloud-Based Data Platforms like Snowflake, it makes sense to utilise the scalable compute they provide. But if you go the hand-scripted route, you will be missing out on the benefits of using ELT like tool Matillion and the TCO of your platform will be more in the long run. You can now get all the benefits that the Original ETL tools provided at a fraction of the price with a Cloud-Based ELT tool like Matillion. The hand-coding vs ETL debate has been around since the early 2000’s, and lost some focus during the Hadoop “bubble”. But with the combination of HVR for Replication, Matillion for ELT and Snowflake for your Datawarehouse, providing the capabilities of Multi-user, Multi-Environment and scaling flexibility, you can automate and re-use templates with shared components, there should be no reason to hand-code a complete solution.
Why is all of the above important? We love data, analytics, software, and enabling the business to do smart things. It’s frustrating hearing of failing solutions implemented, or programmes not capitalising on rich data sets when it is relatively simple to do – if you know what you’re doing. You don’t need the Budget of Facebook® to make your business Data-Driven, talk to us about our DAaaS Data Analytics as a Service or our Accelerator Templated Solutions.