In this blog, we’ll cover how replication can fit very well within the overall architecture for information and analytics delivery.
The pre-cloud generation of Transformation tools typically used ETL (Extract, Transform and Load) for applying business rules and transforming data from source systems into analytical data-sets, that then lived in the Data Warehouse. ETL worked very well in pre-cloud environments where multiple systems were all on the same Local Network, typically with fast connections in the same data-centre.
Things have changed since the dawning of the Data Warehouse. Where once we were dealing with typically user driven content generation for Customer Relationship Management systems (CRM) or Enterprise Management Systems (EMS), there has been an explosion of additional data sources and data points, which didn’t exist when the Data warehouses started to appear on organisations architecture.
Once HVR has delivered the replication part (using a very light-weight pull from source systems, where logs or CDC are used, meaning that the source system is barely impacted, and things like data-locking are avoided, as the data is not being pulled from the live data, but instead from the CDC or log capture). This makes it very feasible to get the continuously) in to Snowflake, where the data can then be transformed by ELT, in this case Matillion.