In this blog, we’re discussing how the “cloud” has moved on to become the biggest revolution in the Analytics landscape in a long time of Data-Warehousing and Analytics and how Snowflake have embraced multi-cloud with today’s announcement of the addition of the Google Cloud Platform scheduled for General release later this year.
The term Data-Warehouse has been around in various forms for several decades now; with various software stacks growing up around provision of information in a palatable way, delivered to many users. The idea is to isolate an organisations analytics workloads from the day to day processing of business systems such as ERP’s, CRM’s or streaming systems – unlocking the potential of the data that’s being captured.
The different components that are involved in Data-Warehousing have changed somewhat over that time, and it’s very true to say, that we live in a much more data-savvy world now as a population than ever before, with many different sources of data that we want to analyse, some of which would not have been naturally envisaged those many decades ago. With this, expectations have changed with organisations looking for their business people to be in ever greater control of using this data.
Cloud has become a major disruptor of infrastructure and software components, and a big propulsion in the data explosion has been facilitated by the gamble that Amazon made on “Cloud computing” back in 2006 when it created AWS, and the subsequent adoption of the same approach for Microsoft with Azure and Google with the Google Cloud Platform. These cloud ecosystems have given us massively scalable, and almost infinite processing and storage capabilities, often in a service orientated way, allowing ecosystems of connected technologies.
Each of the cloud providers give us many services which do similar things but in different ways; often in their initial adoption phase businesses embracing the cloud will choose a main cloud provider and work that that providers products.
Increasingly, businesses that adopted early are now maturing to a point where having elements from multiple vendors is becoming more important. There are various reasons why they might do this, ranging from using the best of services available for given tasks in their Organisation through acquiring different cloud technologies via company acquisitions, to avoiding being locked-in to a single cloud vendor.
Clouds have effectively become Business-Centric Operating Systems which are used to house your business processes and technologies.
Each of the Cloud providers have at least one Relational Database System which you could use for your Data-warehouse – these Databases are not necessarily geared towards Data Warehousing and are often ports of the vendors traditional on-premise database solutions. They also have different ways of getting moving data between systems…
In the same way that some of the best desktop software in use today in use on non-cloud IT estates is multi-platform (so across multiple Operating Systems such as Windows, Linux and Mac OS), giving users the option to implement the solution on their platform of choice, so we should be looking for major components of our analytical infrastructure to have the capability of working across multiple clouds.
This is where Snowflake is way ahead of the competition (being a architected from the ground up for the cloud, it is closely aligned to cloud strategies and allows it to be slotted into an organisations architectural landscape easily) – the recent announcement that it will soon be available on the Google Cloud Platform as well as being currently available on AWS and Azure means that investment made in Snowflake is now completely compatible across all three cloud providers! This is really important, because a data-warehouse is a central repository of information that should be source-system independent. Snowflake provide easy to use services which allow for the transportation of data either between instances on the same providers cloud solution, or even between different clouds.
There is much discussion about where Data should be stored and retrieved, from an analytics viewpoint. One thing is for sure, that in the modern era, there’s an awful lot of data floating around that might be useful.
Whilst Data-Lakes are a prime example of storing masses of data in cloud storage and retrieving them via different means, the next question is – what happens if we have data that is being generated across multiple cloud solutions or in different geographies? Do we move all this data to a single Data Lake or leave it in the different clouds where it already exists?
We believe that the answer is that Snowflake is the best place to house or query the data that’s required for at least mainstream analytics activities, where Snowflake allows larger numbers of Analytics Users to utilise the well known and well understood SQL syntax that Transformation, Business Intelligence, Machine Learning and Artificial Intelligence tool-sets that are available today.
This is a much more manageable approach than using the various vendors database solutions, or learning their various storage querying technologies, with their various differences, as Snowflake effectively abstracts the underlying cloud solution from the users . This means that only one technology needs to be understood for your analytics and better still it’s a very low maintenance solution compared to a lot of the traditional Data-Warehouse platforms with their heavy configuration and ongoing maintenance requirements.
With this weeks announcement of Global Account Management for Snowflake, multiple instances can be utilised that are closely aligned to the original sources of data, whether across different regions on the one cloud, or across multiple clouds and geographies.
The “Satellite” Snowflake instances can then be used to feed in to a “Hub” Snowflake instance on a Cloud of your choice if central consolidation is required.With the new features announced this week, including database replication across instances, this means that moving data between cloud instances that you want Snowflake to run on, is just a matter of replication (and changing any data movement jobs to point to the different instance) – no other work required from a “downstream systems” perspective except perhaps a bit of reconfiguration, making centralisation of data even easier!
For organisations that don’t have a data-lake solution already, you could look at using Snowflake itself – it could be used as both a central Data-Lake and a central Data-Warehouse , because of the native support for Semi-Structured data, and the built in Governance and Security, it’s a very appealing option.
The last benefit we’ll mention in this short blog, is that having the capability to replicate and use Snowflake instances as a centrally managed service, across multiple cloud solutions gives you additional redundancy and fail-over options between cloud solutions, which means that outages on one cloud can be circumvented with other instances, though this is a rare thing, it’s not unknown to happen!
To find out more, talk to us at Leading Edge IT!
©Leading Edge IT, all rights reserved, this material may be referenced freely but in its entirety and must not be reproduced without prior consent.
#leit | #leading edge it | #snowflake | #datawarehouse | #gcp | #aws | #azure | #hub | #spoke |