Transitioning from a legacy analytics platform to a modern insights architecture

It's all about the Data

It’s all very well having great ideas, well-defined business outcomes, the latest cloud modern data stack services, products and tooling. However, if you can’t describe your data, articulating the business priority, and it’s business value, it makes it challenging to capitalise on making the data available.
Include the initial challenges around getting at the data for modelling, sourcing, issues with timeliness; the result is that often we can find it challenging to transition to a new architecture.
 
It brings challenges in shaping the direction of a programme, managing stakeholders, delivering against timelines and expectation, and delivering within budget. Ultimately it means unhappy data scientists, analytics users, shareholders and can lead to failure of a data platform.
 
How do you go about tackling the challenges to unlock the value in your data and enable the creation of great data products to service data science and analytical needs? The problems aren’t new and have resolved in countless programmes.

There are formal methodologies which will explain how to perform a data migration (check DAMA), however, this series of points based on my couple of decades experience may help accelerate your journey:

 

Business Perspective

  • Understand the business outcome to achieve
  • Understand the value associated, whether financial or as an enabler for the rest of your programme
  • Agree and prioritise the business outcome with business stakeholders
  • Invest time in data architecture and building out a glossary/data dictionary that provides a common vocabulary that the business understands. It can be mapped to technical implementations when required and doesn’t have to be performed upfront for the entire organisation before starting. I’d recommend doing it for the domain where the business outcome is a priority
  • Understand your data, the “golden source” of the data and the agreed published source that consumers should use including as a source for the new data platform
  • Think about reconciliation that might be required and how we know we’ve been successful servicing business outcomes from the modern data platform, i.e. what testing is needed? (definition of done?)
  • Understand if the legacy analytics platform allowed adjustments. If yes, whether these were made at any point in the flow from the mastering of data to consumption? Consider how adjustments were performed – what could cause us a problem in ensuring or figures are accurate and getting our values signed off?
 

Technical Delivery for the Organisation

  • Agree principles you can measure against to verify your solution
  • Create a view of the architecture based on capabilities (including how we’ll deal with Security, PII, data quality, lineage, audit and logging) – this becomes our map and compass heading. Depending on whether you’re delivering agile, there are different points to do this – or you can pick a reference architecture off the shelf
  • Try to keep the solution simple; there’s a question over the trade-off between an excellent solution and getting data to the business within timelines. This data could enable them to make decisions that might keep the organisation competitive in an aggressive market
  • Define reusable patterns allowing the capabilities to be realised, including for ingestion pipelines. As part of this, you will be able to agree and build out the technical reference model
  • We typically work with highly skilled engineers who have different outlooks from client site to client site. It is essential that their views are addressed and that the engineers can influence the technical reference model
  • There’s also a point about the organisation wanting to deliver their new platform for the future with supportable skills – the result is an architecture that works for the organisation
  • Think about what patterns we might have if we want to get data out of the platform. Ideally, we wouldn’t need to move the data but would take processing to where the data resides and return a reduced result set with the required answers
  • Perform any PoC’s or PoV’s needed, starting small and breaking up any work into manageable chunks. One approach is to deliver a single subject area from the source to its consumption point via a presentation layer – by doing this, we get to prove out and verify the capabilities
  • Make data available as soon as possible, e.g. to analytics users and data scientists. This might include modelling out a presentation layer structure depending on the consumer to verify that we’re going to build the right thing to meet the requirement
  • If successful, we can scale this model out, with a higher number of teams. The approach can be driven from the data model (and any other artefact used for planning, e.g. the bus matrix if we’ve created one)
  • Try to agree and minimise the number of points where application of business rules are performed (and therefore transformations/calculations are implemented)
  • Ensure you can get at all technical metadata – you’re going to need this to help explain what’s happened
  • Get the business involved as early as possible, providing feedback on data that is being made available via the new platform.
 
The Most Important Part
 
Most importantly, ensure you’re frequently communicating with the entire team and take the time to listen to challenges on anything and everything. This includes what I’ve written! There are always smarter, more efficient ways to do things, but it’s an approach that will get you thinking regardless of whether it fits 100% !!
Why is all of the above important? We love data, analytics, software, and enabling the business to do smart things. It’s frustrating hearing of failing solutions implemented, or programmes not capitalising on rich data sets when it is relatively simple to do – if you know what you’re doing.
 
Contact Leading Edge IT to see how we can help you in your Data Journey to the Clouds.

©Leading Edge IT, all rights reserved, this material may be referenced freely but in its entirety and must not be reproduced without prior consent.
#leit | #leading edge it | #snowflake | #hvr | #matillion | #DataJourneyToTheClouds| #datawarehouse