#280 Enabling Your Domains to Create Maintainable Data Products - Interview w/ Alexandra Diem, PhD
Manage episode 391445472 series 3293786
Please Rate and Review us on your podcast app of choice!
Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn.
Transcript for this episode (link) provided by Starburst. You can download their Data Products for Dummies e-book (info-gated) here and their Data Mesh for Dummies e-book (info gated) here.
Alexandra's LinkedIn: https://www.linkedin.com/in/dralexdiem/
In this episode, Scott interviewed Alexandra Diem, PhD, Head of Cloud Analytics and MLOps at Norwegian insurance company Gjensidige.
Gjensidige's approach closely aligns with data mesh but they are starting with a focus on consumer-aligned data products as they have a well-functioning data warehouse and are not looking to replace what isn't broken.
Some key takeaways/thoughts from Alexandra's point of view:
- Advice to past data mesh self: stop talking to people about data mesh, talk to the changes in the way of working. It can be very tiresome to try to explain data mesh instead of those changes. Data mesh isn't the point.
- There aren't really any reasons we can't apply many software engineering best practices to data, it's simply we haven't done it broadly in the data world.
- There is a push and pull between software best practices and data understanding. Consider which you see as more important and when. Do you bring data understanding to software engineers or software best practices to those with data understanding.
- When you leverage pair programming between enablement software engineers and data analysts that understand the domain, the software engineers learn more about data and the domain and the analysts learn good software engineering/product practices. It's a win-win.
- The people you enable to do work in a data mesh way should serve as ambassadors of your ways of working, especially within the domain. Both helping others learn and as champions. That provides organizational scale. You can't individually enable every person in a large company.
- "Too many cooks spoil the broth." Think about having that 'two pizza team' kind of approach so you have concentrated understanding by those involved in creating data products who then can again help others learn. This is good for those in the domain and also for an enablement team bringing learnings back to a platform team.
- Having a team with intimate knowledge of what data products/data product features have been built can speed time to market for other teams and improve reuse. Each time they sit with a team, that new team has far greater access to what's been built before, whether that is existing data sources, existing models or transformations, output ports, etc.
- ?Controversial?: With a central enablement team, your job is more to teach the domains how to do the work, get them to minimum viable data products. Otherwise, that central enablement just isn't scalable in a large organization.
- ?Controversial?: A perfectly filled data catalog still won't connect all the dots for consumers. Yes, good documentation is important but there still is a significant value in helping people connect the dots. Scott note: this shouldn't be controversial but is. It's also my 'data sherpa' pattern emerging yet again as highly valuable
- If you can, make sure you have a shielding and prioritization mechanism for any central team or you can head back down the overloaded central data team as a bottleneck pattern/challenge.
- As anyone in the organization, your ultimate role is value generation. Consider how the data teams do that. If it's an enabling team, it's helping teams to do data work quicker and better. Those teams don't care that you're doing that via data mesh.
- Relatedly, terms like data product and self-service platform resonate far more than data mesh. Lean in to what generates value, not the implementation details behind the scenes. Potentially read The Lean Startup to dig deeper into this philosophy.
- Data reuse is not actually that obvious of a concept to many. Probably because it's meant so much cleaning and manual work in the past from finding poorly owned data sources or processes. Train your domain teams to look for places to reuse what has come before them.
- !Controversial!: Potentially look to build out your data products from a source of already clean data. That may be an existing data warehouse or something centrally managed. Scott note: this is a data mesh end-state anti-pattern but is it an anti-pattern when in transition? If something isn't broken, do you need to 'fix' it?
- Relatedly, "[I] don't really see the point of having to destroy value before I should be able to generate new value. I can very happily just generate value on top of the value that I already have."
- Hypothesis testing and fast fail are great software engineering practices but in data 1) it can be hard to hypothesis test value and 2) it can be quite hard culturally to get people to learn to iterate and embrace fast fail.
Alexandra started with a little about her background coming from academia into the commercial world and how that shaped her views of things. She was the first data scientist at a company so before she could really do data science, she essentially had to work as a software engineer; that meant learning many good software engineering practices. When she moved to data, she thought 'we should use these practices here too' and then she also came across Zhamak's posts on Martin Fowler's site and it all started to click.
Specifically at Gjensidige, Alexandra was brought in to lead of team of software engineers acting as an enablement team plus a platform team. Their role was to focus on bringing these good software practices - e.g. DevOps, automation, testing, etc. - to the data/business analysts to help them build data products. It has evolved to be more sophisticated but the team is still about enabling people to build data products.
Gjensidige already had embedded analyst teams in many of their domains so when Alexandra and team started to roll out the data mesh implementation, there were already a number of data-capable folks who understood the actual business aspects of the domains. That meant there wasn't the typical pushback on the domain actually owning their data products, it was more about enabling them to do so and build a maintainable and scalable product. This process of pair programming between her team of software engineers and the domain data experts means her team becomes more and more data fluent while the domain learns how to write good software code. They specifically leverage a model of two of her team and two of the analysts in a data product creation team to provide enough information exchange but not too much overhead. That intimate understanding of what has been created also helps her team to help find reuse in other domains - they more deeply understand what has been built and can direct teams towards it quickly. That speeds time to market as well for the new team. Lots of wins all around!
When looking at the central enablement team's strategy, Alexandra strongly believes in a minimum viable data product approach. Her team only has a handful of people and they have 25 analyst teams to work with. The team has to focus on getting each analyst team to capable via the first data product - again with only two analysts on the team - and then letting those two analysts propagate the knowledge to the rest of their own teams. Otherwise, the central team would be too overloaded. So again, the focus is on teaching the analyst teams how to build good data products and then moving on. Otherwise the central team just isn't scalable or you have so many people in the central team that it becomes far harder to find patterns and share information. The domains have to deliver value themselves so teaching them to do so and then moving on is a sustainable strategy.
When communicating with the rest of the organization, Alexandra rarely uses the term data mesh. She points to data product and self-service platform as things that resonate with people and help communicate what she's actually focused on doing: generating value. Most people don't care that the way you are generating value is data mesh. It's simply a mechanism. 'Lean' into that. Scott note: lean is a bad pun here because she mentioned how helpful The Lean Startup is to focusing on value generation.
One very interesting note Alexandra talked about was training the domains in reusing data. Historically, it's been very difficult to reuse data because you didn't have the information about how it was created and didn't really have a reliable source. Getting the spreadsheet from a colleague each month isn't that reliable 😅 so, you will likely need to train your domains on reuse, especially finding sources to reuse and how to see if something fits their purpose. That can be the producing team too, teaching them how to share what they've built to other parties that might want their data.
Alexandra noted that most of the data products they are building use an existing clean and well understood data source: the cloud data warehouse. They are leveraging a hub and spoke pattern from that warehouse for their products. People already know and trust the warehouse so it made sense to them to start there. Essentially, everything ends up as consumer-aligned data products in a sense. Relatedly, for Alexandra and team, they don't see a need to adhere to every aspect of data mesh, especially at the start of their journey. She said, "[I] don't really see the point of having to destroy value before I should be able to generate new value. I can very happily just generate value on top of the value that I already have." They had some things that were working well already and breaking it all down to fit the paradigm didn't make sense to them. However, she is aware of the additional challenges this can bring and made the conscious trade-off.
Scott note: this is an obvious data mesh anti-pattern because the upstream isn't directly from source systems - the teams building the data products don't control the source or their source-aligned data products. But if you don’t have an existing bottleneck from your cloud data warehouse, why fix something that isn't broken? This may become a bigger challenge later - Zhamak has written why not owning source data creates challenges - but if they are willing to take the tradeoff and understand those tradeoffs, is it a bad approach? I don't think so _in their case_ because the data warehouse is functioning well/isn't a bottleneck.
Alexandra talked about how to really embrace a culture around 'minimum viable x' in data. In data science, at least there is a good understanding of hypothesis testing but even then, it's often hard to embrace the necessary 'fast fail' model touted by things like The Lean Startup. Trying to understand how to hypothesis test value is also difficult and people have historically seen the challenges in iterating on anything data related. So there is a learning curve but also generally a necessary cultural change to embrace hypothesis testing and fast fail around data.
On advice to her past 'data mesh self', Alexandra gave a reasonably common response, circling back to an earlier point: stop talking about data mesh, at least early in the process. Data mesh is a set of guiding principles, not the answer. Talk to people about changes to their ways of working and target outcomes. Why are we taking on change? People hear data mesh and expect it to be some technology or technological approach. You can use the name when people ask for what you're calling the approach but selling it as doing data mesh doesn't help your business partners get it. It becomes a much more tiresome approach to specifically focus on data mesh instead of the ways things change and what matters to them.
Learn more about Data Mesh Understanding: https://datameshunderstanding.com/about
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
422 פרקים