Pinterest Report June 2023 | Page 12

PINTEREST
A next-generation data warehouse One of the number of changes made in Pinterest ’ s data systems involves the building of a next-generation data warehouse and the transition to a Data Mesh : an emerging approach to data architecture that aims to address the challenges of managing large and complex data environments , which was first introduced by Zhamak Dehghani – a software architect at ThoughtWorks – in 2019 .
“ At a high level , Data Mesh is a decentralised data architecture that emphasises data ownership and autonomy ,” Burgess explains . “ Rather than having a central data team manage all the data for an organisation , Data Mesh encourages each business unit or team to take ownership of their own data domains , managing their data in a way that is best suited to their needs .”
This approach involves breaking down data into smaller , more manageable domains that can be owned and managed by individual teams . Each team is responsible for the data within their domain , including defining the schema , ensuring data quality , and providing access to other teams that need to use the data . To enable collaboration and sharing across domains , Pinterest has a catalogue of schemas and metadata stored in Apache DataHub , has standardised its data vocabularies and metrics , has tiered the quality of its data , and has integrated its open-sourced Querybook platform to collaborate and share SQL queries .
“ Querybook is an open-source data collaboration platform developed by Pinterest ,” Burgess explains . “ It has a userfriendly interface for data analysts and engineers to collaborate on data analysis
12 pinterest . com