top of page
Search

(whitepaper) A universal data model for building personalised applications

Creating a complete data-warehouse for both business analysts and data scientists to discover insights and build a fully personalised user experience is an ambitious goal, often requiring more capacities than what a start-up or any fast-paced environment would ever offer. Between building derived fields and maintaining legacy trackers, the efforts of a data team would therefore concentrate on working out the best architecture trading off data-user experience and maintenance costs.  In this article, we will present a minimalistic data model describing key interactions between users and products that will serve as the single source of truth for both standard reporting and the creation of data-driven products. From targeted acquisition to conversion and re-engagement, meet the hacks that will shift the user journey with minimum engineering resources.


The challenge lies in building the simplest yet most comprehensive data model that will serve for the three main types of data mining activities: static reporting, ad-hoc analysis and data products. Having a replicable data model also contributes to develop a standard across the company or the industry, fostering future data exchanges between parties.



At a high-level, most online businesses can be described as people (users) interacting with goods or services (products). Typically, product details will be stored in an ERP or similar backend system, together with static user data. Interactions are recorded via events trackers such as Google Analytics, Mixpanel, Snowplow Analytics. The ultimate goal is to centralise and sanitise those datasets, in order to obtain comprehensive datasets for future analysis.

Keeping the interaction table as atomic as possible is key. Two foreign keys to the product and user dimensions, maybe a category and subcategory for classifying interactions. As for products and users, go wild! The more attributes, the better, as long as they are mostly immutable.




Starting with acquisition, conversion and retention KPIs, this model offers a solid base for target metrics reporting. The use of SQL and an open-source visualisation tool (Redash, Superset, Metabase) will unlock first trends and enable a precise monitoring of each feature.


Mining data further with Python, data scientists will be able to create services for profiling and delivering the personalised message to each user. For instance, applying clustering on user profile and engaging events, data scientists will be able to derive user personas, for understanding different types of audiences and adapt marketing strategies or even offer specific features. After acquisition, collaborative filtering will help to create appropriate product recommendations for both up-selling and retention. Finally, leveraging a reporting interface like Redash or even Google Data Studio, one will prioritise customers to be re-engaged and guess the next best suited product.


Although one could elaborate further on technical setups variations and best practices in terms of storage and data processing, data modelling is a conceptual task needed long before going into machine details. Your first step ? Identifying and locating relevant products, users and interactions attributes and precisely define performance metrics goals for guiding the development of models and data driven applications.

Bình luận


bottom of page