While Data scientists focus on mastering algorithms and code, domain expertise often takes a backseat. Taking examples from logistics and supply chain, I will argue that domain knowledge is critical for success of Data Science projects and I will provide ideas on how to upskill your domain expertise
Back in the early days of the Data Revolution, the Data Science Venn diagram emerged as a simple but straightforward framework to understand which shape a wannabe-data-scientist should have: a strong Math/Stats background with Coding skills augmented with some purpose-specific Domain expertise.
Years later, after working hard to learn about Machine Learning (i.e., learning to use something I cannot fully grasp) and Software Engineering (e.g., agonizing over what to name a function), and putting in production a fair mix of well marketed “failures” (fancy code and models that are not used or do not have measurable impact) and silently effective “successes” (boring or simple ideas that actually work), I find myself reflecting on that diagram. I realize that I may not have paid enough attention to effectively acquiring the domain expertise needed to tilt my results toward positive outcomes.
In this talk, I plan to share concrete experiences from my work as a Data Scientist, primarily in the Supply Chain and Logistics domain, and offer insights into the key concepts and techniques for learning about a domain. I will also discuss how domain knowledge can impact data science practice, exploring questions like: Why is knowing the data generating process important while doing data exploration? How do improve my domain knowledge? Why do users struggle to trust and use my carefully crafted model?
The talk will consist of three parts:
This talk is partly inspired by the broader idea that Software Engineering best practices can often be effectively translated into good practices for Data professionals, and specifically by the concept of Domain-Driven Design, which has successfully inspired the Data Mesh concept in Data Engineering.
Data Scientist who loves programming (languages). People driven. Trying to focus.
Passionate about Tech Communities and Open Source. I work at AgileLab and help organize Python Milano and PyData Milano meetups.