Dr Philipp M Diesinger has occupied key roles in data science since 2009. His career includes a post-doctoral position at the Massachusetts Institute of Technology, Data Science Consultant at SAP and Head of Global Data Science at Boehringer Ingelheim. Philipp’s specialisms include Predictica Analytics and machine learning.
In this post, Philipp addresses the issues companies face when scaling minimal viable products (MVPs). Philipp argues this process needs to be holistic, and explores the many factors a company must consider in order to industrialise their MVPs successfully:
Over the past decade, the ability to quickly prototype Data Science projects has advanced significantly. However, industrialising and scaling such minimal viable products (MVPs) remains a highly complex process, involving a multitude of stakeholders and significant resources. For organisations, decisions on which MVPs to industrialise, and perhaps more crucially, which ones not to, have become increasingly important.
Industrialising Data Science MVPs requires a holistic approach that considers not only technical aspects, but also the business and regulatory requirements of the organisation. To ensure successful deployment and achieve the desired business impact, several key factors must be considered. The decision whether to industrialise an MVP can be viewed from two perspectives:
TECHNOLOGY FACTORS: such as platform and architecture considerations
PEOPLE FACTORS: including change management, adherence, and general acceptance within the organisation.
Successfully navigating these factors is critical to the effective industrialisation and deployment of Data Science MVPs.
TECHNOLOGY FACTORS
Scalability: the MVP may work well on small data sets, but as it is scaled up to production-level data, it must be able to handle increased volumes of data and computation. Distributed computing and cloud-based services can be considered for scalable and efficient processing.
Integration: the MVP will need to be integrated with other systems in the organisation’s technology stack to be useful. Consider API design, data schema, and data formatting requirements to ensure smooth integration with other systems. Integration complexity can be a significant challenge and often requires a thorough upfront assessment to not run into problems later.
Cost: industrialising a Data Science MVP can be expensive, especially when considering hardware, software, and personnel costs. Early transparency on costs can be a valuable determining factor to decide whether and how to industrialise. Consider cost optimisation strategies, such as using open-source tools and cloud-based services, to keep costs under control.
Robustness: the MVP may work well on ideal data, but it could encounter unexpected data scenarios when exposed to real world conditions. Ensure that the model can handle noisy, missing, and out-of-distribution data, and consider building in fault-tolerance mechanisms. Think about how the solution may react to significant shifts in input data and how to detect and deal with these.
Security: data is a valuable asset and must be protected from unauthorised access or manipulation. Consider which measure to implement secure access controls, encryption, and data anonymisation techniques to protect sensitive information.
Monitoring and maintenance: Data Science models are not static, and their performance may degrade over time due to changes in data distribution or other factors. Consider establishing a monitoring and maintenance system to ensure that the model is performing as expected and retraining the model as needed.
Regulatory compliance: depending on the industry and the nature of the data used, regulatory compliance may be required. Ensure that the MVP is compliant with applicable regulations, such as GDPR, HIPAA, or PCI DSS.
Data management: the industrialisation process requires a robust data management framework that ensures data quality, security, and compliance. This may involve changes to existing data management processes and the adoption of new tools and technologies.
Technology infrastructure: the technology infrastructure required for the MVP may not be sufficient for industrialisation. It’s essential to assess the current infrastructure and identify any gaps or limitations that need to be addressed.
PEOPLE FACTORS
Organisational culture and change management: it is important to ensure that the organisational culture supports the adoption of new technology and data-driven decision-making. Data Science initiatives may require changes to existing business processes, so it’s essential to have buy-in from all stakeholders.
Communication and transparency: open and transparent communication is essential to ensure that all stakeholders are aware of the progress of the project and any issues that arise. This can help build trust and ensure that everyone is working towards a common goal.
Leadership support: strong leadership support is necessary to ensure that the resources and budget required for industrialising the MVP are allocated effectively. Leaders should communicate the importance of the project and provide guidance on how the organisation can support the initiative.
Skills and expertise: Data Science MVPs may require specific technical and business skills to industrialise effectively. Identifying the required skills and expertise early on can help organisations assess their talent gaps and provide training or recruitment support where necessary.
Stakeholder engagement: the stakeholders involved in the project, such as business users and IT teams, should be engaged early in the process to ensure that their needs are considered. Stakeholder feedback should be incorporated into the design and implementation of the solution.
Talent management: Data Science initiatives require skilled professionals with a mix of technical and business skills. The organisation should ensure that it has the talent required to industrialise the MVP and support ongoing Data Science initiatives.
Governance and control: industrialising the MVP requires robust governance and control frameworks to manage risks, ensure compliance, and maintain data quality. These frameworks should be developed in collaboration with all stakeholders and should be scalable to support future Data Science initiatives.
User adoption: the success of the industrialisation process depends on user adoption of the solution. It’s essential to involve users early in the process to ensure that their needs are considered and that they have a sense of ownership over the solution.
Collaboration and teamwork: industrialising a Data Science MVP is a collaborative effort that involves multiple teams and stakeholders. Effective teamwork and collaboration are critical to ensure that the solution is designed, implemented, and maintained to meet the organisation’s needs.
The decision to industrialise does not have to be a binary go/no-go decision. If only a few critical factors are missing, additional development time or testing can be invested, allowing the organisation to reassess the scaling decision at a later point.