Frank Stadler is a senior data leader, mentor and advisor. With 15 years experience in software engineering, he’s consulted for BMW, developed RFID-software for industrial gas companies and has led a team developing medical software that decreases mortality for thousands of chronically ill people. About 6 years ago Frank shifted his focus to data science. Ever since, he’s been delivering custom data science solutions to many projects.
In this interview, Frank discusses how SMEs can best take advantage of AI and machine learning. All businesses should consider adopting AI in today’s data-driven climate. But what steps should smaller enterprises take when embarking on their AI projects, and what pitfalls do they need to avoid? Frank offers his insights:
What are some common misconceptions small and medium enterprises have about AI and data science in general?
There are two common misconceptions: one is that AI and data science are optional and don’t really generate advantages for the business. Many SMEs haven’t realised that AI is going to be crucial for success.
The second is that many companies, especially SMEs, think you need lots of data to be able to generate value from AI and data science projects. And that’s not accurate: the data needs to be good, but you don’t need to have big data.
That’s an interesting point, because the conventional wisdom is that machine learning models, for example, require lots of data to train them. Can you elaborate?
It depends on the amount of data, and also the kind of data that you have. If you’re doing forecasting, for example, you need historic data, but that doesn’t necessarily need to be big data. You could just have a couple of million rows and you’ll be able to achieve very good results with classical machine learning.
It’s true that if you’re using deep learning models, then you’ll probably need more data. But machine learning models can work with smaller data sets and generate good value from them.
Are there any other parameters, in terms of size, that are the minimum guidelines for ML models to work effectively?
It really depends on the data set. I would say that the quality and the completeness of the data is more relevant than the size and the amount of data. You need to have the right data and the right data points, and the right features and parameters.
If you don’t have the relevant data – for example, if you’re trying to do daily forecasts, but you only have monthly sales numbers – then that’s not going to work out.
What are the key challenges SMEs typically face when implementing AI and data science projects?
Usually, it’s a lack of technical expertise and subjectmatter expertise. SMEs often don’t have many people with a statistical and data science background on their payroll. And they also often don’t have people knowledgeable in the technologies commonly used. The second challenge many SMEs face is whether they have the support from senior management that’s required to make sure implementing the AI aligns with company goals.
Thinking about where they should start, how can smaller companies identify opportunities where AI and data initiatives can make an impact?
I would advise that they start looking at their core business processes and their strategic goal – what they’re trying to accomplish. Next, they should hold brainstorming sessions with the relevant stakeholders. Think of ways they can use their data to improve upon those processes, and maybe even create new processes that support their core operations and their core strategy.
Are there any use cases that you see come up more regularly, that SMEs should think about first?
It depends on the business case. If you’re selling things, then of course there’s always an AI use case for forecasting and optimisation of logistics and storage. If you’re working with services and working with customers directly, then areas like customer segmentation, customer churn and customer lifetime value calculations are where AI and data science can be of most benefit.
What are the risks associated with AI and data science for SMEs, and how can they mitigate them?
One of the biggest risks, especially for smaller companies, is the legal side; the issues around data privacy and data security. Once you start working with data sets, you need to make sure that you’re allowed to use them for those kinds of use cases. You also need to make sure that there are no data breaches or confidentiality issues.
Can you give us an example of where an SME might get tripped up by this legal aspect of data security?
One issue many companies might face is if they collect technical data from their products for more generic purposes, but they then try to use that data on an individual basis, to create personal recommendations for that customer.
That’s a common use case where you’re not really allowed to use the data where individual customers could be identified. If the product is an entertainment product that sends usage data back to the company, that could also be critical or medical data, such as data from health devices.
What are some of the common pitfalls SMEs should avoid when taking on AI and data science projects?
A common issue is overspending on a technical solution that’s needlessly big and complex. Small businesses should start small initially, but also have a growth plan in mind so they can expand on their current projects.
Another potential pitfall is that businesses spend too much money on projects that don’t have ROI. So the project might be very modern and much-hyped, but in the end, if there’s no return that helps support your business processes then it’s money wasted.
How do businesses go about assessing whether a potential project is likely to have a positive ROI?
They should start by carefully considering what KPIs they should measure before they implement the solution and once it’s been implemented. Also, think in advance about how AI and data science could improve upon current processes. There needs to be a theoretical way that the impact will be quantified. And it needs to be measurable afterwards.
Ideally, the new project shouldn’t be too big or complex, because that increases the risk it will fail. It should be somewhat basic and isolated, but not too isolated that it’s not relevant for other processes or other use cases later on.
I would say a mid-sized topic that’s not too complex is probably a good way to start. It’s also a good way to gauge whether the company is ready for these kinds of projects. How is the team structure? How is the technical infrastructure? How is the data quality? If your data infrastructure and the data quality aren’t up to the job, then you’ve wasted a lot of time and money.
Where should an SME that’s new to data science and AI start, in terms of their data infrastructure?
It’s a good start to identify the data sources you have in the company. Usually they’re multiple and heterogeneous, and not accessible through one single interface or technology. Also consider what you want to achieve. It’s usually better to do data analysis first and check what kind of insights and benefits you can generate out of that before you start doing more advanced stuff like data science or AI.
To understand what kind of data is available and what kind of insights can be generated from it, the data should be stored in a unified platform like a data warehouse or a data lake. One single point where they can access the data. It’s easier to build the technology stack around that.
And in terms of a solid data infrastructure, what’s required in addition to this central repository?
The main points to consider are: where is the data coming from and where do we want to store it? So in the cloud, you could use Snowflake. Otherwise, you’d need some kind of automatic data pipeline to make sure that the data is transferred from the source system to the warehouse or the data lake, where it can then be aggregated, quality-checked and improved for the use cases of data analysis and data science.
What size and structure should a data team be in an SME? What’s the minimum number of employees a smaller business can work with?
That depends a lot on the skill set and the requirements you have. I myself have started out as a one-person data team in the past, but I also had the necessary software engineering background to make sure that I could deal with all the topics that came up. So it’s possible to start with one or two people. But you need to have at least one person with an engineering skill set; a data engineer or software engineer who can make sure you build up the infrastructure.
The next person I would probably look out for is a data analyst or some similar role, so you’re able to analyse and generate insights from your data.
And only after you have those specialists in place would you consider looking at data scientists for more complex use cases and projects.
Who should the data leader ideally report to in an SME?
Ideally the CEO. It should be a core topic for the business. If that’s not viable, another option would be the COO or the CPO if they have one; whoever is closest to the real business operations and the product or the services the company provides.
I don’t think the CTO is necessarily the best role to report to, because the deciding factor for the impact and the success of data science projects and data analysis is not the technology that’s being used, but rather the impact it has on the business. And the CTO is commonly very focused on the technology and not the business processes.
How important is it that the first hire has domain knowledge of that area?
People might disagree with me here, but I would argue it’s not too important because they can learn the necessary domain knowledge on the job. Technical knowledge is more important, especially for the first person hired. That’s provided there are people available who can assist with the necessary domain knowledge, and can teach the first hire as well.
Thinking about SMEs that are too small to hire a dedicated technical employee, how can they get started? Is it possible for them to build a data infrastructure?
It’s certainly possible to allocate a part-time role to an existing technical employee. I have experienced that myself in the past. Otherwise, it could also be possible to work with external resources that support the company for at least a couple of days or weeks to give them a head start to continue working on this on their own.
How can SMEs leverage AI and data to improve customer experience and retention?
Customer experience is very closely related to individualisation. So if the SME is able to use the data they have for each individual customer and use that to improve upon the services or products provided, then that will help keep the retention high.
You mentioned earlier that data quality is more important than the size of the data set. How do SMEs ensure they have high-quality data?
SMEs need to check their data quality from the outset. And you have to work closely with the people who are actually generating the data and inputting the data into the systems. These people need to have good data literacy and be made aware of the impact that the data quality will have later on. If the people generating or inputting the data realise the value that good data quality has, then you can ensure that the quality stays high.
And data quality should always be as high as possible as upstream in the process as possible. There may be some quality issues that can be fixed later on, but it will always be better to have initial high-quality raw data.
What’s your take on SMEs leveraging GenAI?
A lot of companies, especially in the SME arena, try GenAI because of the hype and the promise that it’s easy to implement (because it’s pre-trained). They go to Hugging Face and download an open source model, and then they try some RAG implementation with their own data.
The probability of this type of initiative delivering real value for SMEs is usually low, unless they already have very good data infrastructure and data management in place. That’s because the results are very dependent on the data quality that you use as input. So if the SME is already very advanced in their data architecture and data quality, they might have good results. Otherwise, the results will probably be subpar.
Do SMEs need a separate database for GenAI? Are there additional and infrastructure requirements for GenAI projects?
Most often, yes, but it also depends on the infrastructure you have in place already. For example, if you want to do a RAG use case, then of course you need a vector database. But there are also vector plugins for existing databases, and additional tools to circumvent the need for a completely new database system.
How much data is needed for a successful RAG implementation in a small company?
That very much depends on the use case. But it doesn’t have to be much. It’s more a case of the right vectorisation and the right chunking of the data. And it needs to be relevant to the use cases and the questions you want to answer with the system. So more data doesn’t necessarily perform better or create better results. It needs to be the right data and it needs to be prepared well.
Would you argue that most of these smallscale RAG implementations are just a glorified enterprise search?
There are certainly many projects where this is the case. And in many instances, when you try to implement a RAG, you will have to combine it with more classical methods of searching as well. So if you only have a limited number of documents, having a regular search or some kind of software-based search tool might be more efficient.
SMEs need to check their data quality from the outset.
What advice can you give about maintaining these models? SMEs sometimes assume they’re just like standard software that can be left alone once built.
So there are multiple issues that could be problematic. First of all, if you use apps to access public LLMs and OpenAI, those will change without notice and can create different results than you would previously have expected. And you also need to make sure that the vectorisation of your documents is up-to-date. So if new documents get added, then they will have to be added to the vectorisation database as well. And old documents might need to be removed. You need to have processes in place to take care of that.