How GenAI Is Delivering Operational Efficiencies in Online Retail by Sascha Netuschil

width= Sascha Netuschil has degrees in automotive engineering and cultural anthropology from the Universities of Stuttgart and Hamburg. He joined Bonprix in 2015 as a web analyst, pioneering the firm’s establishment and expansion.
Sascha is now the architect of Bonprix’s robust Data Science and AI domain, overseeing its operational responsibilities and strategic development. Over the past decade he’s implemented GenAI solutions, recommender and personalisation systems, real-time intent recognition, fraud detection and price optimisation algorithms.
In this post, Sascha Netuschil of the international fashion company Bonprix discusses the pivotal role GenAI plays in the company’s online retail operations. Bonprix has invested heavily in GenAI, focusing on strong use cases that promise business value. Benefits range from an in-house translation tool to advancements in website accessibility for disabled users:

Can you briefly tell us about your journey to leading AI and data science at Bonprix?

I’ve been at Bonprix for 12 years now. When I started, we had no machine learning or AI capabilities, and it was purely analytics being done.

I eventually recognised we could do more with our data than just building dashboards. We tackled two challenging first projects: building a marketing attribution model to measure campaign effectiveness and developing realtime session churn detection. Both were complex starting points, but they were successful and taught us valuable lessons.

From there, we grew organically. I started as the only data scientist, though that wasn’t even my title initially. We expanded into a team, then a department. Recently, the company consolidated all AI and data science teams into what we call a ‘domain’ to foster collaboration and create a unified approach.

That’s the short version. We’ve worked on projects across many areas, primarily in sales and marketing, but our recent organisational restructuring is expanding our reach into other business areas.

What drove Bonprix’s decision to invest heavily in GenAI, and how did you build the business case?

It wasn’t a strategic top-down decision. We grew organically, project by project, based on clear business value. We take a strictly benefit-oriented approach. If we can identify a strong use case that saves money or generates revenue, we pursue it.

We take a strictly benefit-oriented. approach. If we can identify a. strong use case that saves money or generates revenue, we take it.

We have plenty of these opportunities because we handle our own product design and development, which involves significant manual work. Each project has delivered strong ROI, so our AI investment grew naturally rather than through executive mandate.

You have nine people in your GenAI team. How is it structured?

The team evolved from a traditional data science group doing machine learning. We made early mistakes by focusing too heavily on data scientists and not enough on engineers. This led to lots of proofof-concepts and great models, but we struggled to get them into production.

Now we have a more balanced structure: four data scientists (including an offshore team that works as one integrated unit), three data engineers, one software engineer handling internal customer UI’s, plus project management. The offshore arrangement gives us flexibility in role allocation.

We learned we needed to invest more in engineering roles to handle operations and actually deploy models in production environments.

Are there patterns in your data scientists’ backgrounds? Do they come from typical maths, statistics, physics backgrounds, or are they more computer science and ML focused?

It’s a mix. We have people from mathematical backgrounds and others from machine learning. None come from pure software engineering, as they were already in data science before transitioning to AI.

Moving from classical data science to generative AI requires new skills. We took a learning-bydoing approach rather than formal upskilling programs, growing capabilities through projects. Three of our four data scientists have now built GenAI expertise, and we want all of them to develop these skills as we shift toward more GenAI applications.

Initially, we were perhaps too naive, thinking we could just give data scientists an interesting new topic without realising how much they’d need to learn.

What were the main skill gaps they had to address?

The technology itself isn’t that complicated for experienced data scientists or developers. It’s using models, implementing RAG systems, though we haven’t explored LoRA applications yet.

The real challenge is working with language. You need to learn prompt engineering, what works and what doesn’t. This is completely different from traditional programming, where there’s one language, one command, and it either works or it doesn’t.

With GenAI, you need much more trial and error. You can’t be certain that doing the same thing twice will produce identical outputs. This uncertainty and iterative approach represents a fundamental mindset shift from traditional software engineering or data science work.

You’ve mentioned the rapid pace of change and new skillsets required. How do you keep your team current with the constantly evolving GenAI landscape?

At some point, you have to step back and relax. New developments emerge daily, and trying to track everything becomes a full-time job. It’s like buying any new technology. If you check what’s available six months later, your purchase already seems outdated. AI moves even faster, but the principle remains the same.

We focus on finding what’s available now and what does the job, then pick the best current option. When the next project comes up, we reassess. Is there something new that works better? But as long as our technical setup delivers results, we don’t need to reevaluate every few weeks. Otherwise, you’d never complete any actual projects.

It’s like buying a mobile phone. You shouldn’t keep checking new offers for weeks afterwards because you’ll always think you made a bad deal. The same applies to AI. You have to live with your choices and focus on what works.

We do monitor what’s happening in the field, but we only seriously evaluate new technologies when we have a new use case. Sometimes we revisit solutions from a year or two ago, but that essentially becomes a new project building on an old use case.

Moving to your use cases and the way you are tackling fashionspecific translation challenges. Why do standard translation tools fall short for fashion brands like Bonprix?

Fashion has specialised vocabulary that can’t be translated wordfor-word. German, for example, incorporates many English terms from fashion and technology. Take ‘mom jeans’ – a trend that’s been popular for years. Standard translation would convert this to ‘mama jeans’ in German, which makes no sense because Germans actually use ‘mom jeans’.

While some translation services might handle this specific example correctly, there are countless special cases where fashion terminology requires nuanced translation. Standard algorithms inevitably fall short when dealing with these industry-specific terms.

Additionally, we have our own defined communication style as a company. We follow specific language rules about what words to use and avoid. This combination of fashion-specific vocabulary and brand-specific language guidelines makes it difficult for standard translation software to deliver appropriate results.

How has your translation tool been able to incorporate fashionrelevant language?

We built a RAG-like system that creates a unique prompt for each text we want to translate. The process works like this:

We have extensive humantranslated texts from previous work as reference material. When we receive a new text, we identify which product category it belongs to: men’s fashion, women’s fashion, outerwear, underwear, shoes, etc. We then pull relevant example texts from that specific category.

The system scans for specialised vocabulary words and crossreferences them against our lookup table. These terms, along with category-specific examples, get incorporated into the prompt. We also include static elements like our corporate language guidelines – rules about which words to use or avoid, with examples in all target languages.

So the complete process takes the input text, identifies the product category, builds a customised prompt with relevant examples and vocabulary rules, then sends this to the large language model for translation.

How do you ensure consistency and brand messaging while adapting to local market nuances across all your countries?

We’ve improved our existing process rather than replacing it entirely. Previously, external translators would handle texts, then internal native speakers would review them, so we already had humans in the loop.

We still use human review, but we’re gradually building trust in the model. Initially, we maintain extensive human oversight, then reduce it over time as confidence grows. Eventually, we’ll likely move to sample-based checking.

My philosophy is that we shouldn’t apply different quality standards to AI-generated versus human-generated content. People often distrust AI due to concerns about hallucinations, but human translations weren’t 100% perfect either. When we found translation errors before, we’d discuss them with translators and fix them. We can do exactly the same now. probably the best approach, though it doesn’t scale easily.

I think 1% errors are acceptable if you have a process to handle them. We can feed incorrect translations back into our prompts as examples of what not to do, allowing the system to improve.

It’s interesting that people apply stricter quality rules to AI than to humans. Humans make errors, too, and if AI has the same error rate, the quality impact is identical. The difference is that we now measure AI errors more systematically than we ever did with human translations.

Language is inherently more ambiguous than numbers. Surely if you give the same text to 10 translators, you’ll get 10 different valid translations?

Exactly. With translation, there’s often no clear right or wrong answer. Obviously, some translations are incorrect and contain typos or completely wrong words. But as you said, you can translate the same word multiple ways, and both are fine. One might be subjectively better, but what defines ‘right’?

How big an issue were hallucinations for you, and what did you do about them?

For us, it wasn’t a major problem. People are often very afraid of hallucinations, but current model generations have improved significantly from the early days. They still occur, but several techniques work well for us.

We maintain human-in-the-loop processes for many texts, which is

We also have a fashion creation tool where product developers input natural language descriptions and receive images of how items would look. For this, we use a technique where another AI model checks the first model’s results against specific criteria, ensuring only the item appears (no humans), showing front views, etc.

Another large language model or image generation model evaluates whether outputs meet these rules. If not, the process restarts. This creates a longer user experience, but using one GenAI model to validate another’s results is a widely adopted technique for quality control.

Is that automated?

Yes. Users have a UI where they type what they want, and the interface helps with prompt creation so they don’t need to think about prompting. The entire process of generating and reassessing images runs automatically in the background, then delivers the result.

What were the biggest technical and linguistic hurdles when building the system?

Initially, we saw many language mistakes, which were disheartening. We had to figure out how to teach the model what we didn’t want, but this wasn’t well-documented anywhere. We couldn’t easily find examples of what works versus what doesn’t.

We hit prompt size limits quickly. As prompts get bigger, results often deteriorate because models can’t effectively use all the information in oversized prompts, even with large context windows. This led us to create individual prompts for each text rather than one massive prompt.

We also had to integrate these models into our development environment using APIs rather than UIs.

But, honestly, compared to our earlier machine learning projects, this was easier. Our current Gen AI processes aren’t fully automated, and there’s still a human clicking ‘send’. The background processing is much simpler than something like a fully automated personalisation model that calculates conditions for each user daily and pushes directly to the shop.

So we’ve encountered fewer technical hurdles with GenAI than with our previous machine learning use cases.

How is GenAI helping with accessibility, compliance and making your website more disability-friendly?

Throughout the EU, laws require web content to be accessible to everyone. For fashion e-commerce sites with thousands of images, this creates a challenge. People with vision impairments need text descriptions of all pictures so their browser plugins can read them aloud.

We initially handled this manually, paying people to write descriptions for every image. As you can imagine, this became very expensive.

This is exactly what multimodal models excel at – taking an image and describing it in text. We’ve built a system that automatically describes our product images, though we had to fine-tune it beyond the vanilla model capabilities.

The result saves us significant money on what would have been purely manual work just five or ten years ago. We’re very happy with this solution.

What’s your technology stack for GenAI applications? Are you using cloud, open source models, or proprietary?

We primarily work on the Google Cloud Platform and try to use their available models. When we started, it was basically OpenAI or nothing, so we set up an Azure project solely to access OpenAI APIs while doing our main work on Google Cloud. We still use OpenAI models for some use cases, but for new projects, we look at Google’s models first.

From my perspective, there are differences in models such as pricing and quality. But for standard use cases (not super advanced reasoning), it doesn’t matter much. You can get good results with different models.

For image generation, we use Stable Diffusion, though we’re exploring Google’s Veo for text-toimage and video capabilities, which are quite impressive. It’s not in production yet.

We reassess options for each new project but don’t constantly reevaluate existing ones. Since the field is relatively new, we don’t have truly outdated processes yet.

We use mostly proprietary models rather than open source. It’s about administrative overhead. Open source sounds great because it’s ‘free’, but in a corporate context, you invest significant work and money in maintenance and administration.

And since you’re not putting customer data through these models anyway, the open source route probably doesn’t make sense?

Exactly. Though when using OpenAI on Azure, they’re also compliant with data regulations. From a compliance perspective, we can’t use services hosted in the US, so we wait for European server availability, which is usually a few months after a US rollout.

There’s still the ethical question of handling truly sensitive data. You probably shouldn’t fully automate with AI for really sensitive information, like in healthcare.

In our case, we don’t use customer data currently as our biggest use cases focus on products and articles. But our customer data is already stored on Google Cloud for other services, so it’s covered under the same terms and conditions that keep data safe.

The APIs we use, like OpenAI’s, don’t store or use our data for further model training. Every business needs this guarantee, so providers couldn’t sell to corporations without including it in their terms.

Are there computational costs and scaling challenges with proprietary models in your use cases?

Our costs haven’t been significant because most use cases target internal processes. We have maybe 1,300 colleagues, with roughly 100 using these tools. That’s very different from customer-facing services used by millions.

The cost factor increases dramatically with customer-facing applications. For us, currently, looking at our overall data warehousing and IT costs, GenAI doesn’t have a major impact.

Looking to the future, what other potential GenAI use cases are you considering?

We’ve done a lot with text generation and now want to focus more on image generation. Beyond the inspirational images for product development I mentioned, we’re working on automated content creation.

Currently, adapting images for different channels, such as our homepage, app, and social media, requires manual work. You might need to crop an image smaller while keeping the model centred, or expand it to a different aspect ratio by adding background elements like continuing a wall’s stonework. We can now do this with AI visual models.

We’re also exploring video content creation from images or text. Imageto-video is more interesting for us because we want to showcase our own fashion products. Text-to-video might generate nice outfits, but they wouldn’t be our products. This is possible today with significant development investment, but it’s not our top priority.

Our priority is creating short videos from existing images for our shop and social media. There’s high potential in content creation because we need variants for personalisation. For example, a fall collection teaser featuring a family appeals to parents, but single people or men’s wear shoppers might prefer content showing just a man or a couple. AI can help us create these variants from existing content.

Regarding customer-facing AI systems, like the ‘help me’ chatbots getting lots of publicity, I think culture is changing. Eventually, people will expect every website to have natural language interaction capabilities. I’m aware of this trend, but given our finance-focused approach, the ROI wouldn’t justify the investment right now. We’re focusing on internal processes where we can achieve better results.

Still, we shouldn’t ignore this shift. People increasingly use GenAI apps on their phones for everything, and they’ll expect similar capabilities from other technologies, including websites and maybe even cars. Technology advances faster than culture, but culture does eventually catch up.

How are you measuring success and judging ROI for your GenAI initiatives?

Most of our first wave projects directly save money on external costs. For example, we no longer pay external translators because the savings are right there on the table. We can calculate exactly how much we save per month or year versus our internal costs, which are much lower. These use cases are very easy to justify.

It gets more complicated with projects like our image creation tool for product development. This doesn’t replace external costs; it just enhances the creative process, which isn’t easily measurable.

For these cases, we track usage levels. Since it’s built into a workfocused UI, people wouldn’t use it unless it’s genuinely helpful for their job. Unlike general tools like ChatGPT, where you might waste time generating funny images, our tool serves a specific purpose. The novelty factor wears off, so sustained usage indicates real value.

Of course, there’s initial higher usage as people try it out, but we monitor long-term adoption patterns. However, this doesn’t give us the same level of investment assurance as direct cost savings.

This is a general IT problem, not AI-specific. Some systems are essential – people can’t work without them – but others are harder to quantify. Take our ML recommendation features in the online shop. You can measure how many people use recommendations before purchasing, but that’s probably not the only factor in their decision.

To really prove ‘I made X more money’, you’d need constant A/B testing, which has its own costs. Part of your audience wouldn’t see the feature during testing periods.

Sometimes it’s hard to measure effects, and we had this challenge before AI. At some point, you have to be convinced that something is valuable based on available evidence and what you observe.

Are there any final thoughts you’d like to share?

We’re building these teams with lots of interesting use cases. It’s a good environment for trying new things. The company provides guidance on which areas to focus on, but how you implement solutions and experiment with systems is quite free. We really get to see what works and how to achieve success.

We’re always looking for good talent, and people can check our career page if interested. I think we have a great setup for experimentation, and my team members really enjoy that freedom.

I think culture is changing. Eventually, people will expect every website to have natural language interaction capabilities.