Data Science Talent Logo
Call Now

Why Generative AI Projects Fail ByColin Harman/strong>

 width=Colin Harman is an Enterprise AI-focused engineer, leader, and writer. He has over two years’ experience in implementing LLM-based software solutions for large enterprises, and serves as the Head of Technology at Nesh. Colin specialises in troubleshooting the unique challenges posed by the interaction of generative AI with the enterprise environment.
In this post, Colin explores the most common pitfalls businesses encounter when implementing GenAI. What lessons can we learn from their mistakes, and what factors need to be considered to make your GenAI project a success?

It’s 2024 and every enterprise is talking about using generative AI. Even last year, 50% of companies said they were piloting generative AI projects [1] , and that’s a technology that few even knew about a year before. Rest assured that, at the conclusion of 2024, that figure will be approaching 100%. Very impressive!

There’s a firehose of content discussing potential areas of value and how to succeed with generative AI. Would you believe that innovative companies do better at deploying generative AI than… non-innovative companies [2]? Shocking! And that generative AI will transform some industries [3]? Bring it on!

But if you’re a curious person, you probably have a couple of questions that haven’t been answered by the consultants. First, what are companies actually deploying when they pilot generative AI? And second, what causes these projects to fail, so that mine can succeed?

THE TECH BEHIND MOST GENERATIVE AI PROJECTS

Let’s start by illuminating exactly what these projects consist of, beyond simply ‘generative AI.’ As a generative AI & software provider, I’ve witnessed firsthand what enterprises want, buy, and commission. First, we’ll restrict our focus to primarily text-based generative AI models – large language models (LLMs). There are certainly examples of other modalities of models being used (video, audio, mixed), but with 50%+ of companies piloting ‘generative AI’ this year, you should read that as being text-based. It’s simply the most accessible and broadly applicable form of data. Within text-based use cases there are 5 major ways to implement and benefit from generative AI  [4], that range from exceedingly simple (giving employees access to ChatGPT) to extremely complex (experimental AI software developers). What the vast majority of enterprises are doing is splitting it right down the middle and implementing NLIs (natural language interfaces) , which provide a text interface to a corpus of company data through a technique called RAG (retrievalaugmented generation) . This is equivalent to using ChatGPT with browsing enabled, where it can search the web before responding to your input. However, enterprises want to enable that same functionality on their own internal data, like document stores or databases. (RAG allows that data to be ingested into a search engine, and the results to be interpreted by a LLM).

There are several reasons that companies choose to implement natural language interfaces instead of the other use case types:

  1. They can easily be introduced as a separate tool or minor augmentation on an existing one, limiting the risk of disrupting a business process by inserting an unproven technology.
  2. In theory, they give enterprises easy access to massive stores of knowledge that they have heretofore been ignoring, or spending unnecessary time sorting through.
  3. The benefits of 2. are extremely generic and can apply to any business area, irrespective of its function.

In summary, NLIs are chosen because they are very generic and low-risk. So what could possibly go wrong?

THE CAUSES BEHIND PROJECT FAILURES

Ideally, a generative AI project involves implementation and evaluation, which culminates in the project being chosen to scale up for long-term adoption and value. But many don’t, even when the technical implementation is executed to perfection! Here are what I’ve observed, across many projects, to be the biggest non-technical risk factors. You’ll note that some of the buzzier topics in current discourse are missing, like cybersecurity, intellectual property, bias, etc. [5] This is because, in practice, these obstacles are either easily overcome or quite rare. Instead, the factors that follow are relevant to most projects you might encounter.

Low -Value Use Case

The clearest problem with tackling a generic, low-risk use case is that it may not be very important. While it’s tempting to target ‘safe’ areas, these often don’t align with strategic business goals or have significant impact, leading to projects that fade into obscurity without delivering meaningful value. Successful generative AI projects target use cases that are core to the business. By choosing a highvalue area, even if it’s higher risk, the project receives more attention, support, and resources, increasing the chances of business impact.

Another way this failure risk manifests is with leaders proposing a one-size-fits-all GenAI solution. The technologies involved (LLMs, RAG) can be used in myriad ways and if a single solution is expected to serve the needs of a large, diverse organisation, it’s unlikely to do it well. Rather than thinking of generative AI as something that a company does once, it should be considered a general technology to be leveraged in many different ways. Think of these technologies like databases – they will eventually permeate nearly every system we use and it’s silly to limit usage of them to a single implementation. Solutions that are tailored to the needs of a small group of users with similar objectives will always provide more value per user than those tailored to the needs of a large group of users with disparate goals.

Data Readiness

Because these natural language interfaces operate over some corpus of data, their value is closely tied to the usefulness of that corpus. Many leaders see GenAI as an opportunity to mine the mountains of data that they have accumulated, but look past the fact that previous enterprise search projects (the precursor to NLIs) may have failed spectacularly due to messy, incomplete, incorrect, bad data. Or maybe, nobody was crazy enough to even try to marshal that data before, knowing how disorganised it was! But here we are in 2024, throwing terabytes of files into a search engine and then asking a poor LLM to make sense of the results. A rule of thumb: If your data wouldn’t be useful in a search engine, it won’t be useful in an NLI, and therefore it won’t be useful for GenAI! Here are some data readiness red flags, each of which contributes to an increased risk of project failure:

  • Multiple versions: In organisations, the truth changes over time. Employees spend some of their time keeping track of what the current truth is, and documenting it. But often, what’s given to the GenAI system is a collection of all of the different versions of truth. Instead, the solution should be given the same courtesy as a new employee, that is: pointed to only the latest version of truth.
  • Large volume: Since NLIs are usually built upon search engines, they are subject to the same limitations. One such limitation of search engines is that, as the amount of data grows and grows, the usefulness of the top responses decreases. This relationship is nearly as inevitable as entropy, and all you can do is be aware that as the corpus grows, more and more search optimisation may be required to maintain the same level of performance.
  • Complex formatting : There’s also an additional limitation that NLI systems have beyond search engines: Since an LLM needs to read the search results, to interpret them and generate a response, the text in those search results needs to be extracted in a semantically coherent way that preserves the meaning (while many search engines are happy with a jumble of words). This can get very difficult when documents contain tricky formatting in the shape of tables, columnar layouts, image-only text, and more.
  • ‘Incorrect’ data: Data is oil, right? The more data, the better? It turns out, that thinking is very dangerous when it comes to GenAI. Since the output of an NLI is completely derived from its data sources, an incorrect data source means an incorrect output. Often, companies don’t know how much incorrect data they have until they start generating incorrect outputs left and right, only to discover that it’s coming from their own documents!
  • One big data dump: Often a company will identify a massive share drive of data and say, ‘Let’s use that for GenAI!’ without caring much about what’s in it. Without fail, it will contain a healthy dose of the issues listed above.

Project Framing

In order for a GenAI project to be successful, the solution it provides should delight its users, and definitely not disappoint them. Unfortunately, it’s extremely easy to disappoint users by making promises that the solution cannot satisfy. These claims usually arise from a misunderstanding of how NLI systems work, and lead to users thinking that the solution should be able to perform complex workflows beyond the solution’s capabilities, simply because they are able to command it and haven’t been instructed on its limitations.

The key point is this: these popular GenAI systems are search engines with a language model on top. If the task cannot be performed by executing a search and then interpreting the search results, it is probably not possible for a basic NLI system to complete it. This can result in some shocking limitations, like the inability to perform complete rankings (which is the top X) and aggregations (how many of Y) unless the answer is explicitly mentioned in the corpus, respond to a command that would require multiple steps to perform, and summarise the entirety of a large document. Some advanced solutions and those tailor-made to specific use cases can overcome such limitations, but projects can still succeed in the presence of them – they just need to inform users of what’s possible. But without this guidance, users will make demands of your solution that would baffle even a human expert.

CONCLUSION

However, many of those projects will not lead to immediate value, scaling, and adoption, due to poor choice of use case, messy data that’s not ready for prime time, and lax project framing that allows user expectations to expand beyond solution capabilities. I’ve seen these factors arise in nearly every GenAI project I’ve been involved with. However, with a proactive approach that emphasises strategic use case selection, meticulous data preparation, and realistic project framing, companies can not only avoid these pitfalls but also unlock transformative value from their generative AI initiatives. Now that you know these critical factors, you’re better equipped to guide your projects to success.

REFERENCES

[1] Boston Consulting Group, 2024 ‘What GenAI’s Top Performers Do Differently’ bcg.com/publications/2024/what-gen-ais-top-performers-do-differently

[2] McKinsey & Company, 2023 ‘Companies with innovative cultures have a big edge with generative AI’ mckinsey.com/capabilities/strategy-and-corporate-finance/our-insights/companies-with-innovative-cultures-have-a-bigedge-with-generative-ai

[3] Bain & Company, 2024 ‘Generative AI will Transform Healthcare’ bain.com/insights/generative-ai-global-healthcare-private-equity-report-2024/

[4] Colin Harman, 2023 ‘The 5 Use Cases for Enterprise LLMs’ colinharman.substack.com/p/the-5-use-cases-for-enterprise-llms

[5] Forbes, 2024 ‘Revealing The Dark Side: The Top 6 Problems With ChatGPT And Generative AI In 2024’ forbes.com/sites/glenngow/2024/01/28/revealing-the-dark-side-the-top-6-problems-with-chatgpt-and-generative-ai-in2024/?sh=68f38b04349adoi.org/10.1371/journal.pone.0193088

Back to blogs
Share this:
© Data Science Talent Ltd, 2024. All Rights Reserved.