A Conversation on the Evolution of GenAI By Martin Musiol

width= Martin Musiol’s academic background is in engineering and computer science. He’s been coding since 2006, and has worked for some of the world’s leading companies, including IBM, Airbus and Mphasis. Martin’s also been involved in the startup world, and once created his own NLP startup.
Martin’s been working in the field of generative AI since 2016, and he’s the creator of the world’s first online course in generative AI. His book on the subject was published by Wiley. Martin is also the organiser of the Python Machine Learning Meetup in Munich, and he’s the creator of the influential newsletter Generative AI – Short & Sweet, which has over 40,000 subscribers.
In our latest interview, Martin reflects on the continuing evolution of GenAI. He discusses the tech’s early stages as generative adversarial networks (GANs), the subsequent advancements and collaborations that fuelled GenAI’s progress, and the latest developments for GenAI, including RAG and multi-agent systems. What future advancements could be on the horizon for GenAI, and how will these impact our lives? Martin reveals all:

So, Martin, let’s go back to 2016. That was well before anyone was really talking about generative AI. What first got you interested in this field back then?

In 2016, I actually gave a talk at a conference in Milan titled ‘Generative AI: How the New Milestones in AI Improved the Products and Services We Built’. At that time, I was working as a data scientist at Flock Design, a consultancy. I had just come out of university, and one paper I came across in 2014 really stuck with me. It was on generative adversarial networks (GANs), by Ian Goodfellow – the original ‘vanilla’ GAN. That paper planted a seed, and it kept coming back to me.

In the design space I was working in, the concept of generating visuals through AI was very intriguing. Early GAN-generated images were rough, to say the least, but I could see the potential. I thought: ‘Someday, this technology might be able to create images that are indistinguishable from reality.’ The exact timeline wasn’t clear, but the potential was there. That’s what really drove me to explore how 3D object generation and other generative capabilities could fit into design.

What happened after those early years? When did you start to see the technology mature?

At first, generative AI created some buzz, especially at that conference on data-driven innovation. There were engaging conversations, but it quickly faded because there was no clear business value back then. The tech was mostly focused on visuals, and the language models weren’t advanced enough yet. In 2017, though, things started to shift with Google’s release of the Bard model. I was working at IBM at the time, and we used Bard in a project for a geological client. That’s when I saw the first truly impressive applications of a language model.

Then in 2018, I noticed an increase in research on GANs, and more language model use cases were appearing in papers. I decided to launch a website called GenerativeAI.net and create an online course to explain the technology. I shared it on Reddit and Hacker News, and it gained traction – nothing huge, but enough to tell me there was real interest out there. Later, when DALL-E launched in the summer of 2022, I saw the first big spike in my site’s traffic. But the real explosion happened with ChatGPT’s release that December. That’s when interest in generative AI went truly mainstream.

How would you define generative AI? The term has evolved significantly: what, in your perspective, are the core components that make up generative AI?

There are many ways to look at it, but I’d describe it in contrast to what we call discriminative AI. Traditional AI has been largely about classifying or identifying data – whether through regression, reinforcement learning, dimensionality reduction, and so forth. These are discriminative tasks, where the model essentially ‘judges’ or ‘selects’ from existing options.

Discriminative AI still drives value, of course, such as in recommendation engines, but generative AI is really just beginning to reveal its potential.

Generative AI, however, is about creation. Instead of analysing existing data, it generates new content: text, images, videos, even 3D objects. It’s a more complex task. Judging between options is relatively easy, but generating coherent, high-quality content from scratch is much harder. That’s why we’ve seen this shift, where generative AI is now taking the lead, especially as we find its applications in so many areas. Discriminative AI still drives value, of course, such as in recommendation engines, but generative AI is really just beginning to reveal its potential.

You’ve been part of generative AI’s evolution from the start. Could you walk us through some key moments? How did these developments shape generative AI, and how did you experience it?

I’d divide it into two main areas: sequential data generation and parallel data generation. Sequential data is where you’re working with text, music, or code –basically anything where the order matters. Parallel data generation, like image creation, is different.

Before transformers, we relied on models like LSTMs and sequence-to-sequence models for sequential data, and they worked pretty well. But then the ‘Attention Is All You Need’ paper came out in 2017, introducing transformers. It was revolutionary. Around that time, Google released the Bard model based on transformer architecture, and we implemented it in a project to identify specific words in data. Initially, we used regular expressions with about 80% accuracy, then an LSTM model improved that to around 90%. But when we used the transformer, accuracy jumped to 99% – a massive leap.

This new architecture was a game-changer. If Google had pursued it aggressively back then, they could have had their ‘ChatGPT moment’ sooner. But instead, they open-sourced it, and OpenAI ran with it. It’s interesting because Ilya Sutskever, a co-founder of OpenAI, has mentioned that early conversations with Geoffrey Hinton on the potential of scaling these models were met with scepticism. But over time, the data and the improvements made the case undeniable.

It sounds like this combination of advancements and open collaboration really fuelled generative AI’s progress. Where do you see it going from here?

Generative AI has come a long way, but we’re still just scratching the surface. The possibilities are endless, whether in creative fields, scientific research, or even healthcare. I see it transforming how we create and interact with content, whether it’s personalised education or innovative medical applications. As models continue to improve, generative AI will reshape entire industries.

Looking back, it’s been fascinating to witness how something I first encountered as a paper on GANs has grown into this powerhouse that’s now a core part of modern AI. And as we see more breakthroughs, generative AI’s potential will only keep expanding.

In your view, what are some of the key technologies driving generative AI forward today?

Until recently, I was leading generative AI projects for Infosys across EMEA. About 99% of our clients wanted to incorporate some form of AI that could query a knowledge base. The core architecture behind these systems is called retrieval augmented generation (RAG). Essentially, you break down large knowledge sources into manageable chunks and use a semantic search to retrieve the most relevant pieces based on the query.

For example, we worked with a global transportation company that frequently received emails asking about regulations in various countries. Traditionally, answering these emails involved manually searching through extensive documentation. We created a chatbot where they could simply paste in the email content. The chatbot extracts the intent and then searches a vector database of the document chunks, retrieving the most relevant pieces of information. The result is then compiled and presented in a coherent, accurate response – powered by models like GPT-4. RAG architectures like this are fundamental in enterprise applications today.

Did you notice any significant technological shifts in generative AI driven by advancements in hardware, like GPUs or new chip designs?

Generative AI has come a long way, but we’re still just scratching the surface. The possibilities are endless, whether in creative fields, scientific research, or even healthcare.

Hardware is a big part of it, yes. Nvidia, for instance, has been at the forefront, creating more advanced chips tailored for AI workloads. Most of the investments in training infrastructure flow through hyperscalers straight to companies like Nvidia. Besides that, we’re seeing new architectures emerge, like language processing units (LPUs), designed specifically for language tasks. Groq, for example, is a newer company competing with Nvidia in this space, though on a smaller scale. They focus on achieving lower latencies, which is crucial for real-time AI applications. Overall, improvements in processing power have accelerated generative AI development, making complex tasks more feasible.

You mentioned enterprise use cases, especially in regulatory or document-heavy environments. What are some other real-world applications where generative AI is making an impact?

Customer service is one area ripe for AI-driven transformation. A year ago, McKinsey mapped AI’s disruption potential across industries, and marketing topped the list. In marketing, we’re seeing AI applications in content creation tools like Grammarly and copywriting platforms. But customer service is where enterprise applications are really taking off. Consider routine tasks like address changes or account updates. Today’s language models can handle these efficiently, extracting the required information and updating it in a database, even if it’s provided in an email or as an image.

In fact, we’ve developed systems where AIs crosscheck each other’s work to ensure accuracy before updating customer records automatically. This process can even extend to phone interactions. Imagine calling your internet provider to update your address. With current models, you could speak directly to an AI that would make the update seamlessly. The potential to automate these interactions is enormous.

It sounds like generative AI could disrupt many sectors. Beyond marketing and customer service, where else do you see the biggest potential impact?

Software development stands out as a huge opportunity. I’m actually building an app that can parse financial documents, extract a company’s hierarchy, and visually map it out. I just started this project today, and it’s already 90% done! With a basic front-end, a back-end, and some connections to a language model and RAG system, you can create a functional proof of concept surprisingly quickly.

Generative AI tools like Claude 3.5 allow you to build projects with minimal coding. I simply described my idea in natural language, iterated a few times, and was able to develop a working model. This might mean that in the future, learning traditional coding languages like Python and C++ won’t be necessary for many developers, which is quite a paradigm shift.

What advice would you give to large corporations looking to integrate generative AI into their existing AI strategies?

Good question. It doesn’t have to be a massive investment right from the start. I often advise companies to start small. A proof of concept (POC) is a great way to get started without overwhelming resources or budgets. You can set up a cloud-based environment, maybe on Azure or AWS, and create a basic semantic search application with your internal documents.

For example, Azure offers a service called Azure AI Search, which lets you upload documents and quickly build a powerful, front-facing semantic search engine. With services like this, concerns about data security can be managed effectively. Companies can design their systems to protect data from external access or misuse, especially in sensitive regulatory environments like Europe.

Could you give us an overview of small language models versus large language models? How do they compare in enterprise settings?

Small language models have far fewer trainable parameters than their large counterparts. For instance, GPT-3.5 has 175 billion parameters, whereas a smaller model like Microsoft’s Phi-3 has only 3 billion. Despite the size difference, well-optimised small models can perform remarkably well in specific use cases.

One common belief is that larger models are more prone to ‘hallucinations,’ or generating inaccurate responses, since they’re trained on such a vast amount of data. Smaller models, on the other hand, can be more efficient and are often suitable for tasks where latency or computational cost is a factor. Many small models are even open-source, allowing organisations to run them on local hardware.

From your experience, what are the most common hurdles companies face when implementing or fine-tuning these models?

Training large models is extremely resource-intensive. Fine-tuning a frontier model can cost millions. Opensource models like Llama 3 aim to make AI more accessible, but even those require significant resources for fine-tuning, which isn’t exactly ‘democratised’ yet.

Another critical challenge is ensuring the integrity of the models being used. There’s a growing concern about ‘sleeper agents’ – backdoors or behaviours embedded into models during training. Anthropic researchers discovered that certain character sequences can trigger hidden behaviours in a model, causing it to bypass guardrails. You don’t want that in a professional, customer-facing AI, so it’s essential to vet models carefully.

That’s a bit concerning. Could you elaborate more on the sleeper agent concept?

A sleeper agent is essentially a hidden function embedded in a model. Anthropic found that when a specific character sequence appears, the model switches into an unfiltered, unrestricted mode – ignoring any safety protocols. In an enterprise setting, this could lead to reputational damage if, for instance, a model starts generating inappropriate content based on an unknown trigger.

Even after fine-tuning, some of these behaviours can persist. So, companies should verify their models’ origins and use robust vetting processes before deploying them in production.

Designing generative AI systems clearly involves significant planning. Do you have any cautionary tales from your experience?

Definitely. We had a client during the early days of GPT-4 who implemented an application allowing users to interact with the model freely. Within a month, API usage skyrocketed, leading to a bill of half a million dollars – just for one trial month! That was a costly lesson in controlling access and managing user interactions.

Another issue is bots. The internet is full of automated bots, so if you have an open system, it’s essential to secure it, perhaps by requiring logins. Otherwise, you could be paying for automated spam interactions. Thoughtful design and user access management are crucial for any organisation looking to integrate generative AI effectively.

How does data quality and quantity impact generative AI models? You mentioned that smaller language models can sometimes work with less data, but what’s your experience here?

Working with well-trained models like ChatGPT or Claude-3 actually gives you some leeway with noisy data. If you set the context for the model – letting it know the data might be noisy and guiding it to focus only on relevant parts – it can manage pretty well. This is where prompt engineering becomes essential. Crafting the right prompts is a science in itself, and there was a time when papers on prompt engineering were coming out rapidly.

In RAG (retrieval augmented generation) applications, especially where you’re querying a large knowledge base, having clean data is crucial. Although models handle some level of noise, removing irrelevant information can streamline the response quality, especially when dealing with vast data sets.

How do you measure the performance of generative AI systems? What metrics do you find useful?

Performance measurement in generative AI is complex. There are benchmarks like MMLU and questionanswering tests, but sometimes these benchmarks are included in the training data, blurring the lines between training and testing results. Comparing models directly, however, can be insightful. Take the Elo rating system, originally from chess, which Hugging Face uses on their LMSys Leaderboard. They update this daily, showing how different models perform against each other. The leaderboard reveals the top models, like GPT-4.

Another interesting metric is the Hallucination Leaderboard, also on Hugging Face, which measures how often models generate ‘hallucinations’ or inaccurate responses. GPT-4 currently has one of the lowest hallucination rates. For those exploring model performance, these leaderboards are a good place to start.

Are there any misconceptions about generative AI you’d like to clarify? Or perhaps common questions you get?

Lately, I’ve been exploring Claude 3.5 Sonnet and its upcoming version, Claude 3.5 Opus, which are packed with tools. They’re remarkable in their ability to integrate code directly into documents. You can even build and publish web apps within the platform itself, just by outlining your ideas in natural language. It’s pretty wild! I highly recommend people try it out if they haven’t yet.

We’re touching on future trends in generative AI. Are there any emerging developments or technologies you’re particularly excited about?

Absolutely. One trend I’m really interested in – and I write about it in my book, The Agentic Future – is the rise of AI agents. Imagine language models at the core of multiple agents, each with specialised functions, like a project manager agent, a quality assurance agent, and so on, all working together in a multi-agent framework. There’s a framework called CrewAI that’s pioneering this. Andrew Ng recently said that we’re already at GPT5 levels in performance with GPT-4o when used in a multi-agent setup, and I agree. These agents working in tandem could be game-changers, acting like executive assistants, planning our trips, or managing tasks in both our professional and personal lives.

And this agent-to-agent communication goes beyond individual assistance. My agents could interact with yours, negotiating prices or scheduling meetings. It’s an exciting vision. Another development I’m watching is the rise of humanoid robots, which we’re seeing with projects like Elon Musk’s Optimus, which incorporates GPT-4o as its ‘brain.’ In 2024, we’ll see more of these technical convergences – AI models integrated into physical embodiments, bringing us closer to AGI.

Could you elaborate on how these developments relate to AGI?

The journey to AGI, or artificial general intelligence, hinges on bridging the gap between understanding the world through language and truly experiencing it. Text alone is just an approximation. If I describe living in a jungle for three months, you’ll get a sense of it, but it’s not the same as actually being there.

True AGI needs to connect with the world on multiple levels: visual, auditory, maybe even tactile through robotic embodiment. Multimodal and multi-sensor AI models, capable of interpreting and interacting with various data streams in real-time, are key. I cover this more in my book, discussing how multimodality, multi-sensory input, and multitasking capabilities –essentially imitating the human brain’s functions – are essential to AGI development.

That’s incredibly thought-provoking! Speaking of your book, could you tell us a bit more about it?

Certainly. The book, Generative AI: Navigating the Course to the AGI Future , is split into three parts. The first part is a brief history of AI, with examples of early generative AI, like ELIZA, a chatbot from 1965 that could simulate conversations.

The second part dives into the current landscape, exploring various applications and the vast potential that’s still untapped. I guide readers through a framework to identify opportunities for using generative AI in their own fields. The third part, which is about 35% of the book, looks forward. I discuss the future of generative AI, the evolution of autonomous AI agents, multi-agent frameworks, and the potential merger of these technologies with humanoid robots. The book concludes with an exploration of AGI and how we might prepare for its arrival.

You also have a popular generative AI newsletter. Could you tell us a bit about that?

The newsletter, Generative AI – Short & Sweet , comes out twice a week, on Tuesdays and Fridays. Fridays are a recap of the week’s top AI news, while Tuesdays dive into a specific topic. We’ve covered everything from small language models to applications of Llama agents and different AI frameworks.