Podcast

AI in Procurement | Inference and reasoning: less risk, more strategic power

In the sixth episode of our podcast AI in Procurement, Fabian Heinrich, CEO of Mercanis and Dr Klaus Iffländer, Head of AI at Mercanis talk about two ground-breaking methods for optimising AI models in procurement: Inference Time Compute and Reasoning.

What is it all about?
What is Inference and how does Inference Time Compute help to improve the performance of LLMs on complex tasks?
How can more computing power improve the quality of answers and support purchasing decisions?
What is the difference between inference and reasoning, and how do these techniques help to integrate reasoning into AI models?
How can vertical AI agents with these methods be used in risk management, supplier evaluation and bid comparisons?

Through the use of inference time compute and reasoning, vertical AI agents will be able to solve more complex tasks autonomously and make even more precise, data-driven decisions. These advances enable intelligent automation in procurement and help teams to take on more strategic tasks.

Is this the next step in the evolution of AI in procurement? We show how these technologies are optimising procurement processes and what role humans continue to play.

On our own behalf: There is an e-mail newsletter for the Procurement Unplugged by Mercanis podcast. Subscribe HERE now!

Our Speakers

Fabian Heinrich

CEO & Co-Founder of Mercanis

Dr. Klaus Iffländer

KI Expert & Head of AI at Mercanis

Fabian Heinrich (00:01)
Dear listeners, welcome to another episode of Procurement Unplugged. Today we are once again focusing on the topic of AI with Dr Klaus Iffländer. The last episodes focussed on vertical agents, but above all on training data, synthetic data as a booster for training and how I can virtually eliminate the difference, David, between Goliath and this. Today's episode is about what I do with the training data when I have limitations.

There are two methods here. One is reasoning and the other, which we will discuss first, is inference time computation. Perhaps two foreign concepts for many, but as always, we'll dive into them, shed some light on them and explain them simply and comprehensibly with our Dr AI. Welcome, Klaus!

Dr Klaus Iffländer (01:00)
Hello Fabian, it's great that I can be there again.

Fabian Heinrich (01:04)
Yes, let's start right away. I mean, I've already mentioned it, limitation in the training data and that there are somehow two methods. Yes, maybe you could go into that in more detail.

Dr Klaus Iffländer (01:17)
Yes, with pleasure. So with conventional inference, a large language model...

Fabian Heinrich (01:23)
Maybe here again, what is inference? Perhaps many of our buyers and listeners are not quite clear about this either.

Dr Klaus Iffländer (01:27)
So yes. Exactly. So when we work with large language models, i.e. Chat GPT and colleagues, they are first trained on large amounts of text. And if you now come and ask a question in this familiar chat window and want an answer, then what the large language model does in the background is what we call inference. In other words, it tries to find an answer based on the training data and the question asked in the chat window.

To infer or, in technical terms, predict an answer, because it's actually looking through all the training data and trying to find the answer or find the text that is most likely to be accepted as an answer by me as a user or is considered a reasonable answer. This is called inference. So and as I said with regular inference.

The training data is used and an attempt is made to make a forecast, a prediction for the correct answer or a correct answer. Now with what is now the new technology, this Influence Time Compute, is more or less an extension of this. So it goes beyond this regular inference and is not actually a new technique or technology.

Rather, it is a collection of methods. For example, the so-called adaptive calculation. This means that depending on what kind of question I ask as a user or what the task is for the large language model, the model decides adaptively whether it will use more computing power.

Fabian Heinrich (03:20)
I can bring me an example here, if you are now somehow in sourcing or maybe.

Dr Klaus Iffländer (03:32)
Yes, good question. Imagine you have different offers that you want to compare with each other. And that's something that large language models can already do today, that you simply enter three PDFs, for example, from three suppliers who have responded to the same tender. And let's imagine that the PDFs are now very extensive, then they are all analysed and ...

This is now a relatively complex task, because the LLM is supposed to extract certain dimensions on the basis of which the offers are compared with each other. And the LLM would now decide, okay, I need to use more computing power for this, because there may be several correct answers here and it's not entirely clear. Or certain things have to be extracted first. And so it would be adapted to the task. It's just a more complex task than if I were to ask... "Chat GPT, what are you or who made you?" Chat GPT knows that

Fabian Heinrich (04:33)
Yes.

Dr Klaus Iffländer (04:37)
Exactly, that's what you mean,

Fabian Heinrich (04:39)
Exactly, for me it would be difficult to grasp how to allocate the computing power.

Dr Klaus Iffländer (04:47)
Exactly, so that's that. And the next thing I wanted to briefly explain was what you do with the computing power. Because if I simply give more computing power, the answer is not automatically better, but the question is what can I do with additional computing power. So if I have now bought even more Nvidia graphics cards. And there is a technique called self-assessment and improvement.

And it works in such a way that, if we stay with the example, not one answer is considered to be the correct one, but several answers are generated simultaneously. And now it is also clear why this naturally requires more computing power. Because let's imagine that the LLM now compares the three offers and says that one is perhaps the cheapest and the other has the best conditions in terms of delivery time.

So the provider is perhaps the fastest and perhaps it compares in this way. And from this perspective, several answers are correct. Other dimensions are of course conceivable. For example, the lowest probability of supplier failure or a combination of factors. And so the LLM would first generate several answers and then have an algorithm to decide which of the answers is the best or has the highest probability of being the best result and would then return this to you as the user? Is that understandable, Fabian?

Fabian Heinrich (06:27)
Yes, I think that has explained it quite well, that you can compensate for this constraint of too little data by a more adapted, better allocation of computing power.

Dr Klaus Iffländer (06:46)
Exactly, that's how you can see it, depending on the task. LLM grows with its tasks.

Fabian Heinrich (06:54)
Yes, I think that's a good keyword, how you can generally act with LLMs. Now, of course, the question is, if I apply inference correctly, which is what the provider of the vertical agents has to offer, how do I generate benefits from it? We have already talked about computing power allocation.

Dr Klaus Iffländer (07:23)
Exactly. Yes, of course it's important that the software manufacturer, i.e. the designer of this agent, also understands the customer very well. In other words, you have to cover these use cases in the best possible way with the computing power that you can allocate. For example, what you can do is to

Expand contextual information. Imagine you have an LLM and it has been trained on supplier contracts. And normally, if you ask a question like, what payment terms apply to supplier X, then it would just search the training data and answer that directly.

But if you now want to expand the context and give even better answers, then you can use this intensive inference, i.e. these techniques that we have just discussed, and you can, so to speak, expand this search or the generation of the answer so that, for example, the entire contract history of the supplier is searched and current market data or current news about this company and could include all this information and thus give even better and context-related answers.

So again, ... to tie in with the real-life example. This is how you would generate these benefits.

Fabian Heinrich (08:54)
And of course that helps me to make better decisions if I then have a higher-quality agent. And the hypothesis would then be that I could solve and handle significantly more complex tasks with it, because the agent is somehow even more intelligent, but can also perceive the facts in a logical way.

Dr Klaus Iffländer (09:22)
Exactly. Firstly, because of the techniques I explained, i.e. simply more computing power. And secondly, of course, you can process even more data with the computing power and thus bring even more context into the answers and the data analysis.

Fabian Heinrich (09:37)
And what other examples would there be?

Dr Klaus Iffländer (09:40)
For example, external databases or APIs. For example, you could search current news sources, because sometimes companies are currently in the media or in specialist publications. You could tap into these. Or available APIs that have been connected, for example for price information or market data, availability or supply bottlenecks.

All these things could then be integrated into your answers via intensive inference and would simply give the user a much better information basis for making decisions.

Fabian Heinrich (10:23)
And where could the whole thing develop? So assuming I have better and better and more intelligent agents, where could it go? At some point, computing power will also be a constraint. So the first constraint was the data, you solve that with synthetic data, then at some point it's the synthetic data constraint, then you go to the computing power, where...

If that stops, where is the further development or how do you see the future?

Dr Klaus Iffländer (10:56)
Yes, well, at the moment, I think everyone has heard that this graphics computing power is difficult to get. Even if you pay the high prices for it, it's not necessarily available. Although Nvidia is working hard on it. But that is currently a limiting factor. However, I believe that this will become less and less of a problem. Because the...

Nvidia's competitors are not sleeping, but are also developing corresponding graphics computing power. And so the prices for this computing power will stabilise over time. Other players will enter the field, so prices will stabilise. And that's why I believe that this intensive inference will benefit the users of agents, or vertical agents in particular, in the foreseeable future.

So we will see these advantages in the near future. Even if hardware and software are still a limiting factor now, this will diminish over time. That's my prediction.

Fabian Heinrich (12:05)
Yes, in contrast to this, as we mentioned at the beginning, there is another method of using the limited data factor with computing power.

We've already mentioned it a few times, it's important to improve the logical thinking of the agents and the other topic would be reasoning. I would like to go into this again in comparison. Perhaps we could start very simply, what is meant by reasoning and there are always these English buzzwords, perhaps we could explain again what that means in German.

Dr Klaus Iffländer (12:49)
Yes, of course. Reasoning in German actually means deduction or logical thinking. In other words, given certain facts, you can use them to deduce new facts, so to speak. And that's one thing that LLMs aren't currently very good at, but at the moment they mainly rely on their training data, which is admittedly very extensive, which is why they often give very, very good answers.

But they don't think as strictly logically as we humans do. Let's take an example. If we say that all apples are fruit and fruit contains vitamins, then an LLM would first learn these two pieces of information. But if I now ask: "Do apples contain vitamins?" You would have to logically link these two pieces of information. Firstly, that apples are fruit and therefore contain vitamins as fruit.

Fabian Heinrich (13:50)
A classic rule of three, so to speak.

Dr Klaus Iffländer (13:54)
Exactly, deduction. LLMs still have their problems with that these days. However, it is increasingly being incorporated into the new models that are now coming out from the major providers, so that we can use it to derive new information. For us as users, this is a big step forward, because not everything has to be taught to the LLMs using training data, but the LLMs are increasingly being put in a position to extract or derive new information from the given training data, which is often very limited.

Fabian Heinrich (14:42)
And I mean, we've just spent the last 10 or 15 minutes talking a lot about inference. How does that relate to inference? Can I use both in parallel or are they such different approaches?

Dr Klaus Iffländer (14:57)
Of course, it also consumes computing power. So, in order to search through this given training data and filter new conclusions from it, you naturally need this computing power from the graphics chips. And from this point of view, you can of course use both. You can generate several answers. Maybe they are partly based on the training data, partly based on logical conclusions or both together.

In this way, a larger information base can be processed or the information base can be expanded.

Fabian Heinrich (15:38)
So reasoning would be another extension of the Infarance in your opinion.

Dr Klaus Iffländer (15:45)
Yes, you can see it either way. It's just another technique to get even more out of the given data, just like inference.

Fabian Heinrich (15:54)
Exactly, so with regard to technology, perhaps I should go into it again when reasoning, you have explained deduction quite clearly. The counter-example to this would of course be induction. If you then read in more detail, you also hear about abduction and then you see various great graphics, where the Chain of Sword Prompting and the Graph Neural Network are used, which is then also described with GNNs.

So, it sounds very adventurous at first and the more you get involved, the more exciting it is. Maybe you could explain this in more detail and how you can use a Chain of Swords Prompting and GNNs.

Dr Klaus Iffländer (16:42)
Yes, with pleasure. So we've already explained deduction, that thing with the apples and the vitamins. Induction works the other way round. So you don't have the general observation first, but the specific one. So, for example, you have a supplier that you have ordered ten times and the first time he fails. So he doesn't deliver anything, the second time he doesn't deliver anything, the third time he doesn't deliver anything either.

Fabian Heinrich (16:46)
I have to get out now.

Dr Klaus Iffländer (17:10)
Then you could inductively conclude that he won't deliver at all. You take a specific observation and generalise it to the entire situation. That would be induction. Then the abduction is such that you also conclude from a specific observation and say, okay, the supplier has failed once. And one theory would be that this supplier always fails.

Fabian Heinrich (17:15)
Mhm, yes.

Dr Klaus Iffländer (17:40)
Then that would fit the observation. But then it might deliver on the second order and not on the third. Then you could adapt your theory again and say that perhaps it delivers fields every second time.

Fabian Heinrich (17:54)
Ultimately, these are different logic sequences, deduction, induction, abduction. The question is, of course, how does this relate to the Chain of Thought-Prompting and the Graph Neural Networks?

Dr Klaus Iffländer (18:06)
Yes, chain-of-thought prompting is a prompting technique. You explicitly ask the LLM to explain the thought steps. And that, so to speak, forces the LLM to think even more about what steps need to be taken in order to achieve good results. So it's a kind of prompting technique...

Fabian Heinrich (18:11)
Mmm.

Dr Klaus Iffländer (18:35)
to get better results from the given training data. It's not always very logical, but it challenges the LLM a bit. Exactly. was the other one? Graph neural networks, you had asked.

Fabian Heinrich (18:48)
Yes, now of course the... Exactly, right, yes, these GNNs, you hear them as buzzwords here and there.

Dr Klaus Iffländer (18:58)
Exactly, it's about modelling relationships between entities that occur in the training data or in the task. For example, relationships between suppliers and products and their corresponding contracts. You can then model them like this. And once you have modelled them like a network between these objects, the LLM is able to derive complex conclusions.

Fabian Heinrich (19:31)
And if you look again at the exciting areas of application, they differ from inference. With inference, I was able to map relatively complex issues. Now somehow in risk management or supplier selection or in comparing offers. Do I have similar possibilities in reasoning or does the whole thing go even further?

Dr Klaus Iffländer (19:56)
I would say that the use cases are actually the same, because these are the typical use cases that we as people in procurement are now also confronted with.

Fabian Heinrich (20:06)
It's always about complex issues in the use case and where logical thinking is required. And in both cases I can use inference or reasoning to optimise the LLM so that it can think logically and perceive complexity in the same way as an experienced category buyer.

Dr Klaus Iffländer (20:33)
Correct. And in such a way that the LLM can then also derive things. For example, it could then logically derive certain risks from a contract analysis by drawing conclusions from the obligations. If it then says that the contractor undertakes to do such and such things and, conversely, the client undertakes to do such and such things, the LLM can then logically draw conclusions from this.

Fabian Heinrich (20:53)
Nope.

Dr Klaus Iffländer (21:02)
and derive risks, for example. What happens in such and such situations? The LLM could then identify such things on its own and, as hopefully still the leading retailer, point out to us what risks there are...

Fabian Heinrich (21:21)
That would be my next question. Haven't I got the logical thinking into my vertical agent somewhere, but can I really trust it 100 per cent or shouldn't the category buyer still act accordingly as the last resort?

Dr Klaus Iffländer (21:40)
That's really the question of how far you want to automate it. Other considerations often play a role. Of course, you always imagine that people are the final authority. And that's certainly often true, but often, for example, if it's an established process and it's just a follow-up negotiation to something that's already been completed ten times, you might take the step of automating it completely. I don't know, how do you see it? Do you always need people in the loop? I think there are always cases that

Fabian Heinrich (22:13)
Yes, I think it depends on the processes. I think when we talk about topics such as supplier onboarding, we talk about topics such as supplier qualification, especially compliance topics, I can of course structure that very well, and somehow also proactive risk management.

I believe that when it comes to strategies for a category, when it comes to long-term decisions about which supplier to work with, perhaps soft factors also play a role or factors that are based on the long-term relationship with the customer or supplier and there are also interpersonal factors. I mean, when we are selected by customers as software here, there is always a factor of cultural fit or human fit. So I think that's where we naturally reach the limits of this reasoning and decision-making ability.

Dr Klaus Iffländer (23:18)
Yes, of course, you can hardly derive cultural fit logically, I think.

Fabian Heinrich (23:23)
And I think that at the end of the day, supplier relationships are very successful if they work in the long term. So I think long-term supplier development, or supplier relationship management, I think people still have to be in charge.

Dr Klaus Iffländer (23:47)
Yes, I think so too. But I think there will be a lot of added value in the interaction. For example, the purchasing strategy is just one of those things where a lot of information will give you a very good decision-making position as a basis. Where logical thinking will also help you, for example, when the LLM derives what the advantages and disadvantages of certain purchasing strategies are. But in the end, you will still make the decision as a human being.

Fabian Heinrich (24:18)
I mean, that's somewhere where we want to go. I mean, the first step was to digitise everything. That we at least have a digital system of record. If an employee leaves the company or is on holiday, then we have everything in our system of record. The second thing would be to automate somehow. Whether you automate processes like this or through RPAs or now through...

APA, Atlantic Process Automation, that remains to be seen, but we are automating, thus creating more freedom, the employee can do more strategy. In future, we can also have the simple strategic tasks handled by vertical agents and the LLMs, which of course gives the category more and more freedom for truly strategic tasks, for the highly complex tasks and above all for supplier management in terms of the relationship, i.e. managing and developing this long-term supplier relationship.

Dr Klaus Iffländer (25:25)
Yes, I think so too. In fact, my personal opinion is that most of the added value is created precisely at this interface. So where LLMs or intelligent systems in general... work very closely with humans and where precisely this symbiosis of intensive computing power, which understands a lot, provides a lot of contextual information and then human intuition, and together I think it's the best team you can have.

Fabian Heinrich (25:57)
I think that's the perfect closing, Klaus. I think we had another very exciting episode. Today we got a bit more technical with the topic of inference and reasoning. But then, all the more surprisingly, we came to the conclusion at the end that it doesn't work without people and that people are actually the key to successful and long-term supplier relationships, but can of course benefit enormously from the new technologies here.

With this in mind, thank you once again and we look forward to the next instalment.

Dr Klaus Iffländer (26:32)
I have to thank you. See you soon, Fabian.

Also available on