Complex reasoning with LLM: prompt chain versus chain of thought

Table of Contents

Introduction to Complex reasoning and prompt chain versus chain of thought

Large Language Models (LLMs) are central to understanding the innovations of Gen AI. However, when talking about «complex reasoning», it is necessary to analyze how LLMs work, how to optimize their cost and response speed, and what capabilities are sought to be exploited in favor of improving the competitive intelligence of enterprises.

The adoption of generative AI in enterprises is accelerating at an astonishing rate. In fact, in one way or another, all industries and companies are trying to add their innovations to improve productivity and profitability.

As an example of this phenomenon, investments in AI-related technologies are expected to reach nearly $200 billion by 2025, underscoring the scale and economic impact of the phenomenon, which will go beyond employment and productivity dynamics.

In this context, Large Language Models (LLM) are a central element of Gen AI and protagonists of this data-driven transformation. Their ability to generate responses similar to human dialogue or other natural language inputs has made the importance of LLMs in the enterprise very popular: whether to improve customer service, to provide the enterprise with smarter information for its internal processes and balanced scorecard, or to optimize the company’s marketing and financial strategies.

Without a doubt, the impact of connecting enterprise data to LLMs is transformative. It aims to break down traditional data silos and enable departments such as marketing, finance, and human resources to access and interpret data with unprecedented speed and efficiency.

This revolution in data management is converging on «competitive intelligence» that enables teams to act in real time, drive strategic workforce planning, and accelerate overall business productivity.

But does the company really know how LLMs are developing these «complex inferences» to provide increasingly sophisticated answers? And, most importantly, do its executives know the capabilities and limitations of these systems in the corporate environment, in terms of which tasks they can solve more efficiently and which they cannot yet?

For the moment, it is premature to say that the LLM does not reason, but it is important that we understand that it is trained with sequences and that the LLM is essentially statistical, it works with probabilities. And the results will depend on the quality of the data with which it is trained. Sometimes it is inevitable that these models hallucinate, that is, that they give wrong answers.

What resources seem to provide this type of model with complex reasoning capabilities?

Clearly, there are three elements that provide the LLM with these capabilities:

Chain of Thought: This has to do with how LLMs are trained, with texts where the reasoning sequence is explicit, i.e. you have a problem and you provide the output. For example, let’s imagine an equation where 2 times X + 5 equals 10. What is the real answer? X is 2.5. So we can train the LLM like this: when we have this problem, the answer is X equals 2.5. This can be a sequence in a training text. With the Chain of Thought this is redefined and configured: First step, move the 5 next to the 10, then it becomes 2 times X equals 10. Second step: 2 times X equals 5. Third step: X equals 5 divided by 2, then we put the 2 dividing. And Last step: X equals 2.5, which is the final answer. We write the whole chain of reasoning in the text and teach the LLM as if we were teaching it very slowly and gradually to a human being. Several papers (see example 1 and example 2) have shown that when the LLM is trained in this situation, it is able to produce much more logical sequences. This largely avoids hallucinations. This example of the math problem clearly shows how the model learns. If we train it with poor quality data, the LLM freaks out; if we train it little by little, step by step, the same LLM responds better. That is why many of ChatGPT’s answers are always step by step, not only because it is a complete answer, but because the sequence is ordered that way, you go from less to more. ChatGPT responds like a diligent student, because the diligent student orders the answer by answering in this way, and by ordering the answer, he responds better. And this phenomenon clearly extends to something called “prompt engineering” and the way prompt templates are developed and designed, with examples of templates like RISEN, RTF, COT, or RODES, among many others, which are excellent examples of how to ask for something, how to give step-by-step instructions so that the models respond accurately. It’s not that they’re smarter, but more effective at answering a question.

Prompt Chain: This second element is closely related to Reasoning Cost. Now, this pattern is the prompt chain, which is multiple calls to the LLM. There are even frameworks that allow you to design prompt chains, so we are already talking about workflows and agentic workflows. The idea is to think about how to give some entity to the orchestration of all these prompts. Although the idea of an agent has to do with some autonomy, it comes from a slightly different place. If we have an LLM that has to do 5 steps, you make 5 different prompts, each with its template, and you chain them together. This is a way of performing more complex tasks and classifying elements into categories: for example, it could be company safety and hygiene reports, personnel files, or customer data. This is a very specific way of giving LLMs the ability to classify more complex elements in a chain of requests. The longer the chain, the more expensive it is. And here in the prompt chain the issue of cost in time appears, because there are several calls, so we talk about latency and response time. It is not only the number of tokens that the model returns to us, whether they are many or few, but also the fact that we can have 500 milliseconds for each call to the LLM and our prompt chain has about 8 steps. So 8 steps for 500 milliseconds plus everything we have in between, it takes us 5 seconds to respond. That is not acceptable. So speed is a very important issue to solve in LLMs. At the moment, all the software development in Gen AI has to do with developing prompt chains, linking those prompts. Langgrahph, for example, is already a company that has a pricing of services, also Amazon bedrock, and it has to do with prompt flows. There are already consolidated business models in the industry. It is not that the LLM is now a «big brain» that does everything, but rather that different prompts are put together and linked to give concrete and at the same time complex answers. So much so that we often ask the same thing, but the workflow design is done by a programmer.

Tooling or functions: This is a topic that is rarely discussed in LLMs, but it is fundamental. Let’s look at it with an example. We enter the Open AI chat and ask how long it takes to get from the airport to the hotel where we are staying. We use a prompt, but what does ChatGPT do internally? It calls external tools, in this case geolocation tools. It is like going to Google Maps and doing the same search. But we are talking to ChatGPT, not to Maps. Does ChatGPT have the information that Google Maps had in the model? What it does is call a tool, it understands that it has to do «tooling» for that answer. It can call a search engine, a geocoder or whatever. But the LLM itself understands what kind of search it is. So we as users need geolocation information. And when we ask it what the weather will be like in this location in the spring of 2024, it tells me that we can expect a warm and somewhat humid climate. And it gives us an average temperature. Then we ask it what to wear, and then we don’t know if it’s going to call a tool or not. But to look up the weather, it looks up an external service.

The important point is that the basic technology of LLM is solved, but if we also approach LLM with prompt engineering, prompt chains and tools, with external functions, the capabilities of these services are surprising.

It is understandable that people perceive it as «general AI», because all this combined engineering works well. The fact that with just a text or audio interface, the model recognizes that it needs to call Google, Geocoder or Weather Channel and gives us a quick answer, makes us think that ChatGPT is a business-to-consumer product that will give a competitive company more than one headache. And you start to trust the model a lot.

But what is really behind it is not general AI. There are a lot of new things that you can do as a user, but you have to understand the limitations and what companies can and cannot do, what you can really expect from these models for the company, and how to find the balance of use according to the current needs of each organization.

7Puentes: Your master key to understanding the complex reasoning of LLMs

At 7Puentes we have more than 15 years of experience, more than 50 satisfied customers and more than 100 successful projects. We have experience focused exclusively on the development of solutions based on AI and Machine Learning.

Our services are tailored to the needs of each company. We believe that AI can transform any organization, regardless of its maturity, creating real value and optimizing existing processes.

If you want to realize the full potential of Generative AI and LLM for your business, contact us for a consultation.