RAG and function calling (Tools) - Langchain4j on Spring

Hello everyone guys. I’m new here and I’m a newbie on this topics, I’m searching feedbacks on some doubts I have while developing a spring boot application using langchain4j 0.29.0.

Don’t know if it would be a silly question, but, my doubts concerning RAG together with function calling (Tools).

Let’s suppose I’ve setupped a ChatAssistant to use a generic DefaultRetrievalAugmentor with an EmbeddingStoreContentRetriever as retriever.

The same ChatAssistant is also enabled to some tools through a class that exposes some methods annotated with @Tool that solve an equation (just an example, not real use case).

Now, when I invoke the chat method saying, as an example, “solve x^2 + 2x + 1 = 0”, by debugging the undergoing calls to DefaultAiServices, I understood that what happen is:

  1. the Augmentation is done
  2. the message is sent (messages being memory mandatory)
  3. the message response contains tool execution
  4. the tool is executed
  5. the answer of the tools is add as message
  6. stop or go back to 2

In my use case the tool response is a prompt built after calling some external services to gain information.

Said that, the RAG is done for the very first message promp, the one that triggers the function calling, but my question is:

“Wouldn’t it be more useful if the RAG was done on the second prompt? the one created via function calling?”

I can’t understand if, as it is, the RAG on the first message can have significant utility, or would it be better to do RAG “by hand” directly on the second prompt?

What do you think?
Thanks to everyone who will have a little patience to tell me their opinion, and maybe explain to me where I’m going wrong.

Carmelo

1 Like

I’ve never done “tool” use before, but probably the RAG is used at the beginning to be sure to inject context information in at the beginning so it knows up front what it’s working on. Then probably any future calls are also sending back the full conversation history every time, so that same context is sent always. I’m just guessing.

I’m a spring boot developer too by the way, but I’m just using directly HTTP calls and not langchain4j.

1 Like

Hi @wclayf, thanks for your feedback. That’s my guess too honestly.

I just taught that in my real use case scenario it would be more useful to have RAG on the function calling created prompt rather than the first.

Let’s say my first input is “estimate 1234”.
Then the function calling through an external ws gets the resource 1234 and creates the prompt “given <content of resource 1234> and <similar resource to 1234 retrived from embedding store>, do something.”
I thought it would be more useful to have:

First approach:
1 - “given this context , given <content of resource 1234> and <similar resource to 1234 retrived from embedding store>, do something”
2- AI response

rather then

Second approach:
1 - “given this context , estimate 1234”
2- “given <content of resource 1234> and <similar resource to 1234 retrived from embedding store>, do something”
3- AI response

(that btw I could do by hand, but then having the agumentor configured I would augument twice the same conversation snippet)

But probably being the full conversation sent the two approaches could be similar.

I’ll wait for other feedbacks and if nothing happen I will accept your solution, thanks.

Carmelo

1 Like

You could check inside the github codebase and see if there’s additional logging you can turn on (like full debug logging) and maybe get the Langchain4j to print out it’s full API query JSON too, to see what it’s really doing.

I might try Langchain4j. I recently tried to use the new Java 22 Panama, which theoretically is a cleaner way to call Python, or other languages, directly from Java in shared memory/variable space, but I didn’t get it to work yet, because I decided to only give it one day of my time, thus far, and was getting errors. Langchain4j might be easier to integrate with Java (obviously!) but in my mind the “real” langchain is the Python one.

1 Like

Hi, could you please share more details about your use case?

Usually, RAG is done on the original query from the user, but this is definitely not a must.
Using tools to retrieve more information on demand is also a good strategy. But without knowing more details it is hard to advise something specific.

BTW, AI Services in LangChain4j is a high-level API for building LLM-powered applications, which should be suitable for 80% of use cases. That flow (User → RAG → LLM → Tool → LLM → User) is very common, but might be limiting for some use cases. In the long run, we will be making this flow more customizable, but for the moment you could use low-level API (directly using ChatLanguageModel, ChatMemory, DefaultRetrievalAugmentor, ToolSpecifications, etc) to build any flow you need.

If you need to do RAG using the output of the tool as a query, another (maybe not the prettiest) option is to inject DefaultRetrievalAugmentor (or just EmbeddingStoreContentRetriever) in the object that has a @Tool method and before returning from a @Tool method, call DefaultRetrievalAugmentor/EmbeddingStoreContentRetriever and get more content from it. Then, return original tool output + content retrieved from RAG.


Edited:

Another option that comes to mind is to, instead of using tools, implement a custom ContentRetriever that will retrieve content from external API and plug it into DefaultRetrievalAugmentor.

If you have a limited set of external services, you could implement one ContentRetriever per external service and use LanguageModelQueryRouter to route user query to one (or multiple) of them. Here is an example.

Or, you can implement a single custom ContentRetriever which will transform/route user query using LLM or any other custom logic.
Classification might help here.


Hope this helps. BTW, we have a discord server where you can get more help.

Indeed, enabling logging should help a lot.

Hi, thank you for your feedback.

My use case is pretty the same of what I said in a previous message:

Let’s say my first input is “estimate 1234”.
Then the function calling through an external ws gets the resource 1234 and creates the prompt “given <content of resource 1234> and <similar resource to 1234 retrived from embedding store>, do something.”

So my tool executing is responsibile to find relevant information wrt the question and generating a prompt that would contain the question fetched resource from ws and the embeddings fetched similar resources, upfront to a simple “estimate this” question.

I think customizing the flow could be useful.

I will try to leverage the low-level API to achieve what I want and understand whether it’s a better approach to RAG the tool ouput or not.

BTW, yeah, I know I can inject the retriever and augment the tool output “manually”, but being this assistant the generic Chat one and having configured the RAG on it, then both the “automatic” and “manual” augmentation would be done, resulting in a lot of useless tokens.

Thanks!
Carmelo

I will have a look into the other option!! Never heard about, thank you.

Carmelo

Can you try this with Tools4AI library . In your situation it will work like this

<dependency>
    <groupId>io.github.vishalmysore</groupId>
    <artifactId>tools4ai</artifactId>
    <version>0.9.6</version>
</dependency>
AIProcessor processor = new SpringOpenAIProcessor(applicationContext);
String prompt = "I need to add 2 numbers";
Object object = processor.processSingleAction(prompt);
String answer =  processor.query(prompt,(String)object);

The actions can be java methods, Pojos, Httprest or shell script.

(Disclaimer - I am the developer of Tools4AI project)