Biggest difficulty in developing LLM apps

Hey Champs

I am curious what you consider the biggest difficulty in developing LLM apps today?

What causes you the most pain in your work and what is the biggest nuisance?

1 Like

Chunking and retrieving the right amount of context is by far the most complex activity for me.


Thanks, Joyee!

Interesting! Could you give an example of use case and specific difficulties you face there?

So, think about a use case where you have a PDF with complex tables and images. How do you ensure an efficient layout based chunking. None of the traditional techniques can do it 100% right. Also after you chunk, when you do the retrieval, how do you ensure that you are retrieving the right context for the LLM. Basic semantic retrieval fails for complex use cases where the answer may need to be formed from different set of chunks. Hybrid search gave me good result but still is not 100% consistent.

Agree! How do you solve that?


Have you looked at grobid? It does a half-decent job of segmentation on most ARXIV/SemanticScholar papers I have thrown at it, returns a nice XML parse.
I agree, fixed ‘n token’ length chunking seems dumb.
Also, I use probing with random query subsets as well as the full query.
100%? hmm. I’d love to hear about that if you find it…

btw - by hybrid I assume you mean mix of keyword and embedding? doing anything graph-based?

1 Like

I guess this was posted in the wrong time before the Christmas (Merry Christmas everyone btw!), so upping, as I’m really interested in this topic.

@cass @RonaldGRuckus @curt.kennedy curious to hear what you think

1 Like

Generally speaking, RAG is the holy grail for LLMs. I’m seeing more specialized hardware coming out which means it won’t be too far-fetched to be running a powerful LLM hooked up to my house as an home assistant. Instead of overloading it with knowledge I can manage a much smaller model that’s “street-smart” and use RAG to augment it’s knowledge.

Based on current events with publishers I am not interested in having GPTs as my own personal assistants anymore & prefer to run them locally.

I think a lot of developers including myself are sitting ducks waiting for Assistants to come out of beta and have more functionalities included. Without even having a roadmap it’s hard to know what to expect without speculating.

For public-facing chatbots I think we will shift towards specialized LLMs as well. Chevrolet were one of the few that tried running a customer service chatbot and it was abused into the ground :joy: people were bypassing it’s instructions to work as a free GPT-4.

Speaking of GPT-4. Assistants are next to useless for now as well until token management is taken care of. It can be ridiculously easy to token-bomb people, even by accident. Function-calling or retrieval? Easy. The potential of over $1/message is insane.


Thank you, Ronald!

Which are the main challenges you see with RAG these days?

I’m also waiting for assistants to come out of beta, but the problem I see with them is that they are great for some use-cases where conrollability on a step level (in multi-step dialog flows) is not that critical, but I don’t see a way for them to be fully controllable.

I’m wondering, why do people get so hung up on assistants? I mean there might be something I’m missing, but to me it seems like assistants are (is?) a crappy version of agents on training wheels.

I understand that they’re a convenent way to get started with the concept, but I can just as well imagine that assistants and custom gpts will disappear again in a couple of weeks/months and be replaced (or not) by something else.

dealing with that crippling existential dread that most of us will be out of a job in 10 years.


I don’t think agents (no matter which form - Autogen, SuperAGI, Assistants API) won’t go away soon, but in my opinion, what the general narrative is getting wrong, is that agents are not a panacea. While declarative approach works great in some use-cases, in many others we still need imperative approach (I call it anti-agent approach).

dealing with that crippling existential dread that most of us will be out of a job in 10 years.

Ha, I also have this anxiety, as do many people in the space I believe. How I try to battle it, is to hear more opinions on the topic and understand what will be that post-job world.

1 Like

If history’s any teacher, it’s not gonna be a happy world.

I call it anti-agent approach

do you have any literature on that/ do you want to expound on that?

1 Like

do you have any literature on that/ do you want to expound on that?

I’m actually writing a short essay on that topic right now. Once, it’s ready, I’ll share.

If history’s any teacher, it’s not gonna be a happy world.

Hm… Could you elaborate examples of something similar in history? Nothing comes to my mind.

1 Like

[edit: i have deleted this comment because I believe political speech has no place on this forum. suffice to say, many horrific things have happened in situations people lose control over their value to society]


1 Like

I frankly don’t see a parallel here. Not saying USSR was good (I don’t have an opinion on this), but everyone was busy with some kind of jobs there, there were actually laws against not doing anything. While the era we are entering seems to be the exact opposite.

well, over the medium term people will have to compete with machines for jobs. As machines become cheaper and more capable, the higher the bar will be set for jobs that can yield more than minimum wage.

and minimum wage will be set by a machine. Back of the envelope calculations I did a couple of months ago estimated that a machine replacing an administrator would cost around $6 an hour. That will be the market value of labor. As technology becomes more efficient, that price will sink. There will likely be an equilibrium that will prevent it from bottiming out completely, but if 90% of the population is capable of only performing a job that is worth less than, say $3 an hour, no amount of raising the minimum wage is going to make an impact here.

Of course, it’s possible that commodities and consumer products will become cheaper. But it’s also possible (and in my opinion more likely) that advanced industry will have little interest in serving a population that has no capital.

1 Like

Very interesting! Though off-topic. Should we start a new one specific about this post-labor world? :slight_smile:

very :laughing:

not sure that this forum’s the best place for this haha

1 Like

This is for sure a big one. And the problem compounds when you are dealing with multiple types of documents (legal contracts, government regulations, sermons, speeches, religious texts, policy documents, etc…), each requiring their own specific embedding configuration.

But, the other huge problem when dealing with large datasets is getting comprehensive responses.

An example I ran into, when dealing with Hollywood labor contracts, was when I asked about the rules for holiday pay for performers. Because of the document limit, the search brought back results for background actors and stunt performers, but not singers or dancers. That was about a 25 document limit. When I increased it to around 50, then I got a more comprehensive response.

I mean, in a RAG scenario, you deal with context document limits in order to control costs. The problem is that some questions might be answered in 10 documents, but others may require 30-40. The most comprehensive answer may require that you bring back 50+ documents. But many detailed answers to less general question may only require less than 10.

So, how do you deal with something like this? The first natural solution is to use the large context windows (gpt-4-turo 100k or claude-2 200k) to stuff as much text into into the model as possible. But then, you run into these problems:

So, riding that fine line between the model limitations, cost restrictions, and end-user expectations is a beast.

Probably my biggest problem to date.


Such cases can be dealt with the metadata. In your example you are basically retrieving a lot of noise which both decreases the quality and also increases the cost significantly.

I think the biggest problem with RAG now is not in its limitations, but in the fact that developers skip the database architecture phase which is not less (and maybe even more) critical compared to relational databases. In most of the cases the need to retrieve 50 documents would mean that your documents are either 2 short or you don’t hav a proper data architecture in place.

Big context windows would barely solve your problem, as the attention spans of LLMs are still very short (even inside the context window).