Consolidating Limitations of current generation LLMs developed by OpenAI

Understanding the limitations of rapidly evolving technologies is crucial for making significant progress when these technologies are improved to overcome these challenges.

Take the development of token limits in large language models (LLMs) as an example. Initially, the 4,000-token limit hindered the development of even basic applications due to restricted context retention. Progressing to an 8,000-token capacity allowed for longer coding but continued to challenge efficiency. A significant advancement occurred with the 128,000-token model, which substantially minimized context issues and facilitated the creation of more complex applications.

Another example to consider is the experience of users with “lazy” model responses. Those familiar with these nuances can adapt more swiftly to new models compared to those who haven’t encountered these issues.

For individuals experienced in coding with LLMs, especially those who have navigated past token limitations, the jump to a 128k token limit marked a notable improvement. This enhancement enabled them to meet their objectives faster and more efficiently than newcomers who lack the background of these past constraints. However, it’s important to acknowledge that this may not always be the case.

In this community, many have shared their daily challenges with LLMs, but there hasn’t been a centralized platform for these experiences.

This thread aims to consolidate our collective insights and learnings.

I’ll begin with an observation: the current delay in generating the first token by the model is too significant for applications involving real-time conversations over audio channels, like phone calls. If resolved, this improvement could pivot the focus from chat support agents to telephonic support agents in business applications. Something like Google’s Duplex project, where AI was used for making reservations.

1 Like