Summary created by AI.
The discussion in the thread revolves around the concept of building a custom chatbot that can answer queries based on data available in a database. Abhi3hack initiated the discussion by expressing the issues faced while creating such a system, predominantly using OpenAI GPT-3.5 and a PostgreSQL database. He shared two trial methods, one using langchain and one without. Yet, encounters problems with token size in the case of langchain. He also expressed concerns about fine tuning and embeddings, unfamiliar with how embeddings work and worried about user privacy due to the potential requirement to provide all database information.
Muindemwanzia919 shared a similar problem with a goal of creating a chatbot based on data stored in Google Drive. In response to this, Abhi3hack asked about the approach muindemwanzia919 was planning to use.
EricGT pointed Abhi3hack to a video about building systems with the ChatGPT API, and Abhi3hack found it relevant to his needs.
Muindemwanzia919 shared their code, which involved accessing a Google Drive and answering queries based on documents stored in it, and asked for suggestions on how it could be improved.
Bil.french acknowledged abhi3hack’s query regarding privacy concerns with fine tuning or embedding and proposed checking an open-source tool called pgvector. He emphasized understanding the project before deciding the approach.
Abhi3hack clarified that the goal is to build a chatbot that can answer user queries based on the analytics data in the database, similar to what Hubspot has accomplished.
Nelson provided three potential solutions to overcome the limitations of the Large Language Model (LLM). They included managing the input length, using an embedding data storage system, and fine-tuning the model with custom data. He also mentioned his project, Superinsight, which could be used to create a chatbot.
Abhi3hack was unclear about Nelson’s suggestions and asked further questions regarding how to effectively reduce token size, the functionality of embeddings with numerical data, and user’s privacy concern with fine tuning.
Following more discussions about best methods of data presentation and privacy SomebodySysop emphasized the necessity for abhi3hack to understand embedding and the difference between embedding and fine-tuning, recommending several resources. They added that abhi3hack would be able to understand the answers to their questions better once they understood these concepts.
In his later posts, SomebodySysop provided insights on the difference between a keyword search (sql) and semantic search (AI) and suggested their suitability under different requirements, clearly identifying the advanced capabilities and limitations of AI.
Summarized with AI on Nov 24 2023
AI used: gpt-4-32k