Open AI embedding with 3072 dimensions

cliff4 · July 2, 2024, 5:50am

@RonaldGRuckus
I tried so many methods to perfect my retrieval algorithm but I never tried 0% overlap. Will definitely give it a shot. Although I am indeed trying to automate the embedding process for technical API documentation retrieval, I do have complete control over the API references I provide as docs.

_j · July 2, 2024, 7:04am

You don’t have to be bound by “rules”.

You can embed the chunks split at divisions, but then the vector database can provide text that goes beyond the embedded boundary.

You can promote in-document by adjacency, and decide when on-topic context should be neighboring chunks concatenated.

You can rebuild documents out of the ordered (not ranked) chunks of relevancy, and give the document a pregenerated AI summary.

You can use your imagination.

joyasree78 · July 2, 2024, 1:15pm

I agree with the no overlap part, I initially did that and it was causing issues, then moved to 300 tokens (no overlap) and then do the parent retrieval. Problem is that I still have context overlap among the chunks.

SomebodySysop · July 4, 2024, 5:04am

No overlap required in this approach. Just atomic ideas: Using gpt-4 API to Semantically Chunk Documents - #166 by SomebodySysop

Also, check out the @sergeliatko approach: Using gpt-4 API to Semantically Chunk Documents - #10 by sergeliatko

sergeliatko · July 4, 2024, 8:21am

Great words. Even the sky is not the limit. Personally I try to be as close as possible to what the business logic requires as info accessible via RAG

Topic		Replies	Views
Better performance using text-embedding-3-large? API embeddings	5	4599	February 7, 2024
Are OpenAI text-embedding-ada-002 embedding model greater than text-embedding-3-large? Community embeddings , chatgpt , api	1	1139	February 21, 2024
Using Embeddings for search poor results vs GPT3 API	1	748	December 17, 2023
Transitioning to the new embeddings models from ada API embeddings	8	4859	January 27, 2024
Reduced Cosine of Similarity relevance scores with "text-embedding-3-small" Vs. "text-embedding-ada-002" API embeddings	2	336	July 19, 2024

Open AI embedding with 3072 dimensions

Related topics