I am trying to run Q/A using embeddings as recommended by OpenAI at Question answering using embeddings-based search | OpenAI Cookbook
I am using the Ada-Embedding-data-002 GPT model for embeddings.
I couldn’t really get this working.
I have a text for context whose embedding doesn’t match the question embedding even though the answer is there.
For example, Context Paragraph:
2 ABOUT THE AUTHOR The common thread running through Allen Carr’s work is the removal of fear. Indeed, his genius lies in eliminating the phobias and anxieties which prevent people from being able to enjoy life to the full, as his bestselling books Allen Carr’s Easy Way to Stop Smoking, The Only Way to Stop Smoking Permanently, Allen Carr’s Easyweigh to Lose Weight, How to Stop Your Child Smoking, and now The Easy Way to Enjoy Flying, vividly demonstrate. A successful accountant, Allen Carr’s hundred-cigarettes-a-day addiction was driving him to despair until, in 1983, after countless failed attempts to quit, he finally discovered what the world had been waiting for —the Easy Way to Stop Smoking. He has now built a network of clinics that span the globe and has a phenomenal reputation for success in helping smokers to quit. His books have been published in over twenty different languages and video, audio and CD-ROM versions of his method are also available. Tens of thousands of people have attended Allen Carr’s clinics where, with a success rate of over 95%. he guarantees that you will find it easy to quit smoking or your money back. A full list of clinics appears in the back of this book. Should you require any assistance do not hesitate to contact your nearest therapist. Weight-control sessions are now offered at a selection of these clinics. A full corporate service is also available enabling companies to implement no-smoking policies simply and effectively. All correspondence and enquiries about ALLEN CARR’S BOOKS, VIDEOS, AUDIO TAPES AND CD-ROMS should be addressed to the London Clinic.
And Question is: what do we know about the author? What is his background?
I attempted to calculate similarity using Distance.cosine, Distance.Manhattan and Distance.Euclidean approaches. Despite going over all the paragraphs provided, they chose the irrelevant ones and the similarity with the most appropriate paragraph was quite low. For example, the relevant paragraph above was 155th on the list out of 163 paragraphs in total.
Any idea?