I’m still reading the OpenAI API documentation. But in the meantime, in case anyone is already familiar with it, I was wondering if Embeddings would be a good solution for a feature that detects possible duplicate bug reports.
In the case, for personal purposes, I would like to create a tool (for the Blender software page) that reads a user’s bug report and suggests some confirmed reports that may indicate that the user’s report is a duplicate.
I’ve already created this tool by using the Chat Completions API
and the following prompt:
You will be provided with a report containing a title and description of a bug,
With the title and description of this bug report, check if it could be a duplicate of any of these last ${issues.length} confirmed reports:
${titlesString}
To clarify, a duplicate report refers to a bug that has already been reported with the same defect and source.
Specify the titles of existing reports that could potentially make this bug report a duplicate.
It works. But it’s a little flawed. And it spends a lot of tokens because the titlesString
list exceeds 500 report titles (I limited it to 300).
To avoid spending even more tokens, I didn’t even provide the report contents.
I don’t understand much about Embeddings, so I don’t know yet if it really makes sense to use this API for that. Is it worth investing time in it?
Hi and welcome to the developer forum!
I think tis is certainly a task that Embeddings could be useful in, they turn the semantic meaning of a group of words into a vector that can be compared to other vectors stored in a database, roughly speaking, context with similar sematic meaning are “close” to each other and so you can pull back “similar” results for a given set of input text.
There is a section on embeddings over on the main site OpenAI Platform
Happy to give any advise you might need to get up and running.
Thanks for the reply @Foxalabs!
It tried to use the Embedding API.
I noticed however that I still won’t be able to include the description of the reports in the search. Because of this error:
This model's maximum context length is 8191 tokens, however you requested 124432 tokens (124432 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
(If we consider that I only put 500 of the almost 6 thousand reports, this is really unfeasible).
However, as embeddings can be obtained once, this apparently can serve as a cache and may save some tokens at the end of the day.
This is what I did to use the API:
async function getEmbedTexts(texts) {
const data = await new Promise(resolve => chrome.storage.local.get('openai_secret_key', resolve));
const OPENAI_API_KEY = data.openai_secret_key;
if (!OPENAI_API_KEY) {
console.error('No OpenAI Key.');
return 'Please enter and save the OpenAI Key first.';
}
const apiUrl = 'https://api.openai.com/v1/embeddings';
const requestOptions = {
method: 'POST',
headers: {
'Authorization': `Bearer ${OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
input: texts,
model: "text-embedding-ada-002"
}),
};
try {
const response = await fetch(apiUrl, requestOptions);
const data = await response.json();
return data.data;
} catch (error) {
console.error('Error making request to OpenAI EmbedTexts:', error);
return 'Sorry, something went wrong. Please try again later.';
}
}
I made a mistake by bundling more than just the description into a report title.
That error has been fixed now.
However the number of tokens is still considerable. (I will have to improve the cache).