How to generate embeddings from jsdocs?

photonstorm · August 7, 2023, 2:58pm

Would really like some suggestions or pointers on the best way to generate embeddings from our jsdocs. The end goal is to provide a chatbot to our users that understands our documentation / API fully.

We could chunk and build embeddings from the HTML that our docs creates, of course, but I was curious if there is a better approach, perhaps using the JSON / AST that is generated as part of the jsdoc process instead?

Cheers,

Rich

Foxalabs · August 7, 2023, 3:20pm

Hi and welcome to the forum!

The topic of the best way to embed data is a large one, typically, the better the data matches what it will be searched against the better, if you are putting JSON structures into your embeds, be aware that you will be paying for and encoding a great deal of extra syntax, if you expect your searches to be in this format then it’s certainly worth trying.

As embeddings do not require large amounts to become useful, unlike fine-tuning, it is possible to embed a subsection test case and evaluate that to improve iteration loop times and ease progress.

As this entire field is very new, there are still best practices to be found and areas where trail and error are worth while.

There are some key points, such as overlapping data, where one chuck contains a percentage of the chuck before and after to allow for cross chunk boundary relevancy and ensuring your input data is consistent across chunks, i.e. not going from JSON serialised text to plain text inconsistently.

As the embedding tokens are very cheap to perform ($0.0002 per 1k) it’s worth creating some R&D sets to get the best results for your corpus.

Topic		Replies	Views
Help with determining if its less efficient to create embeddings based on JSON Community gpt-4 , chatgpt , api , vector-db	5	8704	December 23, 2023
How to make embeddings on multiple JSON files API embeddings , api	2	6419	February 14, 2024
Embeddings Depth and Preparing Canonical Documentation for AI API gpt-4	0	100	July 25, 2024
Embedding with large quantity of data API	4	2792	December 25, 2023
Embeddings Documentation for Node.js? API	1	8806	December 3, 2023

How to generate embeddings from jsdocs?

Related topics