Got it, so maybe a ‘hybrid’ approach? I.e. encode the code snippets as class/function/interface name + parameter_names + docstrings as a ‘syntactic’ embedding, and then use a code2seq or the like to generate embeddings based on their AST paths (and get the ‘semantic’ meaning as well). Then whatever the user prompts, I can generate an embedding based off of his prompt (whether a textual description or code) and see if I get some good similarity results for relevant coding snippets. Does this make any sense?
mmark
9
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Prompting with the chat/completions API against a large transcript file | 5 | 3802 | October 4, 2023 | |
| How does the generative aspect of GPT impacts my models? | 12 | 1067 | February 14, 2023 | |
| Teaching GPT the information it will be working on | 8 | 2373 | November 19, 2023 | |
| How to fine tune so GPT knows a new API and then how to prompt to use that API | 4 | 1541 | March 29, 2023 | |
| [GitHub] Embeddings for Entire GitHub Code Repository | 4 | 5133 | September 30, 2025 |