[Paper] AstroLLaMA: Towards Specialized Foundation Models in Astronomy

anon22939549 · September 16, 2023, 4:54am

Two things from this paper jumped out at me,

With a relatively small fine-tuning of 230M tokens the 7B parameter AstroLLaMA was able to generate text completions of quality comparable to GPT-4.
Using the fine-tuned model as an embeddings model is fascinating and raises some interesting ideas.

With respect to (1), I wonder how impactful an identical fine-tuning process would be using gpt-3.5-turbo with its 175B parameters. This same job would cost less than $2,000 to fine-tune gpt-3.5-turbo.

With respect to (2), there isn’t any obvious (to me anyway) reason why OpenAI couldn’t pass back a similar embedding vector^[1] for all inputs (and possibly outputs?) which would have interesting implications for HyDE. @curt.kennedy I’d be interested to read your thoughts.

To get text embeddings from AstroLLaMA, we pass an input through the model and extract its final hidden states, which contain embeddings for all tokens in the input. Then, we omit the [BOS] token and take the average of all other tokens’ embeddings to get the final result. ↩︎

Topic		Replies	Views
Interesting Research Out of Anthropic on Long Context Prompting Prompting long-context	39	8988	December 20, 2023
Fine-tuning myths / OpenAI documentation API	24	14651	December 23, 2023
Train (fine-tune) a model with text from books or articles API	62	28202	November 30, 2023
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	10796	September 3, 2024
BERT better than Ada 002? API embeddings , api , ada002	11	6747	November 13, 2023

[Paper] AstroLLaMA: Towards Specialized Foundation Models in Astronomy

Related topics