[Paper] AstroLLaMA: Towards Specialized Foundation Models in Astronomy

Two things from this paper jumped out at me,

  1. With a relatively small fine-tuning of 230M tokens the 7B parameter AstroLLaMA was able to generate text completions of quality comparable to GPT-4.
  2. Using the fine-tuned model as an embeddings model is fascinating and raises some interesting ideas.

With respect to (1), I wonder how impactful an identical fine-tuning process would be using gpt-3.5-turbo with its 175B parameters. This same job would cost less than $2,000 to fine-tune gpt-3.5-turbo.

With respect to (2), there isn’t any obvious (to me anyway) reason why OpenAI couldn’t pass back a similar embedding vector[1] for all inputs (and possibly outputs?) which would have interesting implications for HyDE. @curt.kennedy I’d be interested to read your thoughts.


  1. To get text embeddings from AstroLLaMA, we pass an input through the model and extract its final hidden states, which contain embeddings for all tokens in the input. Then, we omit the [BOS] token and take the average of all other tokens’ embeddings to get the final result. ↩︎

3 Likes