Need help to define a tech stack and get started

Hello,
I need help to choose a tech stack for an application.
It is an application with AI Assistant that reads scripts to users. The Assistant should select appropriate script based on user’s input.
For development and testing I want to build outside of openai to save some budget.
To build a demo, I’m thinking to start either with Mistral-7B-Instruct-v0.2 or llama2 LLM but opened for suggestions.
What is the best place to host an open source LLM and build such an application? Which tools shall I use?
Can someone please push me to the right direction.

Thank you!

OpenAI is much better than self-hosting for small projects, because you don’t have to allocate 24/7 GPU resources that cost an arm and a leg to actually run.

On LambdaLabs you can rent an A10 (older generation, 24 GB memory, so sufficient for a 7B model at 16 bit or a 13B model at 8 bit) for 75 cents an hour. This is $540 per month. You get a lot of API calls for that amount out of the OpenAI API, and their models are much better.

If you’re not planning on having anything available online while developing, and are just running your own on a local workstation, then get a RTX 4060 Ti 16GB card, stick it in a Linux desktop workstation with a Ryzen 7600 CPU, and run on that. You can easily put this together for about $1500. Either of the models you suggest will “do the thing,” but obviously not as well as the bigger models.

But, really, I highly recommend using some hosted API where you pay per token/API call, because it will actually be cheaper in the beginning, and the quality is better. And if the model just chooses a script, and the script is “canned,” there’s no need to have the model output the full script – just output a reference to the script, and include it from a file in your web app, to save on tokens generated.

3 Likes

Hi Jon,
Thank you for the advice.
If I’ll start with openai, which API shall I choose?
Also, the model should have access to the database, to retrieve user’s information and scripts. How does that work?

Thanks!

I’d recommend using Python and LangChain. Find out how on Youtube. Wit LangChain you can plug in different LLMs without having to rewrite your code, With LangChain you can connect to cloud APIs or locally run models.

1 Like