On it! Thanks for the feedback.
Hey Bill, awesome, thank for the explanation. You mentioned PaLM 2, I searched here and noticed that’s available for Google Vertex API, right?
I’m having an issue trying to translate a text from English to Portuguese, the PaLM 2 is supposed to works on Portuguese, but when I try to translate, I get the following message:
I am trained to understand and respond only to a subset of languages at this time and can't provide assistance with that.
So I have my question now if this LLM really works in Portuguese ![]()
Bard uses PaLM 2. Try your prompt there to see how it responds.
The best thing you could do to “speed up” the API calls for your users is to stream the output rather than waiting for it to complete.
Just like using the chatGPT interface. The query may take awhile but from the user’s perspective they see it spitting out word by word which is a better experience.
How would I be able to do that just from the API call?
Agree with Justin here – the streaming option works great from a UX perspective. So rather than trying to speed up the API call (which would probably never happen even if you switch LLMs) – change the UX if you can.
Hey first of all thanks for opening my eyes. And how did you do like this dashboard?
Coda.
Ah ok thanks very much. Can I ask what kind of softwares you used for the vector embeddings and in memory caches?
Yes - I created vectors using OpenAI in an automated process with Google Apps Script. The resulting vectors are stored in a spreadsheet. The script uses a dot product function to perform similarity queries, which I can perform internally to the script itself or as a web service.
