hey guys, I am working with langchain wrapper to query GPT 4, now because of the cost incurred, I want to implement batching.
However, I don’t know the right implementation for the same,
My approach: I created a prompt template, let’s say data_template
and then did data_template = data_template * 10
(to process 10 records at a time).
Following which data_creation_prompt looks like this,
data_creation_prompt = PromptTemplate(
input_variables=["term", "synonym_list"],
partial_variables={"format_instructions": variation_parser.get_format_instructions(),},
template=data_creation_template
)
a) is this the correct approach ?
b) If yes, what I observed doing this way - I get processing time of 60s/it around 96hrs for 5-6k records. When I did only passed one record at a time, it was 2s/it. Has anyone observed such diff between completion time 5hrs v 96hrs in processing these many records w/o and with batching respectively ?
c) If no, may someone please let me know what’s the right way of batching…