I’ve written a script to take a prompt, input JSONL file, and return JSON output. It’s a strong script, and it’s worked great using the Completion model.
However, I’m now stuck. I genuinely can’t figure out the issue.
Here’s the error:
INFO:root:Loaded 191 instructions
INFO:openai:error_code=None error_message=‘[{'role': 'system', 'content': 'Your main goal is to enrich …]
{'role': 'user', 'content': 'Instruction: Compose a definition, discuss the purpose of SQL, and explain why it is an industry standard. Create comprehensive examples or real-world use cases if possible.; Input: ; Output: A clear and concise overview of SQL including its application and why it has become an industry standard.'}] is not of type 'object' - 'messages.0'’ error_param=None error_type=invalid_request_error message=‘OpenAI API error received’ stream_error=False
The code is extensive, and I’m not sure if I should paste it all here.
What I think the issue is:
def encode_conversation(seed_task, system_message):
user_content = f"Instruction: {seed_task[‘instruction’]}; Input: {seed_task.get(‘input’, ‘’)}; Output: {seed_task.get(‘output’, ‘’)}"
conversation = [
{“role”: “system”, “content”: system_message},
{“role”: “user”, “content”: user_content}
]
return conversation
I honestly can’t be sure though. The entire script passed unit testing; the issue is converting the script from Completion to ChatCompletion, and batching. So, I dropped batching. It started to work but not to completion. I’m getting these errors over the input being incorrect. I have made sure it’s a string, but still the same error.
If you can, please post all of the code, use ``` (back ticks) to surround your code, no problem with large files, the forum will truncate it and create a link.
Also, please include the actual log files and /or the command line responses for a full session start to finish.
I appreciate the response. I’ve actually rebuilt the code the same night I opened this query. To be totally honest; calling the GPT4 Chat Completion endpoint isn’t fun. I’ve figured it out, though.
My one question… I’ve written the scripts to batch via asyncio, and another using multiprocessing via Python. The issue is… batching isn’t possible for my use case with a context window of 8k. So, I’ve been trying to access the 32k via gpt-4-32k and it’s not found? Is this a normal issue? The documentation shows it as a live upgrade to the 32k 0314 model… but it’s not in the list pulled from OpenAI Python tooling nor is it working in the script.
This means I’m left requesting single calls for days, and at a massive cost.
My context is a minimum of 7200 tokens for call/response. It maxes out there, and when I’ve used dynamic control based on tiktoken… it still isn’t realistically close to enough context space. I’ve never been able to call more than once.
If the 32k were there… I’d at least be able 4x my requests.
The error message in the original post shows an extra closing bracket in the data that was sent, and the json validation of the messages failing. You, likewise, should ensure that you are sending a properly-constructed list of message objects.
Thanks for the reply. I did figure this out eventually. I’m sorry for the slow get back; I’m not here much.
Any tips on how to secure an invite? I’m having a hard time getting the API spend limit increased; the same with the rate limit increase. I need more access. I genuinely do. The research and work I’m doing just isn’t possible with a $120/mo. spend limit.
I’ve adjusted to the 8k context window for now, though.
Thanks for your response. I noticed you’re one of the few here that responds to nearly every query… I figured I’d say thanks on behalf of all of us. We do appreciate it, even if it doesn’t feel like it.
I’ve rewritten the script from scratch. I started out using the Self-Instruct framework, and by extension, their generation scripts were the inspiration for the broken script. I then looked into Alpaca’s generation scripts, but they’re convoluted and slow… not to mention they don’t handle the correct API call/response, nor do they parse the response properly.
I’ve totally rewritten both the main and utility scripts. I’ll open-source it in the near future.
My only tips are:
A) Ensure you’re encoding and decoding the Call and Response REALLY well. It’s straightforward to send the call, but parsing the response from a Chat model - regardless of the prompt - is a nightmare. I’m using JSON, REGEX (the import to use recursion via (?R)), and a fallback option that uses both with like a million parameters. It’s working 100% of the time now, but it wasn’t easy.
B) Batching Synchronously > Asynchronously (asyncio) if your expected token limit is anywhere near my own (3k per call/response). If you’re using shorter input and expecting shorter output… asyncio is the wave. Just be prepared to debug for a day. I also recommend using either asyncio.lock or even FileLock… which is obviously synchronous, but worked in testing to write to disk.
Good luck. I’ll drop a link here when I share the workflow.
It’s a chat completion model. My prompts are not only well written; they’re written in ChatML. It’s never made a difference. I’ve even moved the prompts between roles in the call to see if it helped. I’ve given explicit instructions in the prompts. My encoding gives explicit instructions.
I’m making right around 75k calls spread across too many accounts and API keys. Every few responses are formatted in a way that it shouldn’t be. Fortunately, I’ve constructed a parsing function to handle it all so far, but it’s just not that simple.
I wish we could depend on some structured API response as we did with the Completion endpoint. The old endpoint was straightforward and easy to parse/predict. This ChatCompletion endpoint is not.
I’ve iterated through about 30 different versions in testing. I’ve come to one combination of system and user messages that are working about 95% of the time. The other 5% of the time the response changes. So, I handled it with a MONSTER parsing function that essentially moves through different parsing logic options until the data is parsed successfully. I’ve yet to lose a single instruction or response.
I think it’s important to keep two things in mind here:
A) I am a professional. What I mean is, I do this for a living. I’m a full-stack engineer with a lot of general ML experience. The dataset generation pipeline I’ve built is meant to work at scale. At scale, in this context, works out to about 1.1M finalized instructions created, and the input prompt is MASSIVELY complex… which is annoying, but necessary. I say that to say… any ground floor, basic ideas… I’ve tried. The bottom line is we can’t definitively determine the response at scale from the ChatCompletion API yet. The same goes for the LLaMa models and 98% of the fine-tuned OS versions of LLaMa2. (WizardLM team’s models are probably the one exception here, IME)
B) There are two things that would potentially solve this, IMO.
Context length increases to 32k (minimum) which allows for a more absolute instruction without losing the value of the prompt itself.
Using the OpenAI function call option, which I’ve also tried - and it turns out will be used for something towards the end of the pipeline - but that doesn’t work well for this particular use case. There are more talented Python developers than me… I imagine they could probably see something I’m not… but I’ve been writing Python for like… I don’t know 2 decades? I think I started using Python regularly at like v2.0 or something in the early 2000s. I still can’t find a way for it to work effectively in my use case.
If you have something that’s working for you and are open to sharing it explicitly… I’d love to give it a shot.
I’ll update Github in the next week or so with the data used.
Did you get a 400 - … message(0)? By writing the ChatML list of messages wrong?
That’s the topic of the thread.
Take some of your varied failing edge examples, give them the right answer, and multi-shot the input on those and see if you can train beyond the system prompt with that, to get better answering and reinforcement of the output format.