Finetuning via API issues with JSONL

Every time I send my JSONL file through the API it’s giving me this error:

However, the file itself doesn’t have any of that.

When I open and look at the file, it shows a perfectly fine JSONL file

image

I’m kind of going insane at this point trying to figure it out.

Any help would be amazing.

can you share the following:

  1. The code used to call the fine-tune endpoint
  2. Line 1 of jsonl file

Also look at:

1 Like

Second this post. The error is nonsense, I run into the same thing, the formatting is completely fine.

1 Like

The error is showing a double nested dictionary, is this the case?

I’ll take a look at those right now - thank you.

I’m using bubble, it’s being sent in a form format, where the file is being uploaded directly and sent.

Line 1:

{“prompt”: “Old way/new way, how to build a web3 community###”, “completion”: “How to build a web3 community\n\nOld (2021) way:\n\n- Mint useless NFTs that have no underlying utility\n- Create a DAO yet hold the majority of governance tokens\n- Use snapshot to give a sense of decentralized decision-making\n- Leverage fakes and bots to prop up the numbers and show traction\n- Pay Hollywood celebrities to promote your project and be the lead voices\n\nNew (2023) way:\n\n- Mint an NFT collection that has real-life value \n- Distribute value and ownership in the community equitably\n- Make the decision-making process transparent for everyone\n- Build an army of dedicated early adopters to act as your ambassadors\n- Let established brands and well-respected web3 natives be your lead voice\n\n100 early ambassadors > 1000 disengaged members[END-PROMPT]”}

There’s definitely an error in the JSON. Ran it through a validator

That validator is suspect, I used it and kept getting nonsense errors too.

Firstly, it spills the json into multiple lines, this is JSONL, meaning each line is a dict. That validator formats it differently and therefore breaks a JSONL format.

Secondly, that is a JSON validator, not JSON lines validator, while I don’t suspect it to be a huge difference, the error code is question is alerting that it’s a JSONL formatting problem, not a JSON formatting problem.

Every line in a JSONL file must be a valid JSON object.

Did you write the JSONL yourself or are you using a tool to create the JSONL?

To create a JSONL you can use this code:

with open(“output_file.jsonl”, “w”) as output_file:

for {some content you want to write in the JSONL}:

           prompt_text = f"{some_prompt_here}"
            ideal_generated_text = {some_completion_here}

            # Create the JSON object
            json_obj = {
                "prompt": prompt_text,
                "completion": ideal_generated_text
            }

            # Write the JSON object as a line in the JSONL file
            output_file.write(json.dumps(json_obj) + "\n")

That’s it.

Hope this helps.

Juan

Has someone fine-tuned davinci and has been this any useful?

I’ve trained davinci several times with different content, and so far I have not been able to exploit the model with the data used to fine-tuned.

I ask questions about the subject I trained it on, but nothing relevant comes.

I have done the same using embeddings with great success, but I am trying to understand fine-tuning. What is it good for? how is it used?


Actually the great @daveshapautomator came to the rescue with the answer to my question.

If someone has finetuned a model and doesn’t know how to use it, like me, here’s the answer:

Thanks @daveshapautomator !!!

Juan

Ok, this is awesome.

I will have to mess around with this.

I guess I’ll have to do some recursive workflow on the backend to go through the list of prompt/completions??

Or else, how would I be able to directly input the 100+ lines of the spreadsheet into this format using this code?

Use a python interpreter (or Jupyter)

Pandas is a good library for converting spreadsheets into JSON

Convert your object into a list of objects (if it’s not already like that for some reason)
Then use this

import json

with open('output.jsonl', 'w') as outfile:
    for entry in <json_obj>:
        json.dump(entry, outfile)
        outfile.write('\n')

Also, try putting a r infront of your string so it doesn’t accidentally newline your string.
It could be actually doing a newline instead of “\n” as you have

Save your spreadsheet as csv and easily load to pandas. Then iterate through the pandas array, row by row.