Fine tune problem, multiple value for prompting

So i train using curie model, with 100k data, but somehow the result is dont make sense , because even i tried to use train data, the result even not correct, so here is my sample data i use

{“prompt”:“0,1,0,1,1,1,1,0,0,0,0,0,1,1,0,0,1,0,1,1,0,1,1,1,1,1,1,1,1,0,1,1.86506469500924 ->”,“completion”:" 1.058443765\n"}
{“prompt”:“0,1,1,1,1,0,0,0,0,0,1,1,0,0,1,0,1,1,0,1,1,1,1,1,1,1,1,0,1,1,1,0.485102491808276 ->”,“completion”:" 1.013615764\n"}
{“prompt”:“0,0,0,0,0,1,1,0,0,1,0,1,1,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,0.86490969406561 ->”,“completion”:" 0.953420725\n"}

i just wonder if model think that 0,1,0,1 is one value, instead multiple value consist of 4 different value

the output that dont make sense make me think if i input value wrongly, so i came here to ask for help , whats wrong with my prompt? that even if i try to input train data it does not give me a correct value

You can see how it is tokenized here,

https://tiktokenizer.vercel.app/

I think you are putting a logic puzzle to the AI that is beyond its ability to answer.

To it and me, you have a list of a bunch of boolean bits and then a float. That, through some unsolvable black box magic, becomes another float.

A language model carries along a terabyte training corpus that means nothing to the type of inputs and outputs you are creating. Your “list” is a string of two alternating tokens, and then the floats are groups of tokens two to three digits long.

If you were going to “train” on this data, you would want a reinforcement learning AI algorithm, and its ability to “play” the puzzle algorithmically with a reward model for directing it towards better answers. Not a language model, something that predicts the next word.

Is this something that can be solved by a few lines of code? Do you even need inference?

BTW, a guess without actually mathing: looks like a IEEE 32 bit float in big-endian ABS, divided by a similar float, equals some close error multiplier.