Reinforcement learning improvement suggestions equating float numbers with open ai gym

cmeyer · July 26, 2022, 12:35pm

I’m working on a reinforcement model at university. The aim is to equate three decimal numbers (X,Y,Z) to other decimal numbers (X2,Y2,Z2). I already have a working model and would like to ask you for suggestions for improvement. I use the open ai gym library

with each reset the agent gets new numbers

        X=(random.uniform(-1, 1))   
        Y=(random.uniform(-1, 1))
        Z=(random.uniform(-1, 1))

        X = round(X, 5)
        Y = round(Y, 5)
        Z = round(Z, 5)

        X2=X+random.uniform(-0.002,0.002)   
        Y2=Y+random.uniform(-0.002,0.002)
        Z2=Z+random.uniform(-0.002,0.002)

        X2 = round(X2, 5)
        Y2 = round(Y2, 5)
        Z2 = round(Z2, 5)

Here are my 7 discrete actions:

        if action == 0:
            self.state[0] -= 0.00001  
        elif action == 1:
            self.state[1] -= 0.00001
        elif action == 2:
            self.state[2] -= 0.00001

        elif action == 3:
            self.state[0] += 0.00001
        elif action == 4:  
            self.state[1] += 0.00001
        elif action == 5:  
            self.state[2] += 0.00001

        elif action == 6:  
            self.state[0] += 0.00000
            self.state[1] += 0.00000
            self.state[2] += 0.00000

My rewardsystem:

direction = np.array(self.state[3:6] - self.state[0:3])  # array[from index : to index +1]
difference = np.sqrt(direction.dot(direction)) 
reward = difference *-1

my model:

def build_model(states, actions):
    model = Sequential()
    model.add(Flatten(input_shape=(1,states)))
    model.add(Dense(128, activation='relu')) 
    model.add(Dense(256, activation='relu'))
    model.add(Dense(actions, activation="sigmoid"))
    return model

and my agent:

def build_agent(model, actions):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit=100000, window_length=1) 
    dqn = DQNAgent(model=model, memory=memory, policy=policy, 
                  nb_actions=actions, nb_steps_warmup=1000, target_model_update=1e-4)
    return dqn

i’ve tried a lot with the step length but i don’t really get a feeling for it. is there perhaps a better agent, a better policy or better actions? I am thankful for every hint

Topic		Replies	Views
Fine tune problem, multiple value for prompting Prompting davinci	2	427	August 5, 2023
It produces responses with incorrect numbers, in my case, these are prices Prompting fine-tuning-problems	6	517	December 4, 2023
How can I get consistent responses from GPT 3.5 API	3	3221	December 19, 2023
Advanced structured Output -Use case: accident research API gpt-4 , gpt-35-turbo , api , classification , function-calling	5	535	March 2, 2024
Ada model fined tuned for classification hets delusional when fed with thousands of records Prompting fine-tuning , classification	7	835	May 23, 2023

Reinforcement learning improvement suggestions equating float numbers with open ai gym

Related Topics