Reinforcement learning improvement suggestions equating float numbers with open ai gym

I’m working on a reinforcement model at university. The aim is to equate three decimal numbers (X,Y,Z) to other decimal numbers (X2,Y2,Z2). I already have a working model and would like to ask you for suggestions for improvement. I use the open ai gym library

with each reset the agent gets new numbers

        X=(random.uniform(-1, 1))   
        Y=(random.uniform(-1, 1))
        Z=(random.uniform(-1, 1))

        X = round(X, 5)
        Y = round(Y, 5)
        Z = round(Z, 5)

        X2=X+random.uniform(-0.002,0.002)   
        Y2=Y+random.uniform(-0.002,0.002)
        Z2=Z+random.uniform(-0.002,0.002)

        X2 = round(X2, 5)
        Y2 = round(Y2, 5)
        Z2 = round(Z2, 5)

Here are my 7 discrete actions:

        if action == 0:
            self.state[0] -= 0.00001  
        elif action == 1:
            self.state[1] -= 0.00001
        elif action == 2:
            self.state[2] -= 0.00001

        elif action == 3:
            self.state[0] += 0.00001
        elif action == 4:  
            self.state[1] += 0.00001
        elif action == 5:  
            self.state[2] += 0.00001

        elif action == 6:  
            self.state[0] += 0.00000
            self.state[1] += 0.00000
            self.state[2] += 0.00000

My rewardsystem:

direction = np.array(self.state[3:6] - self.state[0:3])  # array[from index : to index +1]
difference = np.sqrt(direction.dot(direction)) 
reward = difference *-1

my model:

def build_model(states, actions):
    model = Sequential()
    model.add(Flatten(input_shape=(1,states)))
    model.add(Dense(128, activation='relu')) 
    model.add(Dense(256, activation='relu'))
    model.add(Dense(actions, activation="sigmoid"))
    return model

and my agent:

def build_agent(model, actions):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit=100000, window_length=1) 
    dqn = DQNAgent(model=model, memory=memory, policy=policy, 
                  nb_actions=actions, nb_steps_warmup=1000, target_model_update=1e-4)
    return dqn

i’ve tried a lot with the step length but i don’t really get a feeling for it. is there perhaps a better agent, a better policy or better actions? I am thankful for every hint :slight_smile: