# Reinforcement learning improvement suggestions equating float numbers with open ai gym

I’m working on a reinforcement model at university. The aim is to equate three decimal numbers (X,Y,Z) to other decimal numbers (X2,Y2,Z2). I already have a working model and would like to ask you for suggestions for improvement. I use the open ai gym library

with each reset the agent gets new numbers

``````        X=(random.uniform(-1, 1))
Y=(random.uniform(-1, 1))
Z=(random.uniform(-1, 1))

X = round(X, 5)
Y = round(Y, 5)
Z = round(Z, 5)

X2=X+random.uniform(-0.002,0.002)
Y2=Y+random.uniform(-0.002,0.002)
Z2=Z+random.uniform(-0.002,0.002)

X2 = round(X2, 5)
Y2 = round(Y2, 5)
Z2 = round(Z2, 5)
``````

Here are my 7 discrete actions:

``````        if action == 0:
self.state -= 0.00001
elif action == 1:
self.state -= 0.00001
elif action == 2:
self.state -= 0.00001

elif action == 3:
self.state += 0.00001
elif action == 4:
self.state += 0.00001
elif action == 5:
self.state += 0.00001

elif action == 6:
self.state += 0.00000
self.state += 0.00000
self.state += 0.00000
``````

My rewardsystem:

``````direction = np.array(self.state[3:6] - self.state[0:3])  # array[from index : to index +1]
difference = np.sqrt(direction.dot(direction))
reward = difference *-1
``````

my model:

``````def build_model(states, actions):
model = Sequential()
return model
``````

and my agent:

``````def build_agent(model, actions):
policy = BoltzmannQPolicy()
memory = SequentialMemory(limit=100000, window_length=1)
dqn = DQNAgent(model=model, memory=memory, policy=policy,
nb_actions=actions, nb_steps_warmup=1000, target_model_update=1e-4)
return dqn
``````

i’ve tried a lot with the step length but i don’t really get a feeling for it. is there perhaps a better agent, a better policy or better actions? I am thankful for every hint 