I’m working on a reinforcement model at university. The aim is to equate three decimal numbers (X,Y,Z) to other decimal numbers (X2,Y2,Z2). I already have a working model and would like to ask you for suggestions for improvement. I use the open ai gym library

with each reset the agent gets new numbers

```
X=(random.uniform(-1, 1))
Y=(random.uniform(-1, 1))
Z=(random.uniform(-1, 1))
X = round(X, 5)
Y = round(Y, 5)
Z = round(Z, 5)
X2=X+random.uniform(-0.002,0.002)
Y2=Y+random.uniform(-0.002,0.002)
Z2=Z+random.uniform(-0.002,0.002)
X2 = round(X2, 5)
Y2 = round(Y2, 5)
Z2 = round(Z2, 5)
```

Here are my 7 discrete actions:

```
if action == 0:
self.state[0] -= 0.00001
elif action == 1:
self.state[1] -= 0.00001
elif action == 2:
self.state[2] -= 0.00001
elif action == 3:
self.state[0] += 0.00001
elif action == 4:
self.state[1] += 0.00001
elif action == 5:
self.state[2] += 0.00001
elif action == 6:
self.state[0] += 0.00000
self.state[1] += 0.00000
self.state[2] += 0.00000
```

My rewardsystem:

```
direction = np.array(self.state[3:6] - self.state[0:3]) # array[from index : to index +1]
difference = np.sqrt(direction.dot(direction))
reward = difference *-1
```

my model:

```
def build_model(states, actions):
model = Sequential()
model.add(Flatten(input_shape=(1,states)))
model.add(Dense(128, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(actions, activation="sigmoid"))
return model
```

and my agent:

```
def build_agent(model, actions):
policy = BoltzmannQPolicy()
memory = SequentialMemory(limit=100000, window_length=1)
dqn = DQNAgent(model=model, memory=memory, policy=policy,
nb_actions=actions, nb_steps_warmup=1000, target_model_update=1e-4)
return dqn
```

i’ve tried a lot with the step length but i don’t really get a feeling for it. is there perhaps a better agent, a better policy or better actions? I am thankful for every hint