Why is my DPO validation loss not decreasing?

Rohan_Majumder · March 5, 2026, 1:52pm

My Models DPO val loss is not getting decreased,

I tried a lot with different strategies but no inprovement

here is the traning and validation stats

Epoch: 1/5: 6400it [15:48, 5.64it/s, mean_loss=0.667, mean_accumulation_loss=0.667, per_update_loss=0.638, mean_policy_margine=-37.4, mean_ref_margine=-38.1, mean_advantage=0.667, mean_accepted_reward=1.42, mean_rejected_reward=0.496, mean_kl=14.2, save_iters=50, pointer_location=5000, current_lrs=[2.0000000000000003e-06, 2.0000000000000003e-06], total_iters=6400, total_accumulation_iters=50, prev_shuffle_seq_len=3600, memory_usage=6.36e+6]

Val Loss: 0.7040763644035906
Val Policy Margine: 19.06810326129198
Val Ref Margine: 19.21380713954568
Val Advantage: -0.14570387825369835
Val Accepted Reward: 0.14107751794205114
Val Rejected Reward: 0.15564789006333513

Epoch: 1/5: 12800it [33:54, 5.35it/s, mean_loss=0.64, mean_accumulation_loss=0.64, per_update_loss=0.605, mean_policy_margine=-34.7, mean_ref_margine=-36.5, mean_advantage=1.79, mean_accepted_reward=5.32, mean_rejected_reward=-0.0884, mean_kl=53.2, save_iters=50, pointer_location=1e+4, current_lrs=[4.000000000000001e-06, 4.000000000000001e-06], total_iters=12800, total_accumulation_iters=100, prev_shuffle_seq_len=2195, memory_usage=6.24e+6]

Pointer Loaction is set to: 0

Val Loss: 0.7111321217380464
Val Policy Margine: 22.200062219053507
Val Ref Margine: 22.36575211212039
Val Advantage: -0.1656898930668831
Val Accepted Reward: 0.061537228080567274
Val Rejected Reward: 0.0781061984588689

Any one has good experience with DPO, I am looking for the Help, Because Nothing coming in mind regarding what to do how will the val loss will decrease and val advantage will increase like taring

Topic		Replies	Views
Training loss=good, Validation loss=good API fine-tuning , api , fine-tuning-problems	8	5842	April 5, 2024
Overfitting issues in finetuning GPT 3.5 turbo API	1	327	March 5, 2026
Fine Tuning for the first time API	3	175	December 4, 2024
Issues with DPO fine tuning API	1	370	February 15, 2025
Why are "Training loss" and "Validation loss" so high API gpt-35-turbo , fine-tuning , fine-tuning-problems	7	960	June 20, 2024

Why is my DPO validation loss not decreasing?

Related topics