Two recent papers could lead to dramatic advances in model quality

anon22939549 September 9, 2023, 4:38pm 1

From Microsoft Research,

From Google Research,

Combining the method for memory-efficient reinforcement learning with the discovery that AI feedback is as effective as human feedback for reinforcement learning could pave the way, eventually, for OpenAI and others to provide reinforcement learning as an option while fine-tuning models.

3 Likes

Topic		Replies	Views
Fine-Tuning to Avoid Scary Responses (Negative Reward) API api	9	2081	August 25, 2023
Interesting Research: Learning Tool Use through Trial and Error Community gpt-4 , api	1	541	March 8, 2024
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance Community	3	1852	April 5, 2022
Creating a GPT-Style Language Model for a Single Question Community	1	466	November 15, 2021
OpenAI vs HRM Architecture? did it really beat o3? Feedback chatgpt	0	275	August 11, 2025

Two recent papers could lead to dramatic advances in model quality

Related topics