research-papers
LLM Critics Help Catch LLM Bugs
Original Paper: https://cdn.openai.com/llm-critics-help-catch-llm-bugs-paper.pdf By: OpenAI Abstract: Reinforcement learning from human feedback (RLHF) is fundamentally limited by the capacity of humans to correctly evaluate model output. To improve human evaluation ability and overcome that limitation this work trains “critic” models that help humans to more accurately