Learning to Generate Better Than Your LLM
Learning to Generate Better Than Your LLM
arxiv.org
Learning to Generate Better Than Your LLM
Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large Language Models (LLMs) for text generation. In particular, recent LLMs such as ChatGPT and GPT-4 can engage in flue...

I was looking through papers that combine LLMs and RL and this was pretty fascinating and the citations are perfect for continuing my search.