Chain-of-Thought Reasoning is a Policy Improvement Operator

Chain-of-Thought Reasoning is a Policy Improvement Operator