Skip to main content


Interesting article on using process reward learning rather than final outcome for #AI supervision. So instead of asking an AI a question and doing reward learning on the totality of the result, each step is evaluated. Good results for mathematics: https://openai.com/research/improving-mathematical-reasoning-with-process-supervision
#AI