Interesting article on using process reward learning rather than final outcome for #AI supervision. So instead of asking an AI a question and doing reward learning on the totality of the result, each step is evaluated. Good results for mathematics: openai.com/research/improving-…
#AI