modulux

modulux@node.isonomia.net

Marxist, law graduate, civil servant, writer, coder. Blind.
Location: Galicia, Spain, EU.
Interests: #writing #poetry #sf #rust #coq #marxism #tech #erotica #bdsm #kink #nsfw ##fedi22
Languages: en es gl de eo
Favourite insult received: commander of the queer communist revolutionary corps.
Runner-up: cipayo de los señores del aire.
Uphold Marxism-Leninism-Martin-Löf thought.
¬(∀x. free(x)) ⇒ ¬∃x. free(x)
(None of us is free until all of us are free.)
The sharpness of a sword results from repeated grinding, while the fragrance of plum blossoms comes from frigid weather.

ActivityPub

modulux

2 years ago

modulux
2 years ago

Interesting article on using process reward learning rather than final outcome for #AI supervision. So instead of asking an AI a question and doing reward learning on the totality of the result, each step is evaluated. Good results for mathematics: openai.com/research/improving-…

Improving mathematical reasoning with process supervision

We've trained a model to achieve a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply rewarding the correct final answer (“outcome supervision”).

^openai.com

#AI

⇧

modulux

modulux 2 years ago • •

Improving mathematical reasoning with process supervision

modulux
2 years ago