Posts tagged "alignment"

Language Models (Mostly) Know What They Know. Language models can evaluate the validity of their own answers and predict which questions they’ll answer correctly, with performance improving when they consider multiple attempts, though calibration on new tasks remains challenging.