Debiasing Language Models: New Method Shows Higher Bias Reduction and Performance.
Pre-trained Language Models (PLMs) show different biases before and after Fine-Tuning (FT), causing gaps in bias evaluation. FT-based debiasing methods can reduce performance in tasks. In-Context Learning (ICL) with prompts induces smaller changes in PLMs compared to FT, leading to higher correlation between intrinsic and extrinsic bias scores and less performance degradation during debiasing.