Removing spurious features can harm accuracy and worsen group disparities.
Spurious features in models can lower accuracy and affect different groups unfairly. Removing these features can actually decrease accuracy and make models more vulnerable to other spurious features. However, using robust self-training can remove spurious features without harming overall accuracy. This was shown in experiments using non-linear models on Toxic-Comment-Detection and CelebA datasets.