Transformer-based models such as BERT and RoBERTa have shown potential ... and RoBERTa was pre-trained with masked language ...