Link to Paper
Abstract—We attempted to compare classification predictions of natural language processing (NLP) models, BERT and HAN by using the Proppy corpus developed by the Propaganda Analysis Project [6]. Our goal was to see which model could more accurately predict propaganda bias based on labeled data from the corpus. Our initial review of the corpus showed a class imbalance that affected the model classification strength. The BERT model performed well, while the HAN model failed to find the minority group in the data and its performance was skewed by the majority class. Solutions to the problem are also discussed.