Towards Understanding User Perceptions for Smart Border Control Technologies using a Fine-Tuned Transformer Approach

Smart Border Control (SBC) technologies became a hot topic in recent years when the European Union (EU) Commission announced the Smart Borders Package to improve the efficiency and security of the border crossing points (BCPs). Although, BCPs technologies have potential benefits in terms of enabling travellers’ data processing, they still lead to acceptability and usability challenges when used by travelers. Success of these technologies depends on user acceptance. Sentiment analysis is one of the primary techniques to measure user acceptance. There exists variety of studies in literature where sentiment analysis has been used to understand user acceptance in different domains. To the best of the authors knowledge, there is no study where sentiment analysis has been used for measuring the user acceptance of SBC technologies. Thus, in this study, we propose a fine-tuned transformer model along with an automatic sentiment labels generation technique to perform sentiment analysis as a step towards getting insights into user acceptance of BCPs technologies. The results obtained in this study are promising; given the condition that there is no training data available from BCPs. The proposed approach was validated against IMDB reviews dataset and achieved weighted F1-score of 79% for sentiment analysis task.


Introduction
SBC technologies are fully or semi-automated technologies, used to verify traveller's identity who is crossing border(s) without any human (i.e. border guard) intervention [18]. Typically, these technologies include: 1) physical barriers like e-gates, 2) full page e-passport readers, 3) screens to display instructions, 4) biometics scanners, to name a few. Figure 1 shows an example of an e-gate. Despite * Corresponding Author: sarang.shaikh@ntnu.no The challenges raise from several ethical [1], privacy and data protection [2] related concerns of using the technology [3,9,32,49]. Nonetheless, successful implementation of SBC technologies depends on user acceptance. It is important to understand how a user perceives the technology in use. In the past, the user perception of technologies has already been investigated in several studies for other domains [15,17,47], however, to the best of our knowledge, no study has investigated into user perceptions for SBC technologies in the literature.
The widely used method to understand user perceptions for technology acceptance in literature is the use of questionnaires. However, this approach has several disadvantages of bias and lack of transparency which ultimately leads to wrong observations of user perceptions [22]. To overcome these shortcomings in analyzing user perceptions for technology acceptance, several studies proposed to analyze online social media content [19,33,51].
There is vast body of literature available on understanding user perceptions for technology acceptance using social media content in other 1 domains like block chain [33], e-commerce [10], healthcare [40], advertising [26], etc. Particularly, there is a lack of studies focusing on user acceptance for SBC technologies. In this study, we focus on filling this research gap by analyzing Twitter data using a fine-tuned transformer model [52] along with an automatic sentiment labels generation technique.
The remaining part of the paper is structured as follows: Section 2 explains existing studies and state-of-the-art methods related to understanding user perceptions from social media data. Section 3 shows the details of the overall proposed methodology including datasets, experiments and evaluation criteria. Section 4 shades light on major results and discussions from the experiments conducted. Section 5 concludes the paper and presents the future works.

Related Works
In recent years, many researchers have worked on understanding of user perceptions from online social media data [33,51,20]. Also, sentiment analysis and topic modelling are among others widely used techniques for this purpose. This section discusses some of the recent studies focusing on user perceptions analysis in general and in the context of technology acceptance in different domains.
Mnif et al. in [33] analyzed the user perceptions related to the acceptance of blockchain technologies. The authors collected 5000 tweets from Twitter between 01/12/2020 and 31/01/2021 related to blockchain technology. The focused perceptions were related to the usefulness of blockchain, ease of use perceived by users and social norms affecting blockchain technology acceptance. The authors used lexicon based approach by using NRC [34] and Harvard-IV dictionaries [16] to analyse the collected tweets. The analysis consisted of finding out how many tweets were positive and negative related to blockchain perceptions. In addition, positive and negative tweets were further analysed into joy, trust, surprise and fear, sadness, anger, disgust respectively.
Chen et al. in [51] performed sentiment analysis on data collected from tweets related to the facebook libra digital currency technology (FLDCT) in order to analyse the technology acceptance for FLDCT. Majority of user perceptions under analysis were related to digital currency, bitcoin, regulations and trust. Latent Dirichlet Allocation (LDA) [44] and Linguistic Inquiry and Word Count (LIWC) [48] language analysis tools were used to extract major topics and sentiments of the users.
Gupta et al. in [20] analysed the user percep- 3. Many of the research studies achieved satisfactory results for sentiment analysis using machine and deep learning techniques. However, after the introduction of transformer models in the paper "Attention is All you Need" [46] in 2017 by Google, all of the previous models were left behind. Transformer models reduce the training time by using the attention mechanism. They are already pre-trained on large datasets and can be used for datasets from different domains using the fine-tuned approach [30].
Considering these studies, we selected sentiment analysis as a way of understanding user perceptions for SBC technologies. Furthermore, to overcome the limitation of manual data labelling we propose to use automatic sentiment labels generation techniques [31,45]. The fine-tuned transformer models are the new trend, therefore we propose to use them in our study. We also compare the performance of the proposed transformer model with state-of-theart deep learning models.

Methodology
This section explains the overall proposed methodology which helps to address both research gaps of understanding technology user perceptions in the SBC domain and overcoming the challenge related to the lack of labelled data. The methodology is given in Figure 2.  Figure 3. We collected the data from the Twitter because it is the only social media platform which gives easy access to its content using its API. Also, the collected tweets are only in the English language due to the fact that it is easy to understand what users are saying while assessing the performance of the proposed method.
• IMDB Dataset: We selected IMDB reviews dataset, which is a binary sentiment classification dataset [28]. The dataset contains movie reviews labelled into two classes; positive and negative. The dataset has total 50000 reviews divided equally into both classes (i.e. 25000 each of positive and negative class labels) . We used this labelled dataset in order to validate our proposed methodology.

Data Pre-processing and Splitting
Many of the research studies have shown that performing text cleaning/pre-processing improves classification results [5]. Therefore, in our work, we applied several techniques such as converted text into lowercase, removed URLs, usernames, hashtags, punctuations and stopwords. Additionally, we removed non-English words and converted all the remaining words into their root/base form. Next, we used the 70-10-20 data splitting strategy for the above datasets in order to validate our proposed approach. Therefore, 70%, 10% and 20% of the data is used for training, validation and testing, respectively. The reason to use only 20% of the data as a test data is based on the results by Guyon et al. in [21], which concludes that the 80% of the data is always a good proportion for a model to learn the patterns from the data.

Sentiment Labels Generation
Once the data is pre-processed/cleaned, it is ready for sentiment labelling. The manual labelling of the datasets is a tedious task. Therefore, in this work we shall propose to replace manual labelling with automatic labelling techniques. We selected four existing automatic sentiment labelling techniques. The selected techniques included Valence Aware Dictionary and Sentiment Reasoner (VADER) [23], TextBlob [27], Afinn [7] and Flair [4]. We labelled the IMDB reviews dataset using these four techniques and compared the performance of generated labels with the labels from the dataset. The results from performance comparison are given in 4.1.

Fine-tuned Transformer Models
The transformer models are already pre-trained on large datasets and can be used on variety of tasks like sentiment analysis, named entity recognition and question answering using a fine-tuned approach [52]. Labelled data is required in order to perform sentiment analysis using fine-tuned transformer models. In our case, the data is labelled into two sentiment categories, positive and negative using automatic sentiment labels generation technique. There is variety of pre-trained transformer models such as Bidirectional Encoder Representations from Transformers (BERT) [13], Generative Pre-trained Transformer (GPT) [8], Transformer-XL [12], XLNet [50] etc. We selected BERT as a proposed transformer model for our study due to the fact that it can be easily fine-tuned on downstream tasks to perform supervised learning [13]. First, we labelled the IMDB training and perform a validation of datasets with one of the sentiment labels generation techniques selected through majority voting as discussed in section 3.3. Next, we trained the BERT model on the labelled data. Finally, we predicted the sentiment labels for the IMDB test dataset.
After prediction, we evaluated the performance of BERT model by comparing predicted with original sentiment labels of IMDB test dataset. The results for the performance evaluation are discussed in section 4.1. Also, we used several evaluation metrics in order to evaluate the sentiment features learning and fine-tuned supervised training models. Specifically, we used accuracy, precision, recall, F1score and kappa score metrics [41]. Table 2 shows the hyperparameters of the BERT training model.

Possible bias in analysing online social media content
This section explains the possible bias that might occur while analysing online social media content. Along with identification of the bias, this section also discusses the methods we followed to avoid those bias.
1. Programmatic access to the data through APIs is often limited: Social media platforms like Twitter provide APIs to collect/access data but they set certain limitations in terms of various aspects like quantity of data to be collected, time frame limitations for data collection and set of limited query parameters to access the data [36,37]. The main bias which occurs here is related to not being able to collect enough representative data for any topic or event which could ultimately lead to bias in overall analysis of the collected data. Solution -We resolved this bias by using "Twitter Academic Research API". The major benefits of this API are it provides historical as well as real-time tweets data collection, it gives access for historical data in the form of full-archive search from the complete twitter database and it supports advanced query parameter language to get more precise results of data collection.
2. Sampling strategies are often opaque: Different social media platforms have different sampling strategies to limit what and how much data we can collect from the available public data through the APIs. For example, an Twitter API can return k number of tweets based on our query parameters but how those k tweets are sampled create the bias related to either the data is sampled through most relevant, more recent, etc. [36,29] Solution -The Twitter Academic Research API that we are using for this study has different ways to collect data: 1) full-archive search data, 2) recent 7 days search data and 3) filtered stream search data. All of these ways return the complete data irrespective of any sampling strategy.
3. The composition of test and training data sam-4 ples impacts the results: The distribution of training and test samples in the predictive modelling highly impacts the results of the predictive model. Also, the selection of training and test samples while building and evaluating the predictive model is a kind of bias which sometimes result in incorrect understandings by the model as the model is not exposed to complete data while training or evaluation. [11,42] Solution -For our set of experiments, we are trying to handle this bias by working on 1) k-fold cross validation and 2) batching in train/test split while training and evaluating the predictive models involved in different components of the perception extraction pipeline.
4. Performance evaluation metrics bias: There are several performance evaluation metrics for predicting models such as precision, recall, accuracy, etc. The most widely used metric is the accuracy but it tends towards bias which can occur due to the data imbalance between classes. [24,38] Solution -Except accuracy, we are also considering precision, recall and sensitivity-based evaluation metrics which reveal performance of predictive models on each individual class labels. Based on these performance metrics, we are planning to optimize our experiments related to perception extraction. Hence, improving the experiments will result in the reduction of this bias in reporting of the predicted results.
5. Individual privacy due to data linkage. despite the public nature of users' posts on Twitter, there is a potential privacy risk that information collected for an intended purpose may be used to achieve other, potentially malevolent goals and that users could be targeted for persecution for their beliefs and opinions [39]. Solution -For the risk of identification, the collected twitts from Twitter are anonymized while the data collected still can be used for extracting useful insights without compromising the individuals privacy. For the purposemisuse, this work as conducted as a part of the EU-H2020 founded project METICOS 2 (A Platform for Monitoring and Prediction of Social Impact and Acceptability of Modern Border Control Technology) and the purpose of the analyze has been indenfities and justified in 2 https://meticos-project.eu/ the project's privacy/data protection impact assessment.

Results and Discussions
This section shows the results obtained by our proposed methodology and discusses the performance comparison of the proposed methodology with the IMDB baseline dataset.

IMDB Dataset Results
This section discusses the performance evaluation of four sentiment labels generation techniques mentioned in section 3.3. The performance results based on mentioned evaluation metrics are given in Table 3. Furthermore, Figure 4 shows the weighted F1-score comparison results of all the techniques.   The performance comparison results shows that the FLAIR technique outperformed among all the techniques. Therefore, we selected FlAIR as the technique for generating the sentiment labels automatically.
Next, we discuss the performance evaluation of the BERT model in combination with the Flair technique for training the automatically labelled sentiment data. The comparison of BERT and Flair with other state-of-the-art deep learning models is shown in Table 4. Furthermore, the weighted F1-score comparison of the proposed BERT + Flair model is given in Figure 5.
From above performance comparison results, it can be observed that the BERT and Flair model  Table 4: Comparison of BERT model with stateof-the-art models outperformed among all the other state-of-the-art deep learning models with an overall accuracy of 79%. Additionally, the model also performs better than the original FLAIR model. Hence, the proposed approach seems to be prominent when it comes to performing sentiment analysis for social media data in a domain where no training data exists. Figure 6 shows the confusion matrix for the results comparing the predictions of the selected model on the original IMDB test dataset labels; where 0 refers to negative and 1 refers to positive class label.

SBC Tweets Dataset Results
First, we divided the SBC tweets dataset collected in section 3.1 into three sets; training, validation and test. The division was made as 2k tweets were set aside as the test set while rest of the tweets were used for training and validation, respectively. After training and validating the model using the process flow given in Figure 2, we predicted the sentiments for the test dataset. The extracted user perceptions for the test dataset are discussed below. Figure 7 shows the distribution of the user tweets based on two sentiments, positive and negative. Majority of the tweets had negative label associated, with 61% from overall test data for SBC technologies. The possible reason behind the majority of the negative predicted tweets might be the date of the tweets used. For this study, we used the tweets from 2019 to 2021 and in these years the airport operations along with SBC techologies were highly affected due to the travel uncertainties caused by COVID-19 pandemic. Furthermore, to relate predicted sentiments more to user perceptions, user profile and demographics, we also performed the observations described below. First, we investigated into the question of what the most frequent locations of the users who have posted on Twitter concerning SBC technologies are. Figure 8 shows the most frequent user locations based on the tweets counts. Here we can see that the highest number of tweets are from United Kingdom.
Additionally, Figure 9 shows the distribution of the positive and negative user sentiments for SBC technologies based on their locations. Again, United Kingdom has the highest number of tweets with a total of 154 negative tweets and 72 positive tweets.
Finally, we visualize the year wise count of tweets for SBC technologies in whole as well as the division of the tweets into positive and negative sentiments.  Additionally, Figure 11 show the distribution year wise for positive and negative tweets. Again, the year 2019 has the highest number of tweets with 565 tweets as negative and 407 tweets as positive.

Conclusion and Future Work
This study analyzed user tweets in order to understand user perceptions for SBC technologies by combining BERT model with the Flair labelling technique. The proposed study contributes in two Figure 11: Year Wise Tweets Sentiment Distribution ways: 1) automatic sentiment labels generation when there is no labelled data exists and 2) applying the proposed study for extracting sentiments to understand user perceptions for SBC technologies. We evaluated four sentiment labels generation techniques and selected the best performing one by comparing with IMDB test dataset labels. The highest performing model was the combination of Flair and BERT with 79% accuracy on the IMDB test dataset. Then, we applied this model on another manually collected dataset by us related to the SBC technologies from Twitter for understanding the user perceptions. The extracted user perceptions resulted into 61% of the users tweets labelled as negative about the SBC technologies. Additionally, highest number of the users were from United Kingdom. The proposed approach can be customized for use in other domains where raw data exists, however there is no processed and labelled data available. As future work, we are planning to create a benchmark dataset in the domain of SBC technologies for sentiment as well as emotion understanding to improve the learning of proposed model.