Twitter Sentiment analysis of Covid-19 vaccination using Deep Learning
Varsha Naika
MIT-WPU, Dr. Vishwanath Karad’s MIT World Peace University, Pune, India
Corresponding Author: varsha.powar@mitwpu.edu.in
Dr. Rajeswari Kannanb
PCCoE, Pimpri Chinchwad College of Engineering, Pune, India
Corresponding Author: kannan.rajeswari@pccoepune.org
Snehalraj Chugha
MIT-WPU, Dr. Vishwanath Karad’s MIT World Peace University, Pune, India
Corresponding Author: snehalchugh2016@gmail.com
Ahbaz Memona
MIT-WPU, Dr. Vishwanath Karad’s MIT World Peace University, Pune, India
Corresponding Author:ahbazmemon0@gmail.com,
Himanshu Chaudharia
MIT-WPU, Dr. Vishwanath Karad’s MIT World Peace University, Pune, India
Corresponding Author:himanshuchaudhari2346@gmail.com
Abstract :
Covid-19 had consequential social, economic, and extreme mental outcomes on the community, where media platforms like Twitter increasingly became essential networking mediums generating information with a large volume of reports, views, opinions, and information shared by individuals and authorized outlets. In 2021, when the second wave of COVID emerged in India, we recognised the fastest outbreak with more than 20 lakh cases in April’s first half. Until then, India distributed over one billion vaccine units with two producers: Bharat Biotech, producing Covaxin, and Covishield, OxfordAstraZeneca’s vaccine, by SII (Serum Institute of India). We collected datasets for analysis, and applied our novel algorithms for preprocessing, i.e., removal of URLs, @, #, contracted words, punctuations, numbers, POS, etc. Converted tweets into tokenized words, used stemming & lemmatizations, then applied neural spellchecker. Using our in-house algorithm, we cleaned around 500 tweets in just 0.5 seconds, getting rid of duplicate and redundant tweets. A word cloud with classes: Positive, Negative, and Neutral was constructed which then used neural network to predict them, resulting in 97% training and 99% testing accuracy. Results aid in improved policy design, keeping citizens' perspectives in mind, and an aware government about issues like vaccination shortages, food, poverty, etc.
Keywords:
- Covid-19
- Pandemic
- Sentimental Analysis;
- Word Cloud,
- K-Means Clustering,
- Natural Language Processing,
- Twitter,
- Covaxin
- Covishield
Reference
[1] Yang, J., Chen, X., Deng, X., Chen, Z., Gong, H., Yan, H., Wu, Q., Shi, H., Lai, S., Ajelli, M., Viboud, C., & Yu, P. H. (2020). Disease burden and clinical severity of the first pandemic wave of COVID-19 in Wuhan, China. Nature Communications, 1. https://doi.org/10.1038/s41467-020-19238-2.
[2] Li, L., Zhang, Q., Wang, X., Zhang, J., Wang, T., Gao, T.-L., Duan, W., Tsoi, K. K., & Wang, F.-Y. (2020). Characterizing the Propagation of Situational Information in social media During COVID-19 Epidemic: A Case Study on Weibo. IEEE Transactions on Computational Social Systems, 2, 556–562. https://doi.org/10.1109/tcss.2020.2980007
[3] Chamola, V., Hassija, V., Gupta, V., & Guizani, M. (2020). A Comprehensive Review of the COVID-19 Pandemic and the Role of IoT, Drones, AI, Blockchain, and 5G in Managing its Impact. IEEE Access, 90225–90265. https://doi.org/10.1109/access.2020.2992341.
[4] Long, S. W., Olsen, R. J., Christensen, P. A., Bernard, D. W., Davis, et al. (2020). Molecular Architecture of Early Dissemination and Massive Second Wave of the SARS-CoV-2 Virus in a Major Metropolitan Area. MBio, 6. https://doi.org/10.1128/mbio.02707-20.
[5] Salzberger, B., Glück, T., & Ehrenstein, B. (2020). Successful containment of COVID-19: the WHO-Report on the COVID-19 outbreak in China. Infection, 2, 151–153. https://doi.org/10.1007/s15010-020-01409-4.
[6] Vahidy, F. S., Drews, A. L., Masud, F. N., Schwartz, R. L., Askary, B. “Billy,” Boom, M. L., & Phillips, R. A. (2020). Characteristics and Outcomes of COVID-19 Patients During Initial Peak and Resurgence in the Houston Metropolitan Area. JAMA, 10, 998. https://doi.org/10.1001/jama.2020.15301.
[7] Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., et al. (2020). A Novel Coronavirus from Patients with Pneumonia in China, 2019. New England Journal of Medicine, 8, 727–733. https://doi.org/10.1056/nejmoa2001017.
[8] Cambria, E., Schuller, B., Xia, Y., & Havasi, C. (2013). New Avenues in Opinion Mining and Sentiment Analysis. IEEE Intelligent Systems, 2, 15–21. https://doi.org/10.1109/mis.2013.30.
[9] Shen, K.-L., Yang, Y.-H., Jiang, R.-M., Wang, T.-Y., Zhao, D.-C., et al. (2020). Updated diagnosis, treatment and prevention of COVID-19 in children: experts’ consensus statement (condensed version of the second edition). World Journal of Pediatrics, 3, 232–239. https://doi.org/10.1007/s12519-020-00362-4.
[10] Cotfas, L.-A., Delcea, C., Roxin, I., Ioanas, C., Gherai, D. S., & Tajariol, F. (2021). The Longest Month: Analyzing COVID-19 Vaccination Opinions Dynamics from Tweets in the Month Following the First Vaccine Announcement. IEEE Access, 33203–33223. https://doi.org/10.1109/access.2021.3059821.
[11] Fan, G., Yang, Z., Lin, Q., Zhao, S., Yang, L., & He, D. (2020). Decreased Case Fatality Rate of COVID-19 in the Second Wave: A study in 53 countries or regions. Transboundary and Emerging Diseases, 2, 213–215. https://doi.org/10.1111/tbed.13819.
[12] Jongeling, R., Datta, S., & Serebrenik, A. (2015). Choosing your weapons: On sentiment analysis tools for software engineering research. In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) (pp. 531-535).
[13] Liu B. (2011) Opinion Mining and Sentiment Analysis. In: Web Data Mining. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg.
[14] Liu B., Zhang L. (2012) A Survey of Opinion Mining and Sentiment Analysis. In: Aggarwal C., Zhai C. (eds) Mining Text Data. Springer, Boston, MA.
[15] Goel, A., Gautam, J., & Kumar, S. (2016). Real time sentiment analysis of tweets using Naive Bayes. In 2016 2nd International Conference on Next Generation Computing Technologies (NGCT) (pp. 257-261).
[16] Nelli F. (2018) Textual Data Analysis with NLTK. In: Python Data Analytics. Apress, Berkeley, CA.
[17] Yogish D., Manjunath T.N., Hegadi R.S. (2019) Review on Natural Language Processing Trends and Techniques Using NLTK. In: Santosh K., Hegadi R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R (2018). Communications in Computer and Information Science, vol 1037. Springer, Singapore.
[18] Likas, A., Vlassis, N., & J. Verbeek, J. (2003). The global k-means clustering algorithm. Pattern Recognition, 2, 451–461. https://doi.org/10.1016/s0031-3203(02)00060-2.
[19] Purnima Bholowalia, & Arvind Kumar (2014). Article: EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN. International Journal of Computer Applications, 105(9), 17-24.
[20] Yuan, Chunhui & Yang, Haitao. (2019). Research on K-Value Selection Method of K-Means Clustering Algorithm. J. 2. 226-235. 10.3390/j2020016.
[21] Aranganayagi, S., & Thangavel, K. (2007). Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure. In International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007) (pp. 13-17).
[22] Thinsungnoen, Tippaya & Kaoungku, Nuntawut & Durongdumronchai, Pongsakorn & Kerdprasop, Kittisak & Kerdprasop, Nittaya. (2015). The Clustering Validity with Silhouette and Sum of Squared Errors. 44-51. 10.12792/iciae2015.012.
[23] Kodinariya, Trupti & Makwana, Prashant. (2013). Review on Determining of Cluster in K-means Clustering. International Journal of Advance Research in Computer Science and Management Studies. 1. 90-95.
[24] Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 8, 651–666. https://doi.org/10.1016/j.patrec.2009.09.011.
[25] Wagstaff, Kiri & Cardie, Claire & Rogers, Seth & Schrödl, Stefan. (2001). Constrained K-means Clustering with Background Knowledge. Proceedings of 18th International Conference on Machine Learning. 577-584.
[26] Zhang, W., Yoshida, T., & Tang, X. (2011). A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Systems with Applications, 3, 2758–2765. https://doi.org/10.1016/j.eswa.2010.08.066.
[27] Sehgal, S., Singh, H., Agarwal, M., Bhasker, V., & Shantanu (2014). Data analysis using principal component analysis. In 2014 International Conference on Medical Imaging, m-Health and Emerging Communication Systems (MedCom) (pp. 45-48).
[28] Martinez, A. M., & Kak, A. C. (2001). PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 228–233. https://doi.org/10.1109/34.908974.
[29] Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., & Lin, S. (2007). Graph Embedding and Extensions: A General Framework for Dimensionality Reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence,1, 40–51, https://doi.org/10.1109/tpami.2007.250598.
[30] Heimerl, F., Lohmann, S., Lange, S., & Ertl, T. (2014). Word Cloud Explorer: Text Analytics Based on Word Clouds. In 2014 47th Hawaii International Conference on System Sciences (pp. 1833-1842).
[31] Pak, Alexander & Paroubek, Patrick. (2010). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Proceedings of LREC. 10.
[32] Saif H., He Y., Alani H. (2012). Semantic Sentiment Analysis of Twitter. In: Cudré-Mauroux P. et al. (eds) The Semantic Web – ISWC 2012. ISWC 2012. Lecture Notes in Computer Science, vol 7649. Springer, Berlin, Heidelberg.
[33] Jelodar, H., Wang, Y., Orji, R., & Huang, S. (2020). Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach. IEEE Journal of Biomedical and Health Informatics, 24(10), 2733-2742.
[34] Wang, X., Ma, X., & Grimson, E. (2007). Unsupervised Activity Perception by Hierarchical Bayesian Models. In 2007 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-8).
[35] Blei, David & Ng, Andrew & Jordan, Michael. (2001). Latent Dirichlet Allocation. The Journal of Machine Learning Research. 3. 601-608.
[36] Erik Wiener, Jan O. Pedersen, & Andreas S. Weigend. (1995). A Neural Network Approach to Topic Spotting.
[37] Zaremba, Wojciech & Sutskever, Ilya & Vinyals, Oriol. (2014). Recurrent Neural Network Regularization.
[38] Tsai, J.T., Chou, J.H., Liu, T.K (2006). Tuning the structure and parameters of a neural network by using hybrid Taguchi-genetic algorithm. IEEE Transactions on Neural Networks, 17(1), 69-80.
[39] Leung, F., Lam, H., Ling, S., & Tam, P. (2003). Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Transactions on Neural Networks, 14(1), 79-88.