Detecting Level of Depression from Social Media Posts for the Low-resource Bengali Language
DOI:
https://doi.org/10.38032/jea.2023.02.003Keywords:
Depression, Bengali, Machine Learning, Multi-class Classification, Low-resource LanguageAbstract
Depression is a mental illness that suffers people in their thoughts and daily activities. In extreme cases, sometimes it leads to self-destruction or commit to suicide. Besides an individual, depression harms the victim's family, society, and working environment. Therefore, before physiological treatment, it is essential to identify depressed people first. As various social media platforms like Facebook overwhelm our everyday life, depressed people share their personal feelings and opinions through these platforms by sending posts or comments. We have detected many research work that experiment on those text messages in English and other highly-resourced languages. Limited works we have identified in low-resource languages like Bengali. In addition, most of these works deal with a binary classification problem. We classify the Bengali depression text into four classes: non-depressive, mild, moderate, and severe in this investigation. At first, we developed a depression dataset of 2,598 entries. Then, we apply pre-processing tasks, feature selection techniques, and three types of machine learning (ML) models: classical ML, deep-learning (DL), and transformer-based pre-trained models. The XLM-RoBERTa-based pre-trained model outperforms with 61.11% F1-score and 60.89% accuracy the existing works for the four levels of the depression-class classification problem. Our proposed machine learning-based automatic detection system can recognize the various stages of depression, from low to high. It may assist the psychologist or others in providing level-wise counseling to depressed people to return to their ordinary life.
References
Arusha, A.R. and Biswas, R.K., 2020. Prevalence of stress, anxiety and depression due to examination in Bangladeshi youths: A pilot study. Children and youth services review, 116, p.105254. DOI: https://doi.org/10.1016/j.childyouth.2020.105254
Islam, M.S., Rahman, M.E., Moonajilin, M.S. and van Os, J., 2021. Prevalence of depression, anxiety and associated factors among school going adolescents in Bangladesh: Findings from a cross-sectional study. Plos one, 16(4), p.e0247898.. DOI: https://doi.org/10.1371/journal.pone.0247898
Ogbo, F.A., Mathsyaraja, S., Koti, R.K., Perz, J. and Page, A., 2018. The burden of depressive disorders in South Asia, 1990–2016: findings from the global burden of disease study. BMC psychiatry, 18(1), pp.1-11. DOI: https://doi.org/10.1186/s12888-018-1918-1
Mashreky, S.R., Rahman, F. and Rahman, A., 2013. Suicide kills more than 10,000 people every year in Bangladesh. Archives of Suicide Research, 17(4), pp.387-396. DOI: https://doi.org/10.1080/13811118.2013.801809
Rafidul Hasan Khan, M., Afroz, U.S., Masum, A.K.M., Abujar, S. and Hossain, S.A., 2021. A deep learning approach to detect depression from Bengali text. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2020, Volume 2 (pp. 777-785). Springer Singapore. DOI: https://doi.org/10.1007/978-981-33-4367-2_74
Mohammed, M.B., Abir, A.S.M., Salsabil, L., Shahriar, M. and Fahmin, A., 2021, December. Depression Analysis from Social Media Data in Bangla Language: An Ensemble Approach. In 2021 Emerging Technology in Computing, Communication and Electronics (ETCCE) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/ETCCE54784.2021.9689887
Mumu, T.F., Munni, I.J. and Das, A.K., 2021. Depressed people detection from bangla social media status using lstm and cnn approach. Journal of Engineering Advancements, 2(01), pp.41-47. DOI: https://doi.org/10.38032/jea.2021.01.006
Uddin, A.H., Bapery, D. and Arif, A.S.M., 2019, July. Depression analysis from social media data in Bangla language using long short term memory (LSTM) recurrent neural network technique. In 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2) (pp. 1-4). IEEE. DOI: https://doi.org/10.1109/IC4ME247184.2019.9036528
Tasnim, F., Habiba, S.U., Nafisa, N. and Ahmed, A., 2022. Depressive Bangla text detection from social media post using different data mining techniques. In Computational Intelligence in Machine Learning: Select Proceedings of ICCIML 2021 (pp. 237-247). Singapore: Springer Nature Singapore. DOI: https://doi.org/10.1007/978-981-16-8484-5_21
Khan, M.R.H., Afroz, U.S., Masum, A.K.M., Abujar, S. and Hossain, S.A., 2020, July. Sentiment analysis from bengali depression dataset using machine learning. In 2020 11th international conference on computing, communication and networking technologies (ICCCNT) (pp. 1-5). IEEE. DOI: https://doi.org/10.1109/ICCCNT49239.2020.9225511
Hossen, I., Islam, T., Rashed, M.G. and Das, D., 2022, October. Early Suicide Prevention: Depression Level Prediction Using Machine Learning and Deep Learning Techniques for Bangladeshi Facebook Users. In Proceedings of International Conference on Fourth Industrial Revolution and Beyond 2021 (pp. 735-747). Singapore: Springer Nature Singapore. DOI: https://doi.org/10.1007/978-981-19-2445-3_52
Das, A., Sharif, O., Hoque, M.M. and Sarker, I.H., 2021. Emotion classification in a resource constrained language using transformer-based approach. arXiv preprint arXiv:2104.08613. DOI: https://doi.org/10.18653/v1/2021.naacl-srw.19
Seddiqui, M.H., Maruf, A.A.M. and Chy, A.N., 2016. Recursive suffix stripping to augment bangla stemmer. In International Conference Advanced Information and Communication Technology (ICAICT).
Ahmed, M.T., Rahman, M., Nur, S., Islam, A. and Das, D., 2021, February. Deployment of machine learning and deep learning algorithms in detecting cyberbullying in bangla and romanized bangla text: A comparative study. In 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT) (pp. 1-10). IEEE. DOI: https://doi.org/10.1109/ICAECT49130.2021.9392608
Kumar, R., Lahiri, B. and Ojha, A.K., 2021. Aggressive and offensive language identification in hindi, bangla, and english: A comparative study. SN Computer Science, 2(1), p.26. DOI: https://doi.org/10.1007/s42979-020-00414-6
Joulin, A., Grave, E., Bojanowski, P. and Mikolov, T., 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759. DOI: https://doi.org/10.18653/v1/E17-2068
Grave, E., Bojanowski, P., Gupta, P., Joulin, A. and Mikolov, T., 2018. Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893.
Kudo, T. and Richardson, J., 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226. DOI: https://doi.org/10.18653/v1/D18-2012
Kudo, T., 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. arXiv preprint arXiv:1804.10959. DOI: https://doi.org/10.18653/v1/P18-1007
Sennrich, R., Haddow, B. and Birch, A., 2015. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909. DOI: https://doi.org/10.18653/v1/P16-1162
N. R. Bhowmik, M. Arifuzzaman, M. R. H. Mondal, and M. Islam, “Bangla text sentiment analysis using supervised machine learning with extended lexicon dictionary,” Natural Language Processing Research, vol. 1, no. 3-4, pp. 34–45, 2021. DOI: https://doi.org/10.2991/nlpr.d.210316.001
Tabassum, N. and Khan, M.I., 2019, February. Design an empirical framework for sentiment analysis from Bangla text using machine learning. In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE) (pp. 1-5). IEEE. DOI: https://doi.org/10.1109/ECACE.2019.8679347
Jahan, M.S., Haque, M., Arhab, N. and Oussalah, M., 2022, June. BanglaHateBERT: BERT for Abusive Language Detection in Bengali. In Proceedings of the Second International Workshop on Resources and Techniques for User Information in Abusive Language Analysis (pp. 8-15).
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L. and Stoyanov, V., 2019. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116. DOI: https://doi.org/10.18653/v1/2020.acl-main.747
Xu, S., Li, Y. and Wang, Z., 2017. Bayesian multinomial Naïve Bayes classifier to text classification. In Advanced Multimedia and Ubiquitous Engineering: MUE/FutureTech 2017 11 (pp. 347-352). Springer Singapore. DOI: https://doi.org/10.1007/978-981-10-5041-1_57
Cortes, C. and Vapnik, V., 1995. Support-vector networks. Machine learning, 20, pp.273-297. DOI: https://doi.org/10.1007/BF00994018
Ali, J., Khan, R., Ahmad, N. and Maqsood, I., 2012. Random forests and decision trees. International Journal of Computer Science Issues (IJCSI), 9(5), p.272.
Hochreiter, S. and Schmidhuber, J., 1997. Long short-term memory. Neural computation, 9(8), pp.1735-1780. DOI: https://doi.org/10.1162/neco.1997.9.8.1735
Schuster, M. and Paliwal, K.K., 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11), pp.2673-2681. DOI: https://doi.org/10.1109/78.650093
Karim, M.R., Chakravarthi, B.R., McCrae, J.P. and Cochez, M., 2020, October. Classification benchmarks for under-resourced bengali language based on multichannel convolutional-lstm network. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) (pp. 390-399). IEEE. DOI: https://doi.org/10.1109/DSAA49011.2020.00053
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Carneiro, T., Da Nóbrega, R.V.M., Nepomuceno, T., Bian, G.B., De Albuquerque, V.H.C. and Reboucas Filho, P.P., 2018. Performance analysis of google colaboratory as a tool for accelerating deep learning applications. IEEE Access, 6, pp.61677-61685. DOI: https://doi.org/10.1109/ACCESS.2018.2874767
Wong, T.T. and Yeh, P.Y., 2019. Reliable accuracy estimates from k-fold cross validation. IEEE Transactions on Knowledge and Data Engineering, 32(8), pp.1586-1594. DOI: https://doi.org/10.1109/TKDE.2019.2912815
Baldi, P. and Sadowski, P.J., 2013. Understanding dropout. Advances in neural information processing systems, 26.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Md. Nesarul Hoque, Umme Salma
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Most read articles by the same author(s)
- Abu Salman Shaikat, Suraiya Akter, Umme Salma, Computer Vision Based Industrial Robotic Arm for Sorting Objects by Color and Height , Journal of Engineering Advancements: Vol. 1 No. 04 (2020)
- Md. Nesarul Hoque, Umme Salma, Md. Jamal Uddin, Sadia Afrin Shampa, Depression Intensity Identification using Transformer Ensemble Technique for the Resource-constrained Bengali Language , Journal of Engineering Advancements: Vol. 5 No. 02 (2024)
Similar Articles
- Md Rasel Sarkar, Lafifa Margia Orpa, Rifat Afroz Orpe, Forecasting Model Selection with Variables Impact to Predict Electricity Demand at Rajshahi City of Bangladesh , Journal of Engineering Advancements: Vol. 4 No. 03 (2023)
You may also start an advanced similarity search for this article.