Detecting Level of Depression from Social Media Posts for the Low-resource Bengali Language

Md. Nesarul Hoque; Umme Salma

doi:10.38032/jea.2023.02.003

Authors

Md. Nesarul Hoque Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj-8100, Bangladesh
Umme Salma Department of Computer Science & Engineering, Bangladesh University, Bangladesh

DOI:

https://doi.org/10.38032/jea.2023.02.003

Keywords:

Depression, Bengali, Machine Learning, Multi-class Classification, Low-resource Language

Abstract

Depression is a mental illness that suffers people in their thoughts and daily activities. In extreme cases, sometimes it leads to self-destruction or commit to suicide. Besides an individual, depression harms the victim's family, society, and working environment. Therefore, before physiological treatment, it is essential to identify depressed people first. As various social media platforms like Facebook overwhelm our everyday life, depressed people share their personal feelings and opinions through these platforms by sending posts or comments. We have detected many research work that experiment on those text messages in English and other highly-resourced languages. Limited works we have identified in low-resource languages like Bengali. In addition, most of these works deal with a binary classification problem. We classify the Bengali depression text into four classes: non-depressive, mild, moderate, and severe in this investigation. At first, we developed a depression dataset of 2,598 entries. Then, we apply pre-processing tasks, feature selection techniques, and three types of machine learning (ML) models: classical ML, deep-learning (DL), and transformer-based pre-trained models. The XLM-RoBERTa-based pre-trained model outperforms with 61.11% F1-score and 60.89% accuracy the existing works for the four levels of the depression-class classification problem. Our proposed machine learning-based automatic detection system can recognize the various stages of depression, from low to high. It may assist the psychologist or others in providing level-wise counseling to depressed people to return to their ordinary life.

References

Arusha, A.R. and Biswas, R.K., 2020. Prevalence of stress, anxiety and depression due to examination in Bangladeshi youths: A pilot study. Children and youth services review, 116, p.105254. DOI: https://doi.org/10.1016/j.childyouth.2020.105254

Islam, M.S., Rahman, M.E., Moonajilin, M.S. and van Os, J., 2021. Prevalence of depression, anxiety and associated factors among school going adolescents in Bangladesh: Findings from a cross-sectional study. Plos one, 16(4), p.e0247898.. DOI: https://doi.org/10.1371/journal.pone.0247898

Ogbo, F.A., Mathsyaraja, S., Koti, R.K., Perz, J. and Page, A., 2018. The burden of depressive disorders in South Asia, 1990–2016: findings from the global burden of disease study. BMC psychiatry, 18(1), pp.1-11. DOI: https://doi.org/10.1186/s12888-018-1918-1

Mashreky, S.R., Rahman, F. and Rahman, A., 2013. Suicide kills more than 10,000 people every year in Bangladesh. Archives of Suicide Research, 17(4), pp.387-396. DOI: https://doi.org/10.1080/13811118.2013.801809

Rafidul Hasan Khan, M., Afroz, U.S., Masum, A.K.M., Abujar, S. and Hossain, S.A., 2021. A deep learning approach to detect depression from Bengali text. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2020, Volume 2 (pp. 777-785). Springer Singapore. DOI: https://doi.org/10.1007/978-981-33-4367-2_74

Mohammed, M.B., Abir, A.S.M., Salsabil, L., Shahriar, M. and Fahmin, A., 2021, December. Depression Analysis from Social Media Data in Bangla Language: An Ensemble Approach. In 2021 Emerging Technology in Computing, Communication and Electronics (ETCCE) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/ETCCE54784.2021.9689887

Mumu, T.F., Munni, I.J. and Das, A.K., 2021. Depressed people detection from bangla social media status using lstm and cnn approach. Journal of Engineering Advancements, 2(01), pp.41-47. DOI: https://doi.org/10.38032/jea.2021.01.006

Uddin, A.H., Bapery, D. and Arif, A.S.M., 2019, July. Depression analysis from social media data in Bangla language using long short term memory (LSTM) recurrent neural network technique. In 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2) (pp. 1-4). IEEE. DOI: https://doi.org/10.1109/IC4ME247184.2019.9036528

Tasnim, F., Habiba, S.U., Nafisa, N. and Ahmed, A., 2022. Depressive Bangla text detection from social media post using different data mining techniques. In Computational Intelligence in Machine Learning: Select Proceedings of ICCIML 2021 (pp. 237-247). Singapore: Springer Nature Singapore. DOI: https://doi.org/10.1007/978-981-16-8484-5_21

Khan, M.R.H., Afroz, U.S., Masum, A.K.M., Abujar, S. and Hossain, S.A., 2020, July. Sentiment analysis from bengali depression dataset using machine learning. In 2020 11th international conference on computing, communication and networking technologies (ICCCNT) (pp. 1-5). IEEE. DOI: https://doi.org/10.1109/ICCCNT49239.2020.9225511

Hossen, I., Islam, T., Rashed, M.G. and Das, D., 2022, October. Early Suicide Prevention: Depression Level Prediction Using Machine Learning and Deep Learning Techniques for Bangladeshi Facebook Users. In Proceedings of International Conference on Fourth Industrial Revolution and Beyond 2021 (pp. 735-747). Singapore: Springer Nature Singapore. DOI: https://doi.org/10.1007/978-981-19-2445-3_52

Das, A., Sharif, O., Hoque, M.M. and Sarker, I.H., 2021. Emotion classification in a resource constrained language using transformer-based approach. arXiv preprint arXiv:2104.08613. DOI: https://doi.org/10.18653/v1/2021.naacl-srw.19

Seddiqui, M.H., Maruf, A.A.M. and Chy, A.N., 2016. Recursive suffix stripping to augment bangla stemmer. In International Conference Advanced Information and Communication Technology (ICAICT).

Ahmed, M.T., Rahman, M., Nur, S., Islam, A. and Das, D., 2021, February. Deployment of machine learning and deep learning algorithms in detecting cyberbullying in bangla and romanized bangla text: A comparative study. In 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT) (pp. 1-10). IEEE. DOI: https://doi.org/10.1109/ICAECT49130.2021.9392608

Kumar, R., Lahiri, B. and Ojha, A.K., 2021. Aggressive and offensive language identification in hindi, bangla, and english: A comparative study. SN Computer Science, 2(1), p.26. DOI: https://doi.org/10.1007/s42979-020-00414-6

Joulin, A., Grave, E., Bojanowski, P. and Mikolov, T., 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759. DOI: https://doi.org/10.18653/v1/E17-2068

Grave, E., Bojanowski, P., Gupta, P., Joulin, A. and Mikolov, T., 2018. Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893.

Kudo, T. and Richardson, J., 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226. DOI: https://doi.org/10.18653/v1/D18-2012

Kudo, T., 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. arXiv preprint arXiv:1804.10959. DOI: https://doi.org/10.18653/v1/P18-1007

Sennrich, R., Haddow, B. and Birch, A., 2015. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909. DOI: https://doi.org/10.18653/v1/P16-1162

N. R. Bhowmik, M. Arifuzzaman, M. R. H. Mondal, and M. Islam, “Bangla text sentiment analysis using supervised machine learning with extended lexicon dictionary,” Natural Language Processing Research, vol. 1, no. 3-4, pp. 34–45, 2021. DOI: https://doi.org/10.2991/nlpr.d.210316.001

Tabassum, N. and Khan, M.I., 2019, February. Design an empirical framework for sentiment analysis from Bangla text using machine learning. In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE) (pp. 1-5). IEEE. DOI: https://doi.org/10.1109/ECACE.2019.8679347

Jahan, M.S., Haque, M., Arhab, N. and Oussalah, M., 2022, June. BanglaHateBERT: BERT for Abusive Language Detection in Bengali. In Proceedings of the Second International Workshop on Resources and Techniques for User Information in Abusive Language Analysis (pp. 8-15).

Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L. and Stoyanov, V., 2019. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116. DOI: https://doi.org/10.18653/v1/2020.acl-main.747

Xu, S., Li, Y. and Wang, Z., 2017. Bayesian multinomial Naïve Bayes classifier to text classification. In Advanced Multimedia and Ubiquitous Engineering: MUE/FutureTech 2017 11 (pp. 347-352). Springer Singapore. DOI: https://doi.org/10.1007/978-981-10-5041-1_57

Cortes, C. and Vapnik, V., 1995. Support-vector networks. Machine learning, 20, pp.273-297. DOI: https://doi.org/10.1007/BF00994018

Ali, J., Khan, R., Ahmad, N. and Maqsood, I., 2012. Random forests and decision trees. International Journal of Computer Science Issues (IJCSI), 9(5), p.272.

Hochreiter, S. and Schmidhuber, J., 1997. Long short-term memory. Neural computation, 9(8), pp.1735-1780. DOI: https://doi.org/10.1162/neco.1997.9.8.1735

Schuster, M. and Paliwal, K.K., 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11), pp.2673-2681. DOI: https://doi.org/10.1109/78.650093

Karim, M.R., Chakravarthi, B.R., McCrae, J.P. and Cochez, M., 2020, October. Classification benchmarks for under-resourced bengali language based on multichannel convolutional-lstm network. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) (pp. 390-399). IEEE. DOI: https://doi.org/10.1109/DSAA49011.2020.00053

Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Carneiro, T., Da Nóbrega, R.V.M., Nepomuceno, T., Bian, G.B., De Albuquerque, V.H.C. and Reboucas Filho, P.P., 2018. Performance analysis of google colaboratory as a tool for accelerating deep learning applications. IEEE Access, 6, pp.61677-61685. DOI: https://doi.org/10.1109/ACCESS.2018.2874767

Wong, T.T. and Yeh, P.Y., 2019. Reliable accuracy estimates from k-fold cross validation. IEEE Transactions on Knowledge and Data Engineering, 32(8), pp.1586-1594. DOI: https://doi.org/10.1109/TKDE.2019.2912815

Baldi, P. and Sadowski, P.J., 2013. Understanding dropout. Advances in neural information processing systems, 26.

Detecting Level of Depression from Social Media Posts for the Low-resource Bengali Language

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Conference Series Advertisement

submit-block

info-block

journalfeature

indexing-abstracting