شناسایی گویندگان در فضای مجازی: بررسی نقش آواهای سایشی بی‌واک در داده‌های صوتی واتساپ

داودی, راحیل; اسدی, هما

doi:10.22059/jolr.2025.389287.666909

نوع مقاله : مقاله پژوهشی

نویسندگان

¹ دانشجوی کارشناسی ارشد زبانشناسی رایانشی، گروه زبانشناسی، دانشگاه اصفهان، اصفهان، ایران.

² استادیار گروه زبانشناسی، دانشگاه اصفهان، اصفهان، ایران.

https://doi.org/10.22059/jolr.2025.389287.666909

چکیده

در عصر دیجیتال، شناسایی دقیق گوینده در تحقیقات قضایی و امنیتی از اهمیت ویژه‌ای برخوردار است. با این‌ حال، گسترش ارتباطات مبتنی بر اینترنت و استفادۀ گسترده از پیام‌رسان‌هایی مانند واتس‌اپ، چالش‌های جدیدی را در این حوزه ایجاد کرده است. کیفیت متغیر میکروفون، نویز پس‌زمینه، اختلالات شبکه و فشرده‌سازی صوتی از جمله عواملی هستند که می‌توانند ویژگی‌های آکوستیکی گوینده را تحت تأثیر قرار دهند و دقت سیستم‌های شناسایی را کاهش دهند. علیرغم این محدودیت‌ها، بررسی عملکرد ویژگی‌های آکوستیکی در چنین شرایطی برای پیشبرد حوزۀ آواشناسی قضایی و بهبود کاربردهای عملی آن در محیط‌های واقعی ضروری است. این پژوهش به بررسی نقش آواهای سایشی بی‌واک در نشان‌دادن تغییرات بین‌گوینده‌ای در داده‌های صوتی ضبط‌شده از طریق پیام‌رسان واتس‌اپ می‌پردازد. نوآوری این پژوهش در بررسی توانایی آواهای سایشی بی‌واک زبان فارسی برای شناسایی گویندگان در شرایط ضبط غیرایده‌آل است. برای این منظور، داده‌های صوتی از ۱۰۰ گویشور مرد فارسی‌زبان جمع‌آوری شد و ضرایب کپسترال فرکانسی مل (MFCC) از زنجیرۀ آواهای سایشی بی‌واک استخراج شده و به‌عنوان ورودی به مدل ماشین بردار پشتیبان (SVM) وارد شدند. نتایج نشان داد که دقت مدل در تشخیص گوینده، زمانی‌ که تمامی آواهای سایشی بی‌واک به‌طور هم‌زمان در نظر گرفته شدند، ۶۹ درصد بوده است. با این‌ حال، بررسی جداگانۀ هر یک از آواهای سایشی، افزایش دقت مدل را نشان داد. در این میان، آوای سایشی /s/ با دقت ۷۷ درصد، بیشترین تأثیر را داشت. پس از آن، آواهای /ʃ/ و/x/ ، /f/ به‌ترتیب با دقت‌های ۷۵ درصد، ۷۴ درصد و ۷۳ درصد قرار گرفتند. این نتایج نشان می‌دهد که حتی در شرایط ضبط غیرایده‌آل، مانند داده‌های ضبط‌شده از طریق واتس‌اپ، آواهای سایشی بی‌واک می‌توانند اطلاعات ارزشمندی برای تمایز میان گویندگان ارائه دهند. با این‌ حال، این پژوهش تنها به یک نمونه از شرایط ضبط غیرایده‌آل پرداخته و بررسی سایر عوامل مخدوش‌کنندۀ بالقوه، نیازمند تحقیقات بیشتری است. یافته‌های این مطالعه، پتانسیل بالای آواهای سایشی بی‌واک را در کاربردهای شناسایی گوینده، به‌ویژه در سناریوهای غیررسمی، غیرکنترل‌شده و واقعی که فاقد تجهیزات ضبط باکیفیت هستند، نشان می‌دهد.

کلیدواژه‌ها

موضوعات

برای مشاهده مقالات مرتبط با موضوع، روی نام موضوع کلیک کنید.

عنوان مقاله [English]

Speaker Identification in Virtual Environments: Investigating the role of Voiceless Fricatives in WhatsApp Audio Data

نویسندگان [English]

Rahil Davoudi ¹
Homa Asadi ²

¹ M.A. Student of Computational Linguistics, Linguistics Department, University of Isfahan, Isfahan, Iran.

² Department of Linguistics, Faculty of foreign languages, University of Isfahan, Isfahan, Iran

چکیده [English]

In the digital age, accurate speaker identification plays a crucial role in forensic and security investigations. However, the widespread use of internet-based communication platforms, such as WhatsApp, has introduced new challenges in this field. Factors such as variable microphone quality, background noise, network distortions, and audio compression can significantly affect a speaker’s acoustic features and reduce the accuracy of speaker identification systems. Despite these limitations, evaluating the performance of acoustic features under such conditions is essential for advancing forensic phonetics and improving its practical applications in real-world settings. This study examines the role of voiceless fricatives in capturing between-speaker variability in audio recordings obtained through WhatsApp. The novelty of this research lies in investigating the ability of Persian voiceless fricatives to distinguish speakers under non-ideal recording conditions. To achieve this goal, speech data from 100 male Persian speakers were collected, and Mel-frequency cepstral coefficients (MFCCs) were extracted from the voiceless fricative segments. These features were then used as input to a support vector machine (SVM) model for speaker classification. The results showed that when all voiceless fricatives were considered together, the model achieved an overall speaker identification accuracy of 69%. However, analyzing each fricative separately led to an increase in model accuracy. Among the individual fricatives, the /s/ fricative had the highest accuracy at 77%, followed by /ʃ/, /f/, and /x/ with accuracies of 75%, 74%, and 73%, respectively. These findings suggest that even in non-ideal recording conditions, such as WhatsApp recordings, voiceless fricatives can provide valuable information for speaker differentiation. However, this study only focuses on one type of non-ideal recording condition, and further research is needed to explore other potential sources of degradation. The results highlight the potential of voiceless fricatives in speaker identification applications, particularly in informal, uncontrolled, and real-world scenarios where high-quality recording equipment is unavailable.

کلیدواژه‌ها [English]

Acoustic phonetics
Speaker identification
Fricative consonants
Mel-frequency cepstral coefficients
Support vector machine algorithm

مراجع

اسدی، ه.، نوربخش م.، ساسانی ف.، تفاوت های بین-گوینده در سایشی‌های بی‌واک زبان فارسی. جستارهای زبانی. ۱۳۹۸؛ ۱۰ (۱) :۱۲۹-۱۴۷.

اسدی، ه.، حسینی کیونانی، ن.، و نوربخش، م. (1394). بررسیِ تأثیر فراخوانی صورت بر ویژگی‌های آکوستیکی سایشی‌های بی‌واک زبان فارسی: پژوهشی در چارچوب آواشناسی قضایی. زبان‌شناسی و گویش‌های خراسان، 7(13)، 1-15.

Boersma, P., & Weenink, D. (2025). Praat: Doing phonetics by computer (Version 6.4.26) [Computer software]. University of Amsterdam. http://www.praat.org

Catford, J. C. (1977). Fundamental problems in phonetics. Edinburgh University Press. https://doi.org/10.2307/412751

Dellwo, V., Huchvale, M., & Ashby, M. (2007). How is individuality expressed in voice? An introduction to speech production and description for speaker classification. In C. Müller (Ed.), Speaker identification (Vol. 1, pp. 1–20). Springer. https://doi.org/10.1007/978-3-540-74200-5_1

Gold, E., & French, P. (2011). International practices in forensic speaker comparison. International Journal of Speech, Language and the Law, 18(2). https://doi.org/10.1558/ijsll.v18i2.293

Gordon, M., Barthmaier, P., & Sands, K. (2002). A cross-linguistic acoustic study of voiceless fricatives. Journal of the International Phonetic Association, 32(2), 141-174. https://doi.org/10.1017/S0025100302001020

Gouri, G., Sharma, A., & Sharma, V. (2024). Forensic speaker and gender identification from voice samples recorded through mobile phones and social media applications: A statistical and machine learning approach. Applied Acoustics, 222, 110074. https://doi.org/10.1016/j.apacoust.2024.110074

Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55

Jessen, M. (2008). Forensic phonetics. Language and Linguistics Compass, 2(4), 671–711. https://doi.org/10.1111/j.1749-818x.2008.00066.x

Jongman, A., Wayland, R., & Wong, S. (2000). Acoustic characteristics of English fricatives. The Journal of the Acoustical Society of America, 108(3), 1252-1263. https://doi.org/10.1121/1.1288413

Karpisek, F., Baggili, I., & Breitinger, F. (2015). WhatsApp network forensics: Decrypting and understanding the WhatsApp call signaling messages. Digital Investigation, 15, 110-118. https://doi.org/10.1016/j.diin.2015.09.002

Kavanagh, C. (2012). New consonantal acoustic parameters for forensic speaker comparison (Doctoral dissertation, University of York). https://etheses.whiterose.ac.uk/id/eprint/3980/

Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12-40. https://doi.org/10.1016/j.specom.2009.08.009

Kisler, T., Reichel, U., & Schiel, F. (2017). Multilingual processing of speech via web services. Computer Speech & Language, 45, 326–347. https://doi.org/10.1016/j.csl.2017.01.005

Lee, Y., Keating, P., & Kreiman, J. (2019). Acoustic voice variation within and between speakers. The Journal of the Acoustical Society of America, 146(3), 1568–1579. https://doi.org/10.1121/1.5125134

Lindh, J. (2017). Forensic comparison of voices, speech and speakers: Tools and methods in forensic phonetics. University of Gothenburg. Retrieved from https://gupea.ub.gu.se/handle/2077/52188

McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 51–56). https://pandas.pydata.org/

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://scikit-learn.org/stable/

Reetz, H., & Jongman, A. (2009). Phonetics: Transcription, production, acoustics, and perception (1st ed.). Wiley-Blackwell.

Rose, P. (2002). Forensic speaker identification. New York: Taylor & Francis.

Schindler, C., & Draxler, C. (2013). Using spectral moments as a speaker-specific feature in nasals and fricatives. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7849-7853). https://doi.org/10.21437/Interspeech.2013-639

Shadle, C. H. (1990). Articulatory-acoustic relationships in fricative consonants. In W. J. Hardcastle & A. Marchal (Eds.), Speech production and speech modelling (pp. 187–209). Springer. http://eprints.soton.ac.uk/id/eprint/250178

Smorenburg, L., & Heeren, W. (2020). The distribution of speaker information in Dutch fricatives /s/ and /x/ from telephone dialogues. The Journal of the Acoustical Society of America, 147(4), 2554-2567. https://doi.org/10.1121/10.0000674

Statista Research Department. (2021). Most popular global mobile messaging apps 2021. Retrieved October 16, 2023, from https://www.statista.com/statistics/258749/most-popular-global-mobile-messenger-apps/

Statista Research Department. (2023). Number of unique WhatsApp mobile users worldwide from January 2020 to June 2023. Retrieved October 16, 2023, from https://www.statista.com/statistics/1306022/whatsapp-global-unique-users/

Stuart-Smith, J. (2007). Empirical evidence for gendered speech production: /s/ in Glaswegian. In J. Cole & J. Hualde (Eds.), Change in phonology: Papers in laboratory phonology. Mouton de Gruyter. https://eprints.gla.ac.uk/8985/

Temko, A., & Nadeu, C. (2005). Classification of acoustic events using SVM-based clustering schemes. TALP Research Center, Universitat Politècnica de Catalunya. https://upcommons.upc.edu/bitstream/handle/2117/2065/classification.pdf?sequence=3 ¹

Ulrich, N., Pellegrino, F., & Allassonnière-Tang, M. (2023). Intra- and inter-speaker variation in eight Russian fricatives. The Journal of the Acoustical Society of America, 135(4), 2098-2109. https://doi.org/10.1121/10.0017827

پژوهشهای زبانی

شناسایی گویندگان در فضای مجازی: بررسی نقش آواهای سایشی بی‌واک در داده‌های صوتی واتساپ

مراجع

مراجع

دوره 16، شماره 1 - شماره پیاپی 30
شهریور 1404
صفحه 99-118

شناسایی گویندگان در فضای مجازی: بررسی نقش آواهای سایشی بی‌واک در داده‌های صوتی واتساپ

مراجع

مراجع

دوره 16، شماره 1 - شماره پیاپی 30شهریور 1404صفحه 99-118

دوره 16، شماره 1 - شماره پیاپی 30
شهریور 1404
صفحه 99-118