Document Type : Research Paper


1 Ph. D Candidate in Linguistics, Alzahra University, Tehran, Iran,

2 Associate Professor Linguistics, Alzahra University, Tehran, Iran


Using the auditory-acoustic approach, the present study examines the possibility of using durational acoustic parameters of speech rhythm for detecting the Persian speakers’ use of Azeri as a form of voice disguise. To do so, continuous speech of 5 speakers of Standard Persian and 5 speakers of Azeri, Tabrizi variety, were chosen for the acoustic and statistical analysis after completing the validity procedures. All Persian speakers were monolingual and all Azeri speakers were bilingual speakers of Azeri and Persian, who spoke Azeri as their mother tongue and Persian as their second language. Each Persian speaker was asked to narrate a lifetime experience once in Persian (Persian- Persian data), and once as an imitation of Azeri (Persian- Azeri data). Azeri speakers were also asked to narrate a lifetime experience once in Azeri (Azeri- Azeri data) and once in Persian (Azeri-Persian Data). Persian-Azeri data is the type that is considered as the disguised data in this survey. The recorded data were then annotated in five tiers: segment, CV-segment, CV-segment interval, CV-interval and syllable. In order to control the effect of any unwanted variable, one minute (±5 seconds SD) of each sound file was extracted for further acoustic and statistical analyses. A Praat script, DurationAnalyzer, was used to automatically calculate the acoustic correlates of durational parameters of speech rhythm. These parameters are: %V (the proportion which speech is vocalic), ΔC (ln) (standard deviation of the natural-log normalized duration of consonantal intervals), ΔV (ln) (standard deviation of the natural-log normalized duration of vocalic intervals), nPVI- V (rate-normalized averaged durational differences between consecutive vocalic intervals) and syllable rate.  Results revealed there was a significant difference between the proposed types of data and that%V and syllable rate best discriminated between them; however, none of the above-mentioned parameters were significantly different between Persian-Azeri and Azeri-Persian data.


Amino, K. and T. Arai. 2009. Speaker-Dependent Characteristics of the Nasals, Forensic Science International 185(1): 21–28. http: // www. sciencedirect. Com /science/article/pii/S0379073808004672.
Asadi, H., M. Nourbakhsh, F. Sasani, and V. Dellwo. 2018. Examining Long-Term Formant Frequency as a Forensic Cue for Speaker Identification: An Experiment on Persian. In M. Nourbakhsh, H. Asadi, and M. Asiaee (eds), Proceedings of the First International Conference on Laboratory Phonetics and Phonology (21-28),. Tehran: Neveesh Parsi Publications.
Asiaee, M., M. Nourbakhsh, and R. Skarnitzl. 2019. Can LTF Discriminate Bilingual Speakers? In Proceedings of the 28th Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA), Istanbul, 41–42.
Barry, W. J., B. Andreeva, M. Russo, S. Dimitrova and T. Kostadinova. 2003. Do Rhythm Measures Tell Us Anything about Language Type? In D. Recasens, M. J. Solé, and J. Romero (eds), Proceedings of the 15th International Congress of Phonetic Sciences (15th ICPhS) (2693-96), Barcelona.
Boersma, P. and Weenink, D. 2020 Praat: doing phonetics by computer. http://www., Accessed 02 March 2020.
Dauer, R. M. 1983. Stress-Timing and Syllable-Timing Reanalyzed, Journal of Phonetics 11(1): 51–62. (19) 30776-4.
De Jong, N. H., R. Groenhout, R. Schoonen, and J. H. Hulstijn. 2015. Second Language Fluency: Speaking Style or Proficiency? Correcting Measures of Second Language Fluency for First Language Behavior, Applied Psycholinguistics 36(2): 223–43.
Dellwo, V., S. Ramyead and J. Dankovicova. 2009. The influence of voice disguise on temporal characteristics of speech. Abstract presented at the IAFPA conference, Cambridge: UK
Dellwo, V., A. Leemann, and M. J. Kolly. 2015. Rhythmic Variability between Speakers: Articulatory, Prosodic, and Linguistic Factors, The Journal of the Acoustical Society of America 137(3): 1513–28.
Dellwo, V. 2010. Influences of Speech Rate on the Acoustic Correlates of Speech Rhythm: An Experimental Phonetic Study Based on Acoustic and Perceptual Evidence. Doctoral Dissertation, University of Bonn (Rheinische Friedrich-Wilhelms-Universität Bonn).
Eriksson, A. 2010. The Disguised Voice: Imitating Accents and Speech Styles and Impersonating Individuals, Language and identities (January 2010): 86–96.
Gold, E., P. French, and P. Harrison. 2013. Examining Long-Term Formant Distributions as a Discriminant in Forensic Speaker Comparisons under a Likelihood Ratio Framework, Proceedings of Meetings on Acoustics 19(May).
Grabe, E., and E. L. Low. 2002. Durational Variability in Speech and the Rhythm Class Hypothesis. In C Gussenhoven and A Warner (eds), Laboratory Phonology 7 (515–46), Berlin: Mouton de Gruyter.
Hollien, H. 2002. Forensic Voice Identification. San Diego: Academic Press.
IBM Corp. 2012. IBM SPSS Statistics for Windows (version 22.0). Armonk, NY: International Business Machines Corporation.
Jessen, M. 2008. Forensic Phonetics, Language and linguistics compass, 2(4): 671–711.
Kinoshita, Y. 2005. Does Lindley’s LR Estimation Formula Work for Speech Data? Investigation Using Long-Term F0, Forensic Linguistics, 12(2): 235–54.
Künzel, H. J. 2000. Effects of Voice Disguise on Speaking Fundamental Frequency, Forensic Linguistics, 7 (2): 149–179. https :// www2. 54249140687 & partnerID = 40&md5 = 91a9ecd533c278f5e6fc8f1d80299550.
Labov, W. 2006. The Social Stratification of English in New York City, 2nd ed, Cambridge: Cambridge University Press.
Lazard, G. (1992) Grammar of Contemporary Persian. Costa Mesa, CA: Mazda Publishers.
Leemann, A. and M. J. Kolly. 2015. Speaker-Invariant Suprasegmental Temporal Features in Normal and Disguised Speech. Speech Communication,75:97–122. https:// www2.scopus.Com/inward/ record. Uri ?eid =2-s2.0-84946575988&doi=10.1016%2Fj.specom. 2015. 10.002&partnerID=40&md5=dfbba8701a51254a6dc359b266ce994e.
Low, E. L., E. Grabe, and F. Nolan. 2000. Quantitative Characterizations of Speech Rhythm: Syllable-Timing in Singapore English, Language and Speech,43(4): 377–401.
James, A. 1940, Speech Signals in Telephony, London: Pitman
Mairano, P., and A. Romano. 2011. Rhythm Metrics for 21 Languages, ICPhS XVII (August): 1318–21.
Masthoff, H. 1996. A Report on a Voice Disguise Experiment, Forensic, Linguistics 3(1): 160–67. https: // journals. equinoxpub. com/index. Php /IJSLL/article/view/17245.
Meyerhoff, M. 2011. Introducing Sociolinguistics, New York, NY: Routledge.
Ghaffarvand Mokari, P. and S. Werner. 2017. Illustrations of the IPA: Azerbaijani, Journal of the International Phonetic Association, 47(2): 207–12.
Nespor, M. (1990). On the rhythm parameter in phonology. In I. Roca (eds.), the Logical Problem of Language Acquisition (157-175), Foris. Dordrecht.
Nolan, F., and C. Grigoras. 2005. A Case for Formant Analysis in Forensic Speaker Identification, International Journal of Speech, Language and the Law, 12(2): 143–73.
Nolan, F. 1983. The Phonetic Bases of Speaker Recognition, New York: Cambridge University Press.
Perrot, P. and Chollet, G. 2012. Helping the Forensic Research Institute of the French Gendarmerie to Identify a Suspect in the Presence of Voice Disguise or Voice Forgery, In A. Neustein and H. A. Patil (eds.), Forensic Speaker Recognition: law enforcement and counterterrorism (pp.469-503). New York: Springer. https: // www2. scopus. Com / inward/record.uri?eid=2-s2.0-84943230355 & doi= 10.1007% 2F9781461402633_16&partnerID=40&md5=026de43ee05d64da537d50753b60e535.
Perrot, P., G. Aversano, and G. Chollet. 2007. Voice Disguise and Automatic Detection: Review and Perspectives. In Y. Stylianou, M. Faundez-Zanuy and A. Esposito (eds), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4391 LNCS(101–17),
Rahandaz,S., M. Asiaee & Sh. Naghshbandi. 2015. Consonant Cluster and Syllable Structure in Azerbaijani.” In M. Dabirmoghaddam (ed.), 9th Iranian Conference on Linguistics (553-564),Tehran: Allameh Tabatabai’ University [In Persian].
Ramus, F., M. Nespor, and J. Mehler. 1999. Correlates of Linguistic Rhythm in the Speech Signal, Cognition, 75(1): 265–92.
Rathcke, T. V. and R. H. Smith. 2015. Speech Timing and Linguistic Rhythm: On the Acoustic Bases of Rhythm Typologies, The Journal of the Acoustical Society of America,137(5): 2834–45.
Rodman, R. D. 1998. Speaker Recognition of Disguised Voices: A Program for Research. In A. Demirekler, M. Saranli, A. Altincay and H. Paoloni (eds.), Proceedings of the Consortium on Speech Technology Conference on Speaker Recognition by Man and Machine: Birections for Forensic Applications (9–22). http: // citeseerx. ist. psu. edu/ viewdoc / summary?doi=
Sadeghi, V. (2015). A phonetic study of vowel reduction in Persian, Language RelatedResearch, 30: 165–187. [In Persian]
Saks, M. J. and J. J. Koehler. 2005. The Coming Paradigm Shift in Forensic Identification Science, Science 309(5736): 892–95.
Shuy, R. W. 1995. Dialect as Evidence in Law Cases, Journal of English Linguistics, 23(1/2): 195–208.
Skarnitzl, R., and J. Vaňková. 2017. Fundamental Frequency Statistics for Male Speakers of Common Czech, Acta Universitatis Carolinae – Philologica 3, Phonetica Pragensia XIV: 7–17.
Taghva, N. and V. Abolhasani Zadeh. 2016. Comparison of English Language Rhythm and Kalhori Kurdish Language Rhythm, Advances in Language and Literary Studies, 7(2): 226–30.
White, L. and S. L. Mattys. 2007. Calibrating Rhythm: First Language and Second Language Studies, Journal of Phonetics, 35(4): 501–22.
Windfuhr, G. L. 1979. Persian Grammar: History and State of its Study, New York: Mouton de Gruyter.
Wolf, Jared J. 1972. Efficient Acoustic Parameters for Speaker Recognition, The Journal of the Acoustical Society of America 51(6B): 2044–56.