Ranking of Persian Speech Phonemes from the Point of View of Efficiency in Speaker Recognition

Document Type : Research Paper


Associate Professor of Research Center of Intelligent Signal Processing


In this paper, the efficiency of Persian speech phonemes from the point of view of efficiency in speaker recognition has been studied, and then with due attention to efficiencies, the ranking of phonemes has been done. For estimating the efficiencies of phonemes, we have introduced one criterion that has been defined in the form of phonemes “Inter speaker distance” to “Intra speaker distance” ratio, referred to as “Speaker Affectability Ratio: SAR”. The necessary experiments and computations have been done for all Persian speech phonemes (with the exception of /À/) using the Persian speech database “Farsdat” and then on the basis of the results of these experiments and computations, the ranking of single phonemes and phoneme groups has been done. The results have shown that in the phoneme groups ranking, vowels and semi-vowels are first, nasals, fricatives and liquids are second and, obstructions and plosives are third from the point of view of efficiency in speaker recognition. Likewise, the ranking of single phonemes shows that the phoneme /∂/ is first and the phoneme /t/ is last from the point of view of efficiency in speaker recognition. The results of this research in line with research on other spoken languages such as English, Germanic and Dutch show high agreement for phoneme groups ranking but noticeable differences in details of rankings are also noted.


بی‌جن‌خان، محمود و سید صالحی، سیدعلی (1376 الف). واج به‌عنوان یک عنصر زبانی، شناختی و پردازشی، اولین مجموعه مقالات پژوهشکده پردازش هوشمند علائم 1-6.
بیجن‌خان، محمود و غفوریان، محمدعلی (1376ب). آموزش و بازشناسی خودکار طبقات واجی در گفتار پیوسته فارسی با استفاده از منطق فارسی، اولین مجموعه مقالات پژوهشکده پردازش هوشمند علائم، 7-12.
بیجن‌خان، محمود و سیدصالحی، سیدعلی (1376ج). بررسی واج‌گونه‌های زبان فارسی و استخراج فرکانس سازه‌ها، گزارش پژوهشی، مرکز تحقیقات پردازش هوشمند علائم.
ثمره، یدالله (1368). آواشناسی زبان فارسی، مرکز نشر دانشگاهی، چاپ دوم.
سید صالحی، سیدعلی و همکاران (1376). بازشناخت مستقل از گوینده واج‌های گفتار پیوسته فارسی با استفاده از ویژگی‌های تولیدی، اولین مجموعه مقالات پژوهشکده پردازش هوشمند علائم، 13-18.
شیخ‌زادگان،جواد (1374 الف). بررسی درجۀ اهمیت واج‌های زبان فارسی گفتاری از نقطه نظر بازشناسی گوینده، مجموعه مقالات دهمین کنفرانس بین‌المللی مهندسی برق ایران، 180-187.
شیخ‌زادگان،جواد (1374ب). تعیین هویت گوینده بصورت مستقل از متن، رساله دکتری، دانشگاه تربیت مدرس، 27-35.
مدرسی قوامی،گلناز (1392). آواشناسی: بررسی علمی گفتار، انتشارات سمت، چاپ دوم.
مشکوه‌الدینی،مهدی (1388). ساخت آوایی زبان، انتشارات دانشگاه فردوسی مشهد، چاپ ششم.
ABE,  M. & Sagayam, S. 1990. Statistical Study on voice Individual Conversion Across Different Languages,  ICSLP.
Atal, B.S. 1972. Automatic speaker recognition based on pitch contours, Acoust, Soc, Amer, 52:1972-1687.
Atal, B.S. 1974. Effectiveness of linear predication characteristics of the speech wave for Automatic speaker Identification and verification, JASA, 55, 6: 1304- 1312.
Bijankhan, M. Sheikhzadegan, J. Roohani, M.R. Samareh, Y. Lucas, K.. & Tebyani, M. 1994.  FARSDAT – The speech Database of Farsi spoken Language, Proceeding SST – 94, vol. 11, Des-.
Doddington, G.R. 1970. A computer Method of speaker verification, Ph.D. dissertation, department of Electrical Engineering, University of Wisconsin Madison.
Eatok, J.P. & Mason, J.S.D. 1992. Phoneme performance in speaker Recognition, ICSLP.
Furui, S. 1986.  Research on individuality features in speech waves and automatic speaker recognition techiques, Speech communication, 5, 2: 183- 197.
Goldstein, U.G. 1976. Speaker identification feature based on formant tracks,JASA, vol. 59, no. 1: 176-182, January.
Heuvel, H.V.D. & Rietveld, T. 1992. Speaker Related Variability in cepstral Representation of Dutch Speech Segments, ICSLP.
Li, K.P. & Hughes, G.W. 1974. Talker Differences as they Appear in correlation Matrices of continuous speech spectra, JASA, vol.55, No. 4:  833- 837.
Li, K.P. & Wrench, Jr.E.H. 1983. An Approach To Text- Independent Speaker Recognition with short ulterances, proc. IEEE, Int. Conf. Acoust. Speech signal processing, Boston, MA, 1209: 555-558.
Lin, C.S. etal. 1990. Study of line spectrum pair frequencies for speaker Recognition, proc. ICASSP 90, vol.1: 277- 280.
Lummis, R.C. 1975. speaker verification by computer using speech Intensity for Temporal Registration, IEEE Trans. Audio Electroacoust vol.63, pp. 561- 580.
Markel, J.D. etal. 1977. Long Term Feature Averaying for speaker Recognition, IEEE Trans. ASSP, vol. PSSP- 25, No. 4: 330- 337.
Mastui, t. & Furui, S. 1992. Speaker Recognition Using Cancatenated phoneme Models, ICSLP.
Matsui, T. & Furui, S. 1990. Text Independent speaker Recognition using Vocal Tract and pitch Information, proc. ICSLP 90, vol. 1: 137- 140.
Nolan, F. 1983. The phonetic basis of speaker recognition,  Cambrige University press.
Paliwal, K.K.. 1988. A study of line spectrum pair frequencies for speech Recognition,  proc. ICASSP 88, vol. 1:  485- 488.
Paul, J. & Rabinowit, A. 1979. Development of analytical methods for a semi- automatic speaker Identification system, Automatic speech and Speaker Recognition, IEEE Press:  390.
Pruzcmsky, S. & Mathews, M.V. 1964. Talker Recognition Based on Analysis of variance, JASA, vol. 36, No. 11:  2041- 2047.
Rose, R.C. & Reynalds, D.A. 1990. Text – indepent speaker Identification using Automatic Acoustic segmentation, ICASSP.
Rose, R.C. & Reynolds, D.A. 1990. Text – Independent speaker Identification using Automatic Acoustic segmentation, proc. ICASSP 90, 551.
Sambur, M.R. 1976. Speaker Recognition  using orthogonal linear predication, IEEE Trances. ASSP, vol. ASSP 24, No. 4:  283- 289.
Sambur, M.R. 1972. Selection of acoustic feature for speaker identification", IEEE Trans. ASSP – 23.
Schwortz, R. etal. 1982. The Application of Probability Density Estimation to Text – Independent speaker Identification, proc. ICASSP 82, vol. 2: 1649- 1652.
Shridhar, M. etal. 1981.  Text- Independent speaker Recognition using orthogonal linear prediction, ICASSP – 81:  197- 204.
SU, I.S. & etel. 1974. Identification of speaker by use of nasal coariculation JASA, vol. 56, no. 6:  1876- 1882, December.
Tou, J.T. & Gonzalez, R.C. 1974. : Pattern Recognition Principles, Addison Wesley Pulishing Company.
Wolf,  J.J. 1972. Efficient acoustic parameters for spesker recognition, JASA, vol. 51, no, 6, pp. 2044-2056, June.
Yegnanarayana, B. etal. 1994. A speaker verification system using prosodic feature, ICSLP 94, vol. 4, pp. 1867-1870.
Volume 7, Issue 1
September 2016
Pages 77-96
  • Receive Date: 28 February 2016
  • Revise Date: 20 April 2016
  • Accept Date: 20 June 2016
  • First Publish Date: 20 June 2016