بررسی تطابق ترجمه‌های چت‌جی‌پی‌تی، دیپ‌سیک، گوگل، مایکروسافت و ریورسو با معیارهای دستور خط مصوّب فارسی و اصول نگارش نویسه‌های سجاوندی

مدادیان, غلامرضا

doi:10.22059/jolr.2025.400687.666934

نوع مقاله : مقاله پژوهشی

نویسنده

غلامرضا مدادیان

استادیار مطالعات ترجمه، گروه زبان‌ انگلیسی، دانشگاه حضرت معصومه (س)، قم، ایران

https://doi.org/10.22059/jolr.2025.400687.666934

چکیده

از‌آنجایی‌که تطابق ترجمه‌ها‌ با اصول و ویژگی‌های خطی زبان مقصد یکی از ابعاد مهم کیفیت آنهاست، در این مطالعه تبعیّت ترجمه‌های انگلیسی-به-فارسی پنج سامانة برخط (در ژانر متون مطبوعاتی) با دستورالعمل‌های دستور خط مصوب فرهنگستان زبان و ادب فارسی و اصول نگارش نویسه‌های سجاوندی بررسی شد. برای این منظور چندین متن برگرفته از متون انگلیسی منتشره در روزنامه‌های انگلیسی‌زبان جمع‌آوری شد و توسط پنج سامانه برخط (چت‌جی‌پی‌تی، دیپ‌سیک، گوگل ترنسلیت، مایکروسافت ترنسلیتر و ریورسو) به فارسی ترجمه شد. با بررسی دقیق ترجمه‌ها به‌صورت کلمه به کلمه بر مبنای دستورالعمل‌های دستور خط مصّوب، مجموعاً 16 نوع خطای رسم‌الخطی و سجاوندی در ترجمه‌ها شناسایی گردید، که به سه دستة عمده تقسیم گردیدند: خطاهای (نیم)فاصله‌گذاری بین اجزاء کلمات و ترکیبات، خطاهای نگارش نویسه‌های ثانوی و خطاهای فاصله‌گذاری نویسه‌های سجاوندی. طبق نتایج، گوگل، مایکروسافت و ریورسو تمامی این خطاها را تولید کرده بودند. دیپ‌سیک و جت‌جی‌پی‌تی نیز به‌ترتیب 4 و 12 نوع از این خطاها را در برونداد خود داشتند. به‌علاوه، نتایج نشان داد که عملکرد هیچ‌یک از سامانه‌ها کاملاً یک‌دست نیست به‌طوری‌که گاهی یک سامانة واحد یک ترکیب یا نویسه واحد را صحیح نگارش کرده بود و گاهی ناصحیح. همچنین، مشخص شد که خطاها مشابهت بسیار بالایی با خطاهای رسم‌الخطی موجود در متون انسانی دارند. علت این امر آن است که متونی که برای آموزش سامانه‌های ترجمه استفاده می‌شود عموماً متون تک‌زبانه یا دوزبانة منبع‌بازی هستند که توسط نویسندگان و مترجمان در بسترها و بافت‌های مختلف تولید شده‌اند. با ‌توجه ‌به استفادة ‌روزافزون از سامانه‌های ترجمه، این خطاها می‌توانند به‌مرور به رسم‌الخط احاد جامعة فارسی‌زبان نفوذ کنند و تلاش‌های معیارسازی خط توسط فرهنگستان زبان را تحت تأثیر قرار دهند. یک راه‌حل رفع این مشکل آن است که سامانه‌های مجزا یا تجمیع‌شده‌ای توسط متولیان این سامانه‌‌ها (یا توسعه‌دهندگان مستقل) ایجاد شود تا قبل‌از تحویل ترجمه به کاربر نهایی آن را از منظر سازگاری با دستور خط مصوب و اصول نگارش نویسه‌های سجاوندی بررسی و پس‌ویرایش نماید.

کلیدواژه‌ها

موضوعات

برای مشاهده مقالات مرتبط با موضوع، روی نام موضوع کلیک کنید.

عنوان مقاله [English]

Investigation of the Correspondence between the Translations Produced by Chat GPT, Deep Seek, Google Translate, Microsoft Translator and Reverso, and the Approved Orthography of the Academy of Persian Language and Literature

نویسنده [English]

Gholamreza Medadian

Assistant Professor of Translation Studies, English Language Department, Hazrat-e Masoumeh University, Qom, Iran

چکیده [English]

Given that adherence to the orthographic principles and conventions of the target language constitutes a significant dimension of translation quality, this study investigated the compliance of English-to-Persian journalistic translations generated by five online machine translation (MT) systems with guidelines of the approved orthography of the Academy of Persian Language and Literature (APLL) and standard usage rules for punctuation marks. To this end, several texts were extracted from English newspapers and were translated into Persian by five systems (ChatGPT, DeepSeek, Google Translate, Microsoft Translator, and Reverso). Through meticulous, word-by-word reading and analysis of the translations in the light of the guidelines of the Approved orthography and usage rules for punctuation marks, a total of 16 distinct categories of orthographic and punctuation errors were identified in the machine-generated translations. These were later classified into three main types: (1) errors in the use of non-breaking or half-space (Nim-Faslé) within word elements and elements of compound words, (2) errors in the orthography of secondary Persian characters, and (3) errors in the correct spacing of punctuation marks. Google, Microsoft, and Reverso produced instances of all error types. DeepSeek and ChatGPT exhibited 4 and 12 types of these errors in their outputs, respectively. Furthermore, the results indicated that none of the systems demonstrated complete consistency in performance so that a given compound or character was at times rendered correctly and other times incorrectly. It was, also, discerned that the identified errors bear a significant resemblance to orthographic errors prevalent in human-generated texts. This phenomenon is attributable to the predominantly open-source monolingual and bilingual corpora used for training these online systems, which comprise texts authored and translated by diverse individuals across varied contexts and registers. Considering the increasing reliance on online MT systems, such orthographic and punctuation errors possess the potential to gradually permeate into the texts generated by Persian-speaking communities, potentially undermining the APLL's ongoing orthographic standardization efforts. One proposed solution for this problem entails the development, either by the MT providers themselves or independent developers, of dedicated or integrated auxiliary systems designed to scrutinize and post-edit translations prior to final user delivery, specifically evaluating their conformity with the approved orthographic guidelines and standard punctuation conventions

کلیدواژه‌ها [English]

orthography errors
punctuation errors
approved orthography
ChatGPT
Google Translate
Microsoft Translator
Reverso

مراجع

احمدی‌نسب، ف.، کاظمی‌فرد، ا. و عظیمی‌فرد، ف. (1401). ارزیابی و رتبه‌بندی میزان التزام شبکه های رسانة ملی به دستورخط فارسی مصّوب فرهنگستان زبان و ادب فارسی با کاربست رهیافتی نوین از نظریة تصمیم‌گیری‌های چندمعیاره. پردازش و مدیریت اطلاعات، 37(4)، 1127-1152. https://doi.org/10.35050/JIPM010.2022.005

احمدی‌نسب، فاطمه (1394). لزومِ کاربرد دستور خط مصوّبِ فرهنگستان زبان و ادب فارسی (در نشریاتِ علمیِ فارسی، جهت ترویج خط و زبان فارسی به‌عنوان زبان علم). مجموعۀ مقاله‌های دهمین همایش بین‌المللی ترویج زبان و ادب فارسی. دانشگاه محقق اردبیلی.

https://www.sid.ir/paper/843605/fa

اسداللهی، خ. و آذرنیوار، ل. (1400). آسیب ‏شناسی نگارشی و ویرایشی قانون مدنی با تکیه بر دستور زبان فارسی. پژوهش‌‌های دستوری و بلاغی، 11(20)، 287-315.

doi: 10.22091/jls.2022.8066.1386

آخشیک، س. (1395). خط و خطا: بازتاب دشواری‌های نگارش کلمه در بازیابی اطلاعات بانک نشریات کشور (مگ ایران). اولین کنفرانس بین المللی بازیابی تعاملی اطلاعات.

https://civilica.com/doc/572879

آخشیک، س. و فتاحی، ر. (1391). تحلیل چالش‌های پیوسته‌نویسی و جدانویسی واژگان فارسی در ذخیره و بازیابی اطلاعات در پایگاه‌های اطلاعاتی. کتابداری و اطلاع‌رسانی، 15(3)، 9-30.

https://lis.aqr-libjournal.ir/article_42907.html

بساک، ح.، سعادت‌زاده، م. و بساک، ح. (1392). نقد و بررسی نگارشی، ویرایشی و دستوری آرای دادگاههای عمومی جزایی مشهد و شناسایی ضعف‌ها و عوامل اثر‌گذار بر آنها. رویة قضایی، (4-5)، 11-36

ذوالفقاری، ح. (1386). آسیب‌شناسی زبان مطبوعات، فصلنامه علمی رسانه، 18(4). 9-42.

https://dor.isc.ac/dor/20.1001.1.10227180.1386.18.4.1.0

رنجبر، ا.، عباس پور، ج.، ستوده, ه و مولودی، ا. س. (1398). بررسی میزان انطباق رفتار نگارشی نگارندگان و کاربران پایگاه‌های اطلاعات علمی فارسی با دستورالعمل‌های مصّوب فرهنگستان زبان و ادب فارسی در ارتباط با پیوسته‌نویسی، نزدیک‌نویسی و جدانویسی کلمات. کتابداری و اطلاع‌رسانی، 22(3)، 164-187.

doi: 10.30481/lis.2019.53904

ستوده، ه. و هنرجویان، ز. (1391). مروری بر دشواری‌های زبان فارسی در محیط دیجیتال و تأثیرات آن بر اثربخشی پردازش خودکار متن و بازیابی اطلاعات. کتابداری و اطلاع‌رسانی، 15(4)، 59-92.

https://lis.aqr-libjournal.ir/article_42651.html

سمایی، ف. (1388). بررسی مسائل زبان‌شناختی زیرنویس‌های شبکة خبر. تهران: مرکز تحقیقات صدا وسیما.

شیعه علی، ف. (1392). نقد و بررسی آیین نگارشی آرای دادگا‌هها‌ی عمومی وحقوقی مشهد. پایان نامه کارشناسی ارشد رشته زبان و ادبیات فارسی. دانشگاه پیام نور واحد مشهد.

صادقی، ع. ا. و زندی‌مقدم، ز. (1394). فرهنگ املایی خط فارسی براساس دستور خط فارسی. نشر آثار: تهران.

فرهنگستان زبان و ادب فارسی (1402). دستور ‌خط فارسی. نشر آثار: تهران

کشاورز، پ. و ستاری، ع. (1390). کاربرد خط فارسی در تلویزیون. تهران: مرکز تحقیقات صدا و سیما.

کلاهدوزان، ا.، معینی، م.، پاپی، ا.، عسگری، غ. و ذولفقاری، ب. (1383). بررسی توزیع فراوانی عدم رعایت اصول و قواعد نگارش فارسی در پایان‌نامه‌های کارشناسی ارشد ناپیوسته و دکترای دانشکده‌های پزشکی و داروسازی در سال 1378-79. مدیریت اطلاعات سلامت، 1(2)، 50-56.

https://him.mui.ac.ir/article_10859.html

مدادیان، غ. (1403). بررسی خطاهای رسم‌الخطی و نشانه‌گذاری سجاوندی در ترجمه‌های علمی ترجمه‌آموزان مقطع کارشناسی برپایة دستور خط مصوّب فرهنگستان زبان و ادب فارسی. پژوهش‌نامه آموزش زبان فارسی به غیر‌فارسی‌‌زبانان، 13(1)، 161-204.

https://doi.org/10.30479/jtpsol.2024.20476.1669

مدرس خیابانی، ش. (1397). آسیب‌شناسی نگارشی زیرنویس‌ها در شبکه‌های تلویزیونی خبر و آ‌ی‌فیلم: پژوهشی پیکر‌ه‌بنیاد. رسانه‌های دیداری و شنیداری، 12(27)، 31-60.

https://dorl.net/dor/20.1001.1.26454696.1397.12.27.3.4

هاشمی، س. ح. (1390). بررسی کتاب‌های درسی سال تحصیلی 1387-88 دوره ابتدایی از نظر میزان هماهنگی در جدایی یا اتصال پایه واژه‌های فعلی و غیرفعلی در کلمات مرکب. پژوهش زبان و ادبیات فارسی، 9(3/21)، 1-10. https://www.sid.ir/paper/56353/fa

References

Academy of Persian Language and Literature (2023). Persian Orthography. Nashr-e Asar: Tehran. [in Persian]

Ahmadinasab, F., Kazemifard, A., & Azimifard, F. (2022). Evaluation and Ranking of the National Media Networks' Adherence to the Persian Orthography Approved by the Academy of Persian Language and Literature Using a Novel Approach of Multi-Criteria Decision-Making Theory. Information Processing and Management, 37(4), 1127-1152. [in Persian] https://doi.org/10.35050/JIPM010.2022.005

Ahmadinasab, Fatemeh (2015). The Necessity of Applying the Orthography Approved by the Academy of Persian Language and Literature (in Persian Scientific Journals, to Promote Persian Script and Language as the Language of Science). Collection of Articles from the 10th International Conference on the Promotion of Persian Language and Literature. University of Mohaghegh Ardabili. [in Persian] https://www.sid.ir/paper/843605/fa

Akhshik, S. (2016). Script and Error: The Reflection of Difficulties in Word Writing on Information Retrieval in the Country's Magazines Database (Mag Iran). The First International Conference on Interactive Information Retrieval. [in Persian] https://civilica.com/doc/572879

Akhshik, S., & Fattahi, R. (2012). Analyzing the Challenges of Conjoined and Separate Writing of Persian Words in Information Storage and Retrieval in Databases. Library and Information Science, 15(3), 9-30. [in Persian] https://lis.aqr-libjournal.ir/article_42907.html

Asadollahi, Kh., & Azarniyavar, L. (2021). Pathological Study of the Writing and Editing of the Civil Code Based on Persian Grammar.Grammatical and Rhetorical Research, 11(20), 287-315. [in Persian] doi: 10.22091/jls.2022.8066.1386

Bassak, H., Sa'adatzadeh, M., & Bassak, H. (2013). A Critical Study of the Writing, Editing, and Grammatical Aspects of Verdicts from Public Criminal Courts of Mashhad and Identification of Their Weaknesses and Influential Factors. Judicial Procedure, (4-5), 11-36. [in Persian]

Burchardt, A., Macketanz, V., Dehdari, J., Heigold, P., & van den Heuvel, H. (2022). A taxonomy of terminological errors in machine translation. In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation (pp. 47–56). European Association for Machine Translation.
https://aclanthology.org/2022.eamt-1.6

Cintas, J. D., & Remael, A. (2020). Subtitling: Concepts and practices. Routledge, London and New York.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. 10.30479/jtpsol.2024.20476.1669

Fedus, W., Zoph, B., & Shazeer, N. (2021). Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Journal of Machine Learning Research, 22(120), 1–39.

Guzmán, F., Chen, X., & Orăsan, C. (2019). A multifaceted comparison of translation paradigms and their effects on punctuation. Machine Translation, 33(3), 205–230.https://doi.org/10.1007/s10590-019-09232-x

Hammo, B. H. (2009). Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents. Information Retrieval 12(3): 300-323. http://link.springer.com/article/10.1007/s10791-008-9081-9.

Hashemi, S. H. (2011). An Investigation of Textbooks for the 2008-09 Academic Year in Elementary Education Regarding the Level of Consistency in the Separation or Connection of Verbal and Non-Verbal Stems in Compound Words. Research in Persian Language and Literature, 9(3/21), 1-10. [in Persian] https://www.sid.ir/paper/56353/fa

Keshavarz, P., & Sattari, A. (2011). The Use of Persian Script on Television. Tehran: Radio and Television Research Center. [in Persian]

Kolahdoozan, A., Moeini, M., Papi, A., Asgari, Gh., & Zolfaghari, B. (2004). Investigating the Frequency Distribution of Non-compliance with Persian Writing Principles and Rules in Master's and PhD Theses of Medical and Pharmacy Schools in the Academic Year 1999-2000. Health Information Management, 1(2), 50-56. [in Persian] https://him.mui.ac.ir/article_10859.html

Lazarinis, F. (2007). At the sharp END evaluating the searching capabilities of commerce websites in a non-English language A Greek case study. Online Information Review, 31(6): 881-891.

http://www.emeraldinsight.com/journals.htm?articleid=1640585.

Lazarinis. (2008). Improving concept-based web image retrieval by mixing semantically similar Greek queries. Program: electronic library and information systems, 42(1), 56-67.

http://www.emeraldinsight.com/journals.htm?articleid=1674242.

Lewandowski, D. (2008). Problems with the use of Web search engines to find results in foreign languages. Online Information Review 32(4): 668-672. http://www.emeraldinsight.com/journals.htm?articleid=1747662.

Madadian, Gh. (2024). A Study of Orthographic and Punctuation Errors in the Translations of Undergraduate Translation Students Based on the Orthography Approved by the Academy of Persian Language and Literature. Journal of Teaching Persian to Speakers of Other Languages, 13(1), 161-204. [in Persian] https://doi.org/10.30479/jtpsol.2024.20476.1669

Microsoft Research. (2023). Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization. Microsoft Research Blog

Modarres Khiabani, Sh. (2018). A Pathological Study of Subtitles on News and iFilm TV Channels: A Corpus-Based Research. Audiovisual Media, 12(27), 31-60. [in Persian] https://dorl.net/dor/20.1001.1.26454696.1397.12.27.3.4

Monz, C. & De Rijke, M. (2002). Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian. Evaluation of Cross-Language Information Retrieval Systems: Second Workshop of the Cross Language Evaluation Forum, CLEF 2001, Darmstadt, Germany.

Moukdad, H. (2005). Lost in cyberspace: How Do Search Engines Handle Arabic Queries? The international information & library review, 37(4): 237-394. https://journals.library.ualberta.ca/ojs.cais-acsi.ca/index.php/cais-asci/article/view/334/282

Popović, M. (2018). Error classification and analysis for machine translation quality assessment. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation (pp. 195–204). European Association for Machine Translation.
https://aclanthology.org/W18-1920/

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.

Ranjbar, A., Abbas Pour, J., Sotoudeh, H., & Mauludi, A. S. (2019). Investigating the Level of Compliance of the Writing Behavior of Authors and Users of Persian Scientific Information Databases with the Guidelines Approved by the Academy of Persian Language and Literature Regarding Conjoined, Hyphenated, and Separate Writing of Words. Library and Information Science, 22(3), 164-187. [in Persian] doi: 10.30481/lis.2019.53904

Sadeghi, A. A., & Zandimoghadam, Z. (2015). Persian Orthographic Dictionary Based on Persian Orthography. Nashr-e Asar: Tehran. [in Persian]

Samaie, F. (2009). An Investigation of the Linguistic Issues in the Subtitles of the News Network. Tehran: Radio and Television Research Center. [in Persian]

ShiaAli, F. (2013). A Critical Study of the Writing Style of Verdicts from Public and Civil Courts of Mashhad (Master's thesis in Persian Language and Literature). Payame Noor University, Mashhad Branch. [in Persian]

Sotoudeh, H., & Hanarjuyan, Z. (2012). A Review of the Difficulties of the Persian Language in the Digital Environment and Its Impact on the Effectiveness of Automatic Text Processing and Information Retrieval. Library and Information Science, 15(4), 59-92. [in Persian] https://lis.aqr-libjournal.ir/article_42651.html

Toth, E. (2006). Exploring the Capabilities of English and Hungarian Search Engine for Various Queries. Libri, 56, 38-47. https://www.degruyter.com/document/doi/10.1515/LIBR.2006.38/html

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998-6008.

Zhang, J., & Suyu, L. (2007). Multiple language supports in search engines. Online Information Review, 31(4): 516-532. http://www.emeraldinsight.com/journals.htm?articleid=1621798.

Zolfaghari, H. (2007). Pathology of the Language of the Press. Scientific Quarterly of Media, 18(4), 9-42. [in Persian] https://dor.isc.ac/dor/20.1001.1.10227180.1386.18.4.1.0.

پژوهشهای زبانی

بررسی تطابق ترجمه‌های چت‌جی‌پی‌تی، دیپ‌سیک، گوگل، مایکروسافت و ریورسو با معیارهای دستور خط مصوّب فارسی و اصول نگارش نویسه‌های سجاوندی

مراجع

مراجع

دوره 16، شماره 1 - شماره پیاپی 30
شهریور 1404
صفحه 193-228

بررسی تطابق ترجمه‌های چت‌جی‌پی‌تی، دیپ‌سیک، گوگل، مایکروسافت و ریورسو با معیارهای دستور خط مصوّب فارسی و اصول نگارش نویسه‌های سجاوندی

مراجع

مراجع

دوره 16، شماره 1 - شماره پیاپی 30شهریور 1404صفحه 193-228

دوره 16، شماره 1 - شماره پیاپی 30
شهریور 1404
صفحه 193-228