Journal of Prosthetic Dentistry, cilt.134, sa.4, 2025 (SCI-Expanded, Scopus)
Statement of problem: Although artificial intelligence (AI) chatbots have been increasingly used to obtain information about smile design, the accuracy, reliability, and readability of such information for laypersons remain unclear. Purpose: The purpose of this study was to assess the accuracy, reliability, quality, and readability of responses about digital smile design provided by 4 artificial intelligence models: ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot. Material and methods: The most frequently searched questions regarding smile design were identified via Google search and presented to each AI model. Responses were independently evaluated using a 5-point Likert scale for accuracy, the modified DISCERN scale for reliability, the General Quality Scale (GQS) for quality, and the Flesch Reading Score (FRES) for readability. Normality was assessed by the Kolmogorov-Smirnov test, and group differences by the Kruskal-Wallis test with the Dunn post hoc analysis; statistical significance was set at α=.05. Results: ChatGPT-4 achieved the highest median accuracy score 5 (4−5), with significant differences among models (P<.05). Copilot demonstrated the highest reliability and quality scores (P<.05), while ChatGPT-3.5 responses were the most readable (P<.05); however, all models produced output classified as difficult to read. Only Copilot and Gemini included source citations in their responses. Conclusions: AI chatbots generally provided accurate and moderately reliable information about smile design, but limited readability and insufficient referencing restrict their value as patient education tools. Enhancements in transparency, scientific clarity, and source citation are needed to improve the clinical utility of chatbot systems. These findings are limited to the evaluated models and topic area, and further research is warranted for broader validation.