A Machine Learning Framework for Discriminating between ChatGPT and Web Search Results

  • Md. Sadiq Iqbal Iqbal Department of Computer Science and Engineering, Dhaka University of Engineering Technology, Gazipur, Bangladesh.
  • Kashem Department of Computer Science and Engineering, Dhaka University of Engineering Technology, Gazipur, Bangladesh.
  • Chowdhury Department of Computer Science and Engineering, Dhaka University of Engineering Technology, Gazipur, Bangladesh.
Keywords: ChatGPT, Classification, GPT, NLP, Machine Learning.

Abstract

ChatGPT is a large language model built by OpeanAI. It is based on an architecture called the Generative pre-trained transformer (GPT). It can generate text that appears to be written by a human and understands natural language questions. We want to investigate whether we can distinguish between query results from web search and ChatGPT by utilizing Machine learning (ML). To accomplish the investigation this research trains five different Machine learning (ML) methods on a balanced dataset containing 2010 samples of query results from ChatGPT and web search. These ML models are Random forest (RF), Naive Bayes (NB),  Decision tree (DT), Support vector machine (SVM), and Logistic regression (LR). Each of these methods is experimented with two feature optimization techniques namely LDA and PCA. After analyzing the results of all experiments, it is determined that the combination of NB with LDA yields the highest accuracy of 99.75%. Besides this technique also identifies ChatGPT-generated and human-written text with an accuracy of 98.67 from an existing dataset, and this outcome outperforms the state-of-the-art (SOTA) techniques. However, the proposed intelligent approach will help to identify any text of ChatGPT.
Published
2025-05-13
How to Cite
Iqbal, M. S. I., Kashem, M. A. K., & Chowdhury, M. A. C. (2025). A Machine Learning Framework for Discriminating between ChatGPT and Web Search Results. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2338
Section
Research Articles