A Machine Learning Framework for Discriminating between ChatGPT and Web Search Results
Keywords:
ChatGPT, Classification, GPT, NLP, Machine Learning.
Abstract
ChatGPT is a large language model built by OpeanAI. It is based on an architecture called the Generative pre-trained transformer (GPT). It can generate text that appears to be written by a human and understands natural language questions. We want to investigate whether we can distinguish between query results from web search and ChatGPT by utilizing Machine learning (ML). To accomplish the investigation this research trains five different Machine learning (ML) methods on a balanced dataset containing 2010 samples of query results from ChatGPT and web search. These ML models are Random forest (RF), Naive Bayes (NB), Decision tree (DT), Support vector machine (SVM), and Logistic regression (LR). Each of these methods is experimented with two feature optimization techniques namely LDA and PCA. After analyzing the results of all experiments, it is determined that the combination of NB with LDA yields the highest accuracy of 99.75%. Besides this technique also identifies ChatGPT-generated and human-written text with an accuracy of 98.67 from an existing dataset, and this outcome outperforms the state-of-the-art (SOTA) techniques. However, the proposed intelligent approach will help to identify any text of ChatGPT.
Published
2025-05-13
How to Cite
Iqbal, M. S. I., Kashem, M. A. K., & Chowdhury, M. A. C. (2025). A Machine Learning Framework for Discriminating between ChatGPT and Web Search Results. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2338
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).