Indonesian News Extractive Summarization using Lexrank and YAKE Algorithm
Keywords:
automatic text summarization, unsupervised learning, extractive text summarization, sentence extraction, term weight
Abstract
The surge in global technological advancements has led to an unprecedented volume of information sharingacross diverse platforms. This information, easily accessible through browsers, has created an overload, making it challenging for individuals to efficiently extract essential content. In response, this paper proposes a hybrid Automatic Text Summarization (ATS) method, combining LexRank and YAKE algorithms. LexRank determines sentence scores, while YAKE calculates individual word scores, collectively enhancing summarization accuracy. Leveraging an unsupervised learning approach, the hybrid model demonstrates a 2% improvement over its base model. To validate the effectiveness of the proposed method, the paper utilizes 5000 Indonesian news articles from the Indosum dataset. Ground-truth summaries are employed, with the objective of condensing each article to 30% of its content. The algorithmic approach and experimental results are presented, offering a promising solution to information overload. Notably, the results reveal a two percent improvement in the Rouge-1 and Rouge-2 scores, along with a one percent enhancement in the Rouge-L score. These findings underscore the potential of incorporating a keyword score to enhance the overall accuracy of the summaries generated by LexRank. Despite the absence of a machine learning model in this experiment, the unsupervised learning and heuristic approach suggest broader applications on a global scale. A comparative analysis with other state-of-the-art text summarization methods or hybrid approaches will be essential to gauge its overall effectiveness.
Published
2024-06-07
How to Cite
Wijaya, J., & Suganda Girsang, A. (2024). Indonesian News Extractive Summarization using Lexrank and YAKE Algorithm. Statistics, Optimization & Information Computing, 12(6), 1973-1983. https://doi.org/10.19139/soic-2310-5070-1976
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).