Context-driven Bengali Text Generation using Conditional Language Model
Abstract
Text generation is a rapidly evolving field of Natural Language Processing (NLP) with larger Language models proposed very often setting new state-of-the-art. These models are exorbitantly effective in learning the representation of words and their internal coherence in a particular language. However, an established context-driven, end to end text generation model is very rare, even more so for the Bengali language. In this paper, we have proposed a Bidirectional gated recurrent unit (GRU) based architecture that simulates the conditional language model or the decoder portion of the sequence to sequence (seq2seq) model and is further conditioned upon the target context vectors. We have explored several ways of combining multiple context words into a fixed dimensional vector representation that is extracted from the same GloVe language model which is used to generate the embedding matrix. We have used beam search optimization to generate the sentence with the maximum cumulative log probability score. In addition, we have proposed a human scoring based evaluation metric and used it to compare the performance of the model with unidirectional LSTM and GRU networks. Empirical results prove that the proposed model performs exceedingly well in producing meaningful outcomes depicting the target context. The experiment leads to an architecture that can be applied to an extensive domain of context-driven text generation based applications and which is also a key contribution to the NLP based literature of the Bengali language.References
N. M. Goldman, Computer generation of natural language from a deep conceptual base, tech. rep., Stanford University-Computer Science Department, 1974.
E. Hovy, G. van Noord, G. Neumann, and J. Bateman, Language generation, Survey of the State of the art in Human Language Technology, pp. 131-146, 1996.
E. Reiter and R. Dale, Building natural language generation systems, Cambridge university press, 2000.
I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Advances in neural information processing systems, 2014, pp. 3104-3112.
Wikipedia, Bengali language, https://en.wikipedia.org/wiki/Bengali language [Online; accessed 20-September-2020].
I. Sutskever, J. Martens, and G. E. Hinton, Generating text with recurrent neural networks, ICML, 2011
M. Sundermeyer, R. Schlter, and H. Ney, LSTM neural networks for language modeling, Thirteenth annual conference of the international speech communication association, 2012.
X. Zhang and M. Lapata, Chinese poetry generation with recurrent neural networks, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 670-680.
K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv:1406.1078, 2014.
J. Li, M.-T. Luong, and D. Jurafsky, A hierarchical neural autoencoder for paragraphs and documents, arXiv:1506.01057, 2015.
T.-H. Wen, M. Gasic, N. Mrksic, P.-H. Su, D. Vandyke, and S. Young, Semantically conditioned lstm-based natural language generation for spoken dialogue systems, arXiv:1508.01745, 2015.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, arXiv:1706.03762, 2017.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv:1810.04805, 2018.
E. Egonmwan and Y. Chali, Transformer and seq2seq model for paraphrase generation, Proceedings of the 3rdWorkshop on Neural Generation and Translation, pp. 249-255, 2019.
S. Santhanam, Context based text-generation using lstm networks, arXiv:2005.00048, 2020.
S. Abujar, A. K. M. Masum, S. M. H. Chowdhury, M. Hasan, and S. A. Hossain, Bengali Text generation Using Bi-directional RNN, 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2019, pp. 1-5.
S. Islam, M. F. Sarkar, T. Hussain, M. M. Hasan, D. M. Farid, and S. Shatabda, Bangla Sentence Correction Using Deep Neural Network Based Sequence to Sequence Learning, 2018 21st International Conference of Computer and Information Technology (ICCIT), 2018, pp. 1-6.
M. S. Islam, S. S. S. Mousumi, S. Abujar, and S. A. Hossain, Sequence-to-sequence Bangla sentence generation with LSTM recurrent neural networks, Procedia Computer Science, vol. 152, pp. 51-58, 2019.
O. Rakib, S. Akter, M. Khan, A. Das, and K. Habibullah, Bangla Word Prediction and Sentence Completion Using GRU: An Extended Version of RNN on N-gram Language Model, 2019 International Conference on Sustainable Technologies for Industry 4.0 (STI), 2019, pp. 16, doi: 10.1109/STI47673.2019.9068063.
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol. 9, no. 8, pp. 17351780, 1997.
A. Graves, Generating sequences with recurrent neural networks, arXiv:1308.0850, 2013.
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv:1412.3555, 2014.
L. Bottou, Stochastic gradient descent tricks, Neural networks: Tricks of the trade, pp. 421-436, Springer, 2012.
T. Tieleman and G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26-31, 2012.
J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of machine learning research, vol. 12, no. 7, 2011.
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980, 2014.
J. Pennington, R. Socher, and C. D. Manning, Glove: Global vectors for word representation, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543.
D. Britz, A. Goldie, M.-T. Luong, and Q. Le, Massive exploration of neural machine translation architectures, arXiv:1703.03906, 2017.
W. Foundation, Wikimedia Database backup dumps, https://dumps.wikimedia.org/bnwiki/latest/ [Online; accessed 12-August-2020].
M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, Deep contextualized word representations, arXiv:1802.05365, 2018.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).