Context-driven Bengali Text Generation using Conditional Language Model

Md. Raisul Kibria; Mohammad Abu Yousuf

doi:10.19139/soic-2310-5070-1061

Md. Raisul Kibria Institute of Information Technology, Jahangirnagar University
Mohammad Abu Yousuf Institute of Information Technology, Jahangirnagar University, Savar, Dhaka-1342, Bangladesh.

DOI: https://doi.org/10.19139/soic-2310-5070-1061

Keywords: Natural Language Processing, Bengali Text Generation, Context Representation, Bidirectional GRU

Abstract

Text generation is a rapidly evolving field of Natural Language Processing (NLP) with larger Language models proposed very often setting new state-of-the-art. These models are exorbitantly effective in learning the representation of words and their internal coherence in a particular language. However, an established context-driven, end to end text generation model is very rare, even more so for the Bengali language. In this paper, we have proposed a Bidirectional gated recurrent unit (GRU) based architecture that simulates the conditional language model or the decoder portion of the sequence to sequence (seq2seq) model and is further conditioned upon the target context vectors. We have explored several ways of combining multiple context words into a fixed dimensional vector representation that is extracted from the same GloVe language model which is used to generate the embedding matrix. We have used beam search optimization to generate the sentence with the maximum cumulative log probability score. In addition, we have proposed a human scoring based evaluation metric and used it to compare the performance of the model with unidirectional LSTM and GRU networks. Empirical results prove that the proposed model performs exceedingly well in producing meaningful outcomes depicting the target context. The experiment leads to an architecture that can be applied to an extensive domain of context-driven text generation based applications and which is also a key contribution to the NLP based literature of the Bengali language.

References

N. M. Goldman, Computer generation of natural language from a deep conceptual base, tech. rep., Stanford University-Computer Science Department, 1974.

E. Hovy, G. van Noord, G. Neumann, and J. Bateman, Language generation, Survey of the State of the art in Human Language Technology, pp. 131-146, 1996.

E. Reiter and R. Dale, Building natural language generation systems, Cambridge university press, 2000.

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Advances in neural information processing systems, 2014, pp. 3104-3112.

Wikipedia, Bengali language, https://en.wikipedia.org/wiki/Bengali language [Online; accessed 20-September-2020].

I. Sutskever, J. Martens, and G. E. Hinton, Generating text with recurrent neural networks, ICML, 2011

M. Sundermeyer, R. Schlter, and H. Ney, LSTM neural networks for language modeling, Thirteenth annual conference of the international speech communication association, 2012.

X. Zhang and M. Lapata, Chinese poetry generation with recurrent neural networks, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 670-680.

K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv:1406.1078, 2014.

J. Li, M.-T. Luong, and D. Jurafsky, A hierarchical neural autoencoder for paragraphs and documents, arXiv:1506.01057, 2015.

T.-H. Wen, M. Gasic, N. Mrksic, P.-H. Su, D. Vandyke, and S. Young, Semantically conditioned lstm-based natural language generation for spoken dialogue systems, arXiv:1508.01745, 2015.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, arXiv:1706.03762, 2017.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv:1810.04805, 2018.

E. Egonmwan and Y. Chali, Transformer and seq2seq model for paraphrase generation, Proceedings of the 3rdWorkshop on Neural Generation and Translation, pp. 249-255, 2019.

S. Santhanam, Context based text-generation using lstm networks, arXiv:2005.00048, 2020.

S. Abujar, A. K. M. Masum, S. M. H. Chowdhury, M. Hasan, and S. A. Hossain, Bengali Text generation Using Bi-directional RNN, 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2019, pp. 1-5.

S. Islam, M. F. Sarkar, T. Hussain, M. M. Hasan, D. M. Farid, and S. Shatabda, Bangla Sentence Correction Using Deep Neural Network Based Sequence to Sequence Learning, 2018 21st International Conference of Computer and Information Technology (ICCIT), 2018, pp. 1-6.

M. S. Islam, S. S. S. Mousumi, S. Abujar, and S. A. Hossain, Sequence-to-sequence Bangla sentence generation with LSTM recurrent neural networks, Procedia Computer Science, vol. 152, pp. 51-58, 2019.

O. Rakib, S. Akter, M. Khan, A. Das, and K. Habibullah, Bangla Word Prediction and Sentence Completion Using GRU: An Extended Version of RNN on N-gram Language Model, 2019 International Conference on Sustainable Technologies for Industry 4.0 (STI), 2019, pp. 16, doi: 10.1109/STI47673.2019.9068063.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol. 9, no. 8, pp. 17351780, 1997.

A. Graves, Generating sequences with recurrent neural networks, arXiv:1308.0850, 2013.

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv:1412.3555, 2014.

L. Bottou, Stochastic gradient descent tricks, Neural networks: Tricks of the trade, pp. 421-436, Springer, 2012.

T. Tieleman and G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26-31, 2012.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of machine learning research, vol. 12, no. 7, 2011.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980, 2014.

J. Pennington, R. Socher, and C. D. Manning, Glove: Global vectors for word representation, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543.

D. Britz, A. Goldie, M.-T. Luong, and Q. Le, Massive exploration of neural machine translation architectures, arXiv:1703.03906, 2017.

W. Foundation, Wikimedia Database backup dumps, https://dumps.wikimedia.org/bnwiki/latest/ [Online; accessed 12-August-2020].

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, Deep contextualized word representations, arXiv:1802.05365, 2018.