proem

Computer Science

3 years ago

Researchers mine massive Bengali corpus, unlocking new frontiers in AI-powered language understanding.

5 views

2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)

BanglaLM: Data Mining based Bangla Corpus for Language Model Research

Md. Kowsher, Mohammed Jashim Uddin, Anik Tahabilder, Md. Ruhul Amin, Md. Fahim Shahriar, Md. Shohanur Islam Sobuj

Paper Summary

The article discusses the creation of a large dataset called BanglaLM for researchers working on language models in Bengali. The dataset, containing over 19 million samples from various sources, aims to help improve machine learning models for tasks like translation and grammar correction in the Bengali language. This dataset can be used to train models effectively and contribute to the Bengali machine learning and NLP community.

Researchers mine massive Bengali corpus, unlocking new frontiers in AI-powered language understanding.

Paper Summary

Researchers mine massive Bengali corpus, unlocking new frontiers in AI-powered language understanding.

Paper Summary

Related papers

Related papers