Hello community Happy new year to everyone 2021 I'm happy to let you know that for anyone doing data science or machine learning in NLP. The Swahili news dataset is now available in the datasets library from HuggingFace (https://github.com/huggingface/datasets). Install the datasets library and access the dataset with 3 lines of code.
from datasets import load_dataset
# download the dataset
swahili_news = load_dataset('swahili_news')
# access the downloaded dataset
swahili_news['train'][0]