Top 20 News Datasets Available on the Web for Free

Digital news sources have flourished at an extraordinary rate, ranging from a handful of digital news posts to many digital news sources and publications. This is because news posts now cover a wide range of issues and events, increasing their reach. These publications not only represent the world but also change and shape our perception of it.

Storing news data is now common due to the high demand for instant access to historical news data, for which people commonly use the News API. These news datasets can be useful for research purposes and for personal and professional artificial intelligence (AI) and machine learning (ML).

If you are looking for historical news data to power your AI and ML algorithms, you can use these free news datasets or the Newsdata.io tool which I will mention below. News datasets can help you find a wide range of historical stories related to any topic, organization, person, and more.

In this article, we will discuss a simple and reliable way to access historical news data sets. Let’s get right into it.

Here are the top 20 news datasets that you can download for free for your personal and professional AI, machine learning, and data analytics projects.

1. Newsdata.io

Name: Covid-19 news dataset


This Covid-19 dataset contains the latest world news related to Coronavirus.

2. Kaggle.com

Name: BBC News Classification (News article categorization)


The dataset is broken into 1490 records for training and 735 for testing. The goal will be to build a system that can accurately classify previously unseen news articles into the right category.

3. BBC

Name: BBC datasets


Two news article datasets, originating from BBC News, provided for use as benchmarks for machine learning research.

4. Harvard Dataverse

Name- A Million News Headlines


This contains data on news headlines published over a period of eighteen years. Sourced from the reputable Australian news source ABC (Australian Broadcasting Corporation)

5. Newsdata.io

Name: Covid-19 and vaccine news dataset


This contains data on the latest published news headlines from across the web. News headlines with all the metadata and full description.

6. Webz.io

Name- Political news articles


This contains world politics-related news article data fetch with the help of Webz.io news API.

7. Paperswithcode

Name- COVID-19 Fake News Dataset


Along with the COVID-19 pandemic, we are also fighting an `infodemic’. Fake news and rumors are rampant on social media. Believing in rumors can cause significant harm.

8. Kaggle

Name: India News Headlines Dataset


This news dataset is a persistent historical archive of notable events in the Indian subcontinent from start-2001 to end-2020, recorded in real-time by the journalists of India. It contains approximately 3.4 million events published by the Times of India.

9. Data.world

Name: Economic News Article Tone


Contributors read snippets of news articles. They then noted if the article was relevant to the US economy and, if so, what the tone of the article was.

10. Archive.org

Name: World Politics news dataset


This dataset contains the latest news related to politics around the world with the available news article’s metadata.

11. IEEE.org

Name: Covid-19 and vaccine


This dataset contains world news related to Covid-19 and vaccine and also with the news article’s available metadata.

12. IEEE.org

Name: World politics news


This dataset contains world news related to politics and also with the news article’s available metadata.

13. IEEE.org

Name: Covid-19 news


This dataset contains all the latest news data related to Covid-19 from around the world.

14. IEEE.org



COVIFN is a CoVID-19-specific dataset that consists of fact-checked fake news scraped from Poynter and true news from news publishers’ verified portals. The dataset was pre-processed, the removal of special characters and non-vital information is performed.

15. IEEE.org



The Internet is a vast repository of useful knowledge, but it has been contaminated by the spread of false information. Relying on misinformation can be disastrous. According to a World Health Organization survey, about 6,000 individuals were hospitalized throughout the world as a result of fake news on COVID-19 in the first three months of 2020.

16. IEEE.org



Features of each news according to seven credibility categories.

17. IEEE.org

Name: AI-Based automated extraction of entities, entity categories, and sentiment on Covid-19 situation.


Artificial Intelligence (AI) based on in-depth analysis of social media content would allow a strategic decision-maker to obtain evidence-based responses to complex queries.

18. Kaggle

Name: Reddit Omicron Panic


As we all know, a new variant of COVID-19 is spreading worldwide causing massive panic. This dataset captures mentions of the new variant on Reddit.

19. Kaggle

Name: Omicron daily cases by country (COVID-19 variant)


Tracking the progression of the new omicron COVID-19 variant.

20. IEEE.org

Name: Daily report of Covid-19 confirmed cases in Thailand.


A dataset contains a total of 578,375 COVID-19 confirmed cases reported in Thailand that were being recorded between 22 January 2021 to 30 July 2021.

Source link

1 comment

    I like the helpful info you provide in your articles. I will bookmark your weblog and check again here frequently.
    I’m quite sure I will learn plenty of new stuff right
    here! Good luck for the next!

Leave a Reply

Shopping cart


No products in the cart.

Continue Shopping