Digital news sources have flourished at an extraordinary rate, ranging from a handful of digital news posts to many digital news sources and publications. This is because news posts now cover a wide range of issues and events, increasing their reach. These publications not only represent the world but also change and shape our perception of it.
Storing news data is now common due to the high demand for instant access to historical news data, for which people commonly use the News API. These news datasets can be useful for research purposes and for personal and professional artificial intelligence (AI) and machine learning (ML).
If you are looking for historical news data to power your AI and ML algorithms, you can use these free news datasets or the Newsdata.io tool which I will mention below. News datasets can help you find a wide range of historical stories related to any topic, organization, person, and more.
In this article, we will discuss a simple and reliable way to access historical news data sets. Let’s get right into it.
Here are the top 20 news datasets that you can download for free for your personal and professional AI, machine learning, and data analytics projects.
Name: Covid-19 news dataset
This Covid-19 dataset contains the latest world news related to Coronavirus.
Name: BBC News Classification (News article categorization)
The dataset is broken into 1490 records for training and 735 for testing. The goal will be to build a system that can accurately classify previously unseen news articles into the right category.
Name: BBC datasets
Two news article datasets, originating from BBC News, provided for use as benchmarks for machine learning research.
4. Harvard Dataverse
Name- A Million News Headlines
This contains data on news headlines published over a period of eighteen years. Sourced from the reputable Australian news source ABC (Australian Broadcasting Corporation)
Name: Covid-19 and vaccine news dataset
This contains data on the latest published news headlines from across the web. News headlines with all the metadata and full description.
Name- Political news articles
This contains world politics-related news article data fetch with the help of Webz.io news API.
Name- COVID-19 Fake News Dataset
Along with the COVID-19 pandemic, we are also fighting an `infodemic’. Fake news and rumors are rampant on social media. Believing in rumors can cause significant harm.
Name: India News Headlines Dataset
This news dataset is a persistent historical archive of notable events in the Indian subcontinent from start-2001 to end-2020, recorded in real-time by the journalists of India. It contains approximately 3.4 million events published by the Times of India.
Name: Economic News Article Tone
Contributors read snippets of news articles. They then noted if the article was relevant to the US economy and, if so, what the tone of the article was.
Name: World Politics news dataset
This dataset contains the latest news related to politics around the world with the available news article’s metadata.
Name: Covid-19 and vaccine
This dataset contains world news related to Covid-19 and vaccine and also with the news article’s available metadata.
Name: World politics news
This dataset contains world news related to politics and also with the news article’s available metadata.
Name: Covid-19 news
This dataset contains all the latest news data related to Covid-19 from around the world.
Name: COVIFN : FAKE NEWS ON COVID19
COVIFN is a CoVID-19-specific dataset that consists of fact-checked fake news scraped from Poynter and true news from news publishers’ verified portals. The dataset was pre-processed, the removal of special characters and non-vital information is performed.
Name: FAKE NEWS ON HEALTHCARE
The Internet is a vast repository of useful knowledge, but it has been contaminated by the spread of false information. Relying on misinformation can be disastrous. According to a World Health Organization survey, about 6,000 individuals were hospitalized throughout the world as a result of fake news on COVID-19 in the first three months of 2020.
Name: NEWS CREDIBILITY DATASET
Features of each news according to seven credibility categories.
Name: AI-Based automated extraction of entities, entity categories, and sentiment on Covid-19 situation.
Artificial Intelligence (AI) based on in-depth analysis of social media content would allow a strategic decision-maker to obtain evidence-based responses to complex queries.
Name: Reddit Omicron Panic
As we all know, a new variant of COVID-19 is spreading worldwide causing massive panic. This dataset captures mentions of the new variant on Reddit.
Name: Omicron daily cases by country (COVID-19 variant)
Tracking the progression of the new omicron COVID-19 variant.
Name: Daily report of Covid-19 confirmed cases in Thailand.
A dataset contains a total of 578,375 COVID-19 confirmed cases reported in Thailand that were being recorded between 22 January 2021 to 30 July 2021.