A Global Database of Society - the GDELT Project monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world
React App
Table of contents
Open Table of contents
Source
The GDELT Project - https://www.gdeltproject.org/
The GDELT project is a massive data scraping project recording the events occuring all over the world and is refreshed every 15 minutes, in this project I focus on the events table - the raw data file is extracted daily.
Data ingestion, cleaning and transformation
A python script which extract the raw data file and cleans filters the columns needed and drops the data where the event category is not available, Polars is used for transformations.
The title is scraped and extracted for each record from its article web page using beautifulsoup.
The data then is loaded into Azure cosmos and MotherDuck.
The Fast API server is connected to a Azure Cosmos and exposes the API which is used by the React App to get data and update.
Dashboard made using Evidence.dev is connected to the MotherDuck database and is refreshed daily.
Sources: https://www.gdeltproject.org/
Screenshots: