AR BRAND LOGO

Abraham Aroloye

Data Engineer - Web Developer - SEO Expert

  • Data Engineer
  • Web Dev & SEO
  • Portfolio
  • Contact Me
Github Linkedin Instagram
Contact Me

DataForge Energy — Daily Load Analyzer

Project Overview

A Python data pipeline that ingests, validates, and summarises daily energy meter readings from multiple regional CSV files.

Built as a data engineering case study project.

Business Problem

DataForge Energy collects hourly power consumption data from smart meters across four regions. Raw data arrives daily with quality issues including missing values, duplicate meter IDs, negative readings, and out-of-range timestamps. This pipeline automates detection and reporting of these issues.

Pipeline Architecture

Raw CSV Files (Bronze) ↓ Data Validation & Cleaning (Silver) ↓ Regional Aggregation & Anomaly Detection (Gold) ↓ JSON + CSV Summary Report (Output)

Features

  • Reads and processes multiple CSV files in one run
  • Validates each record against defined quality rules
  • Detects duplicate meter IDs using set-based logic
  • Flags high consumption meters (> 250 kWh threshold)
  • Validates timestamps against expected date range
  • Generates regional summaries (avg kWh, record counts)
  • Exports results as both JSON and CSV reports

Tech Stack

  • Python 3
  • Jupyter Notebook (VS Code)
  • Libraries: csv, os, json (all built-in)

Dataset

  • 4 regional files (east, north, south, west)
  • 2,400 rows per file — 9,600 total records
  • Fields: timestamp, meter_id, region, consumption_kwh

Results

Metric Value
Total records processed 9,600
Total invalid rows detected 171
Total duplicate meters found 400
Files processed 4

Key Concepts Demonstrated

  • Data validation and quality checking
  • Medallion architecture (Bronze → Silver → Gold)
  • Duplicate detection using Python sets
  • Aggregation using nested dictionaries
  • Error handling with try/except
  • Pipeline orchestration across multiple files
  • Structured reporting in JSON and CSV

How to Run

  1. Clone the repository
  2. Place CSV files in the project folder
  3. Open daily_load_analyzer.ipynb in VS Code
  4. Click Run All
  5. Check output files: dataforge_summary_report.json and dataforge_summary_report.csv
Share
Categories
Data Engineering
Project URL
https://github.com/Larious/Dataforge-daily-load-analyzer
Launch Project

Lets Work Together

You're not just getting a Data Engineer, but working with a strategic partner committed to driving tangible growth for your business. 

Contact Me
AR BRAND LOGO

Abraham Aroloye

Data Engineer - Web Developer - SEO Expert

© 2026. Abraham Aroloye - All rights reserved.
Github Linkedin Instagram
Shopping Basket