AR BRAND LOGO

Abraham Aroloye

Data Engineer - Web Developer - SEO Expert

  • Data Engineer
  • Web Dev & SEO
  • Portfolio
  • Contact Me
Github Linkedin Instagram
Contact Me

Job Search AutoPipe πŸ”βž‘οΈπŸ“Šβž‘οΈβœ‰οΈ

A production-grade data engineering pipeline that automates job discovery, intelligent filtering, daily Telegram notifications, and AI-powered cover letter generation for data engineering roles.


Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        JOB SEARCH AUTOPIPE                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  INGEST  │───▢│CLASSIFY  │───▢│  DIGEST  │───▢│   APP PREP     β”‚    β”‚
β”‚  β”‚ (Bronze) β”‚    β”‚ (Silver) β”‚    β”‚  (Gold)  β”‚    β”‚   (Output)     β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚       β”‚               β”‚               β”‚                  β”‚             β”‚
β”‚   Job APIs        NLP/Skill      Telegram Bot       Cover Letter       β”‚
β”‚   - Adzuna        Matching       7 AM Daily         Claude API         β”‚
β”‚   - Reed          Dedup +        Ranked List        Tailored to JD     β”‚
β”‚                   Scoring        /flag /cover        Per-Role Output   β”‚
β”‚                                                                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  ORCHESTRATION:  Apache Airflow 2.8                                     β”‚
β”‚  STORAGE:        PostgreSQL 15 (Medallion Architecture)                 β”‚
β”‚  TRANSFORMS:     dbt Core                                               β”‚
β”‚  QUALITY:        Great Expectations                                     β”‚
β”‚  NOTIFICATIONS:  Telegram Bot (Webhook)                                 β”‚
β”‚  COVER LETTERS:  Claude API                                             β”‚
β”‚  INFRA:          Docker Compose                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pipeline Phases

Phase 1 β€” Ingestion (Bronze Layer)

  • Pulls live job listings from Adzuna and Reed APIs every morning
  • Raw JSON stored inΒ bronze.raw_job_postings
  • SHA-256 content hashing for deduplication across sources
  • Idempotent inserts β€”Β ON CONFLICT DO NOTHINGΒ prevents re-processing
  • Handles both Adzuna (ISO dates) and Reed (DD/MM/YYYY dates) formats

Phase 2 β€” Classification & Scoring (Silver Layer)

  • Role ClassifierΒ β€” NLP-based weighted keyword scoring to verify genuine data engineering roles. Title and description signals with configurable weights and threshold
  • Skills Match ScorerΒ β€” Compares JD requirements against a skills profile with proficiency weighting (expert=1.0, proficient=0.7, familiar=0.4)
  • DeduplicationΒ β€” Cross-source SHA-256 hash of title+company eliminates duplicate listings from different sources
  • Quality GateΒ β€” Only postings scoring above the configured threshold proceed to the gold layer

Phase 3 β€” Daily Digest (Gold Layer)

  • Ranked Telegram message delivered at 7 AM daily
  • Each listing shows: rank, match score (🟒🟑🟠), title, company, location, matched skills, ID, and apply link
  • gold.daily_digestΒ tracks what has been sent β€” no duplicates across days

Phase 4 β€” Application Prep (Output Layer)

  • /flag <id>Β β€” flag a job from Telegram to start an application
  • /cover <id>Β β€” generate a tailored cover letter via Claude API, delivered to Telegram
  • Cover letters reference actual portfolio projects, matched skills, and address skill gaps honestly
  • Application outcomes tracked inΒ gold.application_trackerΒ with full ATS analytics

Tech Stack

Component Technology Why
Orchestration Apache Airflow 2.8 Industry standard for data pipelines
Database PostgreSQL 15 Medallion architecture (bronze/silver/gold)
Transformations dbt Core SQL-based transforms with lineage
Data Quality Great Expectations Automated validation suites
Containerisation Docker Compose Reproducible local environment
Notifications Telegram Bot + Webhook Real-time digest + bidirectional commands
Cover Letters Claude API AI-powered, JD-tailored generation
Language Python 3.11 Pipeline logic and API clients

Database Schema (Medallion Architecture)

BRONZE β€” Raw ingestion, untouched source data
  bronze.raw_job_postings     (id, source, source_job_id, content_hash, raw_json, ingested_at)

SILVER β€” Cleaned, classified, and scored
  silver.classified_jobs      (id, bronze_id, title, company, location, salary_min, salary_max,
                               description_clean, url, posted_date, source,
                               role_score, skills_match_score, overall_score,
                               is_genuine_de_role, matched_skills, missing_skills,
                               dedup_hash, is_duplicate, classified_at)

GOLD β€” Digest-ready and application tracking
  gold.daily_digest           (id, digest_date, silver_id, rank_position, digest_sent, created_at)
  gold.application_tracker    (id, silver_id, status, flagged_at, applied_at, cover_letter,
                               cv_notes, response_at, interview_at, days_to_response)

META β€” Pipeline observability
  meta.pipeline_runs          (id, dag_id, run_id, phase, status, records_in, records_out,
                               started_at, completed_at, error_message)

Telegram Bot Commands

Command Description
/digest Get today's ranked job digest
/flag <id> Flag a job to start an application
/cover <id> Generate a tailored cover letter via Claude API
/stats View pipeline statistics

Quick Start

Prerequisites:Β Docker Desktop, Git

# 1. Clone the repo
git clone https://github.com/Larious/job-search-autopipe.git
cd job-search-autopipe

# 2. Configure
cp config/config.example.yaml config/config.yaml
# Edit config.yaml with your API keys (see Configuration section below)

# 3. Start infrastructure
docker compose up -d pipeline-db airflow-db
docker compose up schema-init
docker compose up airflow-init
docker compose up -d airflow-webserver airflow-scheduler

# 4. Trigger the pipeline
# Open http://localhost:8080 (admin/admin)
# Trigger the job_search_autopipe DAG manually

# 5. Check Telegram for your digest

Configuration

CopyΒ config/config.example.yamlΒ toΒ config/config.yamlΒ and fill in your values.

⚠️ config/config.yamlΒ is gitignored β€” never commit it. It contains your API keys.

You will need:

  • Adzuna API keysΒ β€” free atΒ developer.adzuna.com
  • Reed API keyΒ β€” free atΒ reed.co.uk/developers
  • Telegram bot token + chat IDΒ β€” viaΒ @BotFatherΒ on Telegram
  • Anthropic API keyΒ β€” fromΒ console.anthropic.com

Project Structure

job-search-autopipe/
β”œβ”€β”€ dags/
β”‚   └── job_search_dag.py              # Airflow DAG: ingestβ†’classifyβ†’qualityβ†’digest
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ ingestion/
β”‚   β”‚   β”œβ”€β”€ base_client.py             # Abstract base with SHA-256 hashing, date parsing
β”‚   β”‚   β”œβ”€β”€ adzuna_client.py           # Adzuna API client
β”‚   β”‚   └── reed_client.py             # Reed API client
β”‚   β”œβ”€β”€ transformation/
β”‚   β”‚   β”œβ”€β”€ role_classifier.py         # NLP weighted keyword scorer
β”‚   β”‚   └── skills_matcher.py          # Proficiency-weighted skills matching
β”‚   β”œβ”€β”€ quality/
β”‚   β”‚   └── expectations.py            # Great Expectations validation suite
β”‚   β”œβ”€β”€ generation/
β”‚   β”‚   └── cover_letter_generator.py  # Claude API + template fallback
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ database.py                # PostgreSQL connection and CRUD methods
β”‚   β”‚   β”œβ”€β”€ config_loader.py           # YAML config loader
β”‚   β”‚   β”œβ”€β”€ telegram_notifier.py       # Telegram message formatting and sending
β”‚   β”‚   β”œβ”€β”€ slack_notifier.py          # Slack webhook notifier
β”‚   β”‚   └── notifier_factory.py        # Notification channel routing
β”‚   └── webhook/
β”‚       └── telegram_webhook_server.py # Handles incoming Telegram commands
β”œβ”€β”€ dbt/
β”‚   β”œβ”€β”€ dbt_project.yml
β”‚   └── models/
β”‚       β”œβ”€β”€ bronze/stg_raw_postings.sql
β”‚       β”œβ”€β”€ silver/int_classified_jobs.sql
β”‚       └── gold/mart_daily_digest.sql
β”‚           mart_ats_analytics.sql
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ config.example.yaml            # Template β€” safe to commit
β”‚   └── skills_profile.yaml            # Skills profile for matching
β”œβ”€β”€ scripts/
β”‚   └── cli.py                         # Terminal interface for pipeline operations
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ STAGE_1_SETUP_GUIDE.md
β”‚   β”œβ”€β”€ STAGE_2_INGESTION.md
β”‚   β”œβ”€β”€ STAGE_3_CLASSIFICATION.md
β”‚   β”œβ”€β”€ STAGE_4_NOTIFICATIONS.md
β”‚   └── STAGE_5_GENERATION.md
β”œβ”€β”€ docker-compose.yml
└── requirements.txt

Share
Categories
Data Engineering
Project URL
https://github.com/Larious/job-search-autopipe
Launch Project

Lets Work Together

You're not just getting a Data Engineer, but working with a strategic partner committed to driving tangible growth for your business.Β 

Contact Me
AR BRAND LOGO

Abraham Aroloye

Data Engineer - Web Developer - SEO Expert

© 2026. Abraham Aroloye - All rights reserved.
Github Linkedin Instagram
Shopping Basket