
Job Search AutoPipe πβ‘οΈπβ‘οΈβοΈ
A production-grade data engineering pipeline that automates job discovery, intelligent filtering, daily Telegram notifications, and AI-powered cover letter generation for data engineering roles.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JOB SEARCH AUTOPIPE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββββ β
β β INGEST βββββΆβCLASSIFY βββββΆβ DIGEST βββββΆβ APP PREP β β
β β (Bronze) β β (Silver) β β (Gold) β β (Output) β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββββ β
β β β β β β
β Job APIs NLP/Skill Telegram Bot Cover Letter β
β - Adzuna Matching 7 AM Daily Claude API β
β - Reed Dedup + Ranked List Tailored to JD β
β Scoring /flag /cover Per-Role Output β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β ORCHESTRATION: Apache Airflow 2.8 β
β STORAGE: PostgreSQL 15 (Medallion Architecture) β
β TRANSFORMS: dbt Core β
β QUALITY: Great Expectations β
β NOTIFICATIONS: Telegram Bot (Webhook) β
β COVER LETTERS: Claude API β
β INFRA: Docker Compose β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Pulls live job listings from Adzuna and Reed APIs every morning
- Raw JSON stored inΒ
bronze.raw_job_postings - SHA-256 content hashing for deduplication across sources
- Idempotent inserts βΒ
ON CONFLICT DO NOTHINGΒ prevents re-processing - Handles both Adzuna (ISO dates) and Reed (DD/MM/YYYY dates) formats
- Role ClassifierΒ β NLP-based weighted keyword scoring to verify genuine data engineering roles. Title and description signals with configurable weights and threshold
- Skills Match ScorerΒ β Compares JD requirements against a skills profile with proficiency weighting (expert=1.0, proficient=0.7, familiar=0.4)
- DeduplicationΒ β Cross-source SHA-256 hash of title+company eliminates duplicate listings from different sources
- Quality GateΒ β Only postings scoring above the configured threshold proceed to the gold layer
- Ranked Telegram message delivered at 7 AM daily
- Each listing shows: rank, match score (π’π‘π ), title, company, location, matched skills, ID, and apply link
gold.daily_digestΒ tracks what has been sent β no duplicates across days
/flag <id>Β β flag a job from Telegram to start an application/cover <id>Β β generate a tailored cover letter via Claude API, delivered to Telegram- Cover letters reference actual portfolio projects, matched skills, and address skill gaps honestly
- Application outcomes tracked inΒ
gold.application_trackerΒ with full ATS analytics
| Component | Technology | Why |
|---|---|---|
| Orchestration | Apache Airflow 2.8 | Industry standard for data pipelines |
| Database | PostgreSQL 15 | Medallion architecture (bronze/silver/gold) |
| Transformations | dbt Core | SQL-based transforms with lineage |
| Data Quality | Great Expectations | Automated validation suites |
| Containerisation | Docker Compose | Reproducible local environment |
| Notifications | Telegram Bot + Webhook | Real-time digest + bidirectional commands |
| Cover Letters | Claude API | AI-powered, JD-tailored generation |
| Language | Python 3.11 | Pipeline logic and API clients |
BRONZE β Raw ingestion, untouched source data
bronze.raw_job_postings (id, source, source_job_id, content_hash, raw_json, ingested_at)
SILVER β Cleaned, classified, and scored
silver.classified_jobs (id, bronze_id, title, company, location, salary_min, salary_max,
description_clean, url, posted_date, source,
role_score, skills_match_score, overall_score,
is_genuine_de_role, matched_skills, missing_skills,
dedup_hash, is_duplicate, classified_at)
GOLD β Digest-ready and application tracking
gold.daily_digest (id, digest_date, silver_id, rank_position, digest_sent, created_at)
gold.application_tracker (id, silver_id, status, flagged_at, applied_at, cover_letter,
cv_notes, response_at, interview_at, days_to_response)
META β Pipeline observability
meta.pipeline_runs (id, dag_id, run_id, phase, status, records_in, records_out,
started_at, completed_at, error_message)
| Command | Description |
|---|---|
/digest |
Get today's ranked job digest |
/flag <id> |
Flag a job to start an application |
/cover <id> |
Generate a tailored cover letter via Claude API |
/stats |
View pipeline statistics |
Prerequisites:Β Docker Desktop, Git
# 1. Clone the repo
git clone https://github.com/Larious/job-search-autopipe.git
cd job-search-autopipe
# 2. Configure
cp config/config.example.yaml config/config.yaml
# Edit config.yaml with your API keys (see Configuration section below)
# 3. Start infrastructure
docker compose up -d pipeline-db airflow-db
docker compose up schema-init
docker compose up airflow-init
docker compose up -d airflow-webserver airflow-scheduler
# 4. Trigger the pipeline
# Open http://localhost:8080 (admin/admin)
# Trigger the job_search_autopipe DAG manually
# 5. Check Telegram for your digest
CopyΒ config/config.example.yamlΒ toΒ config/config.yamlΒ and fill in your values.
β οΈΒ
config/config.yamlΒ is gitignored β never commit it. It contains your API keys.
You will need:
- Adzuna API keysΒ β free atΒ developer.adzuna.com
- Reed API keyΒ β free atΒ reed.co.uk/developers
- Telegram bot token + chat IDΒ β viaΒ @BotFatherΒ on Telegram
- Anthropic API keyΒ β fromΒ console.anthropic.com
job-search-autopipe/
βββ dags/
β βββ job_search_dag.py # Airflow DAG: ingestβclassifyβqualityβdigest
βββ src/
β βββ ingestion/
β β βββ base_client.py # Abstract base with SHA-256 hashing, date parsing
β β βββ adzuna_client.py # Adzuna API client
β β βββ reed_client.py # Reed API client
β βββ transformation/
β β βββ role_classifier.py # NLP weighted keyword scorer
β β βββ skills_matcher.py # Proficiency-weighted skills matching
β βββ quality/
β β βββ expectations.py # Great Expectations validation suite
β βββ generation/
β β βββ cover_letter_generator.py # Claude API + template fallback
β βββ utils/
β β βββ database.py # PostgreSQL connection and CRUD methods
β β βββ config_loader.py # YAML config loader
β β βββ telegram_notifier.py # Telegram message formatting and sending
β β βββ slack_notifier.py # Slack webhook notifier
β β βββ notifier_factory.py # Notification channel routing
β βββ webhook/
β βββ telegram_webhook_server.py # Handles incoming Telegram commands
βββ dbt/
β βββ dbt_project.yml
β βββ models/
β βββ bronze/stg_raw_postings.sql
β βββ silver/int_classified_jobs.sql
β βββ gold/mart_daily_digest.sql
β mart_ats_analytics.sql
βββ config/
β βββ config.example.yaml # Template β safe to commit
β βββ skills_profile.yaml # Skills profile for matching
βββ scripts/
β βββ cli.py # Terminal interface for pipeline operations
βββ docs/
β βββ STAGE_1_SETUP_GUIDE.md
β βββ STAGE_2_INGESTION.md
β βββ STAGE_3_CLASSIFICATION.md
β βββ STAGE_4_NOTIFICATIONS.md
β βββ STAGE_5_GENERATION.md
βββ docker-compose.yml
βββ requirements.txt
Share
- Categories
- Data Engineering
- Project URL
- https://github.com/Larious/job-search-autopipe