Data + AI Engineering Portfolio

Data Engineer building reliable pipelines and AI-ready data systems

Designing scalable data platforms with Python, SQL, AWS, Azure, Databricks, RAG, and modern cloud warehouses.

Featured Project

AZURE DATA ENGINEERING

NYC 311 Service Requests Lakehouse

Azure-first medallion lakehouse for NYC 311 operational analytics, transforming raw API data into analytics-ready bronze, silver, and gold datasets.

Azure Data Factory -> ADLS Gen2 -> Databricks pipeline with proven raw landing and medallion processing
Reusable data quality checks, dimensional models, and reporting marts
Architecture notes, runbooks, SQL assets, notebook exports, and cloud execution proof

Azure Data FactoryADLS Gen2DatabricksPySparkDelta LakePythonSQLPower BIGitHub Actions

Case Study GitHub Repo Architecture Execution Proof Docs

Data Engineering

REAL AWS CLOUD PROOF

Cloud Flight Fare Pipeline

Real AWS cloud proof project showing EventBridge Scheduler -> ECS/Fargate -> Flight API -> S3 Bronze -> Redshift Serverless -> dbt staging/marts/tests -> CloudWatch Logs, with proof screenshots, runbooks, and cost/secret safety notes.

AWSECS/FargateEventBridgeS3RedshiftdbtDockerCloudWatch

Case Study GitHub Repo Architecture Execution Proof

PYTHON DATA INGESTION

Travelpayouts Flight Collector

Python API ingestion project that collects live Travelpayouts flight fare data and publishes dated CSV snapshots for analytics.

PythonAPI IngestionCSVSchedulingpytestGitHub Actions

GitHub Repo Data

AI Data Engineering

CivicLens RAG architecture showing curated NYC 311 docs, local embeddings, PostgreSQL/pgvector retrieval, cited answers, sample analytics, and Streamlit UI.

AI DATA ENGINEERING / HYBRID RAG

COMPLETED LOCAL PROTOTYPE

CivicLens RAG — NYC 311 Operations Copilot

Local Hybrid RAG prototype for grounded NYC 311 documentation Q&A with citations, PostgreSQL/pgvector retrieval, sample analytics, and a Streamlit UI.

Ingests curated NYC 311 docs and runbooks, chunks source text, and stores local embeddings in PostgreSQL/pgvector.
Retrieves cited context for grounded answers and routes sample analytics questions to predefined CSV summaries.

PythonPostgreSQLpgvectorStreamlitDockerRAGGitHub Actionspytest

GitHub Repo Case Study Architecture Screenshots

Machine Learning & AI Data Science

Financial complaint NLP workflow diagram showing complaint text preprocessing, TF-IDF baseline, DistilBERT transformer upgrade, evaluation, and routing.

AI DATA SCIENCE / NLP CLASSIFICATION / ML PIPELINE

VERSION 2 PLANNED

Financial Complaint Auto-Routing with NLP

NLP classification project for routing consumer financial complaints into product categories using classical ML baselines and a planned DistilBERT transformer upgrade, with model evaluation, error analysis, and business-focused routing logic.

Ingests and cleans consumer financial complaint text data for supervised NLP classification.
Compares TF-IDF baseline models with a planned DistilBERT transformer upgrade using macro F1, weighted F1, precision, recall, and confusion matrix.
Designs a complaint-routing workflow where high-confidence predictions can be auto-routed and low-confidence cases are flagged for human review.

Pythonpandasscikit-learnNLPTF-IDFPyTorchTransformersDistilBERTModel EvaluationClassification

Supporting Work

AI REPORTING SAAS

Sumryze - AI-Powered SEO Reporting Dashboard

SaaS-style dashboard for automated SEO reporting, AI-generated summaries, analytics visualizations, and client-ready insights.

Next.jsTypeScriptTailwindOpenAIREST APIsVercel

GitHub Repo Live Site

DATA ANALYTICS

Floral Daily SKU Analysis

Sales and inventory analysis project focused on daily SKU movement, reporting, and business decision support.

SQLAnalyticsReporting

GitHub Repo

Skills & Tools

Orchestration & Workflow

Apache AirflowAzure Data FactoryEventBridge SchedulerGitHub Actions

Cloud Execution & Containers

DockerECS/FargateECRCloudWatch Logs

Storage, Lakehouse & Warehouse

ADLS Gen2Delta LakeDatabricksAmazon S3Redshift ServerlessPostgreSQL

Transformation & Modeling

PythonSQLPySparkdbtDimensional Modeling

Data Quality & CI

dbt TestspytestData ValidationValidation SQLGitHub Actions

Analytics Enablement

Power BISQL MartsKPI DesignBI HandoffDocumentation

AI / RAG Engineering

RAGVector SearchEmbeddingspgvectorCited AnswersStreamlitAI Evaluation

About

I am a Data Engineer focused on building reliable cloud data pipelines, analytics-ready datasets, and AI/RAG-enabled data applications.

My projects show end-to-end data engineering work across API ingestion, cloud storage, transformation layers, data quality checks, dimensional modeling, workflow orchestration, and analytics-ready outputs. I have built portfolio projects using Azure Data Factory, ADLS Gen2, Databricks, PySpark, Delta Lake, AWS S3, ECS/Fargate, Redshift Serverless, PostgreSQL/pgvector, Streamlit, Python, SQL, and dbt.

I care about clear SQL, maintainable Python, reproducible workflows, validation checks, CI, and documentation that helps reviewers understand how a system works.

With a background in analytics and web development, I can connect technical data engineering work with dashboards, reporting needs, AI-assisted workflows, and user-facing project presentation.

Contact

Interested in collaborating on data engineering work or portfolio projects? Reach out and I will follow up.

Email me LinkedIn GitHub