Real-Time Stock Market Data Pipeline with Kafka & AWS
Role: Data EngineerDesigned and implemented an end-to-end real-time data engineering pipeline to simulate stock market data ingestion, processing, and querying in a production-like cloud environment.
Architecture & Workflow
- Real-time data ingestion using Kafka producers and consumers built in Python
- Kafka infrastructure hosted on AWS EC2
- Streaming data persisted in Amazon S3 as a scalable data lake
- Schema discovery and management using AWS Glue Crawler & Data Catalog
- Analytical querying performed with Amazon Athena using SQL
What This Demonstrates
- Hands-on experience with real-time streaming systems
- Cloud-native data lake architecture design
- Practical understanding of ingestion, storage, and query layers
- Ability to design scalable, decoupled data pipelines
Tech Stack: Apache Kafka, Python, AWS (EC2, S3, Glue, Athena), SQL