Experiance

Software Engineer - JobTarget LLC

Overview

As a Data Engineer at JobTarget LLC, I have been pivotal in designing and optimizing scalable data pipelines, ensuring seamless data integration, and driving business insights. My contributions have significantly improved data processing speeds and reduced operational costs.

Position: Software Engineer
Company: JobTarget LLC
Location: Stamford, CT

Key Metrics

76+ Scalable data pipelines designed and deployed.
25% Increase in processing speed through Spark optimizations.
30% Reduction in operational costs by leveraging AWS solutions.
10+ Team collaborations with data scientists for business insights.

Technologies Used

AWS (S3, Lambda, Glue, Athena, DMS) Apache Spark Apache Airflow Hudi PostgreSQL Serverless Terraform CloudFormation

Key Contributions

Leadership: Spearheaded the migration of traditional data pipelines to a cloud-native architecture, leading a cross-functional team of 5 engineers and collaborating with stakeholders to define project requirements and ensure timely delivery. Reduced infrastructure costs by 40% and improved scalability, enabling seamless handling of surges in data volume.
Problem-Solving: Diagnosed and resolved data bottlenecks in Apache Spark processes by implementing dynamic partitioning and caching strategies. Achieved a 25% increase in processing efficiency, reduced compute costs, and enhanced the reliability of data pipelines across the organization.
New Approaches: Introduced a Hudi-based lakehouse architecture, which reduced data duplication, enabled seamless upserts, and decreased storage costs by 30%. Migrated terabytes of data to AWS S3 while ensuring data integrity and minimal downtime. Improved query performance by 40%, enabling advanced analytics and machine learning workloads.
Collaboration: Partnered with data scientists and analysts to design optimized data marts, delivering actionable insights that supported high-impact decision-making. Reduced query times by 50% and enabled independent exploration of data through robust documentation and training sessions.
Scalability: Engineered real-time streaming pipelines using AWS DMS and Apache Spark Structured Streaming, processing millions of records per minute with sub-second latency. Designed a fault-tolerant system with checkpointing, ensuring reliability during peak loads.
Innovation: Automated pipeline monitoring and alerting using Apache Airflow and AWS CloudWatch, reducing manual intervention by 40%. Created custom metrics and dashboards, providing real-time visibility into pipeline performance and enabling proactive detection of issues.