Data Engineer - JobTarget LLC

Overview

As a Data Engineer at JobTarget LLC, I have been pivotal in designing and optimizing scalable data pipelines, ensuring seamless data integration, and driving business insights. My contributions have significantly improved data processing speeds and reduced operational costs.

  • Position: Data Engineer
  • Company: JobTarget LLC
  • Duration: April 2022 - Present
  • Location: Stamford, CT
Key Metrics
  • 76+ Scalable data pipelines designed and deployed.
  • 25% Increase in processing speed through Spark optimizations.
  • 30% Reduction in operational costs by leveraging AWS solutions.
  • 10+ Team collaborations with data scientists for business insights.
Technologies Used
AWS (S3, Lambda, Glue, Athena, DMS) Apache Spark Apache Airflow Hudi PostgreSQL Serverless Terraform CloudFormation
Key Contributions
  • Leadership: Spearheaded the migration of traditional data pipelines to a cloud-native architecture, leading a cross-functional team of 5 engineers and collaborating with stakeholders to define project requirements and ensure timely delivery. Reduced infrastructure costs by 40% and improved scalability, enabling seamless handling of surges in data volume.
  • Problem-Solving: Diagnosed and resolved data bottlenecks in Apache Spark processes by implementing dynamic partitioning and caching strategies. Achieved a 25% increase in processing efficiency, reduced compute costs, and enhanced the reliability of data pipelines across the organization.
  • New Approaches: Introduced a Hudi-based lakehouse architecture, which reduced data duplication, enabled seamless upserts, and decreased storage costs by 30%. Migrated terabytes of data to AWS S3 while ensuring data integrity and minimal downtime. Improved query performance by 40%, enabling advanced analytics and machine learning workloads.
  • Collaboration: Partnered with data scientists and analysts to design optimized data marts, delivering actionable insights that supported high-impact decision-making. Reduced query times by 50% and enabled independent exploration of data through robust documentation and training sessions.
  • Scalability: Engineered real-time streaming pipelines using AWS DMS and Apache Spark Structured Streaming, processing millions of records per minute with sub-second latency. Designed a fault-tolerant system with checkpointing, ensuring reliability during peak loads.
  • Innovation: Automated pipeline monitoring and alerting using Apache Airflow and AWS CloudWatch, reducing manual intervention by 40%. Created custom metrics and dashboards, providing real-time visibility into pipeline performance and enabling proactive detection of issues.