This role involves designing and managing the foundational infrastructure of a data lakehouse, developing real-time stream processing frameworks, and creating self-service ETL and query frameworks. The ideal candidate will have 3-12 years of experience in data engineering, with a focus on building or managing a data platform. They will possess deep hands-on expertise with tools like Spark, Hudi/Delta Lake, Kafka, Airflow, Debezium, Presto/Trino, DBT, and Airbyte, and be comfortable working with the AWS data ecosystem.
Requirements
- Take full ownership of the data lakehouse, including its architecture, ingestion from CDC sources, scalability, and reliability
- Develop and manage real-time stream processing frameworks for applications such as anomaly detection, customer 360 views, and live supply chain signals
- Design and scale OLAP stores to support both real-time and batch processing for internal analytics and AI/ML pipelines
- Create self-service ETL and query frameworks that enable data consumers to operate quickly without creating bottlenecks for the platform team
- Implement cost observability measures that provide detailed insights into compute, storage, and query expenses by job, user, and source
- Build data movement APIs and reverse-ETL pipelines to efficiently deliver data to downstream consumers at scale
- Establish robust job orchestration layers that remain stable under scale
Benefits
- Opportunity to work on a data platform that powers AI for Fortune 500 companies
- High ownership and challenge of working at true scale
To apply for this job please visit jobs.workable.com.

Follow us on social media