Build and operate large-scale healthcare data pipelines across batch workflows, metadata-driven ingestion, and data service publishing. Own end-to-end engineering from source ingestion to conformed data products, with strong focus on reliability, data quality, and operational observability.
Requirements
- Design and maintain PySpark/SQL pipelines in Databricks for landing, unified, unstitched, and published data layers.
- Build and support Airflow DAGs for scheduling, dependencies, retries, and production operations.
- Implement metadata/config-driven frameworks for ingestion, transformation, and rule-based processing.
- Develop robust data quality controls, DQ summaries, failure handling, and alerting workflows.
- Manage batch/process audit logs, run status tracking, release flags, and operational reporting.
- Integrate multi-source data (files, APIs, cloud storage, and relational systems) into governed Delta/Spark tables.
- Optimize pipeline performance using partitioning, parallelization, and query tuning.
- Collaborate on schema evolution, business-rule onboarding, and production support.

Follow us on social media