Join a global healthcare biopharma company in Hyderabad, India as a Specialist, GSF DnA Data Engineer. Drive innovation and execution excellence by designing, building, and operating production-grade data platforms and pipelines. Partner with analytics, data science, and business stakeholders to translate requirements into robust datasets. Deliver reliable, governed, secure, and analytics-ready data by implementing modern data warehousing and lakehouse patterns on AWS and Databricks.
Requirements
- Design, build, and operate batch and streaming data pipelines to ingest data from multiple sources into an AWS data lake / lakehouse and data warehouse.
- Develop and maintain ETL/ELT transformations using Python, PySpark, and SQL; optimize jobs for performance, cost, and reliability.
- Partner with Data Analysts, Data Scientists, and business stakeholders to understand use cases and deliver curated, analytics-ready datasets and features.
- Implement data quality controls (validation rules, reconciliation, anomaly checks), define SLAs/SLOs, and contribute to metadata, lineage, and data catalog practices.
- Use orchestration and observability to run pipelines reliably (e.g., Databricks Workflows, AWS Step Functions, scheduling, logging, monitoring, alerting).
- Apply engineering best practices: unit/integration testing, automated data tests, code reviews, and quality gates within CI/CD.
- Model and publish data for BI/analytics using dimensional modeling (star/snowflake), facts & dimensions, and slowly changing dimensions (SCD).
- Write and tune advanced SQL for profiling, transformations, and performance troubleshooting across large datasets.
- Build on AWS using services such as S3, Glue, Lambda, Step Functions, EMR, and CloudWatch; follow security best practices (IAM, encryption, least privilege).
- Provision and manage cloud resources using Infrastructure as Code (e.g., Terraform) across dev/test/prod environments.
- Package and deploy workloads using Docker (and where applicable ECS/Fargate); manage dependencies and runtime configurations.
- Use GitHub for version control (branching strategies, pull requests, code reviews) and set up CI/CD for automated build, test, and deployment.
- Develop scalable processing on Databricks / Apache Spark using PySpark and lakehouse concepts (e.g., Delta Lake, ACID, schema evolution).
- Use notebooks (e.g., Jupyter/Databricks) for exploration and PoCs, then productionize solutions with reusable modules, tests, and deployment pipelines.
- Work in an Agile delivery model (planning, daily sync, reviews, retros), providing accurate estimates and proactively managing risks/dependencies.
- Create and maintain technical documentation (data contracts, pipeline specs, runbooks) and support operational handoffs.
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Visa Sponsorship
To apply for this job please visit msd.wd5.myworkdayjobs.com.

Follow us on social media