Specialist, GSF DnA Data Engineer

Hybrid Full TimeHyderabad, Telangana, IndiaMSD

Join a global healthcare biopharma company in Hyderabad, India as a Specialist, GSF DnA Data Engineer. Drive innovation and execution excellence by designing, building, and operating production-grade data platforms and pipelines. Partner with analytics, data science, and business stakeholders to translate requirements into robust datasets. Deliver reliable, governed, secure, and analytics-ready data by implementing modern data warehousing and lakehouse patterns on AWS and Databricks.

Requirements

  • Design, build, and operate batch and streaming data pipelines to ingest data from multiple sources into an AWS data lake / lakehouse and data warehouse.
  • Develop and maintain ETL/ELT transformations using Python, PySpark, and SQL; optimize jobs for performance, cost, and reliability.
  • Partner with Data Analysts, Data Scientists, and business stakeholders to understand use cases and deliver curated, analytics-ready datasets and features.
  • Implement data quality controls (validation rules, reconciliation, anomaly checks), define SLAs/SLOs, and contribute to metadata, lineage, and data catalog practices.
  • Use orchestration and observability to run pipelines reliably (e.g., Databricks Workflows, AWS Step Functions, scheduling, logging, monitoring, alerting).
  • Apply engineering best practices: unit/integration testing, automated data tests, code reviews, and quality gates within CI/CD.
  • Model and publish data for BI/analytics using dimensional modeling (star/snowflake), facts & dimensions, and slowly changing dimensions (SCD).
  • Write and tune advanced SQL for profiling, transformations, and performance troubleshooting across large datasets.
  • Build on AWS using services such as S3, Glue, Lambda, Step Functions, EMR, and CloudWatch; follow security best practices (IAM, encryption, least privilege).
  • Provision and manage cloud resources using Infrastructure as Code (e.g., Terraform) across dev/test/prod environments.
  • Package and deploy workloads using Docker (and where applicable ECS/Fargate); manage dependencies and runtime configurations.
  • Use GitHub for version control (branching strategies, pull requests, code reviews) and set up CI/CD for automated build, test, and deployment.
  • Develop scalable processing on Databricks / Apache Spark using PySpark and lakehouse concepts (e.g., Delta Lake, ACID, schema evolution).
  • Use notebooks (e.g., Jupyter/Databricks) for exploration and PoCs, then productionize solutions with reusable modules, tests, and deployment pipelines.
  • Work in an Agile delivery model (planning, daily sync, reviews, retros), providing accurate estimates and proactively managing risks/dependencies.
  • Create and maintain technical documentation (data contracts, pipeline specs, runbooks) and support operational handoffs.

Benefits

  • Generous Paid Time Off
  • 401k Matching
  • Retirement Plan
  • Visa Sponsorship

To apply for this job please visit msd.wd5.myworkdayjobs.com.


You can apply to this job and others using your online resume. Click the link below to submit your online resume and email your application to this employer.

Tired of manual job applications?

JobCopilot auto-applies to thousands of RevOps and GTM roles on your behalf — so you can focus on interviews, not applications.

Applying for this role?

Tailor your resume to this exact role — hiring managers notice the difference.

Latest articles on the blog

RECRUITERS!

Reduce the risk of your recruitment process (applicant quality, long and inefficient process) by selecting from a relevant pool of candidates.

POST A NEW JOB NOW!