Design and build automated ETL/ELT workflows, ingest data from various sources, transform and standardize data formats, and ensure HIPAA and GDPR compliance.
Requirements
- Pipeline Development: Build and maintain robust data pipelines using Python, SQL, and Spark to process large-scale healthcare claims and salary survey data.
- Data Normalization: Develop logic to clean and standardize diverse data formats
- Languages: Expert-level SQL and Python (specifically for data manipulation via Pandas/PySpark).
- Big Data Tools: Hands-on experience with Databricks, Snowflake, or Hadoop ecosystems.
- Orchestration: Experience with Airflow or Azure Data Factory for managing complex job dependencies.
- Modeling: Understanding of Star/Snowflake schemas and Data Vault 2.0 for long-term analytical storage.
- Deploy and monitor data workloads on Azure (Data Factory/Databricks) or AWS (Glue/Redshift) to ensure high availability and scalability.

Follow us on social media