Citi is seeking a highly skilled and experienced Senior Data Engineer to join our dynamic and innovative technology team.
Requirements
- Design, build, and maintain scalable ETL/ELT pipelines using PySpark, Spark SQL, and Delta Lake on Databricks, ensuring efficient ingestion, transformation, and integration of large-scale datasets across cloud platforms.
- Cloud Data Platform Management: Implement and manage data solutions on cloud platforms (e.g., AWS, GCP, Azure).
- Big Data Technologies: Work extensively with big data frameworks and platforms such as Databricks, Snowflake, and open table formats like Apache Iceberg to process and analyze petabyte-scale datasets.
- Optimize Spark workloads and Databricks clusters by tuning jobs, managing partitioning strategies, caching, and autoscaling to improve performance, reduce processing time, and control infrastructure costs.
- Implement and manage Lakehouse architecture using Delta Lake, enforcing data quality, schema evolution, and governance (e.g., Unity Catalog), while ensuring reliable, secure, and high-quality data for analytics and downstream applications.
- Lead the design and architecture of Starburst-based data solutions, ensuring scalability, performance, and reliability for enterprise-level data platforms.
- Implement and manage data federation strategies using Starburst connectors to seamlessly integrate and query data across disparate systems (e.g., Data Lakes, RDBMS, NoSQL databases, Cloud Storage).
- Performance Optimization: Identify and resolve performance bottlenecks in data pipelines and queries. Optimize data storage and processing for cost and efficiency.
- Develop and optimize robust data pipelines with a strong focus on data governance, ensuring high data quality, comprehensive data lineage, and efficient, compliant data flow from ingestion to consumption for analytical and operational needs.
- Data Modeling and Architecture: Design and implement data models that support business intelligence, analytics, and machine learning use cases. Ensure data architecture is robust, scalable, and secure.
- AI and Machine Learning Collaboration: Partner with data scientists and AI specialists to support the development and deployment of AI models. Contribute to innovative projects involving RAG and Agentic AI by providing the necessary data infrastructure and support.
- Agile Methodology: Operate effectively within an Agile development environment, actively participating in sprint planning, daily stand-ups, and retrospectives to ensure iterative and timely delivery of project milestones.
- Leadership and Project Guidance: Provide technical leadership to steer the project in the right direction, making critical decisions that align with both client interests and the organization’s strategic benefits. Mentor junior engineers and promote best practices.
- Stakeholder and Client Interaction: Serve as a key point of contact for stakeholders and clients. Effectively communicate project progress, manage expectations, and translate complex business requirements into actionable technical tasks.
- Core Data Technologies: Expert-level proficiency with Python and its data ecosystem (e.g., Pandas, NumPy, Dask). Experience should include writing production-grade code for data processing, automation, and API development.
- PySpark: Extensive hands-on experience with the Spark framework, including deep knowledge of the DataFrame API, Spark SQL, and performance tuning techniques for distributed data processing.
- Databricks: Proven experience developing on the Databricks Lakehouse Platform, including proficiency with Delta Lake, structured streaming, and optimizing Spark jobs within the Databricks environment.
- Ab Initio: Strong, practical experience with the Ab Initio suite of products (GDE, Co>Operating System, Conduct>It) for designing and implementing enterprise-grade ETL workflows.
- Snowflake: Hands-on experience designing, building, and maintaining data warehouses in Snowflake. This includes data modeling, implementing security (RBAC), performance tuning, and utilizing features like Snowpipe and Time Travel.
- Starburst/Trino: Experience using federated query engines to provide unified access across disparate data sources. Should understand the principles of query federation and have experience connecting to various data systems.
- Apache Iceberg: Familiarity or experience with open table formats like Apache Iceberg for managing large analytic datasets.
- In-depth knowledge and multi-year experience with at least one major cloud provider (AWS, Google Cloud Platform, or Azure).
Benefits
- medical, dental & vision coverage
- 401(k)
- life, accident, and disability insurance
- wellness programs
- paid time off packages
- paid holidays
To apply for this job please visit citi.wd5.myworkdayjobs.com.

Follow us on social media