Role Overview
As a Senior Data Engineer in the AI Data Curation track, you will ensure that the data powering our AI models is high-quality, well-organized, and fit for use in model training and deployment. You will play a key role in designing and maintaining scalable data pipelines, ensuring that data is clean, relevant, and aligned with ethical and compliance standards.
What You Will Do
Design and implement data pipelines for processing, cleaning, and curating large datasets used in model training and fine-tuning. Automate data cleaning processes and ensure datasets are appropriately labeled and structured.
Why It Might Be a Fit
Assess and mitigate bias in datasets, ensuring that models are trained on diverse and representative data. Manage data storage and retrieval strategies, ensuring scalability and data consistency across different environments.
Requirements
- Bachelor’s degree in Computer Science, Data Science, or a related field
- 5+ years of experience in data engineering, data wrangling, or data curation
- Strong proficiency in Python (Pandas, NumPy) and SQL for data manipulation and querying
- Familiarity with cloud-based data storage (AWS S3, Google Cloud Storage, etc.) and distributed systems for managing large datasets
- Experience with data annotation tools and platforms for manual or semi-automated labeling
- Experience with NLP data formats, such as JSONL, text, or embeddings, and an understanding of tokenization
- Experience managing data pipelines with tools like Apache Kafka, Apache Airflow, or similar ETL tools
- Strong knowledge of AI ethics, data privacy, and compliance standards (GDPR, CCPA, etc.)
Benefits
- Comprehensive and competitive benefits program
- Medical, dental, and vision plan offerings
- Income-protection programs
- 401(k)-retirement savings plan
- Competitive paid time-off programs
- Paid holidays
To apply for this job please visit careers.tsmc.com.

Follow us on social media