Location: Remote – Hyderabad
Client: Microsoft (via Qylis)
Experience: 4–8 years
Employment Type: Contract / Full-Time (as applicable)
Start Date: Immediate
Job Summary
We are looking for a skilled MLOps Engineer to join our AI/ML team and support the deployment, automation, and monitoring of machine learning models in production. The ideal candidate will bridge the gap between data science and DevOps, ensuring ML workflows are robust, scalable, and secure across cloud and on-prem environments.
Key Responsibilities
- Collaborate with Data Scientists and Engineers to build and automate ML pipelines (training, validation, deployment, monitoring)
- Develop and maintain CI/CD pipelines tailored for ML model lifecycle
- Containerize ML models using Docker and orchestrate deployment with Kubernetes
- Implement model versioning, drift detection, monitoring, and rollback strategies
- Integrate ML pipelines with data versioning tools (DVC, MLflow, Weights & Biases, etc.)
- Optimize resource utilization for training and inference jobs
- Automate data ingestion workflows and feature store updates
- Ensure models meet compliance, auditability, and reproducibility standards
- Manage ML environments and dependency isolation (virtualenv, conda, etc.)
- Work with cloud services like AWS SageMaker, Azure ML, or GCP Vertex AI
Required Skills
- Strong experience with MLOps tools and frameworks (MLflow, Kubeflow, Airflow, TFX, etc.)
- Experience with CI/CD tools (GitLab CI/CD, Jenkins, CircleCI, etc.)
- Proficiency in Docker, Kubernetes, and container orchestration
- Solid Python programming skills and familiarity with ML libraries (scikit-learn, TensorFlow, PyTorch)
- Experience with monitoring and logging tools (Prometheus, Grafana, ELK, etc.)
- Hands-on experience with cloud platforms (AWS, Azure, GCP) and infrastructure as code (Terraform, CloudFormation)
Preferred Qualifications
- Bachelor’s/Master’s in Computer Science, Data Engineering, or related field
- Experience working with GPU/TPU-enabled training environments
- Exposure to feature stores, data lineage, and model governance frameworks
- Certification in cloud platforms (AWS/Azure/GCP) or MLOps is a plus