ML Ops Engineer

البحرين

Responsibilities:

  • Design and implement data pipelines and engineering infrastructure to support enterprise machine learning systems at scale.
  • Work closely with data scientists and engineering teams to deploy, monitor, and optimize machine learning models in production.
  • Identify, evaluate, and integrate new technologies to enhance performance, maintainability, and reliability of machine learning solutions.
  • Apply software engineering best practices to machine learning pipelines, including CI/CD, automation, monitoring, and version control.
  • Manage cloud infrastructure (AWS, Azure, GCP) and containerization (Docker, Kubernetes) to ensure scalable and efficient ML workloads.
  • Implement and maintain highly available and scalable machine learning environments.
  • Ensure the security and compliance of machine learning systems, adhering to governance and industry regulations.
  • Troubleshoot and optimize machine learning models and infrastructure for performance improvements.
  • Collaborate with IT and OT teams to ensure seamless integration of machine learning systems.
  • Use Infrastructure as Code (Terraform, CloudFormation) to automate the management and provisioning of infrastructure.
  • Implement automated processes for deployment, monitoring, logging, and performance tracking.


Skills

Required Skillsets:

  • ML Model Deployment & Containerization: Strong experience with Docker and Kubernetes.
  • Cloud Platforms: Expertise in AWS, Azure, or Google Cloud Platform (GCP).
  • DevOps Practices: In-depth knowledge of DevOps, CI/CD pipelines, and automation techniques.
  • Monitoring & Logging: Proficiency in setting up monitoring and logging for ML models and infrastructure.
  • Version Control: Expertise in Git or other version control systems.
  • IT-OT Integration: Experience integrating IT and OT systems.
  • Scalability & High Availability: Proven track record of designing scalable, highly available machine learning infrastructure.
  • Security & Compliance: Understanding of security protocols, compliance frameworks, and governance.
  • Infrastructure as Code (IaC): Proficiency with Terraform or CloudFormation for automating infrastructure management.
  • Scripting: Strong skills in Python or Bash scripting for automation.
  • Data Engineering: Familiarity with data engineering workflows and handling large datasets.
  • Troubleshooting: Excellent problem-solving and troubleshooting abilities in distributed systems.


تاريخ النشر: اليوم
الناشر: Bayt
تاريخ النشر: اليوم
الناشر: Bayt