We are looking for Associate Site Reliability Engineer/Site Reliability Engineer to join our team in Guadalajara, Mexico.
Responsibilities: Maximize system uptime and availability, ensuring functional and performance SLAs.Establish end-to-end monitoring and alerting on all critical aspects.Solve complex problems for critical services and build automation to prevent problem recurrence.Influence and create new designs, architectures, standards, and methods for supporting the platform.Initiate and lead scripting and automation to streamline system updates and upgrades.Set up critical infrastructure, tools, and framework to streamline the deployment cycle.Work cross-functionally with Services and Engineering teams.Qualifications: BS or MS in Computer Science, related field, or equivalent professional experience.Demonstrated experience in deploying, managing, and operating scalable and fault-tolerant Linux/Kubernetes/JVM-based infrastructure in AWS, GCP, and other public clouds.Expertise in Linux Operating Systems, Networking, and Database concepts.Experience deploying, upgrading, and troubleshooting Kubernetes clusters and workloads.Experience with Cassandra (or another NoSQL alternative).Expertise in cloud providers, such as Amazon Web Services, Azure, and GCP.Experience with configuration management systems such as Puppet.Experience in Bash or Python; to automate and monitor systems.Experience with IaC tools like Ansible or Terraform.Excellent problem-solving, critical thinking, and communication skills.Experience supporting as a DevOps or sys admin for commercial SaaS solutions.C3 AI provides a competitive compensation package and excellent benefits.
#J-18808-Ljbffr