About The Job:
At Oracle, we're seeking a talented and skilled Site Reliability Engineer to work on Oracle Cloud Observability and Management platform.
As a Site Reliability Engineer, you will solve interesting technical challenges by designing, deploying, and troubleshooting key Cloud services, platforms, and infrastructure, always thinking about reliability, scalability, resilience, security, and performance. Technically, you will understand the full stack of the services you support (Network to Application) and are able to dig deep into the service to determine how to best mitigate customer impact. Further, you will drive improvements through the development of tools and engage partner teams to drive down incident counts, reduce severity of events and minimize downtime. As a member of the O&M Site Reliability team, you will be surrounded by "willing to help" individuals representing some of the brightest and most innovative minds in the industry. You will be a part of an organization that prides itself on providing training, empowerment, and career progression. Our team provides 24/7/365, follow-the-sun coverage while pushing the boundaries of what can be accomplished in the cloud. Advancing cloud computing means great growth opportunities, and highly rewarding experiences working in our expanding computing environments and SRE team.
What You Need to Have:
- 2+ years of experience
- A BE/BTech or ME/MTech in Computer Science or equivalent education background.
- The successful Site Reliability Engineer should be highly motivated, dig deep into solving problems and be able to work independently.
- They should also be able to collaborate successfully with partner teams and stakeholders.
- Good analytical and problem-solving skills with strong customer service orientation.
- Ability to work effectively in a multi-location team
- Excellent communication skills, strong interpersonal skills
- Able to work as part of a 24x7x365 operations team.
Software skills:
- Strong Scripting skills (in Java, Python, Shell or equivalent)
- Well versed with Micro-Services architecture, Linux administration and Oracle database
- Clear understanding of a CI/CD pipeline concepts
- Knowledge of Cloud technologies like Chef, Terraform, Docker, Kubernetes, Solr etc.
- Experience with writing automation utilities to streamline workload
- Ability and willingness to learn quickly in a dynamic environment.
- Ability to participate in technical discussions and communicate clearly.
Career Level - IC2
- Ensure the availability of our cloud services 24x7x365
- Leverage excellence in communication, technical/business analysis, problem solving and attention to detail to methodically resolve issues.
- Technically, you will understand the full stack of the services you support (Network to Application) and are able to dig deep into the service to determine how to best mitigate customer impact.
- Drive improvements through the development of tools and engage partner teams to drive down incident counts, reduce severity of events and minimize downtime.
- Triage and troubleshoot service impacting events.
- Identify and work with engineering to implement opportunities for automation, signal noise reduction, recurring issues, and other actions to reduce time to mitigate service impacting events and increase the productivity of cloud operations and development resources.