We look for the type of engineer who can't walk past a problem. When you find an error or inefficiency, does it become your mission to ensure you never see its like again? If you are nodding your head, this is the job for you!
At CSI, one of our primary purposes is to discover and address recurring issues under the Continual Service Improvement (CSI) process. This program serves Oracle at large by continually improving the effectiveness and efficiency of SaaS service delivery. In this role you'll encounter multifaceted duties and responsibilities ranging from customer-facing non-technical communication to technical problem-solving and trend mapping. We work across departments and levels with individuals ranging from Technical Support to VPs of Cloud Operations.
We love people who can come into the team and point out things we haven't considered, and who are independent, proactive problem-solvers but still thrive in team environments.
**Some of what we do**:
- Provide leadership in responding to and managing the resolution of customer-facing critical issues in production environments following best practices of Incident Management
- Own the process and tools used to support timely and accurate notifications for stake holders at all levels during and after events
- Conduct timely in-geo postmortems and deliver a root cause, action items, and other metrics for all customer-impacting issues
- Collaborate with others to resolve problems, handle requests, and manage issues across multiple locations
- Identify opportunities and take ownership of automation and/or continuous improvement of processes, procedures, monitoring, and best practices within CSI and among operational teams
- Manage projects addressing long term, permanent resolutions for major issues
- Generate reports and metrics to be used in progress meetings and goal setting
- Maintain and enhance documentation for team and departmental use
**What we love to see**:
- Excellent written and verbal communication, conflict resolution, and meeting management
- Excellent troubleshooting skills
- Proven ability to synthesize information - can you read between lines?
- Motivation to work quickly and accurately under pressure in time-critical crises
- Data analysis experience (Excel, ELK/Kibana/OpenSearch, Grafana, Tableau, etc.)
**What we like to see**:
- Experience in technical troubleshooting, with broad expertise in core infrastructure technologies (e.g. server, compute, storage, network, authentication, databases)
- Scripting experience in Python, R, Bash, Perl, Ruby, or similar
- 3-5 years' experience working in large-scale production ops environments providing mission critical services to customers (we've held titles such as Site Reliability Engineer, Service Delivery Engineer, Technical Support, Program Analyst, and many others)
- Experience in incident and/or problem management roles
- Familiarity with service improvement methodologies like Agile, Six Sigma, or ITIL
- Demonstrated ability to effectively build/train/teach others new information and processes.
Our group is global with members in Austin, San Jose, Brno, Sydney, Manila, Athens, Guadalajara and Melbourne. Despite the distance, the team is a close-knit and collaborative group. Each member brings a unique skill set to create a robust and knowledgeable team. What will you bring to the table?
Career Level - IC3
**Some of what we do**:
- Provide leadership in responding to and managing the resolution of customer-facing critical issues in production environments following best practices of Incident Management
- Own the process and tools used to support timely and accurate notifications for stake holders at all levels during and after events
- Conduct timely in-geo postmortems and deliver a root cause, action items, and other metrics for all customer-impacting issues
- Collaborate with others to resolve problems, handle requests, and manage issues across multiple locations
- Identify opportunities and take ownership of automation and/or continuous improvement of processes, procedures, monitoring, and best practices within CSI and among operational teams
- Manage projects addressing long term, permanent resolutions for major issues
- Generate reports and metrics to be used in progress meetings and goal setting
- Maintain and enhance documentation for team and departmental use