Staff Site Reliability Engineer

This description is a summary of our understanding of the job description. Click on ‘Apply’ button to find out more.

Role Description

As a Staff Site Reliability Engineer (Staff SRE) at SailPoint, you will be a key member on our Reliability Engineering team, driving reliability practices servicing the Identity Security Cloud platform. You are immensely passionate about reliability practices and operational excellence.

Make it easy for everyone to create, consume, manage, and scale reliable cloud production services to achieve more
Keep up with industry trends to improve end-to-end reliability and maintainability for all services
Coach engineering teams on observability best practices such as setting up well-defined Service Level Objectives (SLOs)
Analyze performance of services and recommend infrastructure/code changes that will improve capacity and performance
Enable our engineering teams to scale our enterprise operations by providing guidance, best practices, and support as part of an SRE Center of Excellence
Manage cross-functional requirements working with Engineering, Product, Services, and other departments
Be a mentor of quality for design reviews, code, test cases, automation, observability, root cause analysis, and self-healing
Influence architectural design, implementation, consolidation, and simplification for global scale
Drive operational excellence to deliver frictionless operation, happy on call, and optimal customer experience

Qualifications

8+ years experience in SRE or DevOps production operations supporting a highly available environment for SaaS software or cloud service provider
Strong proficiency with one or more programming languages (Java, Python, Go, etc.)
Bachelor’s degree in Computer Science or other technical discipline, or equivalent experience is preferred, not required

Requirements

Due to FedRAMP requirements, US Citizenship is required to be considered for this role
Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code, preferably Terraform
Strong proficiency with containerization technology and/or Kubernetes
In-depth experience with metrics, tracing, and logging observability tools such as Prometheus, Grafana, Honeycomb, and Kibana
Experience with incident management, including conducting incident reviews
Strong understanding of Linux, software development, systems, networking, and Cloud concepts
A positive and collaborative demeanor, combined with the ability to coach, mentor, and delegate
Excellent communication skills
Life-long learner – you stay up to date with technology trends, spend time learning new technologies, and share your learnings with your team

Benefits

Health and wellness coverage: Medical, dental, and vision insurance
Disability coverage: Short-term and long-term disability
Life protection: Life insurance and Accidental Death & Dismemberment (AD&D)
Flexible spending accounts for health care, and dependent care; limited purpose flexible spending account
Financial security: 401(k) Savings and Investment Plan with company matching
Time off benefits: Flexible vacation policy
Holidays: 8 paid holidays annually
Sick leave
Parental support: Paid parental leave
Employee Assistance Program (EAP) and Care Counselors
Voluntary benefits: Legal Assistance, Critical Illness, Accident, Hospital Indemnity and Pet Insurance options
Health Savings Account (HSA) with employer contribution

#Staff #Site #Reliability #Engineer

Leave a Comment Cancel reply