Staff Site Reliability Engineer


This description is a summary of our understanding of the job description. Click on ‘Apply’ button to find out more.

Role Description

As a Staff Site Reliability Engineer (Staff SRE) at SailPoint, you will be a key member on our Reliability Engineering team, driving reliability practices servicing the Identity Security Cloud platform. You are immensely passionate about reliability practices and operational excellence.

  • Make it easy for everyone to create, consume, manage, and scale reliable cloud production services to achieve more
  • Keep up with industry trends to improve end-to-end reliability and maintainability for all services
  • Coach engineering teams on observability best practices such as setting up well-defined Service Level Objectives (SLOs)
  • Analyze performance of services and recommend infrastructure/code changes that will improve capacity and performance
  • Enable our engineering teams to scale our enterprise operations by providing guidance, best practices, and support as part of an SRE Center of Excellence
  • Manage cross-functional requirements working with Engineering, Product, Services, and other departments
  • Be a mentor of quality for design reviews, code, test cases, automation, observability, root cause analysis, and self-healing
  • Influence architectural design, implementation, consolidation, and simplification for global scale
  • Drive operational excellence to deliver frictionless operation, happy on call, and optimal customer experience

Qualifications

  • 8+ years experience in SRE or DevOps production operations supporting a highly available environment for SaaS software or cloud service provider
  • Strong proficiency with one or more programming languages (Java, Python, Go, etc.)
  • Bachelor’s degree in Computer Science or other technical discipline, or equivalent experience is preferred, not required

Requirements

  • Due to FedRAMP requirements, US Citizenship is required to be considered for this role
  • Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code, preferably Terraform
  • Strong proficiency with containerization technology and/or Kubernetes
  • In-depth experience with metrics, tracing, and logging observability tools such as Prometheus, Grafana, Honeycomb, and Kibana
  • Experience with incident management, including conducting incident reviews
  • Strong understanding of Linux, software development, systems, networking, and Cloud concepts
  • A positive and collaborative demeanor, combined with the ability to coach, mentor, and delegate
  • Excellent communication skills
  • Life-long learner – you stay up to date with technology trends, spend time learning new technologies, and share your learnings with your team

Benefits

  • Health and wellness coverage: Medical, dental, and vision insurance
  • Disability coverage: Short-term and long-term disability
  • Life protection: Life insurance and Accidental Death & Dismemberment (AD&D)
  • Flexible spending accounts for health care, and dependent care; limited purpose flexible spending account
  • Financial security: 401(k) Savings and Investment Plan with company matching
  • Time off benefits: Flexible vacation policy
  • Holidays: 8 paid holidays annually
  • Sick leave
  • Parental support: Paid parental leave
  • Employee Assistance Program (EAP) and Care Counselors
  • Voluntary benefits: Legal Assistance, Critical Illness, Accident, Hospital Indemnity and Pet Insurance options
  • Health Savings Account (HSA) with employer contribution

#Staff #Site #Reliability #Engineer

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *