Our client is an expert provider that employs top-tier technology and industry best practices to develop, execute, and manage ICT-driven business solutions. The solutions they provide are designed to align with business strategies, meet intended purposes, save costs, and maximise performance.
Role Responsibilities
- Delivering and maintaining production-grade systems that follow Site Reliability Engineering principles.
- Create, develop, and maintain cloud migration patterns for maximum reuse.
- Work in an Agile team, closely guiding and assisting engineering teams in delivering business functionality.
- Maintain a calm temperament in the face of potential incidents and exhibit good interpersonal skills when communicating with clients and team members.
- Drive highly available and resilient architecture decisions and implement them as part of a team effort.
- Mentor and advise junior SREs as they grow in the field.
- Bachelor's Degree (or equivalent) in IT, Computer Science, or Engineering
- AWS Solution Architect Associate Certification preferred
- Azure Fundamentals certification optional
- Google Cloud Associate Engineer certification optional
- 7+ years of IT industry experience in Implementation and Consulting roles
- 5+ years of experience in Enterprise IT and Infrastructure across multiple technologies
- 3+ years of experience in AWS cloud implementations
- 2+ years of experience in scripting or shell scripting
- Experience with Configuration management tools and Infrastructure-as-Code tooling and practices
- Experience with Systems monitoring, alerting, and analytics using tools such as New Relic, Graphite, ELK, EFK, Nagios, Ganglia, Grafana, and Prometheus
- Experience in implementing SLIs and maintaining SLOs/SLAs, incident management, on-call responsibilities, etc.
- Experience in production readiness reviews of microservice workloads and toil automation using programming languages
- Knowledge of intersystem integration mechanisms such as REST APIs, SSH file delivery, and site-to-site VPNs
- Solid understanding of distributed systems, service architectures, cloud native systems, and related trade-offs to contribute to feature and service design
- Familiarity with the implementation of monitoring and observability solutions such as logging, metrics, etc.
- Experience with Public cloud (AWS/Azure/GCP) preferable
- Experience with local Infrastructure experience (virtualization,
- NIX systems)
- Knowledge of important networking concepts such as HTTP and REST, SSL/TLS, SSH, etc.
- Container experience and CNCF tools experience such as Docker and Kubernetes
- Proven experience with production systems and dealing with production issues
- Ability to think out-of-the-box to solve infrastructure and operational problems
Desired Skills
- Site Reliability Engineer
- DevOps
- SDLC