# SRE Lead & Monitoring Consultant
## Key Responsibilities
SRE Practice Development
Assess operational maturity and build SRE transformation roadmap
Establish SLOs, SLIs, and error budgets for critical services
Design incident management processes and on-call strategies
Implement chaos engineering and resilience testing
Mentor teams on SRE principles and best practices
Monitoring & Observability
Deploy and configure Datadog, Splunk, Grafana, and Prometheus
Implement metrics collection, log aggregation, and APM
Build custom dashboards and alerting configurations
Set up anomaly detection and intelligent alerting
Configure automated health checks and remediation
Establish golden signals monitoring (latency, traffic, errors, saturation)
Reliability & Compliance
Conduct reliability reviews and performance optimization
Design disaster recovery and failover procedures
Implement security monitoring and audit logging
Configure fraud detection and transaction monitoring
Create runbooks and operational documentation
## Required Qualifications
Experience
7+ years in Site Reliability Engineering, DevOps, or infrastructure engineering
3+ years in SRE leadership roles
3+ years hands-on experience with Datadog, Splunk, Grafana, and Prometheus
Previous experience in fintech or regulated industries
Proven track record building SRE practices from scratch
Technical Skills
Deep understanding of SRE principles, error budgets, and SLO/SLI frameworks
Expertise with cloud platforms (AWS, Azure, or Google Cloud Platform)
Proficiency with Kubernetes, Docker, and infrastructure as code (Terraform, Ansible)
Strong programming/scripting skills (Python, Go, Bash)
Experience with incident management and post-mortem culture
Knowledge of compliance requirements (SOC 2, PCI-DSS, ISO 27001)
Soft Skills
Exceptional leadership and mentoring abilities
Strong communication and stakeholder management
Data-driven decision-making approach
Collaborative mindset with ability to drive cultural change
## Preferred Qualifications
Cloud certifications (AWS, Google Cloud Platform, Azure) or Kubernetes certifications (CKA/CKAD)
Experience with ELK stack
Background in cloud cost optimization
Multi-cloud or hybrid cloud experience
## Deliverables
SRE maturity assessment and transformation roadmap
Fully configured monitoring stack with Datadog, Splunk, Grafana, and Prometheus
SLO/SLI definitions and error budgets
Custom dashboards, alerting, and automated remediation
Incident management framework and runbooks
Chaos engineering test suite
...Remote Insurance Producer (U.S. Only Not Hiring in MA) Company: American Income Life Location: 100% Remote (U.S. Only) Employment Type: Full-Time (Flexible Schedule) Compensation: Commission + Weekly Bonuses + Monthly Residuals ~ Requirements / Responsibilities...
...high. Carrying Large cans, weighing 3 pounds, 7 ounces, are carried from the workstation to storage shelves. Occasionally, pizza sauce weighing 30 pounds is carried from the storage room to the front of the store. Trays of pizza dough are carried three at a...
...High School Assistant Football Coach Applicants must have a current coaching license to be considered. This position involves working with the coaching staff to prepare and implement practice and game plans, supervise athletes, communicate with parents...
Overview:The Ambulatory Care Nurse (ACN) is a Registered Nurse responsible for providing nursing care management services to the assigned... ...:Registered Nurse RequiredEducation:Diploma Nursing Or Associate's Degree Nursing RequiredExperience:No Experience Required Required
...Job description: Wellness Visit Nurse Practitioner. Nurse Practitioners (NP) needed to conduct In-Home Health Assessments in the Inland... ...availability 7 days a week from 8am - 8pm Must be able to travel within a Region (Mileage paid) The following will be provided...