Site Reliability Engineer
Posted on: October 10, 2019
Purpose of Job The primary purpose of site reliability engineering
at USAA is to improve and sustain the reliability of USAA's most
critical IT systems. The role is essential in helping establish and
measure service level objectives for critical systems. In addition,
SREs will continuously identify engineering and automation
opportunities to effectively manage production systems at scale. An
SRE will model a blameless culture through effective post mortems
and a focus on minimizing impact felt from outages. Site
reliability engineers at USAA will have the job title of a Software
Developers and Integrators (SDIs) who are also engaged in all
phases of the software development lifecycle which include;
gathering and analyzing user/business system requirements,
responding to outages and creating application system models. SDIs
primary functions are to design, develop, document, test and debug
new and existing software systems and/or applications for internal
use, perform defect corrections (analysis, design, code). In
addition, SDIs participate in design meetings and consult with
business clients to refine, test, and debug programs to meet
business needs, and interact and sometimes direct third-party
partners in the achievement of business and technology initiatives.
This role is a solid, career-level role where functional and
technical proficiency has been obtained, and incumbents display a
depth of technical understanding within their respective areas of
specialization allowing them to operate independently. Incumbents
also display a proficiency that allows them to begin to mentor
others (third party and internal resources) on procedural matters.
Job Requirements * Work with application and system SME's to design
highly scalable and resilient distributed systems
* Create Service Level Objectives to measure and manage core
infrastructure and critical services
* Analyze, troubleshoot and fix core infrastructure or critical
systems when they fail or degrade
* Write custom code or scripts to automate repetitive or manual
system support tasks
* Lead technical post mortems to identify lessons learned and
* Partner with technical teams and product owners to ensure
resiliency work is developer-ready
* Design and execute failure injection tests to verify adequate
system capacity and resiliency
* Champion Site Reliability Engineering practices across IT
* Independently installs, customizes and integrates commercial
* Facilitates root cause analysis of system issues.
* Works with experienced team members to conduct root cause
analysis of issues, review new and existing code and/or perform
* Learns to create system documentation/play books and attends
requirements, design and code reviews.
* Receives work packages from manager and/or delegates.
* Identifies ideas to improve system performance and impact
* Resolves complex technical design issues.
* Creates system documentation/play book(s) and participates as a
reviewer and contributor in requirements, design and code
* May serve as the subject matter expert on development
* Partners with experienced team members to develop accurate work
estimates on work packages.
* May serve as a mentor on procedural matters to less experienced
internal and third party team members.
* May assist experienced team members with the delegation of work
packages. Minimum Experience: * Bachelor's degree or 4 additional
years of related experience beyond the minimum required may be
substituted in lieu of a degree.
* 4+ years of software development experience demonstrating depth
of technical understanding within a specific I/T
discipline(s)/technology(s) to include relevant business support
and/or general information technology support experience
* Working knowledge of systems administration and/or systems
* Strong interest in monitoring, optimizing, scaling and
troubleshooting large distributed systems
* Qualifications may warrant placement in a different job level*
When you apply for this position, you will be required to answer
some initial questions. This will take approximately 5 minutes.
Once you begin the questions you will not be able to finish them at
a later time and you will not able to change your responses.
Preferred Experience: * 4+ years of experience managing large scale
production environments (1000+ servers) and experience with
production support of applications in large scale environments
* Demonstrated experience influencing and selling new ideas to
peers, leadership, and senior management
* Experience in one or more of the following: C, C++, Java or
* Strong experience or working knowledge of end-to-end IT systems
(compute, storage, network, security, application runtime,
relational databases, REST services, asynchronous messaging,
* Strong troubleshooting skills and experience developing
* Demonstrated experience building SLO-based monitoring
* Strong teaming and collaboration skills The above description
reflects the details considered necessary to describe the principal
functions of the job and should not be construed as a detailed
description of all the work requirements that may be performed in
the job. At USAA our employees enjoy one of the best benefits
package in the business, including a flexible business casual or
casual dress environment, comprehensive medical, dental and vision
plans, along with wellness and wealth building programs.
Additionally, our career path planning and continuing education
will assist you with your professional goals. Relocation assistance
is not available for this position.
Keywords: USAA, Avondale , Site Reliability Engineer, Professions , Avondale, Arizona
Didn't find what you're looking for? Search again!