Senior Site Reliability Engineer
Stord
This job is no longer accepting applications
See open jobs at Stord.See open jobs similar to "Senior Site Reliability Engineer" Susa Ventures.Stord is the leading commerce enablement provider of fulfillment services and technology that powers seamless checkout and delivery experiences for high-volume mid-market and enterprise brands across all channels. Stord manages over $5 billion of commerce annually through its fulfillment, warehousing, transportation, and operator-built software suite including OMS, Pre- and Post-Purchase, and WMS platforms. With Stord, brands can sell more, save money, and reduce headaches.
With Stord, brands can increase cart conversion, improve unit economics, and drive customer loyalty. Stord’s end-to-end commerce solutions combine best-in-class omnichannel fulfillment and shipping with leading technology to ensure fast shipping, reliable delivery promises, easy access to more channels, and improved margins on every order.
Hundreds of leading DTC and B2B companies like AG1, Native, Tula, American Giant, and more trust Stord to make their supply chains a competitive advantage. Stord is headquartered in Atlanta with facilities across the United States, Canada, and Europe. Stord is backed by top-tier investors including Kleiner Perkins, Franklin Templeton, Founders Fund, and Salesforce Ventures.
At Stord, we believe in fostering a culture of innovation, collaboration, and continuous learning. We value transparency, ownership, and a growth mindset. Our team is passionate about solving complex challenges and building impactful solutions. We are committed to creating an inclusive environment where everyone can thrive and contribute their best work. We encourage open communication, feedback, and a strong sense of community.
You will collaborate with all of our dedicated product development teams, including our Warehouse Management System (WMS) team, Parcel Billing team, and Order Management System (OMS) team. The OMS team builds and maintains the system that powers our innovative, world-class logistics network, enabling leading omni-channel brands to achieve significant optimizations and cost savings. Through advanced automation, intelligent routing, and robust order management tools, it ensures that orders are fulfilled in alignment with each client’s unique goals and objectives. The Parcel Billing team maintains our internal billing platform, responsible for billing customers for fulfillment services and parcel shipping costs. This platform manages the rating, billing, and auditing of millions of packages each week.The WMS team is responsible for maintaining our warehouse management system, which supports our fulfillment centers and manages the fulfillment of tens of millions of packages annually. This system is designed to enhance labor productivity, reduce costs, and maintain inventory accuracy.
What You'll Do:
Reliability Engineering: Design, implement, and manage scalable, reliable, and highly available systems and infrastructure.
Automation: Develop and maintain automation tools and scripts for deployment, monitoring, and incident response.
Monitoring & Alerting: Implement comprehensive monitoring and alerting systems to proactively identify and resolve issues.
Incident Response: Lead incident response efforts, including troubleshooting, root cause analysis, and post-incident reviews.
Performance Optimization: Identify and address performance bottlenecks, optimizing systems for efficiency and scalability.
Collaboration: Work closely with development teams to ensure systems are designed for reliability and operability.
Documentation: Create and maintain documentation for systems, processes, and procedures.
Security: Implement security best practices to protect systems and data.
What You'll Need:
Proven experience as a Senior Site Reliability Engineer or in a similar role.
Expertise in designing, building, and maintaining large-scale distributed systems.
Strong proficiency in scripting languages (e.g., Python, Bash).
Extensive experience with cloud platforms (e.g., AWS, GCP, Azure) and containerization technologies (e.g., Docker, Kubernetes).
Solid understanding of networking, operating systems, and system architecture.
Proficient in using monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Demonstrated ability to lead incident response and perform root cause analysis.
Excellent problem-solving skills and the ability to navigate ambiguity in a fast-paced environment.
Strong collaboration and communication skills, with a track record of working effectively in cross-functional teams.
Bonus Points:
Previous startup experience.
Previous logistics or supply chain experience.
Certifications in cloud platforms (e.g., AWS Certified DevOps Engineer, GCP Professional Cloud DevOps Engineer).
This job is no longer accepting applications
See open jobs at Stord.See open jobs similar to "Senior Site Reliability Engineer" Susa Ventures.