The Wormhole FoundationOur mission is to empower passionate people in the research and development of blockchain interoperability technologies. We support teams building secure, open-source, and decentralized products within the Wormhole ecosystem.The Role: Crypto Production EngineerWormhole Foundation is seeking an experienced Crypto Production Engineer to improve the reliability, security, and operational excellence of Wormhole’s production infrastructure. This role focuses on uptime, observability, deployment workflows, and incident response across critical blockchain and networking services. The Crypto Production Engineer will work closely with engineering, DevOps, and validator partners to ensure Wormhole services operate at a minimum 99.99% uptime, excluding scheduled maintenance windows.What you'll be doing:Act as first responder and incident commander during production incidentsLead incident triage, root cause analysis, and retrospective documentationBuild detailed incident timelines and preventative runbooksRespond to incidents related to: performance issues, CCQ failures or degraded throughput, observability pipeline outages, and core Wormhole productsDeliver remediation recommendations and implement approved fixesImprove reliability and uptime across all Wormhole servicesStrengthen observability, monitoring, and alerting systemsHarden infrastructure for security and operational resiliencyEnhance deployment workflows and reduce operational frictionLead incident response, analysis, and continuous improvementSupport operational tooling used by engineering, DevOps, and validator partnersWho you are:Relevant tertiary qualifications in computer science or a closely related field (bachelors/masters) and/or relevant work experience over at least five yearsEstablished experience as incident commander across multiple stakeholders in global teamFamiliarity with metrics and log analysis tools (e.g., Grafana), incident response tools (e.g., PagerDuty), GitHub administration and related toolsDeep understanding of reliability engineering, observability, and incident response for distributed systemsAbility to write and debug code in any of the following: Go, Rust, JavaStrong experience operating in Grafana or Datadog or Splunk and/or Kubernetes in production environmentsExperience securing distributed systems and public-facing infrastructureAbility to operate independently, document clearly, and lead during incidentsSolid understanding of cloud computing environments (AWS and GCP preferred) and willingness to keep up to date with their changing offerings.Excellent and proactive written and verbal communicationIdeal candidate will be based in ET or GMT time zone or the ability to work those hoursIf you don’t meet all of these criteria, we’d still love to hear from you anyway if you think you’d be a great fit for this role!Originally posted on Himalayas