sudo README

Posts


No results for 'undefined'

Notes


No results for 'undefined'

Mean Time to Recovery (MTTR)

  • Blue/green deploy, roll forward
  • Timeouts/deadlines for all remote calls
  • Capped exponential backoff retries with jitter at single point in stack (only for idempotent calls based on response code, monitor retries)
  • Well exercised fallbacks
  • What happens when downstream services down? Gracefully degrade, timeouts (Postgres), circuit breakers (for all sync downstream calls via Resilience4j, which also does bulkheads and load shedding)
  • For migrations, etc., two phase deploy with bake time, readers go before writers while rolling forward whereas writers go before readers while rolling backward

Rocky Warren's blog. Principal Architect, Tech Lead, Product Manager. I do other stuff too.

ResumeNotesPhotos