Modern applications rarely run as a single program anymore. Today’s systems are made of multiple services communicating with each other — APIs, payment gateways, authentication servers, notification services, and third-party integrations.
This architecture (microservices or distributed systems) provides flexibility and scalability, but it introduces a new problem:
Failure is unavoidable.
Servers go down. Networks slow. APIs timeout. External services crash.
If not handled properly, one small failure can bring down the entire application. This is called a cascading failure.
To solve this, software architecture uses resilience patterns — especially the Retry Pattern and the Circuit Breaker Pattern.
1. The Problem: Why Systems Crash
Imagine an e-commerce application:
- User places an order
- Order service calls payment service
- Payment service calls bank API
Now suppose the bank API becomes slow.
The payment service waits.
The order service waits.
User requests keep piling up.
Soon:
• Threads get blocked
• CPU usage spikes
• Database connections exhaust
• Entire website crashes
The issue is not just the bank API failure — the real issue is how your system reacts to failure.
2. Retry Pattern
The Retry Pattern is the simplest resilience technique.
Instead of failing immediately when a request fails, the system tries again.
Basic Idea
If a service temporarily fails, retry after a short delay.
Example:
- Network glitch
- Temporary server overload
- Short downtime
Many failures are temporary, not permanent.
Simple Retry Flow
Request → Failure → Retry → Success
But retrying instantly is dangerous.
If 10,000 users retry at the same time, you create a retry storm, making the service even worse.
Exponential Backoff (Important)
Instead of retrying immediately:
1st retry → after 1 second
2nd retry → after 2 seconds
3rd retry → after 4 seconds
4th retry → after 8 seconds
This is called Exponential Backoff.
It reduces load and gives the failing service time to recover.
When to Retry
Retry only for:
• Timeouts
• Temporary network errors
• HTTP 5xx errors
Do NOT retry:
• Authentication errors (401)
• Bad request (400)
• Not found (404)
Because those are permanent failures.
3. Circuit Breaker Pattern
Retry alone is not enough.
If a service is completely down, retrying repeatedly wastes resources and blocks your system.
This is where the Circuit Breaker Pattern comes in.
Think of it like an electrical circuit breaker in your house. When there is overload, the breaker cuts power to protect appliances.
Similarly, software circuit breakers stop requests to a failing service.
Three States of Circuit Breaker
1. Closed State (Normal)
Requests flow normally.
Failures are monitored.
2. Open State (Failure Mode)
If failures exceed a threshold:
The circuit opens → requests are blocked immediately.
Instead of waiting for timeout, system returns a fallback response.
Example:
“Payment service unavailable. Try again later.”
This prevents system overload.
3. Half-Open State (Recovery Check)
After a cooldown period:
The system allows a few test requests.
If successful → circuit closes.
If failed → circuit opens again.
4. Circuit Breaker + Retry Together
Best practice is to use both patterns together.
Flow:
- Request sent
- Retry with backoff
- If failures continue → Circuit breaker opens
- Fallback response returned
- System recovers safely
Retry handles temporary failures.
Circuit breaker handles continuous failures.
5. Fallback Mechanisms
When the circuit is open, the application should still respond.
Examples:
• Show cached data
• Display last known price
• Queue the request
• Show maintenance message
Example in food delivery app:
Instead of crashing, it shows:
“Live tracking unavailable. Order is confirmed.”
User experience remains intact.
6. Real-World Implementations
Popular tools:
Java
- Resilience4j
- Hystrix (legacy)
Node.js
- Opossum
- Cockatiel
Cloud Platforms
- AWS API Gateway retries
- Azure resilience policies
7. Benefits
Implementing these patterns provides:
• High availability
• Faster response time
• No cascading failures
• Better user experience
• System stability under heavy traffic
Without resilience patterns, microservices architecture becomes fragile.
Conclusion
Failures cannot be avoided in distributed systems, but crashes can be avoided.
The Retry Pattern helps recover from temporary problems, while the Circuit Breaker Pattern protects the system from repeated failures.
Together, they create fault-tolerant, resilient, and production-ready applications.
Modern backend development is no longer just about writing code — it is about designing systems that survive failure.


