The big downtime of the Internet
In Oct 4 2023, one of the biggest DNS resolver has failures. This issue leads to many services downtime :
- Client can't connect backends
- Backends can't connect other backends
- Services can't connect other services
Why DNS is so important ? Because it is the internet dictionary book which contains the mapping between Domains and Real IP addresses. It's a barebone of the internet.
Ok, back to the story. Many default ISPs, backend and server OS are using the pair DNS resolver of same vendor, common is :
- 188.8.131.52 and 184.108.40.206 (this is fallback)
- 220.127.116.11 and 18.104.22.168 (this is fallback)
- 22.214.171.124 and 126.96.36.199 (this is fallback)
- Many ISP pair of their DNS resolvers
The fallback is only good if it doesn't depend on "the main", like "the main".
Every service provider always tell you use their another services as fallback, to vendor-lockin your infrastructure, so they promise it will work well as "a fallback".
Every promise they said:
- They can run it another server, another region, another country.
- They can automatically detect and run the "fallback" right-after "the main" down.
But, don't forget these basic things :
- The biggest bug in software services named "No money pay the bill". Every-single services down if you're bankrupt, or lately pay the bill.
- They can use same code for the fallback, and the code brokes.
- They can use same infrastructure provider (like backbone physical providers), even your vendor can-be another victim of bigger vendor-lockin.
- They forget pay their bill, or bankrupt.
- Use another services as your fallback (or backup). Ex: using 188.8.131.52 as main and 184.108.40.206 as fallback; using AWS S3 as main storage and another S3-like for backup; replicate your database to another-vendor server.
- If you need safe, try to pay the bill for 2+ vendors
- Ready for the down, and bring it up soon (monitoring, fallback strategies)
- Pay the bill in-time ;)
If your budget is low, or your business is still small ? - Keep YOLO and only backup the database !
The Art is : Balance between Safety - Effort - Benefit !