Network Design Principle Resiliency – ZNDP 064

Network Design Principle Resiliency
Network Design Principle Resiliency – ZNDP 064

In this show, we are covering a Network Design Principle Resiliency. What is it and how to solve it! How do we know if our networks are resilient? Let’s get right into it!

Network Design Principles are critical to helping Designers create valid networks.  Not all Network Design Principles will be leveraged or used in each situation but as a Designer, you should know them, and make the decisions if you need to address it or not for your customers.

Resiliency is one of the Network Design Principles that have become an “unstated requirement”.  I like to compare this to power and plumbing in a house. In most cases today, power and plumbing are unstated requirements.  This is the same as today’s modern networks.  

What is The Network Design Principle Resiliency? 

“I like to define this as the ability of the network to ‘automatically failover’ when an outage occurs.”

Zig Zsiga

You do not have to do anything for this to happen, you already did it in the design and implementation steps of the network.

How far the network can automatically failover is determined based on the requirements of the business.

Let’s use an example to frame this up. Picture yourself in bed asleep, dreaming of those zeros and ones.  While you are imagining all of those binary numbers, a router in your production network has a critical issue and the router fails. You didn’t design a resilient network. As a result, you would be called/paged immediately to come in to resolve the outage. Assuming here that this was a business-impacting outage.

Now let’s say you had designed a resilient network. This same router has a critical issue and fails. This router is offline. As a result of designing a resilient network, your traffic is immediately re-routed through another router. Your users/customers lose connectivity for near seconds, if at all. You are not called or paged to immediately resolve the issue.  This is resiliency in action.

There are many forms of resiliency. The more resilient a network is designed, the more it costs to create and implement. In most cases the complexity of the network will also increase, requiring more skilled resources to operate and manage it.

Here are two common resiliency requirements that I see in production situations.

  1. Remove all single points of failures
  2. Remove all dual points of failures.

Network Design Principle Resiliency – Removing Failures

Removing single points of failure directly increases resiliency but it also increases cost. As a result, you no longer have one link to a resource or one device performing that critical role.  You now must have two of everything, be it physical or virtual. In addition, our overall complexity increases.

Removing dual points of failure, while not as common as removing single points of failure, it’s common enough to mention it these days. Now we must have three options for every failure situation. These options may not be active/active/active, probably more like active/active/standby, or even active/standby/standby. Our overall resiliency is increased even further but once again at an even greater cost.  Complexity is also increased substantially.  When designing to remove dual points of failure, you need to properly identify all of the failure situations and what the recovered state should be.  This is a process for sure. It can be hard to picture each failure situation in your head, so I recommend documenting it down on paper/whiteboard.

Business Perspective for the Network Design Principle Resiliency:

From a business perspective, we would see resiliency as a requirement in the form of a business line of effort, with a specific criticality level. For example, what happens if that business moneymaker goes offline for 60 seconds? How about 2 hours? What’s the risk vs cost to implement “No Single Points of failure”? These are discussions you as a Designer should have with the business. Remember, bring every decision back to the business! As a result of these conversations, you will be able to properly design for a resilient network.

Technical Perspective for the Network Design Principle Resiliency:

How do we ensure the network is resilient from a technology perspective? We as Designers can leverage a number of solutions/technologies. Technologies such as BFD, LFA, ECMP, Unequal cost load balancing, and Traffic Engineering are all examples of protocols/technologies that can help speed up “convergence” and increase the overall resiliency of a network.

Modern Network Design Perspective for the Network Design Principle Resiliency:

As a result of Software-Defined Technologies and Solutions, we now can leverage application-aware routing and Dynamic path determination to help create a resilient network.

Keep in mind, it is on you as a designer to determine if resiliency is something you need to design for. In most cases, customers will no longer state they need a resilient network, they assume this is a requirement. As a result, you need to determine what level of resiliency is needed.  You will also need to weigh in the cost and complexity increases a resilient network would be.  

Sometimes you have to push back on the business, letting them know they would either need to increase their budget or accept the risk of a less resilient network design.

Want More Network Engineer Content?

  • Check out Zigbits Network Design Podcast – Business Priority episodes here:

Content Update – New Show Called Daily Zigbit!

For those needed some motivation in your daily life, checkout our new weekday show called Daily Zigbit.

I started a New Video Series last week called Daily Zigbits! Yep, I did it, I did the thing! 🤓

My Daily Zigbit Series is all about helping you in every aspect I can. It’s not just focused on IT, Networking, Design, or Career. It is truly focused on every aspect of our lives. Each video in this series is a short (approximately 5 minutes) installment that you can watch as you drink your coffee in the morning! #CoffeeWithZig

In each Daily Zigbit video:

  1. I cover a critical leadership topic to help you in your daily life!
  2. I provide a real-world (in the wild) situation in my life to provide you direct context around the highlighted leadership lesson. How have I used it and why!
  3. I come up with a quick action for you to take right now to leverage this leadership lesson!

The publishing schedule for Daily Zigbit is every weekday between 5 am – 7 am EST (the weekends are dedicated to my family 😉).

I’m branding these videos as #AttackTheDay, #AttackYourGoals, and #MakeProgress!

You can view the first few Daily Zigbit Videos here!

I am publishing a New Zigbits Network Design Podcast episode and a new YouTube video weekly! That’s two new pieces of content for you to consume every week! Feel free to reach out with any content ideas and we will make sure to add them!

Resources

  • We are creating a Network Design Course: Let’s Make you The Best Network Designer! If you want to find out more info join the waitlist here.
  • Do you watch YouTube Videos? We have our own Zigbits YouTube Channel here! Make sure you subscribe and click the bell to stay notified when we create new content. You don’t want to miss anything!

Provide Feedback


Transparency:

This post may contain affiliate links to products or services where I may receive a level of compensation from your actions by following those links. This is seamless to you and does not add any additional cost to the products or services in question. In addition, I do not let any affiliate relationship cloud my judgment or my recommendation of a product or service. My recommendations will always be above reproach.  This is my commitment to you Ziglets!