BlueAllyBlueAlly
Case Study

Vendor & Infrastructure Diversity Reduces Risk and Improves Security

Data Center

The Problem 

Over two decades ago, a financial firm sought our expertise to enhance application and network availability, addressing widespread failures causing operational disruptions and reputational damage. From local outages to company-wide issues, diverse causes necessitated a network design for 100% availability, marking the evolution of diversity as a risk-mitigation tool in security. 

The BlueAlly Solution 

Phase 1 – INFRASTRUCTURE MODULARITY AND TELECOM DIVERSITY 

When the client relationship started, most corporate networks were designed as a single system, and the rapid growth of these networks created fragility.  Modularity was limited to campus, data center, and WAN.  Diversity was limited to redundancy, e.g., having two circuits connecting a site to one or more corporate data centers (DCs).  In addition, the major systemic outages also included failures seen with service provider frame-relay and ATM offerings. 

We modeled a network built from distinct modular blocks: 

  • Core 
  • Data Center (DC) 
  • Regional Centers 
  • WAN and Branch Networks 

This Lego-like approach permitted significant risk-based testing on each module, and we also found ways to limit the ability of events to cascade from one module to another.  The technical aspect was to introduce border gateway protocol (BGP) to the Enterprise as the glue connecting the modules.  Unlike interior gateway protocols that were easy to use but had scalability issues, BGP was almost infinitely scalable. After all, it ran the Internet. This permitted our ability to lock down dynamic changes from propagating across networks. 

Given the target of a fully fault-tolerant design that could operate at or near the client’s targets, we had to model a solution that provided diversity for each module.  This led to a design using two separate core networks. Other modules, such as the DC infrastructures, were also duplicated. 

To reduce risk, each core, DC, and regional center operated on a diverse, independent infrastructure.  There wasn’t any common fiber, path, or telecom equipment shared between them.  This addressed all the major outages that had been experienced to that date. 

The result was separate parallel networks.  Each vendor operating in the environment was required to work with its partner (think of making the fiber providers for each network work with each other to guarantee there was no common conduit, fiber, or telecom equipment being utilized). 

Individual branch sites did not require the same level of fault tolerance, but the design needed to solve critical scaling issues that had started when the firm grew to over 2,000 locations.  As a result, branch networks went from a single large WAN to a modularized approach.  This prevented issues within each module from cascading to others. 

In the end, a service level agreement (SLA) was required and provided for 100% availability – but it was written specifically as a guarantee against systemic or widespread outages.  The requirements back to the firm were to maintain the separation with regular audits and to operate the network as a system – preventing changes on one module while its partner was undergoing maintenance (and yes, there was a caveat for malicious insiders). 

Phase 2 – VENDOR DIVERSITY 

In the project’s initial phase, all the routers, switches, and security systems were from a single IP equipment vendor (Cisco Systems).  To deal with the single-vendor risk, upgraded software was rolled out with an N+1 phased approach: 

  • In a lab 
  • In a segment of one network 
  • Widely deployed on that network 
  • After a few weeks of stable operation, to the other network 

However, the N+1 approach still left devices susceptible to Zero Day and similar type vendor-specific attacks.  These are the attacks often utilized by criminal gangs and even nation-state actors. 

More recent implementations employ IP equipment vendor diversity to offset the single-vendor risk, using diverse route/switch, security, and optical vendors on these types of networks.  Think of having one DC module on Cisco and another using Arista devices.  In one case, for a network of similar scale, one core was deployed on Cisco and the other on Juniper routers. 

What we discovered in testing was the concept of staying homogenous within a module worked best.  Intermingling vendors within a module led to inter-vendor issues that resulted in lowest common denominator feature sets.  Distributing an application between DCs, irrespective of their vendor bases, led to the highest availability.  

The Results 

Over the years, conduits, fibers, links, and network devices have failed, but there have been no systemic outages since the original network’s implementation over 10 years ago.  The network as a system has survived floods, tornados, hurricanes, and even the loss of a major site during a hurricane. 

Meanwhile, the design has been extended to additional regions and gone through several generational upgrades (as examples – the branch network moved from frame-relay to MPLS to SD-WAN, and the original SONET-based optics were displaced by 10g and most recently with a 100g over a DWDM system). 

What has survived is the modularity and operational practices stemming from employing diversity wherever possible. 

The right partner who can help you navigate all the system and vendor diversity choices is key to your success. 

Contact BlueAlly

Connect with BlueAlly today to learn more.