Chapter 7. High Availability
Information availability is a daily part of modern society. People make phone calls, read the news, stream songs, check sports scores, and watch television all over the Internet or on their local provider’s network. At any given time, at any given location, almost any bit of information can be made available over the Internet. Today, and in the near future, it’s expected that there should be no interruptions to the access of this flow of information. Failure to provide all of the world’s information at any user’s fingertips at any time, day or night, will create great wrath on whomever’s network is in the way. Welcome to the twenty-first century.
The average user of Internet services is unable to comprehend why the information she desires is not available. All that user knows is that it isn’t, and that is no longer acceptable. Consumers clamor for compensation and complain to all available outlets. Business users call the help desk and demand explanations while escalating their lost connection to all levels. Revenue is lost and the world looks bleak. Information must always be highly available, not just available.
The most likely location of a failure somewhere in the network is typically between the client device and the server. This chapter is dedicated to training network administrators on how to ensure that their SRX is not the device that brings down the network. Firewalls are placed in the most critical locations in the network, and when problems occur, trust us, users notice.
A router handles each packet as its own entity. It does not process traffic as though the packets had a relationship to each other. The packet could be attempting to start a new connection, end a connection, or have all sorts of strange data inside. A router simply looks at the Layer 3 header and passes the packet on. Because of this, packets can come in any order and leave the router in any way. If the router only sees one packet out of an entire connection and never sees another, it doesn’t matter. If one router failed, another router can easily pick up all of the traffic utilizing dynamic routing. Designing for high availability (HA) with stateful firewalls is different because of their stateful nature.
Stateful firewalls need to see the creation and teardown of the communication between two devices. All of the packets in the middle of this communication need to be seen as well. If some of the packets are missed, the firewall will start dropping them as it misses changes in the state of communications. Once stateful firewalls came into the picture, the nature of HA changed. The state of traffic must be preserved between redundant firewalls. If one firewall fails, the one attempting to take over for it must have knowledge of all of the traffic that is passing through it. All established connections will be dropped if the new firewall does not have knowledge of these sessions. This creates the challenge of ensuring that state synchronization can occur between the two devices. If not, the whole reason for having redundancy in the firewalls is lost.
Understanding High Availability in the SRX
The design of the SRX is extremely robust regardless of the model or platform. It has a complete OS and many underlying processors and subsystems. Depending on the platform, it could have dozens of processors. Because of this, the SRX implements HA in a radically different way than most firewalls. Common features such as configuration and session synchronization are still in the product, but how the two chassis interact is different.
Chassis Cluster
An SRX HA cluster implements a concept called chassis cluster. A chassis cluster takes the two SRX devices and represents them as a single device. The interfaces are numbered in such a way that they are counted starting at the first chassis and then ending on the second chassis. Figure 7-1 shows a chassis cluster. On the left chassis, the FPC starts counting as normal; on the second chassis, the FPCs are counted as though they were part of the first chassis.
In Chapter 2, we discussed the concept of the route engine. In an SRX cluster, each SRX has one active RE. When the cluster is created, the two REs work together to provide redundancy. This is similar to the Juniper M Series, T Series, and MX Series routing platforms that support dual REs. The Junos OS is currently limited to supporting two REs per device. Because of this, the SRX cluster can only have one RE per chassis. When the chassis are combined and act as a single chassis, the devices reach the two-RE limit.
Note
Multiple REs in a single SRX are only supported to provide dual control links. They do not provide any other services.
The chassis cluster concept, although new to the SRX, is not new to Juniper Networks. The SRX utilizes the code infrastructure from the TX Matrix products. The TX Matrix is a multichassis router that is considered one of the largest routers in the world. Only the largest service providers and cloud networks utilize the product. Because of its robust design and reliable infrastructure, it’s great to think that the code from such a product sits inside every SRX. When the SRX was designed, the engineers at Juniper looked at the current available options and saw that the TX Matrix provided the infrastructure they needed. This is a great example of how using the Junos OS across multiple platforms benefits all products.
Note
To run a device in clustering mode, there are a set of specific requirements. For the SRX1400, SRX3000, and SRX5000 lines, the devices must have an identical number of SPCs and the SPCs must be in identical locations. The SRXs, however, can have any number of interface cards and they do not have to be in the same slots. Best practice suggests, though, that you deploy interfaces in the same FPCs or PIC slots, as this will make it easier to troubleshoot in the long run.
For most network administrators, this concept of a single logical chassis is very different from traditional HA firewall deployment. To provide some comparison, in ScreenOS, for example, the two devices were treated independently of each other. The configuration, as well as network traffic state, was synchronized between the devices, but each device had its own set of interfaces.
Note
On the branch SRX Series products, Ethernet switching as of Junos 11.1 is supported when the devices are in chassis cluster mode. You will also need to allocate an additional interface to provide switching redundancy. This is covered later in the chapter.
The Control Plane
As discussed throughout this book, the SRX has a separated control plane and data plane. Depending on the SRX platform architecture, the separation varies from being separate processes running on separate cores to completely physically differentiated subsystems. For the purposes of this discussion, however, it’s enough to know that the control and data planes are separated.
The control plane is used in HA to synchronize the kernel state between the two REs. It also provides a path between the two devices to send hello messages between them. On the RE, a process or daemon runs, called jsrpd. This stands for Junos stateful redundancy protocol daemon. This daemon is responsible for sending the messages and doing failovers between the two devices. Another kernel, ksyncd, is used for synchronizing the kernel state between the two devices. All of this occurs over the control plane link.
The control plane is always in an active/backup state. This means only one RE can be the master over the cluster’s configuration and state. This ensures that there is only one ultimate truth over the state of the cluster. If the primary RE fails, the secondary takes over for it. Creating an active/active control plane makes synchronization more difficult because many checks would need to be put in place to validate which RE is right.
Note
The two devices’ control planes talk to each other over a control link. This link is reserved for control plane communication. It is critical that the link maintain its integrity to allow for communication between the two devices.
The Data Plane
The data plane’s responsibility in the SRX is to pass data and processes based on the administrator’s configuration. All session and service states are maintained on the data plane. The REs, control plane, or both are not responsible for maintaining state (the RE simply requests data and statistics from the data plane and returns them to the administrator).
The data plane has a few responsibilities when it comes to HA implementation. First and foremost is state synchronization. The state of sessions and services is shared between the two devices. Sessions are the state of the current set of traffic that is going through the SRX, and services are other items such as the VPN, IPS, and ALGs.
On the branch SRX Series, synchronization happens between the flowd daemon running on the data plane. The SRX Series for the branch, as discussed in Chapter 1, runs a single multicore processor with a single multithreaded flowd process. The data center SRX distributed architecture state synchronization is handled in a similar fashion. Figure 7-2 shows a detailed example.
In Figure 7-2, two SRX data center platforms are shown. Node 0 is shown on the left and node 1 is on the right. Each device is depicted with two SPCs. SPC 0 is the SPC that contains the CP SPU and a second flow SPU. In SPC 1, both SPUs are flow SPUs. Both SRX data center platforms are required to have the same number and location of SPCs and NPCs. This is required because the SPUs talk to their peer SPU in the same FPC and PIC location. As seen in the back of Figure 7-2, the flow SPU in FPC 0 on node 0 sends a message to node 1 on FPC 0 in PIC 1. This is the session synchronization message. Once the SPU on node 1 validates and creates the session, it sends a message to its local CP. As stated in Chapter 1, the CP processors are responsible for maintaining the state for all of the exiting sessions on the SRX. The secondary device now has all of the necessary information to handle the traffic in the event of a failover.
Information is synchronized in what is known as a real-time object (RTO). This RTO contains the necessary information to synchronize the data to the other node. The remote side does not send an acknowledgment of the RTO because doing so would slow down the session creation process, and frankly, an acknowledgment is rarely needed. There are many different RTO message types. New ones can be added based on the creation of new features on the SRX. The most commonly used message types are the ones for session creation and session closure.
334 The second task the SRX needs to handle is forwarding traffic between the two devices. This is also known as data path or Z path forwarding. Figure 7-3 illustrates this. Under most configuration deployments, Z path forwarding is not necessary. However, in specific designs, this operation might be very common. (The details are further explored in the section Deployment Concepts later in this chapter.) In the event that traffic is received by a node, the node will always forward the traffic to a node on which the traffic will egress.
The last task for the data link is to send jsrpd messages between the two devices. The jsrpd daemon passes messages over the data plane to validate that it is operating correctly. These are similar to the messages that are sent over the control link, except that they go through the data plane. By sending these additional messages over the data plane, the RE ensures that the data plane is up and capable of passing traffic. On the branch SRX Series devices, the message exits the control plane, passes through flowd and over the data link, and then to the second device. The second device receives the packet, flowd, and passes the packet to the control plane and on to jsrpd. Depending on the platform, the rate for the messages will vary.
All of these data plane messages pass over the data link. The data link is also known as the fabric link, depending on the context of the discussion. The size of the link varies based on the requirements. These requirements consist of the amount of data forwarding between devices and the number of new connections per second:
On the SRX100, SRX110, SRX210, and SRX220, a 100 MB Ethernet link is acceptable for the data link.
For the SRX550, SRX650, and SRX240, it’s suggested that you use a 1 GB link.
On the data center SRXs, a 1 GB link is acceptable unless data forwarding is going to occur.
Even on an SRX5000 Series with a maximum of 380,000 new CPS, a 1 GB link can sustain the RTOs throughput.
If data forwarding is in the design, a 10 GB link is suggested.
Getting Started with High Availability
This chapter started with the concept of the chassis cluster because it’s the fundamental concept for the entire chapter. There are several important aspects to the chassis cluster; some concern how the cluster is configured, and others are simply key to the fault tolerance the chassis cluster provides. In this section, we explore the deeper concepts of the chassis cluster.
Cluster ID
Each cluster must share a unique identifier among all of its members. This identifier is used in a few different ways, but most important it is used when two devices are communicating with each other. Fifteen cluster IDs are available for use when creating a cluster. The cluster ID is also used when determining MAC addresses for the redundant Ethernet interfaces.
Node ID
The node ID is the unique identifier for a device within a cluster. There are two node IDs: 0 and 1. The node with an ID of 0 is considered the base node. The node ID does not give the device any sort of priority over its mastership, only in interface ordering. Node 0 is the first node for the interface numbering in the chassis cluster. The second node, node 1, is the second and last node in the cluster.
Redundancy Groups
In an HA cluster, the goal is the ability to fail over resources in case something goes wrong. A redundancy group is a collection of resources that need to fail over between the two devices. Only one node at a time can be responsible for a redundancy group; however, a single node can be the primary node for any number of redundancy groups.
Two different items are placed in a redundancy group: the control plane and the interfaces. The default redundancy group is group 0. Redundancy group 0 represents the control plane. The node that is the master over redundancy group 0 has the active RE. The active RE is responsible for controlling the data plane and pushing new configurations. It is considered the ultimate truth in matters regarding what is happening on the device.
The data plane components for redundancy groups exist in numbers 1 and greater. The different SRX platforms support different numbers of redundancy groups. A data plane redundancy group contains one or more redundant Ethernet interfaces. Each member in the cluster has a physical interface bound into a reth. The active node’s physical interface will be active and the backup node’s interface will be passive and will not pass traffic. It is easier to think of this as a binary switch. Only one of the members of the reth is active at any given time. The section Deployment Concepts later in this chapter details the use of data plane redundancy groups.
Interfaces
A network device doesn’t help a network without participating in traffic processing. An SRX has two different interface types that it can use to process traffic. The first is the reth. A reth is a Junos aggregate Ethernet interface and it has special properties compared to a traditional aggregate Ethernet interface. The reth allows the administrator to add one or more child links per chassis. Figure 7-4 shows an example of this where node 0 is represented on the left and node 1 is represented on the right.
In Figure 7-4, node 0 has interface xe-0/0/0 as a child link of reth0 and node 1 has interface xe-12/0/0. The interface reth0 is a member of redundancy group 1. The node, in this case node 0, has its link active. Node 1’s link is in an up state but it does not accept or pass traffic. After a failover between nodes, the newly active node sends out GARPs. Both nodes share the same MAC address on the reth. The surrounding switches will learn the new port that has the reth MAC address. The hosts are still sending their data to the same MAC, so they do not have to relearn anything.
The MAC address for the reth is based on a combination of the cluster ID and the reth number. Figure 7-5 shows the algorithm that determines the MAC address. In Figure 7-5, there are two types of fields: the hex field represents one bit by using a hexadecimal representation of a byte using two base-16 digits; the bit field represents a number in binary with eight bits.
The first four of the six bytes are fixed. They do not change between cluster deployments. The last two bytes vary based on the cluster ID and the reth index. In Figure 7-5, CCCC represents the cluster ID in binary. With four bits, the maximum number is 15, which is the same number of cluster IDs supported. Next, the RR represents a reserved field for future expansion. It is currently set to 0 for both bits. The VV represents the version of the chassis cluster, which today is set at 0 for both of the bits. Last is the field filled with XXXXXXXXX, and this represents the redundant Ethernet index ID. Based on Figure 7-5, it’s easy to see that collision of MAC addresses between clusters can be avoided.
When configured in a chassis cluster, the SRX is also able to support local interfaces. A local interface is an interface that is configured local to a specific node. This method of configuration on an interface is the same method of configuration on a standalone device. The significance of a local interface in an SRX cluster is that it does not have a backup interface on the other chassis, meaning that it is part of neither a reth nor a redundancy group. If this interface were to fail, its IP address would not fail over to the other node. Although this feature might seem perplexing at first, it actually provides a lot of value in complex network topologies, and it is further explored later in this chapter.
Deployment Concepts
It’s time to apply all these concepts to actual deployment scenarios. For HA clusters, there is a lot of terminology for the mode of actually deploying devices, and this section attempts to give administrators a clear idea of what methods of deployment are available to them.
Earlier in this chapter we discussed control plane redundancy, whereby the control plane is deployed in an active/passive fashion. One RE is active for controlling the cluster, and the second RE is passive. The secondary RE performs some basic maintenance for the local chassis and synchronizes the configuration as well as checks that the other chassis is alive.
In this section, we discuss what can be done with the redundancy groups on the data plane. The configuration on the data plane determines in which mode the SRXs are operating. The SRX doesn’t have an idea of being forced into a specific mode of HA, but operates in that mode based on the configuration. There are three basic modes of operation and one creative alternative:
Active/passive
Active/active
Mixed mode
The six pack
Active/passive
In the active/passive mode, the first SRX data plane is actively passing traffic while the second SRX data plane is sitting in a passive setting not passing traffic. On a fault condition, of course, the passive data plane will take over and begin passing traffic. To accomplish this, the SRX uses one data plane redundancy group and one or more redundant Ethernet interfaces. Figure 7-6 illustrates an example of this active/passive process.
As shown in Figure 7-6, node 0, on the left, is currently active and node 1 is passive. In this example, there are two reth interfaces: reth0 and reth1. Reth0 goes toward the Internet and reth1 goes toward the internal network. Because node 0 is currently active, it is passing all of the traffic between the Internet and the internal network. Node 1’s data plane is (patiently) waiting for any issue to arise so that it can take over and continue to pass traffic. The interfaces on node 1 that are in the reth0 and reth1 groups are physically up but are unable to pass traffic. Because node 0 is currently active, it synchronizes any new sessions that are created to node 1. When node 1 needs to take over for node 0, it will have the same session information locally.
Active/active
In an active/active deployment, both SRXs are simultaneously passing traffic. Although it sounds difficult, the concept is simple—active/active is merely active/passive but done twice. In this case, each member of the cluster is active for its own redundancy group and the other device is passive for the redundancy group. In the event of a failure, the remaining node will take over for the traffic for the failed device. Synchronization happens between both nodes. Sessions for both redundancy groups are available on both nodes.
So, this question remains: what does this type of deployment mean for the administrator? The biggest advantage is that passing traffic over the backup node ensures that the backup data plane is ready and correctly functioning. Nothing is worse than having an HA cluster running for months and then, during the moment of truth, a failure occurs, and the second node is in a degraded state and no one discovered this ahead of time. A good example of avoiding this is to have one of the redundancy groups passing a majority of the traffic while the other redundancy group is used to pass only a single health check. This is a great design because the second device is verified and the administrator doesn’t have to troubleshoot load-sharing scenarios.
Active/active deployments can also be used to share load between the two hosts. The only downside to this design is that it might be difficult to troubleshoot flows going through the two devices, but ultimately that varies based on the administrator and the environment, and it’s probably better to have the option available in the administrator’s tool chest than not. Figure 7-7 shows an example of an active/active cluster.
Figure 7-7 shows an active/active cluster as simply two active/passive configurations. Building from Figure 7-6, the example starts with the same configuration as before. The clusters had a single redundancy group 1 and two reths, reth0 and reth1, with node 0 being the designated primary. In this example, a second redundancy group is added, redundancy group 2, and two additional reths are added to accommodate it. Reth2 is on the Internet-facing side of the firewalls and reth3 is toward the internal network. This redundancy group, however, has node 1 as the primary, so traffic that is localized to redundancy group 2 is only sent through node 1 unless a failure occurs.
Mixed mode
Mixed mode, perhaps the most interesting HA configuration, builds on the concepts already demonstrated but expands to include local interfaces. As we discussed earlier, a local interface is an interface that has configurations local to the node for which it is attached. The other node is not required to have a backup to this interface as in the case of a reth.
This option has significance in two specific use cases.
The first use case is WAN interfaces. For this use case, there are two SRX210s, each with a T1 interface and a single reth to present back to the LAN, as depicted in Figure 7-8. Node 0 on the left has a T1 to provider A and node 1 on the right has a T1 to provider B. Each node has a single interface connected to the LAN switch. These two interfaces are bound together as reth0. The reth0 interface provides a redundant, reliable gateway to present to clients. Because of the way a T1 works, it is not possible to have a common Layer 2 domain between the two T1 interfaces, so each T1 is its own local interface to the local node.
Traffic can enter or exit either T1 interface, and it is always directed out to the correct interface. In the case shown in Figure 7-8, that would be reth0, as it is the only other interface configured. The benefit of this design is that the two T1s provide redundancy and increased capacity, and sessions between the two interfaces are synchronized. It’s great when you are using T1 interfaces as connections to a remote VPN site.
A second great use case for mixed mode is with data centers using a dynamic routing integration design. The design is similar to our previous example, but in this case all of the interfaces are Ethernet. The two SRXs each have two interfaces connected into two different M120 routers, all of which can be seen in Figure 7-9. Having two links each going to two different routers provides a better level of redundancy in case links or routers fail. The OSPF routing protocol is enabled between the SRXs and the upstream routers, allowing for simplified failover between the links and ensuring that the four devices can determine the best path to the upstream networks. If a link fails, OSPF recalculates and determines the next best path.
You can see in Figure 7-9 that the southbound interfaces connect into two EX8200 core switches. These switches provide a common Layer 2 domain between the southbound interfaces, which allows for the creation of reth0 (similar to the rest of the designs seen in this chapter).
Six pack
It’s possible to forgo redundant Ethernet interfaces altogether and use only local interfaces. This is similar to the data center mixed mode design, except it takes the idea one step further and uses local interfaces for both the north- and southbound connections. A common name for this design is six pack. It uses four routers and two firewalls and is shown in Figure 7-10.
Much like the mixed mode design, the two northbound routers in Figure 7-10 are connected to the SRXs with two links. Each router has a connection to each SRX. On the southbound routers, the same design is replicated. This allows for a fully meshed, active/active, and truly HA network to exist. In this case, the SRXs are acting more like how a traditional router would be deployed. OSPF is used for the design to direct traffic through the SRXs, and it’s even possible to use equal cost multipath routing to do balancing for upstream hosts.
The six pack design shows just how flexible the SRXs can be to meet the needs of nearly any environment. These deployments can even be done in either the traditional Layer 3 routing mode or Layer 2 transparent mode.
Preparing Devices for Deployment
Understanding how a chassis cluster works is half the battle in attaining acceptable HA levels. The rest concerns configuring a cluster.
To be fair, the configuration is actually quite easy—it’s just a few steps to get the cluster up and running. Setting it up correctly is the key to a stable implementation, and needless to say, rushing through some important steps can cause serious pain later on. We therefore suggest that you start with fresh configurations, if possible, even if this means clustering the devices starting with a minimal configuration and then adding on from there.
Note
If there is an existing configuration, set it aside and then create the cluster. After the cluster is running happily, then migrate the configuration back on.
Differences from Standalone
When an administrator enters configuration mode on a standalone SRX, all of the active users who log in to the device can see the configuration and edit it. When each user’s changes can be seen by the other users on the device, it’s called a shared configuration. Once chassis clustering is enabled, the devices must be configured in what is called configure private, or private, mode, which allows each administrator to see only her own configuration changes. This imposes several restrictions on the end administrator while using configure private mode.
The first notable restriction is that all configuration commits
must be done from the root, or top, of the configuration hierarchy.
Second, the option to do commit
confirmed
is no longer allowed, which, as
you know, allows for a rollback to the previous configuration if things
go wrong. Both are very nice features that are not available when in
clustering mode. The reason these are disabled is simple:
stability.
A lot of communication is going on between the two SRXs when they are in a clustered mode, so when committing a change, it is best to minimize the chances of differences between the two devices’ local configurations at the time of the commit. If each node had a user modifying the configuration at the same time, this would add an unneeded level of complexity to ensure that the configurations are synchronized. Because of this, private mode is required while making configuration changes.
Activating Juniper Services Redundancy Protocol
The first step in creating a cluster is to place the device into cluster mode. By default, the SRX does not run the jsrpd daemon, so this must be triggered. To enable the jsrpd daemon and turn the device into an eligible chassis cluster member, a few special bits must be set in the Non Volatile Random Access Memory (NVRAM) on the device, triggering the SRX, on boot, to enable jsrpd and enter chassis cluster mode.
Note
These settings are permanent until they are otherwise removed. An initial reboot is required after setting the cluster ID to get the jsrpd daemon to start. The daemon will start every time the bits are seen in the NVRAM.
It takes a single command, and it takes effect only on reboot. Although it is unfortunate that a reboot is required, it is required only once. You must run the command from operational mode and as a user with superuser privileges.
root@SRX210-H> set chassis cluster cluster-id 1 node 0 reboot
Successfully enabled chassis cluster. Going to reboot now
root@SRX210-H>
*** FINAL System shutdown message from root@SRX210-H ***
System going down IMMEDIATELY
For this command to work, we needed to choose the cluster ID and the node ID. For most implementations, cluster ID 1 is perfectly acceptable, as we discussed earlier. The node ID is easy, too: for the first node that is being set up, use node 0, and for the second node, use node 1. There isn’t a specific preference between the two. Being node 0 or node 1 doesn’t provide any special benefit; it’s only a unique identifier for the device.
Once the device comes back up, it’s easy to notice the changes. Right above the shell prompt is a new line:
{primary:node0} #<----new to the the prompt
root>
This line gives the administrator two important pieces of information. The part to the left of the colon is the current status of the cluster control plane in relevance to the cluster, and it will define which state the RE is in.
Note
This only shows the control plane status. This does not show which device has the active data plane. This is a common mistake for those using the SRX. That message should be on its own page all by itself, as it’s that important to remember.
There are several different options for control plane status, as listed in Table 7-1. On boot, the device enters the hold state. During this state, the control plane is preparing itself to enter the cluster. Next the device enters the secondary state when the RE checks to see if there is already a primary RE in the cluster. If not, it then transitions to the primary state.
State | Meaning |
Hold | This is the initial state on boot. The RE is preparing to join the cluster. |
Secondary | The RE is in backup state and is ready to take over for the primary. |
Primary | The RE is the controller for the cluster. |
Ineligible | Something has occurred that makes the RE no longer eligible to be part of the cluster. |
Disabled | The RE is no longer eligible to enter the cluster. It must be rebooted to rejoin the cluster. |
Unknown | A critical failure has occurred. The device is unable to determine its current state. It must be rebooted to attempt to reenter the cluster. |
Lost | Communication with the
other node is lost. A node cannot be in a lost state; this is
only listed under the |
Secondary-hold | A device enters secondary-hold when it is identified as a secondary but the configured hold-down timer has not yet expired. In the event of a critical failure, the redundancy group can still fail over. |
After the primary states are three states that occur only when something goes wrong. Ineligible occurs when something happens that invalidates the member from the cluster. From there, the device enters the disabled state after a period of time while being ineligible. The last state, unknown, can occur only if some disastrous, unexpected event occurs.
Once the system is up, and in either the final primary or secondary state, there are a few steps you can take to validate that the chassis cluster is indeed up and running. First, check that the jsrpd daemon is up and running. If the new cluster status message is above the prompt, it’s pretty certain that the daemon is running.
{primary:node0} root>show system processes | match jsrpd
863 ?? S 0:00.24 /usr/sbin/jsrpd -N {primary:node0} root>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 1 primary no no node1 0 lost n/a n/a {primary:node0} root>
The greatest friend of anyone using the chassis cluster is
the show chassis cluster status
command. It is the most common command for looking at the current status
of the cluster and it is full of information. The first bit of
information is the cluster ID, the one that was initially configured and
will likely stay that way until cleared. Next is information regarding
all of the redundancy groups that are configured; the first one in our
case is redundancy group 0. This represents the control plane only and
has no relevance on who is actively passing traffic.
Under each redundancy group, each node is listed along with its priorities, status, preempt status, and whether the device is in manual failover. By default, redundancy group 0 is created without user intervention. Each device is given a default priority of 1. Because of this, the first node that becomes primary will be primary until a failure occurs. Next, the status is listed. The last two columns are preempt and manual failover. Preempt is the ability to configure the device with the higher priority to preempt the device with the lower priority. The manual failover column will state if the node was manually failed over to by the administrator.
Managing Cluster Members
Most Junos devices have a special interface named fxp0 that is used to manage the SRXs. It is typically connected to the RE, although some devices, such as the SRX100 and SRX200 Series, do not have a dedicated port for fxp0 because the devices are designed to provide the maximum number of ports for branch devices. However, when SRX devices are configured in a cluster, the secondary node cannot be directly managed unless it has either a local interface or the fxp0 port. To ease management of the SRX100 and SRX200 Series, the fe-0/0/6 port automatically becomes the fxp0 port. In the section Node-Specific Information later in this chapter, we discuss how to configure this port.
The fxp0 interface exists on the majority of Junos devices. This is due to the devices’ service-provider-like design. Fxp0 allows for secure out-of-band management, enabling administrators to access the device no matter what is happening on the network. Because of this, many of the capabilities and management services often operate best through the fxp0 port. Tools such as NSM and Junos Space operate best when talking to an fxp0 port. Also, updates for IDP and UTM will work best through this interface. After 12.1, this is no longer the case, and you are freed from this limitation.
Managing branch devices that are remote can often be a challenge. It might not be possible to directly connect to the backup node. This is especially an issue when using a management tool such as NSM or Junos Space. Luckily, Juniper created a way to tunnel a management connection to the secondary node through the first. This mode is called “cluster-master” mode.
This mode requires a single command to activate.
{secondary:node1}[edit] root@SRX-HA-1# set chassis cluster network-management cluster-master {secondary:node1}[edit] root@SRX-HA-1# edit chassis cluster {secondary:node1}[edit chassis cluster] root@SRX-HA-1# show reth-count 2; heartbeat-threshold 3; network-management { cluster-master; } redundancy-group 0 { node 0 priority 254; node 1 priority 1; } redundancy-group 1 { node 0 priority 254; node 1 priority 1; } {secondary:node1}[edit chassis cluster] root@SRX-HA-1#
Configuring the Control Ports
Now that the devices are up and running, it’s time to get the two devices talking. There are two communication paths for the devices to talk over; the first leads to the second. The control port is the first, and by configuring the control port it’s possible to get the devices communicating early on. Then, once the devices are in a cluster, the configuration is automatically synchronized for a consistent second method. This can cut the administrator’s work in half as both devices need to be configured only once.
Different platforms have different requirements for configuring the control port. Table 7-2 lists each platform and the control port location. Because each platform has different subsystems under the hood, so to speak, there are different ways to configure the control port. The only device that requires manual configuration is the SRX5000. Some devices also support dual or redundant control ports.
Device | Control port | Description | Dual support? |
SRX100, SRX110, and SRX210 | fe-0/0/7 | Dedicated as a control port upon enabling clustering | No |
SRX220 | ge-0/0/7 | Dedicated as a control port upon enabling clustering | No |
SRX240 and SRX650 | ge-0/0/1 | Dedicated as a control port upon enabling clustering | No |
SRX1400 | ge-0/0/10 and optionally ge-0/0/11 | Dedicated as a control port upon enabling clustering | Yes |
SRX3000 | Both located on the SFB | No user configuration required | Yes |
SRX5000 | Located on the SPC | Manual configuration required | Yes |
When connecting control ports, you connect the control port from each device to the other device. It is not recommended that you join two primary devices together—it’s best to reboot the secondary and then connect the control port. On reboot, the two devices will begin to communicate.
Note
For all of the SRX devices, except the SRX5000, you can do this right after the initial cluster configuration. For the SRX5000 Series, two reboots are required.
The SRX5000 Series control ports are located on the SPC, because when Juniper was creating the SRX5000, the SPC was the only part that was created from scratch and the remaining parts were taken from the MX Series. Ultimately, locating the control ports on the SPC removes the control ports from other components while adding some additional resiliency. The SPC and its underlying traffic processing are physically separate from the control ports even though they are located on the same card. The SRX5000 must use fiber SFPs to connect the two chassis.
To configure the control ports on the SRX5000, the administrator first needs to determine which ports she wants to configure based on which FPC the control port is located within. Next, the administrator must identify the port number (either port 0 or port 1).
{primary:node0}[edit chassis cluster] root@SRX5800A#set control-ports fpc 1 port 0
root@SRX5800A#set control-ports fpc 2 port 1
root@SRX5800A#show
control-link-recovery; control-ports { fpc 1 port 0; fpc 2 port 1; } root@SRX5800A#commit
There is logic in how the control ports should be configured on the SRX5000s. The control ports can be on the same FPC, but ideally, the SRX should not be configured that way. If possible, do not place the control port on the same card as the CP or central point processor because the CP is used as a hop for the data link. If the FPC with the CP fails, and the control link is on it and it’s a single control link, the SRX cluster can go into split brain or dual mastership. Because of this, separating the two is recommended. So, if an administrator is going to utilize dual control links, it’s recommended that she place each control link on separate SPCs and the CP on a third SPC. This would require at least three SPCs, but this is the recommendation for the ultimate in HA.
Once the control links are up and running, and the secondary node
is rebooted and up and running, it’s time to check that the cluster is
communicating. Again, we go back to the show
chassis cluster status
command.
{primary:node0}
root> show chassis cluster status
Cluster ID: 1
Node Priority Status Preempt Manual failover
Redundancy group: 0 , Failover count: 1
node0 1 primary no no
node1 1 secondary no no
{primary:node0}
root>
Both devices should be able to see each other, as shown here, with one device being primary and the other secondary.
Next, because there are two devices, it’s possible to check
communications between the two, this time using the show chassis cluster statistics
command.
{primary:node0}
root> show chassis cluster control-plane statistics
Control link statistics:
Control link 0:
Heartbeat packets sent: 217
Heartbeat packets received: 21
Heartbeat packet errors: 0
Fabric link statistics:
Probes sent: 4286
Probes received: 0
Probe errors: 0
At this point in the cluster creation, you should see only
heartbeat messages on the control link, such as under the statistic
Heartbeat packets received:
. In the
preceding output, 21 packets have been received. Typically, the number
of heartbeat packets sent and received will not match, as one device
started before the other and did not receive messages for a period of
time. But once the sent and received numbers consistently match,
everything on the control plane should be in order.
The SRX1400, SRX3000, and the SRX5000 are able to use two control links that are provided for redundancy only. In the event that one of the control links on the device fails, the second is utilized. But to use the second control link, an additional component is needed. The SRX3000 uses a component called the SCM, which is used to activate and control the secondary control link. On the SRX5000, a standard RE can be used. The RE needs to have Junos 10.0 or later loaded to operate the control link. Both the SCM and the secondary RE are loaded into the second RE port on the platform. These modules do not act as an RE or backup to the RE, but rather are used only for the backup control link.
Note
These components must be placed into the chassis while it is powered off. On boot, the secondary link will be up and functional.
A quick look at the output of the show
chassis cluster control-plane
statistics
command shows the second
control link working.
root > show chassis cluster control-plane statistics
Control link statistics:
Control link 0:
Heartbeat packets sent: 1114
Heartbeat packets received: 217
Heartbeat packet errors: 0
Control link 1:
Heartbeat packets sent: 1114
Heartbeat packets received: 217
Heartbeat packet errors: 0
Fabric link statistics:
Probes sent: 1575937
Probes received: 1575224
Probe errors: 0
A final configuration option needs to be configured for the control link, and that is control link recovery. Control link recovery allows for automated recovery of the secondary chassis in the event that the control link fails. If the single or both control links fail, then the secondary device will go into the disabled state.
On the data center SRXs, a feature called unified in-service software upgrade (ISSU) can be used. This method is a graceful upgrade method that allows for the SRXs to upgrade without losing sessions or traffic.
The process might take some time to complete because the kernel on the two devices must synchronize and the software must be updated. It is suggested that you have all of the redundancy groups on a single member in the cluster. The process is similar to the other, except the upgrade only needs to be run on one SRX.
{primary:node0} root@SRX5800-1>request system software in-service-upgrade
junos-srx5000-12.1X44.10-domestic.tgz reboot
The command will upgrade each node and reboot them as needed. No further commands are required.
There is one last option that the unified ISSU process can
use: the no-old-master-upgrade
command, which leaves the master in a nonupgraded state. This ensures
that there is a working box should the software upgrade fail. After
successful completion of the upgrade, the old master is manually
upgraded, as shown here.
{primary:node0} root@SRX5800-1>request system software in-service-upgrade
junos-srx5000-12.1X44.D10-domestic.tgz no-old-master-upgrade
##next on the old master {primary:node0} root@SRX5800-1>request system software add junos-srx5000-12.1X44D10-domestic
.
tgz
{primary:node0} root@SRX5800-1>request chassis cluster in-service-upgrade abort
{primary:node0} root@SRX5800-1>request system reboot
If things do go wrong and both nodes are unable to complete the upgrade in the unified ISSU process, the upgraded node needs to be rolled back. This is simple. First, you must abort the unified ISSU process, then roll back the software on that node, and then reboot the system.
{primary:node0} root@SRX5800-1>request chassis cluster in-service-upgrade abort
Exiting in-service-upgrade window {primary:node0} root@SRX5800-1>request system software rollback
{primary:node0} root@SRX5800-1>request system reboot
To recover, the device must be rebooted. The risk is that the device might not be able to see the primary on reboot, so if that occurs, dual mastership or split brain will result. The better option is to enable control link recovery. It only takes a single command to enable, as shown in the next example.
{primary:node0}[edit chassis cluster]
root# set control-link-recovery
{primary:node0}[edit chassis cluster]
root# show
control-link-recovery;
Once control link recovery is enabled, a user can manually reconnect the control link. After the control link has been up for about 30 seconds and the SRXs have determined that the link is healthy, the secondary node will reboot. After recovering from the reboot, the cluster will be up and synchronized and ready to operate. Although a reboot seems harsh for such a recovery, it is the best way to ensure that the backup node is up and completely operational.
Configuring the Fabric Links
The second half of the chassis cluster communication equation is the fabric connection. Unlike the control link, the fabric link provides several functions and a great deal of value to the cluster, the most important being session synchronization. Without session synchronization, there would be little value to an SRX cluster. A second feature of the fabric link is the ability to forward traffic between the two chassis. The egress chassis is responsible for processing the traffic, so traffic is forwarded to the other cluster member only if the egress interface is on that chassis.
Each node in the chassis needs its own fabric interface configured. The interfaces should be directly connected to each other. Creating the fabric link between the two chassis requires the creation of a special interface called the fab interface. The fab interface is a special version of the aggregate Ethernet interface that allows for the binding of one or more interfaces into a special bundle. Interfaces are added to the fab interface with node 0’s fabric interface, called fab0, and node 1’s fabric interface, called fab1. Set the interface this way.
{primary:node0}[edit interfaces] root#set fab0 fabric-options member-interfaces fe-0/0/4
{primary:node0}[edit interfaces] root#set fab1 fabric-options member-interfaces fe-2/0/4
{primary:node0}[edit] root# show interfaces fab0 { fabric-options { member-interfaces { fe-0/0/4; } } } fab1 { fabric-options { member-interfaces { fe-2/0/4; } } } {primary:node0}[edit] root# run show interfaces terse Interface Admin Link Proto Local Remote ge-0/0/0 up down ge-0/0/1 up down fe-0/0/2 up down fe-0/0/3 up down fe-0/0/4 up up fe−0/0/4.0 up up aenet --> fab0.0
fe-0/0/5 up up fe-0/0/6 up up fe-0/0/7 up up ge-2/0/0 up down ge-2/0/1 up down fe-2/0/2 up down fe-2/0/3 up down fe-2/0/4 up up fe−2/0/4.0 up up aenet --> fab1.0
fe-2/0/5 up up fe-2/0/6 up up fe-2/0/7 up up fab0up up
fab0.0 up up inet 30.17.0.200/24
fab1 up up
fab1.0 up up inet 30.18.0.200/24
fxp0 up up fxp0.0 up up inet 10.0.1.210/24 fxp1 up up fxp1.0 up up inet 129.16.0.1/2 tnp 0x1100001 fxp2 up up gre up up ipip up up lo0 up up lo0.16384 up up inet 127.0.0.1 --> 0/0 lo0.16385 up up inet 10.0.0.1 --> 0/0 10.0.0.16 --> 0/0 128.0.0.1 --> 0/0 128.0.1.16 --> 0/0 inet6 fe80::224:dcff:fed4:e000 lo0.32768 up up lsi up up mtun up up pimd up up pime up up pp0 up up st0 up up tap up up vlan up down {primary:node0}[edit] root#
As shown in the preceding output, interfaces fe-0/0/4 and fe-2/0/4
are members of an aenet
bundle.
Interface fe-0/0/4 is a member of fab0.0, and fe-2/0/4 is a member of
fab1.0. If you look closely at fab0.0 and at fab1.0, each is given an
internal IP address. The address is used for internal communication and
does not need to be configured by the administrator.
You should verify that the two SRXs are talking over the fabric link. The send and receive statistics should be increasing.
{primary:node0}[edit interfaces] root#set fab0 fabric-options member-interfaces fe-0/0/4
{primary:node0}[edit interfaces] root#set fab1 fabric-options member-interfaces fe-2/0/4
{primary:node0}[edit] root#show interfaces
fab0 { fabric-options { member-interfaces { fe-0/0/4; } } }
When the SRX needs to forward traffic across the data plane, it encapsulates the entire packet and then forwards it over the link. The fabric link is automatically configured using jumbo frames, or frames that are larger than the standard 1,514-byte frame. Juniper supports up to a 9,192-byte frame. The difficulty here is that the SRX cannot take a maximum size frame and then encapsulate it because it would be far too large to push over the fabric link, and the SRX is not able to fragment the packet. Therefore, it’s best to set the maximum transmission unit (MTU) on the SRX interfaces to less than 8,900 to ensure that the packets are able to pass over the fabric link.
Note
If you’re using an active/passive cluster, this should not be an issue.
As of Junos 12.1X44, all SRX platforms are able to use redundant fabric link ports, unlike control link redundancy, which is restricted to the data center SRXs only. Adding a second link is identical to creating the first link and it also requires a second link to be physically cabled between the two chassis. For some platforms, such as the SRX100 or SRX210, that might seem excessive because adding a second fabric link would mean half of the ports on the chassis would be taken up by links for HA: one control link, two fabric links, and a management link. Only three are required, but the fourth is optional.
There are good reasons to add a redundant fabric link on the smaller SRX devices. The first is that there is an important level of redundancy on the critical links between the SRXs, and it helps to prevent split brain, a critical requirement especially in a remote branch location. (We discuss dealing with split brain further in the sections Fault Monitoring and Troubleshooting and Operation.)
To configure the second fabric link utilize the following commands:
{primary:node0}[edit interfaces] root#set fab0 fabric-options member-interfaces fe-0/0/5
{primary:node0}[edit interfaces] root#set fab1 fabric-options member-interfaces fe-2/0/5
{primary:node0}[edit] root#show interfaces
fab0 { fabric-options { member-interfaces { fe-0/0/4; fe-0/0/5; } } } fab1 { fabric-options { member-interfaces { fe-2/0/4; fe-2/0/5; } } } {primary:node0} root>show interfaces terse
Interface Admin Link Proto Local Remote ge-0/0/0 up down ge-0/0/1 up down fe-0/0/2 up down fe-0/0/3 up down fe-0/0/4 up up fe−0/0/4.0 up up aenet --> fab0.0
fe-0/0/5 up up fe−0/0/5.0 up up aenet --> fab0.0
fe-0/0/6 up up fe-0/0/7 up up ge-2/0/0 up down ge-2/0/1 up down fe-2/0/2 up down fe-2/0/3 up down fe-2/0/4 up up fe−2/0/4.0 up up aenet --> fab1.0
fe-2/0/5 up up fe−2/0/5.0 up up aenet --> fab1.0
fe-2/0/6 up up fe-2/0/7 up up fab0 up up fab0.0 up up inet 30.17.0.200/24 fab1 up up fab1.0 up up inet 30.18.0.200/24 fxp0 up up fxp0.0 up up inet 10.0.1.210/24 fxp1 up up fxp1.0 up up inet 129.16.0.1/2 tnp 0x1100001 fxp2 up up gre up up ipip up up lo0 up up lo0.16384 up up inet 127.0.0.1 --> 0/0 lo0.16385 up up inet 10.0.0.1 --> 0/0 10.0.0.16 --> 0/0 128.0.0.1 --> 0/0 128.0.1.16 --> 0/0 inet6 fe80::224:dc0f:fcd4:e000 lo0.32768 up up lsi up up mtun up up pimd up up pime up up pp0 up up ppd0 up up ppe0 up up st0 up up tap up up vlan up down {primary:node0} root>
In this output’s configuration, each fabric link has a second member interface added to it. So, fe-0/0/5 is added to fab0, and fe-2/0/5 is added to fab1. Because a fab link is like an aggregate Ethernet interface, the configuration also looks similar. Note that packets will only pass over one fab link at a time, as the second fab link is only used as a backup.
Configuring the Switching Fabric Interface
The branch series of devices has the ability to perform local switching. However, once you enter chassis cluster mode, what do you do when you still need to provide local switching? The branch SRX devices now have the ability to share a single switching domain across two devices. This is excellent for small branches that need to offer switching to hosts without even needing to add standalone switches. There are a few things to take into consideration before you enable switching in your cluster.
First, to enable switching in a cluster, you need to dedicate one interface on each SRX to connect to the other cluster member. This allows a dedicated path to connect between the two switches. On some of the smaller SRXs this will eat up another valuable port. This is why the feature is only supported on the SRX240 and up; it makes more sense to enable this configuration. For the SRX550 or the SRX650 that have G-PIMs, you need to create a switch fabric interface between each G-PIM that you want to bridge switching between. Also Q-in-Q features are not supported in chassis cluster due to hardware limitations.
{primary:node1}[edit] root@SRX-650# set interfaces swfab0 fabric-options member-interfaces ge-2/0/5 {primary:node1}[edit] root@SRX-650# set interfaces swfab0 fabric-options member-interfaces ge-11/0/5 {primary:node1}[edit] root@SRX-650# show interfaces -- snip -- swfab0 { fabric-options { member-interfaces { ge-2/0/5; } } } swfab1 { fabric-options { member-interfaces { ge-11/0/5; } } } {primary:node1}[edit] root@SRX-650# run show chassis cluster ethernet-switching statistics Switch fabric link statistics: Probe state : UP Probes sent: 1866 Probes received: 1871 Probe recv errors: 0 Probe send errors: 0
Node-Specific Information
A chassis cluster HA configuration takes two devices and makes them look as though they are one. However, the administrator might still want some elements to be unique between the cluster members, such as the hostname and the IP address on fxp0, which are typically unique per device. No matter what unique configuration is required or desired, it’s possible to achieve it by using Junos groups. Groups provide the ability to create a configuration and apply it anywhere inside the configuration hierarchy. It’s an extremely powerful feature, and here we use it to create a group for each node.
Each group is named after the node it is applied to, and it’s a special naming that the SRX looks for. After commit, only the group that matches the local node name is applied, as shown in the following configuration:
{primary:node0}[edit groups] root# show node0 { system { host-name SRX210-A; } interfaces { fxp0 { unit 0 { family inet { address 10.0.1.210/24; } } } } } node1 { system { host-name SRX210-B; } interfaces { fxp0 { unit 0 { family inet { address 10.0.1.211/24; } } } } } {primary:node0}[edit groups] root#
In this configuration example, there are two groups,
created under the groups
hierarchy,
which is at the top of the configuration tree. The node0
group has its hostname set as SRX210-A
, and node1
has its hostname set as SRX210-B
. To apply the groups, the
administrator needs to use the apply-groups
command at the root of the
configuration. When the configuration is committed to the device, Junos
will see the command and merge the correct group to match the node
name.
{primary:node0}[edit] root#set apply-groups "${node}"
{primary:node0}[edit] root#show apply-groups
## Last changed: 2010-03-31 14:25:09 UTC apply-groups "${node}"; {primary:node0}[edit] root# root#show interfaces | display inheritance
fab0 { fabric-options { member-interfaces { fe-0/0/4; fe-0/0/5; } } } fab1 { fabric-options { member-interfaces { fe-2/0/4; fe-2/0/5; } } } ## ## 'fxp0' was inherited from group 'node0' ## fxp0 { ## ## '0' was inherited from group 'node0' ## unit 0 { ## ## 'inet' was inherited from group 'node0' ## family inet { ## ## '10.0.1.210/24' was inherited from group 'node0' ## address 10.0.1.210/24; } } } {primary:node0}[edit] root#
To apply the configurations to the correct node, a special
command was used: the set apply-groups
"${node}"
command. The variable "${node}"
is interpreted as the local node
name. Next in the output example is the show |
display inheritance
command, which shows the components of the
configuration that are inherited from the group—the component that is
inherited has three lines above it that all begin with ##
, with the second line specifying from which
group the value is inherited.
As discussed, the fxp0 management port can be configured like a standard interface providing a management IP address for each device, but it’s also possible to provide a shared IP address between each device so that when connecting to the IP it is redirected back to the primary RE. This way, the administrator does not have to figure out which is the master RE before connecting to it. The administrator can connect to what is called the master-only IP.
To do so, a tag is added to the end of the command when configuring the IP address, which is configured in the main configuration and not in the groups (because the tag is applied to both devices, there is no need to place it in the groups).
{primary:node0}[edit] root#set interfaces fxp0.0 family inet address 10.0.1.212/24 master-only
{primary:node0}[edit] root# show interfaces fxp0 unit 0 { family inet { address 10.0.1.212/24 { master-only; } } } {primary:node0} root@SRX210-A>show interfaces fxp0 terse
Interface Admin Link Proto Local Remote fxp0 up up fxp0.0 up up inet 10.0.1.210/24 10.0.1.212/24 {primary:node0} root@SRX210-A>
Configuring Heartbeat Timers
The SRX sends heartbeat messages on both the control and data links to ensure that the links are up and running. Although the device itself could look to see if the link is up or down, that is not enough to validate it. Heartbeat messages provide three layers of validation: link, daemon, and internal paths.
The message requires the two jsrpd daemons to successfully communicate, ensuring that the other daemon isn’t in a state of disarray and validating the internal paths between the two daemons, including the physical link and the underlying subsystems. For the data link, the packets are even sent through the data plane, validating that the flow daemons are communicating properly.
Each platform has default heartbeat timers that are appropriate for that device. The reason for the differences is due to the ability of the kernel to guarantee the time to the jsrpd daemon. Generally, the larger the device, the larger the processor on the RE; the larger the processor, the faster it can process tasks; and the faster the device can process tasks, the quicker it can move on to the next task.
Note
This begs the question of how fast an administrator needs a device to fail over. Of course, the world would like zero downtime and guaranteed reliability for every service, but the answer is as fast as a device can fail over in a reasonable amount of time while maintaining stability.
Table 7-3 lists the various configuration options for heartbeat timers based on the SRX platform. The branch platforms use a higher timer because they use slower processors to ensure stability at the branch. Although a faster failover might be desired, stability is the most important goal. If the device fails over but is lost in the process, it is of no use.
Platform | Control plane timer min (ms) | Control plane timer max (ms) | Missed heartbeat threshold min (sec) | Missed heartbeat threshold max (sec) | Min missing peer detection time (sec) |
SRX100 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX110 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX210 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX220 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX240 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX550 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX650 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX1400 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX3400 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX3600 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX5600 | 1,000 | 2,000 | 3 | 8 | 3 |
SRX5800 | 1,000 | 2,000 | 3 | 8 | 3 |
The SRXs have a default failover detection time of three seconds, and these platforms can be easily modified. There are two options to set: threshold and interval. Increasing the failover time is needed in many networks. Surrounding STP convergence might have high timers, and to match the failover times you might need to increase your failover detection times.
{primary:node0}[edit chassis cluster] root@SRX210-A#set heartbeat-interval 2000
{primary:node0}[edit chassis cluster] root@SRX210-A#set heartbeat-threshold 8
{primary:node0}[edit chassis cluster] root@SRX210-A#show control-link-recovery;
heartbeat-interval 2000; heartbeat-threshold 8; {primary:node0}[edit chassis cluster] root@SRX210-A#
Redundancy Groups
Redundancy groups are the core of the failover mechanism for the SRX and they are used for both the control and data planes. On any SRX cluster there can be at least 1 redundancy group at a minimum, and up to 128 at a maximum (including redundancy group 0). How many you deploy, of course, varies by platform and deployment scenario.
A redundancy group is a collection of objects, and it represents which node is the owner of the objects. The objects are either interfaces or the control plane. Whichever node is the primary owner for the redundancy group is the owner of the items in the redundancy group. On ScreenOS firewalls this was called a VSD (virtual security device). When a cluster is created, redundancy group 0 is also created by default. No additional configuration is required to make it work.
Each node is given a priority within a redundancy group. The higher-priority device is given mastership over the redundancy group. This depends on a few options, and one of them, by default, is that a node with a higher priority will not preempt the device with the lower priority. The result is that if a lower-priority node were to have ownership of a redundancy group and then a node with the higher-priority were to come online, it would not give ownership to the higher-priority device. To enable this, the preempt option would need to be enabled, and the device with the higher priority would take ownership of the redundancy group when it was healthy to do so. Most organizations do not use this option—they want to manually move the redundancy group back to the node after the failover is investigated.
Creating a redundancy group is the same for the control or data plane, with the only difference seen when configuring the interfaces. Let’s create an example with redundancy group 0. Remember that this is not required, but doing so helps to create the redundancy group and set the node priorities, because if the node priorities are not set they default to 1.
Note
Most organizations use node 0 as the higher-priority device. It’s best when configuring the cluster to keep the configuration logical. When troubleshooting in the middle of the night, it’s great to know that node 0 should be the higher-priority node and that it is the same across the whole organization.
Let’s create the redundancy group:
Default: root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 1 primary no no node1 1 secondary no no {primary:node0} root@SRX210-A> {primary:node0}[edit chassis cluster] root@SRX210-A#set redundancy-group 0 node 0 priority 254
{primary:node0}[edit chassis cluster] root@SRX210-A#set redundancy-group 0 node 1 priority 1
{primary:node0}[edit chassis cluster] root@SRX210-A#show redundancy-group 0
node 0 priority 254; node 1 priority 1; root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no {primary:node0} root@SRX210-A>
Now let’s create redundancy group 1. The most common firewall
deployment for the SRX is a Layer 3-routed active/passive deployment.
This means the firewalls are configured as a router and that one device
is active and the other is passive. To accomplish this, a single data
plane redundancy group is created. It uses the same commands as used to
create redundancy group 0 except for the name redundancy-group 1
.
{primary:node0}[edit chassis cluster] root@SRX210-A#set redundancy-group 1 node 0 priority 254
{primary:node0}[edit chassis cluster] root@SRX210-A#set redundancy-group 1 node 1 priority 1
{primary:node0}[edit chassis cluster] root@SRX210-A#set chassis cluster reth-count 2
{primary:node0}[edit chassis cluster] root@SRX210-A#show
control-link-recovery; reth-count 2; heartbeat-interval 2000; heartbeat-threshold 8; redundancy-group 0 { node 0 priority 254; node 1 priority 1; } redundancy-group 1 { node 0 priority 254; node 1 priority 1; } {primary:node0}[edit chassis cluster] root@SRX210-A# {primary:node0} root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 1 node0 254 primary no no node1 1 secondary no no {primary:node0} root@SRX210-A>
To keep things consistent, redundancy group 1 also gives node 0 a priority of 254 and node 1 a priority of 1. To be able to commit the configuration, at least one reth has to be enabled (it’s shown here but is further discussed in the next section). After commit, the new redundancy group can be seen in the cluster status. It looks exactly like redundancy group 0 and contains the same properties.
When creating an active/active configuration and utilizing redundant Ethernet interfaces, the SRX needs to have at least two redundancy groups. Each node in the cluster will have an active redundancy group on it. You configure this redundancy group in the same way as you did the other redundancy group, except that the other node will be configured with a higher priority. In this case, node 1 will have priority 254 and node 0 will have priority 1.
{primary:node0}[edit chassis cluster] root@SRX210-A#set redundancy-group 2 node 0 priority 1
{primary:node0}[edit chassis cluster] root@SRX210-A#set redundancy-group 2 node 1 priority 254
{primary:node0}[edit chassis cluster] root@SRX210-A#show
control-link-recovery; reth-count 2; heartbeat-interval 2000; heartbeat-threshold 8; redundancy-group 0 { node 0 priority 254; node 1 priority 1; } redundancy-group 1 { node 0 priority 254; node 1 priority 1; } redundancy-group 2 { node 0 priority 1; node 1 priority 254; } {primary:node0}[edit chassis cluster] root@SRX210-A# {primary:node0} root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 2 , Failover count: 0 node0 1 secondary no no node1 254 primary no no {primary:node0} root@SRX210-A>
Now, three redundancy groups are listed. The newest redundancy group, redundancy group 2, has node 1 as its primary and node 0 as its secondary. In this case, all of the traffic for redundancy group 2 will be flowing through node 1, and redundancy group 1’s traffic will be flowing through node 0. In the event of a failover each node has a mirrored state table of the peer device so it is possible for either node to take over all redundancy groups.
Note
It’s important to plan for the possibility that a single device might have to handle all of the traffic for all of the redundancy groups. If you don’t plan for this, the single device can be overwhelmed.
Each redundancy group needs a minimum of one reth in it to operate. Because of this, the total number of redundancy groups is tied to the total number of reths per platform, plus one for redundancy group 0. Table 7-4 lists the number of supported redundancy groups per SRX platform.
Platform | Redundancy groups |
SRX100 | 9 |
SRX110 | 9 |
SRX210 | 9 |
SRX220 | 9 |
SRX240 | 25 |
SRX550 | 69 |
SRX650 | 69 |
SRX1400 | 128 |
SRX3400 | 128 |
SRX3600 | 128 |
SRX5600 | 128 |
SRX5800 | 128 |
As previously discussed, it’s possible to have the node with the higher priority preemptively take over the redundancy group. By default, the administrator would need to manually fail over the redundancy group to the other node. Configuring a preempt only requires a single command under the redundancy group as shown here, but redundancy groups also have a default hold-down timer, or the time that the redundancy group must wait until it can preempt. On redundancy group 1 and greater, it is set to one second. On redundancy group 0, it is set to 300 seconds or 5 minutes to prevent instability on the control plane.
{primary:node0}[edit chassis cluster] root@SRX210-A#set redundancy-group 1 preempt
{primary:node0}[edit chassis cluster] root@SRX210-A#show
control-link-recovery; reth-count 2; heartbeat-interval 2000; heartbeat-threshold 8; redundancy-group 0 { node 0 priority 254; node 1 priority 1; } redundancy-group 1 { node 0 priority 254; node 1 priority 1; preempt; } {primary:node0}[edit chassis cluster] root@SRX210-A# {primary:node0} root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 1 node0 254 primary yes no node1 1 secondary yes no {primary:node0} root@SRX210-A>
A hold-down timer can be set to prevent unnecessary failovers in a chassis cluster, used in conjunction with preempt as the number of seconds to wait until the redundancy group can fail over. As previously mentioned, default hold-down timers are configured: for redundancy group 1, it’s 1 second; for redundancy group, 0 it’s 300 seconds. You can customize the timer and set it between 0 and 1,800 seconds, but best practice suggests to never set redundancy group 0 to less than 300 seconds to prevent instability on the control plane.
It’s best to set a safe number for the redundancy groups to ensure that the network is ready for the failover, and in the event of a hard failure on the other node, the redundancy group will fail over as fast as possible.
{primary:node0}[edit chassis cluster] root@SRX210-A#set redundancy-group 1 hold-down-interval 5
{primary:node0}[edit chassis cluster] root@SRX210-A#show
control-link-recovery; reth-count 2; heartbeat-interval 2000; heartbeat-threshold 8; redundancy-group 0 { node 0 priority 254; node 1 priority 1; } redundancy-group 1 { node 0 priority 254; node 1 priority 1; preempt; hold-down-interval 5; } {primary:node0}[edit chassis cluster] root@SRX210-A#
Integrating the Cluster into Your Network
Once the SRXs are talking with each other and their configurations are correctly syncing, it is time to integrate the devices into your network. Waiting to configure the network after enabling the cluster is the best practice to follow. Not only does it save time but it reduces the amount of configuration steps needed as the configuration is shared across both devices. To use a cluster in your network, you need to create a special interface called a reth (often pronounced like wreath) interface. This interface is used as a shared interface between the devices. Although there are other more advanced methods to add a cluster into a network, the suggested design is to use an active/active cluster.
Configuring Interfaces
A firewall without interfaces is like a car without tires—it’s just not going to get you very far. In the case of chassis clusters, there are two different options: the reth, and the local interface. A reth is a special type of interface that integrates the features of an aggregate Ethernet interface together with redundancy groups.
Before redundant Ethernet interfaces are created, the total number of interfaces in the chassis must be specified. This is required because the reth is effectively an aggregate Ethernet interface, and an interface needs to be provisioned before it can work.
Note
It is suggested that you only provision the total number of interfaces that are required to conserve resources.
Let’s set the number of interfaces in the chassis and then move on to create redundancy groups 1+ and configure the interfaces.
{primary:node0}[edit chassis cluster] root@SRX210-A#set reth-count 2
{primary:node0}[edit chassis cluster] root@SRX210-A#show
control-link-recovery; reth-count 2; redundancy-group 0 { node 0 priority 254; node 1 priority 1; } redundancy-group 1 { node 0 priority 254; node 1 priority 1; } {primary:node0}[edit chassis cluster] root@SRX210-A# {primary:node0} root@SRX210-A>show interfaces terse | match reth
reth0 up up reth1 up up
Each SRX platform has a maximum number of reths that it can support, as listed in Table 7-5.
Platform | Redundant Ethernet interfaces |
SRX100 | 8 |
SRX110 | 8 |
SRX210 | 8 |
SRX220 | 8 |
SRX240 | 24 |
SRX550 | 58 |
SRX650 | 68 |
SRX1400 | 128 |
SRX3400 | 128 |
SRX3600 | 128 |
SRX5600 | 128 |
SRX5800 | 128 |
Now let’s create a reth. When using a reth, each member of the cluster has one or more local interfaces that participate in the reth.
{primary:node0}[edit interfaces] root@SRX210-A#set fe-0/0/2 fastether-options redundant-parent reth0
{primary:node0}[edit interfaces] root@SRX210-A#set fe-2/0/2 fastether-options redundant-parent reth0
{primary:node0}[edit interfaces] root@SRX210-A#set reth0.0 family inet address 172.16.0.1/24
{primary:node0}[edit] root@SRX210-A#set interfaces reth0 redundant-ether-options redundancy-group
1 {primary:node0}[edit interfaces] root@SRX210-A#show
fe-0/0/2 { fastether-options { redundant-parent reth0; } } fe-2/0/2 { fastether-options { redundant-parent reth0; } } fab0 { fabric-options { member-interfaces { fe-0/0/4; fe-0/0/5; } } } fab1 { fabric-options { member-interfaces { fe-2/0/4; fe-2/0/5; } } } fxp0 { unit 0 { family inet { address 10.0.1.212/24 { master-only; } } } } reth0 { redundant-ether-options { redundancy-group 1; } unit 0 { family inet { address 172.16.0.1/24; } } } {primary:node0}[edit] root@SRX210-A#
In this configuration example, interfaces fe-0/0/2 and fe-2/0/2 have reth0 specified as their parent. Then the reth0 interface is specified as a member of redundancy group 1, and finally the interface is given an IP address. From here the interface can be configured with a zone so that it can be used in security policies for passing network traffic.
After commit, there are two places to validate that the interface is functioning properly, as shown in the following output. First, the user can look at the interface listing to show the child links and also the reth itself. Second, under the chassis cluster status, Junos shows if the interface is up or not. The reason to use the second method of validation is that although the child links might be physically up, the redundancy groups might have a problem, and the interface could be down as far as jsrpd is concerned (we discussed this in the section Cluster ID in this chapter).
{primary:node0} root@SRX210-A>show interfaces terse | match reth0
fe-0/0/2.0 up up aenet --> reth0.0 fe-2/0/2.0 up up aenet --> reth0.0 reth0 up up reth0.0 up up inet 172.16.0.1/24 {primary:node0} root@SRX210-A>show chassis cluster interfaces
Control link 0 name: fxp1 Redundant-ethernet Information: Name Status Redundancy-group reth0 Up 1 reth1 Down Not configured {primary:node0} root@SRX210-A>
With the data center SRX firewalls, it’s possible to utilize multiple child links per node in the cluster, meaning that each node can have up to eight links configured together for its reth interface. The requirement for this to work is that both nodes must have the same number of links on each chassis. It works exactly like a traditional reth where only one chassis will have its links active, and the secondary node’s links are still waiting until a failover occurs. Configuring this is similar to what was done before; the noted difference is that additional interfaces are made child members of the reth.
{primary:node0}[edit interfaces root@SRX5800-1#set xe-6/2/0 gigether-options redundant-parent reth0
{primary:node0}[edit interfaces] root@SRX5800-1#set xe-6/3/0 gigether-options redundant-parent reth1
{primary:node0}[edit interfaces] root@SRX5800-1#set xe-18/2/0 gigether-options redundant-parent reth0
{primary:node0}[edit interfaces] root@SRX5800-1#set xe-18/3/0 gigether-options redundant-parent reth1
{primary:node0}[edit interfaces] root@SRX5800-1#show interfaces
xe-6/0/0 { gigether-options { redundant-parent reth0; } } xe-6/1/0 { gigether-options { redundant-parent reth1; } } xe-6/2/0 { gigether-options { redundant-parent reth0; } } xe-6/3/0 { gigether-options { redundant-parent reth1; } } xe-18/0/0 { gigether-options { redundant-parent reth0; } } xe-18/1/0 { gigether-options { redundant-parent reth1; } } xe-18/2/0 { gigether-options { redundant-parent reth0; } } xe-18/3/0 { gigether-options { redundant-parent reth1; } } reth0 { redundant-ether-options { redundancy-group 1; } unit 0 { family inet { address 1.0.0.1/16; } } } reth1 { redundant-ether-options { redundancy-group 1; } unit 0 { family inet { address 2.0.0.1/16; } } } {primary:node0}[edit] root@SRX5800-1# {primary:node0} root@SRX5800-1>show interfaces terse | match reth
xe-6/0/0.0 up up aenet --> reth0.0 xe-6/1/0.0 up up aenet --> reth1.0 xe-6/2/0.0 up down aenet --> reth0.0 xe-6/3/0.0 up down aenet --> reth1.0 xe-18/0/0.0 up up aenet --> reth0.0 xe-18/1/0.0 up up aenet --> reth1.0 xe-18/2/0.0 up up aenet --> reth0.0 xe-18/3/0.0 up up aenet --> reth1.0 reth0 up up reth0.0 up up inet 1.0.0.1/16 reth1 up up reth1.0 up up inet 2.0.0.1/16 {primary:node0} root@SRX5800-1>show chassis cluster interfaces
Control link 0 name: em0 Control link 1 name: em1 Redundant-ethernet Information: Name Status Redundancy-group reth0 Up 1 reth1 Up 1 {primary:node0} root@SRX5800-1>
As seen here, the configuration is identical except that additional interfaces are added as members of the reth. As far as the switch it is connected to, the interface is considered an aggregate Ethernet, link agg group, or EtherChannel depending on the vendor. It’s also possible to use LACP as well.
When a failover occurs to the secondary node, the node
must announce to the world that it is now owner of the MAC address
associated with the reth interface (because the reth’s MAC is shared
between nodes). It does this using GARPs, ARPs that are broadcast but
not specifically requested. Once a GARP is sent, the local switch will
be able to update its MAC table to map which port the MAC address is
associated with. By default, the SRX sends four GARPs per reth on a
failover. These are sent from the control plane and out through the data
plane. To modify the number of GARPs sent, this must be configured on a
per-redundancy-group basis. Use the set
gratuitous-arp-count
command and a parameter between 1 and
16.
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX210-A#set gratuitous-arp-count 5
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX210-A#show
node 0 priority 254; node 1 priority 1; gratuitous-arp-count 5; {primary:node0}[edit] root@SRX210-A#
One last item to mention is the use of local interfaces. A local interface is not bound or configured to a redundancy group; it’s exactly what the name means: a local interface. It is configured like any traditional type of interface on a Junos device and is used in an active/active scenario. It does not have a backup interface on the second device.
Fault Monitoring
“In the event of a failure, your seat cushion may be used as a flotation device.” If your plane were to crash and you were given notice, you would take the appropriate action to prevent disaster. When working with a chassis cluster, an administrator wants to see the smoke before the fire. That is what happens when an administrator configures monitoring options in the chassis cluster. The administrator is looking to see if the plane is going down so that she can take evasive action before it’s too late. By default, the SRX monitors for various internal failures such as hardware and software issues. But what if other events occur, such as interfaces failing or upstream gateways going away? If the administrator wants to take action based on these events, she must configure the SRX to take action.
The SRX monitoring options are configured on a per-redundancy-group basis, meaning that if specific items were to fail, that redundancy group can fail over to the other chassis. In complex topologies, this gives the administrator extremely flexible options on what to fail over and when. Two integrated features can be used to monitor the redundancy groups: interface monitoring and IP monitoring.
And there are two situations the SRXs can be in when a failure occurs. The first is that the SRXs are communicating and the two nodes in the cluster are both functional. If this is the case, and a failure occurs, the failover between the two nodes will be extremely fast because the two nodes can quickly transfer responsibility for passing traffic between them. The second scenario is when the two nodes lose communication. This could be caused by a loss of power or other factors. In this case, all heartbeats between the chassis must be missed before the secondary node can take over for the primary, taking anywhere from 3 to 16 seconds, depending on the platform.
In this section, each failure scenario is outlined so that the administrator can gain a complete understanding of what to expect if or when a failure occurs.
Interface Monitoring
Interface monitoring monitors the physical status of an interface. It checks to see if the interface is in an up or down state. When one or more monitored interfaces fail, the redundancy group fails over to the other node in the cluster.
The determining factor is when a specific weight is met, and in this case it is 255. The weight of 255 is the redundancy group threshold that is shared between interface monitoring and IP monitoring. Once enough interfaces have failed to meet this weight, the failover for the redundancy group occurs. In most situations, interface monitoring is configured in such a way that if one interface were to fail, the entire redundancy group would fail over. However, it could be configured that two interfaces need to fail. In this first configuration, only one interface needs to fail to initiate a failover.
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX210-A#set interface-monitor fe-0/0/2 weight 255
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX210-A#set interface-monitor fe-2/0/2 weight 255
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX210-A#show
node 0 priority 254; node 1 priority 1; interface-monitor { fe-0/0/2 weight 255; fe-2/0/2 weight 255; } {primary:node0}[edit chassis cluster redundancy-group 1] root@SRX210-A# root@SRX210-A>show chassis cluster interfaces
Control link 0 name: fxp1 Redundant-ethernet Information: Name Status Redundancy-group reth0 Up 1 reth1 Down Not configured Interface Monitoring: Interface Weight Status Redundancy-group fe-2/0/2 255 Up 1 fe-0/0/2 255 Up 1 {primary:node0} root@SRX210-A>
In this example, interfaces fe-0/0/2 and fe-2/0/2 are configured with a weight of 255. In the event that either interface fails, the redundancy group will fail over.
In the next example, the interface has failed. Node 0 immediately becomes secondary and its priority becomes zero for redundancy group 1. This means it will only be used as a last resort for the primary of redundancy group 1. After restoring the cables, everything becomes normal again.
{primary:node0} root@SRX210-A>show chassis cluster interfaces
Control link 0 name: fxp1 Redundant-ethernet Information: Name Status Redundancy-group reth0 Up 1 reth1 Down Not configured Interface Monitoring: Interface Weight Status Redundancy-group fe-2/0/2 255 Up 1 fe-0/0/2 255 Down 1 {primary:node0} root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 2 node0 0 secondary no no node1 1 primary no no {primary:node0} root@SRX210-A>
In this example:
{primary:node0}[edit] root@SRX210-A#set interfaces fe-0/0/3 fastether-options redundant-parent reth1
{primary:node0}[edit] root@SRX210-A#set interfaces fe-2/0/3 fastether-options redundant-parent reth1
{primary:node0}[edit] root@SRX210-A#set interfaces reth0 redundant-ether-options redundancy-group 1
{primary:node0}[edit] root@SRX210-A#set interfaces reth1 redundant-ether-options redundancy-group 1
{primary:node0}[edit] root@SRX210-A#set interfaces reth1.0 family inet address 172.17.0.1/24
{primary:node0}[edit] root@SRX210-A#show interfaces ## Truncated to only show these interfaces
fe-0/0/3 { fastether-options { redundant-parent reth1; } } fe-2/0/3 { fastether-options { redundant-parent reth1; } } reth1 { redundant-ether-options { redundancy-group 1; } unit 0 { family inet { address 172.17.0.1/24; } } } {primary:node0}[edit chassis cluster redundancy-group 1] root@SRX210-A#set interface-monitor fe-0/0/2 weight 128
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX210-A#set interface-monitor fe-2/0/2 weight
128 {primary:node0}[edit chassis cluster redundancy-group 1] root@SRX210-A#show
node 0 priority 254; node 1 priority 1; interface-monitor { fe-0/0/2 weight 128; fe-2/0/2 weight 128; } {primary:node0}[edit chassis cluster redundancy-group 1] root@SRX210-A# {primary:node0} root@SRX210-A>show chassis cluster interfaces
Control link 0 name: fxp1 Redundant-ethernet Information: Name Status Redundancy-group reth0 Up 1 reth1 Up 1 Interface Monitoring: Interface Weight Status Redundancy-group fe-2/0/2 128 Up 1 fe-0/0/2 128 Up 1 {primary:node0} root@SRX210-A>
Both interfaces are needed to trigger a failover. The next sequence shows where node 0 will lose one interface from each of its reths. This causes a failover to occur on node 1.
{primary:node0}[edit] root@SRX210-A#show chassis cluster redundancy-group 1
node 0 priority 254; node 1 priority 1; interface-monitor { fe-0/0/2 weight 128; fe-0/0/3 weight 128; } {primary:node0}[edit] root@SRX210-A# {primary:node0} root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 3 node0 254 primary no no node1 1 secondary no no {primary:node0} root@SRX210-A>show chassis cluster interfaces
Control link 0 name: fxp1 Redundant-ethernet Information: Name Status Redundancy-group reth0 Up 1 reth1 Up 1 Interface Monitoring: Interface Weight Status Redundancy-group fe-0/0/3 128 Up 1 fe-0/0/2 128 Up 1 {primary:node0} root@SRX210-A> {primary:node0} root@SRX210-A>show chassis cluster interfaces
Control link 0 name: fxp1 Redundant-ethernet Information: Name Status Redundancy-group reth0 Up 1 reth1 Up 1 Interface Monitoring: Interface Weight Status Redundancy-group fe-0/0/3 128 Down 1 fe-0/0/2 128 Down 1 {primary:node0} root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 4 node0 0 secondary no no node1 1 primary no no {primary:node0} root@SRX210-A>
Here it required both interfaces to go down to fail over to the other node.
Only physical interfaces can be monitored. The reths themselves can’t be monitored.
IP Monitoring
IP monitoring allows for the monitoring of upstream gateways. When using IP monitoring, the ping probe validates the entire end-to-end path from the SRX to the remote node and back. The feature is typically used to monitor its next hop gateway, ensuring the gateway is ready to accept packets from the SRX. This is key, as the SRX’s link to its local switch could be working but the upstream devices might not.
IP monitoring is configured per redundancy group and has some similarities to interface monitoring. It also uses weights, and when the weights add up to exceed the redundancy group weight, a failover is triggered. But with IP monitoring, the SRX is monitoring remote gateways, not interfaces.
In each redundancy group there are four global options that affect all of the hosts that are to be monitored:
The first option is the global weight. This is the weight that is subtracted from the redundancy group weight for all of the hosts being monitored.
The second option is the global threshold. This is the number that needs to be met or exceeded by all of the cumulative weights of the monitored IPs to trigger a failover.
The last two options are the retry attempts for the ping. The first is the retry count, which is the number of times to retry between failures. The minimum setting is five retries.
The last is the retry interval, and this value specifies the number of seconds between replies. The default retry time is one second.
Here the configuration options can be seen using the help prompt.
root@SRX5800-1# set redundancy-group 1 ip-monitoring ?
Possible completions:
+ apply-groups Groups from which to inherit configuration data
+ apply-groups-except Don't inherit configuration data from these groups
> family Define protocol family
global-threshold Define global threshold for IP monitoring (0..255)
global-weight Define global weight for IP monitoring (0..255)
retry-count Number of retries needed to declare reachablity failure
(5..15)
retry-interval Define the time interval in seconds between retries.
(1..30)
{primary:node0}[edit chassis cluster]
root@SRX5800-1#
These IP monitoring options can be overwhelming, but they are designed to give the user more flexibility. The redundancy group can be configured to fail over if one or more of the monitored IPs fail or if a combination of the monitored IPs and interfaces fail.
In the next example, two monitored IPs are going to be configured. Both of them need to fail to trigger a redundancy group failure. The SRX will use routing to resolve which interface should be used to ping the remote host (you could also go across virtual routers as of Junos 10.1 and later).
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX5800-1#set ip-monitoring family inet 1.2.3.4 weight 128
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX5800-1#set ip-monitoring family inet 1.3.4.5 weight 128
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX5800-1#show
node 0 priority 200; node 1 priority 100; ip-monitoring { global-weight 255; global-threshold 255; family { inet { 1.2.3.4 weight 128; 1.3.4.5 weight 128; } } } {primary:node0}[edit chassis cluster redundancy-group 1] root@SRX5800-1# {primary:node0}[edit chassis cluster redundancy-group 1] root@SRX5800-1#run show chassis cluster ip-monitoring status
node0: -------------------------------------------------------------------------- Redundancy group: 1 IP address Status Failure count Reason 1.3.4.5 unreachable 1 redundancy-group state unknown 1.2.3.4 unreachable 1 redundancy-group state unknown node1: --------------------------------------------------------------------- Redundancy group: 1 IP address Status Failure count Reason 1.3.4.5 unreachable 1 redundancy-group state unknown 1.2.3.4 unreachable 1 redundancy-group state unknown {primary:node0}[edit chassis cluster redundancy-group 1] root@SRX5800-1#run show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 200 primary no no node1 100 secondary no no Redundancy group: 1 , Failover count: 1 node0 0 primary no no node1 0 secondary no no {primary:node0}[edit chassis cluster redundancy-group 1] root@SRX5800-1#
After you have studied that, the next example uses a combination of both IP monitoring and interface monitoring, and it shows how the combined weight of the two will trigger a failover.
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX5800-1#show
node 0 priority 200; node 1 priority 100; interface-monitor { xe-6/1/0 weight 255; } ip-monitoring { global-weight 255; global-threshold 255; family { inet { 1.2.3.4 weight 128; } } } {primary:node0}[edit] root@SRX5800-1#run show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 200 primary no no node1 100 secondary no no Redundancy group: 1 , Failover count: 2 node0 200 secondary no no node1 100 primary no no {primary:node0}[edit] root@SRX5800-1#run show chassis cluster ip-monitoring status
node0: ------------------------------------------------------------------- Redundancy group: 1 IP address Status Failure count Reason 1.2.3.4 unreachable 1 redundancy-group state unknown node1: -------------------------------------------------------------------- Redundancy group: 1 IP address Status Failure count Reason 1.2.3.4 unreachable 1 redundancy-group state unknown {primary:node0}[edit] root@SRX5800-1# run show chassis cluster interfaces ? Possible completions: <[Enter]> Execute this command | Pipe through a command {primary:node0}[edit] root@SRX5800-1#run show chassis cluster interfaces
Control link 0 name: em0 Control link 1 name: em1 Redundant-ethernet Information: Name Status Redundancy-group reth0 Up 1 reth1 Up 1 reth2 Down 1 reth3 Up 1 Interface Monitoring: Interface Weight Status Redundancy-group xe-6/1/0 128 Up 1 {primary:node0}[edit] root@SRX5800-1#
Here the ping for IP monitoring is sourced from the reth’s active device, with the IP address configured on the specified interface. Optionally, it’s possible to configure a secondary IP to trigger the ping to come from the configured secondary IP address and from the backup interface, allowing the administrator to check the backup path coming from the secondary node. This would ensure that before a failover occurs, the backup path is working. Let’s configure this option. It only takes one additional step per monitored IP.
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX5800-1#set ip-monitoring family inet 1.2.3.4 weight 255
interfacereth0.0 secondary-ip-address 1.0.0.10
{primary:node0}[edit chassis cluster redundancy-group 1] root@SRX5800-1#show
node 0 priority 200; node 1 priority 100; ip-monitoring { global-weight 255; global-threshold 255; family { inet { 1.2.3.4 { weight 255; interface reth0.0 secondary-ip-address 1.0.0.10; } } } } {primary:node0}[edit] root@SRX5800-1#run show chassis cluster ip-monitoring status
node0: ---------------------------------------------------------------------- Redundancy group: 1 IP address Status Failure count Reason 1.2.3.4 unreachable 0 no route to host node1: ---------------------------------------------------------------------- Redundancy group: 1 IP address Status Failure count Reason 1.2.3.4 unreachable 0 no route to host {primary:node0}[edit] root@SRX5800-1#
The SRX5000 Series products can create up to 64 monitored IPs and the SRX3000 Series can create 32. The ping is generated from the second SPU on the system, which is the first non-CP SPU, and because of that, it is not limited to scheduling or processing restrictions found on the RE. The branch devices operate slightly differently. The best practice is to minimize the total number of monitored hosts to two on the branch devices. The more devices that you add, the more difficult it could be to ensure the device has the processing to monitor the remote nodes.
Hardware Monitoring
On the SRX, there is a daemon running called chassisd. This process is designed to run and control the system hardware, and it is also used to monitor for faults. If the chassisd determines that the system has experienced specific faults, it will trigger a failover to the other node. Depending on the SRX platform, various components can fail before a complete failover is triggered.
The majority of the branch platforms are not component-based. This means the entire system consists of a single board, and if anything were to go wrong on the main board, generally the complete system would fail. The branch SRX devices also have interface cards, and if the cards fail, the local interfaces are lost. Interface monitoring can be used to detect if the interface has failed.
The data center devices are a different story. These devices have many different boards and system components, and because of this, the failover scenarios can get fairly complex. Both Juniper Networks and customers thoroughly test the reliability of the devices, and each component is failed in a matrix of testing scenarios to ensure that failovers are correctly covered.
Route engine
The RE is the local brain of a chassis. Its job is to maintain control over the local cards in the chassis. It ensures that all of them are up and running and it allows the administrator to manage the device. If the RE fails, it can no longer control the local chassis, and if that RE was the primary for the cluster, the secondary engineer will pause until enough heartbeats are missed that it assumes mastership.
During this period, the local chassis will continue to forward (the data plane without an RE will continue to run for up to three minutes), but as soon as the other RE contacts the SPUs, they will no longer process traffic. By this time, the secondary data plane will have taken over for the traffic.
In the event that the secondary RE fails, that chassis immediately becomes lost. After the heartbeat threshold is passed, the primary RE will assume the other chassis has failed, and any active traffic running on the chassis in redundancy groups will fail over to the remaining node. Traffic that used local interfaces must use another protocol, such as OSPF, to fail over to the other node.
Switch control board
The switch control board is a component that is unique to the SRX5000 Series. This component contains three important systems: the switch fabric, the control plane network, and the carrier slot for the RE. It’s a fairly complex component, as it effectively connects everything in the device. The SRX5600 requires one SCB and can have a second for redundancy. The SRX5800 requires two SCBs and can have a third for redundancy.
If an SCB fails in the SRX5600, it will fail over to the second SCB. Its redundancy, however, causes a brief blip in traffic and then things start moving along. The second SCB also requires the use of a local RE, the same simple RE that is used to bring up dual control links. The second RE is needed to activate the local control plane switching chip on the second SCB—if this was not in place, the RE would be unable to talk to the rest of the chassis.
The SRX5800’s behavior is different because, by default, it has two SCBs. These are required to provide full throughput to the entire chassis, and if one were to fail, the throughput would be halved until a new SCB is brought online. The same conditions as the SRX5600 also apply here. If the SCB containing the RE were to fail, a secondary RE would need to be in the second SCB to provide the backup control network for the RE to communicate. If the SCB that does not contain the primary RE fails, the maximum throughput of the chassis is cut in half. This means all of the paths in the box are halved. If a third SCB is installed, it will take over for either of the failed SCBs. It cannot provide a redundant control link as it is not able to contain an RE, and when the switchover happens to the third SCB, it will briefly interrupt traffic as the switchover occurs.
Now, all of this should pose a question to the careful reader: if the RE is contained in an SCB and the SCB fails, will this affect the RE? The answer depends on the type of failure. If the fabric chips fail, the RE will be fine, as the SCB simply extends the connections from the back plane into the RE. The engineers put the RE in the SCB to conserve slots in the chassis and reserve them for traffic processing cards. It is possible for an SCB to fail in such a way that it will disable the engineer; it’s unlikely, but possible.
Switch fabric board
The SFB is a component unique to the SRX3000 Series platform. It contains the switch fabric, the primary control plane switch, the secondary control plane switch, an interface card, and the control ports. If this component were to fail, the chassis would effectively be lost. The SFB’s individual components can fail as well, causing various levels of device degradation. In the end, once the integrity of the card is lost, the services residing in that chassis will fail over to the remaining node.
Services Processing Card/Next Generation Services Processing Card
The SPC contains one, two, or up to four SPUs, depending on the model of the SRX. Each SPU is monitored directly by the SRX’s local RE chassisd process. If any SPU fails, several events will immediately occur. The RE will reset all of the cards on the data plane, including interfaces and NPCs. Such an SPU failure causes the chassis monitoring threshold to hit 255. This causes all of the data plane services to fail over to the secondary chassis. Messages relating to SPUs failing can be seen in the jsrpd logs. The entire data plane is reset because it is easier to ensure that everything is up and running after a clean restart, rather than having to validate many individual subsystems. Each subsystem is validated after a clean restart of the chassis.
Network Processing Card
A separate NPC is unique to the SRX3000 Series (these items are located on the interface cards on the SRX5000). They were separated out to lower the component costs and to lower the overall cost of the chassis. The SRX3000 has static bindings to each interface. So if an NPC were to fail, the interface bound to it would effectively be lost, as it would not have access to the switching fabric. The chassis will be able to detect this by using simple health checks; alternatively, IP monitoring can be used to validate the next hop. This message would be sent from the SPC and then through the NPC. Because the NPC has failed, the messages will not make it out of the chassis. At this point, IP monitoring triggers a failover to the other node. The NPC failure ultimately triggers a failover to the remaining node in the chassis, and the chassis with the failure restarts all of the cards. If the NPC with the failure is unable to restart, the interfaces are mapped to new NPCs, assuming there are some remaining. Although the device can run in a degraded state, it’s best to leave all of the traffic on the good node and replace the failed component.
Interface card
The SRX data center devices have both types of interface cards, often referred to as input/output cards. However, there are stark differences between the two. The IOCs on the SRX3000 contain a switching chip (used to connect multiple interfaces to a single bus) and a Field Programmable Gate Array (FPGA) to connect into the fabric. The IOCs on the SRX5000s contain two or more sets of NPUs, fabric connect chips, and physical interfaces. If an SRX5000 Series interface card fails and it does not contain a monitored interface, or the only fabric link, the SRX will rely on the administrator to use interface monitoring or IP monitoring to detect a failure. The same is true with the SRX3000 Series platforms. On the SRX5000 Series, it is also possible to hot-swap interfaces to replace the card, whereas the SRX3000 requires that the chassis be powered off to replace a card.
Control link
The control link is a critical component in the system. It allows the two brains, the REs, to talk to each other. If the control link physically goes down and the fabric link is up, the secondary RE immediately goes into ineligible state. Eventually, it will go into disabled state. Once it becomes ineligible, the only way to recover the secondary node is to reboot the device. If control link recovery is enabled, the device will reboot itself after one minute of successful communications. (Using control link recovery is the best option in this scenario, as it allows the device to reboot when it knows communications are working correctly.) The important item here is that for this scenario to work, at least one fabric link must still be up. With the fabric link still remaining, the primary RE knows that the secondary is still alive but a problem has occurred.
The secondary node goes into disabled state to prevent split brain (the state when two devices both think they are master). If this occurs, effectively two nodes are fighting to be the primary node for the cluster. They will use GARPs to try to take over and process the traffic on the network, typically causing an outage. This is a good reason you should use dual control links and data links when possible.
Data link
The data link uses jsrpd heartbeat messages to validate that the path is up and is actively working. This is similar to the control link. However, the data link is more forgiving. It can take up to 120 seconds for the data link to detect that it is down. This is because it’s possible for the data link to get completely full of RTOs, or data forwarding messages, hence the data link is more forgiving in missing messages from the other node. However, after the required amount of time has passed, the secondary node will become disabled just like the control link. There isn’t an automatic reboot like the control link—the secondary node must be manually rebooted to recover it.
To increase stability of the SRXs, the data link is no longer monitored. This was changed to allow administrators to move the data link cables around without impacting the cluster.
Control link and data link failure
Rarely do both the fabric and data links go down at the same time, meaning within the same second. But we all know this can occur for all sorts of reasons, from hardware failures to a machete-wielding utility helper chopping up cables in the data center. It’s a common request and test that Juniper Networks receives, so it’s best that we cover it rather than leave administrators wondering.
If the control link and the data link were to fail at the same time, the worst possible scenario would occur, split brain, leaving the cluster members thinking the other node has failed, which effectively causes an outage. There are several ways to prevent this.
It is possible to use dual fabric links on the branch SRX Series devices. Even if a control link and a fabric link were to fail, the last remaining control link would prevent split brain from occurring. So, generally speaking, split brain will not occur on the branch platforms.
For the data center platforms, utilizing dual control links and dual fabric links will provide the same level of split brain prevention, with one point to note. On the SRX5000 Series, the control port is on the SPC, and the SPC containing the CP is a part of the data path. So, if the administrator were to configure the control link, and the CP is on the same SPC and it failed, split brain would occur. The CP is always located on the SPU in the lowest numbered FPC and PIC slot.
The best practice is to place the control port on any SPC other than the one containing the CP. If redundant control links are required, it’s best to place them on two separate SPCs. This would mean that on an SRX5000 Series, three SPCs—one containing the CP and the other two each containing a control link—would be used for the best level of redundancy. The same goes for the fabric link. Placing each redundant fabric link on a separate SPC would be the best practice for availability. Although this might seem like overkill, if ultimate availability is required, this is the suggested deployment.
Although it does look like the CP is a single point of failure, that isn’t true. If the CP fails, the data plane will be reset. As soon as the SPCs receive power, the control links will come up rapidly, allowing the cluster to continue control plane communications and preventing split brain.
Power supplies
It’s obvious that if the device’s sole source of power fails, the device shuts off. This will cause the remaining node to perform Dead Peer Detection (DPD) to determine if the other node is alive. DPD is done with jsrpd heartbeats. If the remaining node is the primary device for the control and data planes, it continues to forward traffic as is. It notes that the other node is down because it cannot communicate with it. If the remaining node was secondary, it will wait until all of the heartbeats are missed before it determines that the node has failed. Once the heartbeats have been passed, it assumes mastership of the node.
For devices with redundant power supplies, the remaining power supply will power the chassis and it will continue to operate. This is applicable to the SRX650 and the SRX3400. The SRX3600 has up to four power supplies, and it requires at least two to operate. The other two are used for redundancy. So, in the best availability deployment, four should be deployed.
The SRX5000 Series devices each have up to four power supplies. At a suggested minimum, three should be used. Depending on the total number of cards running in the chassis, a single power supply can be used. If the total draw from the installed components exceeds the available power, all of the cards will be turned off. The RE will continue attempting to start the cards until the power is available. It’s always best to deploy the SRXs with the highest amount of available power supply to ensure availability.
Software Monitoring
The SRX is set up to monitor the software that is running, and this is true for both the control and data planes. The SRX attempts to detect a failure within the system as soon as it happens, and if and when it can detect a failure within the system, it must react accordingly. The SRX platform has some fairly complex internals, but it is built to protect against failures. So if the RE has a process that fails, it can restart it, and the failure is logged for additional troubleshooting.
The branch’s data plane consists of a core flowd process. The RE is in constant communication to watch if it is acting correctly. In the event that the flowd process crashes or hangs up, the control plane quickly fails over to the other node. This will happen in less than the time it would take to detect a dead node. In any failure case where the two nodes are still in communication, the failover time is quite fast. These cases include IP monitoring, manual failover, and interface monitoring.
On the data center SRX’s data plane, each SPU has both control and data software running on it. The RE talks directly to each SPU’s control software for status updates and for configuration changes. Because of this, the RE will know if the data plane fails. If the flowd processes crash on the data plane (there is one per SPU), the entire data plane will be hard-reset, which means all of the line cards will be reset. This is done to ensure that the control plane is completely up to an acceptable running standard. To do this, the data plane is failed over to the secondary node.
Preserving the Control Plane
If a device is set up to rapidly fail over, it’s possible that it could be jumping the gun and attempting a failover for no reason. When it’s time to move between two firewalls, it’s best to ensure that the time is correct for the failover. There are methods in dynamic routing to do extremely fast failover using a protocol called bidirectional forwarding detection (BFD). This protocol is used in conjunction with a routing protocol such as OSPF. It can provide 50-ms failovers. That is extremely fast but provides little threat to the network. In this case, BFD is rerouting around a link or device failure typically in a stateless manner. Because it’s done stateless, there is little threat to the traffic.
When a stateful firewall does a failover, there is much more in play than simply rerouting traffic. The new device needs to accept all of the traffic and match up the packets with the existing sessions that are synchronized to the second node. Also, the primary device needs to relinquish control of the traffic. On the data plane, it’s a fairly stable process to fail over and fail back between nodes. In fact, this can be done rapidly and nearly continuously without much worry. It’s best to let the control plane fail over only in the event of a failure, as there simply isn’t a need to fail over the control plane unless a critical event occurs.
The biggest reason for any concern is that the control plane talks to the various daemons on the other chassis and on the data plane. If rapid failover were to occur, it’s possible to destabilize the control plane. This is not a rule, it’s an exception, just as the owner of a car isn’t going to jam the car into reverse on the highway. Often, administrators want to test the limits of the SRX and drop them off shelves and whatnot, so it’s fair to call this out as a warning before it’s tested in production.
Troubleshooting and Operation
From time to time things can go wrong. You can be driving along in your car and a tire can blow out; sometimes a firewall can crash. Nothing that is made by humans is precluded from undergoing an unseen failure. Because of this, the administrator must be prepared to deal with the worst possible scenarios. In this section, we discuss various methods that show the administrator how to troubleshoot a chassis cluster gone awry.
First Steps
There are a few commands to use when trying to look into an issue. The administrator needs to first identify the cluster status and determine if it is communicating.
The show chassis cluster
status
command, although simple in nature, shows the
administrator the status of the cluster. It shows who is the primary
member for each redundancy group and the status of those nodes, and it
will give insight into who should be passing traffic in the network.
Here’s a sample:
{primary:node1}
root@SRX210-B> show chassis cluster status
Cluster ID: 1
Node Priority Status Preempt Manual failover
Redundancy group: 0 , Failover count: 1
node0 254 secondary no no
node1 1 primary no no
Redundancy group: 1 , Failover count: 2
node0 254 primary no no
node1 1 secondary no no
{primary:node1}
root@SRX210-B>
You should have seen this many times in this chapter, as it is used frequently. Things to look for here are that both nodes show as up; both have a priority greater than zero; both have a status of either primary, secondary, or secondary-hold; and one and only one node is primary for each redundancy group. Generally, if those conditions are met, things in the cluster should be looking okay. If not, and for some reason one of the nodes does not show up in this output, communication to the other node has been lost. The administrator should then connect to the other node and verify that it can communicate.
To validate that the two nodes can communicate, the
show chassis cluster control-plane
statistics
command is used, showing the messages that are
being sent between the two members. The send and receive numbers should
be incrementing between the two nodes. If they are not, something might
be wrong with both the control and fabric links. Here is an example with
the statistics in bold:
{primary:node0}
root@SRX210-A> show chassis cluster control-plane statistics
Control link statistics:
Control link 0:
Heartbeat packets sent: 124
Heartbeat packets received: 95
Heartbeat packet errors: 0
Fabric link statistics:
Probes sent: 122
Probes received: 56
Probe errors: 0
{primary:node0}
root@SRX210-A>
Again, this command should be familiar as it has been used in this chapter. If these (boldface) numbers are not increasing, check the fabric and control plane interfaces. The fabric interfaces method is the same across all SRX products.
Next let’s check the fabric links. It’s important to verify that the fabric link and the child links show they are in an up state.
{primary:node0}
root@SRX210-A> show interfaces terse
Interface Admin Link Proto Local Remote
--snip--
fe-0/0/4.0 up up aenet --> fab0.0
fe-0/0/5 up up
fe-0/0/5.0 up up aenet --> fab0.0
--snip--
fe-2/0/4.0 up up aenet --> fab1.0
fe-2/0/5 up up
fe-2/0/5.0 up up aenet --> fab1.0
--snip--
fab0 up up
fab0.0 up up inet 30.17.0.200/24
fab1 up up
fab1.0 up up inet 30.18.0.200/24
--snip--
{primary:node0}
root@SRX210-A>
If any of the child links of the fabric link, fabX
, show in a down state, this would show
the interface that is physically down on the node. This must be restored
to enable communications.
The control link is the most critical to verify, and it varies per SRX platform type. On the branch devices, the interface that is configured as the control link must be checked. This is specified in Table 7-2’s control ports by platform. The procedure would be the same as any physical interface. Here an example from an SRX210 was used, and it shows that the specified interfaces are up.
{primary:node0}
root@SRX210-A> show interfaces terse
Interface Admin Link Proto Local Remote
--snip--
fe-0/0/7 up up
--snip--
fe-2/0/7 up up
--snip--
{primary:node0}
root@SRX210-A>
On the data center SRXs, there is no direct way to check the state of the control ports; because the ports are dedicated off of switches inside the SRX and they are not typical interfaces, it’s not possible to check them. It is possible, however, to check the switch that is on the SCB to ensure that packets are being received from that card. Generally, though, if the port is up and configured correctly, there should be no reason why it won’t communicate. But checking the internal switch should show that packets are passing from the SPC to the RE. There will also be other communications coming from the card as well, but this at least provides insight into the communication. To check, the node and FPC that has the control link must be known. In the following command, the specified port coincides with the FPC number of the SPC with the control port.
{primary:node0}
root@SRX5800-1> show chassis ethernet-switch statistics 1 node 0
node0:
------------------------------------------------------------------
Displaying port statistics for switch 0
Statistics for port 1 connected to device FPC1:
TX Packets 64 Octets 7636786
TX Packets 65-127 Octets 989668
TX Packets 128-255 Octets 37108
TX Packets 256-511 Octets 35685
TX Packets 512-1023 Octets 233238
TX Packets 1024-1518 Octets 374077
TX Packets 1519-2047 Octets 0
TX Packets 2048-4095 Octets 0
TX Packets 4096-9216 Octets 0
TX 1519-1522 Good Vlan frms 0
TX Octets 9306562
TX Multicast Packets 24723
TX Broadcast Packets 219029
TX Single Collision frames 0
TX Mult. Collision frames 0
TX Late Collisions 0
TX Excessive Collisions 0
TX Collision frames 0
TX PAUSEMAC Ctrl Frames 0
TX MAC ctrl frames 0
TX Frame deferred Xmns 0
TX Frame excessive deferl 0
TX Oversize Packets 0
TX Jabbers 0
TX FCS Error Counter 0
TX Fragment Counter 0
TX Byte Counter 1335951885
RX Packets 64 Octets 6672950
RX Packets 65-127 Octets 2226967
RX Packets 128-255 Octets 39459
RX Packets 256-511 Octets 34332
RX Packets 512-1023 Octets 523505
RX Packets 1024-1518 Octets 51945
RX Packets 1519-2047 Octets 0
RX Packets 2048-4095 Octets 0
RX Packets 4096-9216 Octets 0
RX Octets 9549158
RX Multicast Packets 24674
RX Broadcast Packets 364537
RX FCS Errors 0
RX Align Errors 0
RX Fragments 0
RX Symbol errors 0
RX Unsupported opcodes 0
RX Out of Range Length 0
RX False Carrier Errors 0
RX Undersize Packets 0
RX Oversize Packets 0
RX Jabbers 0
RX 1519-1522 Good Vlan frms 0
RX MTU Exceed Counter 0
RX Control Frame Counter 0
RX Pause Frame Counter 0
RX Byte Counter 999614473
{primary:node0}
root@SRX5800-1>
The output looks like standard port statistics from a switch.
Looking in here will validate that packets are coming from the SPC.
Because the SRX3000 has its control ports on the SFB, and there is
nothing to configure for the control ports, there is little to look at
on the interface. It is best to focus on the result from the show chassis cluster control-plane statistics
command.
If checking the interfaces yields mixed results where they seem to be up but they are not passing traffic, it’s possible to reboot the node in the degraded state. The risk here is that the node could come up in split brain. Because that is a possibility, it’s best to disable its interfaces, or physically disable all of them except the control or data link. The ports can even be disabled on the switch to which they are connected. This way, on boot, if the node determines it is master, it will not interrupt traffic. A correctly operating node using the minimal control port and fabric port configuration should be able to communicate to its peer. If, after a reboot, it still cannot communicate to the other node, it’s best to verify the configuration and cabling. Finally, the box or cluster interfaces might be bad.
Checking Interfaces
Interfaces are required to pass traffic through the SRX, and for the SRX to be effective in its job, it needs to have interfaces up and able to pass traffic. The SRX can use both local and redundant Ethernet interfaces, and for our purposes here, both have similar methods of troubleshooting.
To troubleshoot an interface, first check to see if the
interface is physically up. Use the show
interfaces terse
command to quickly see all of the interfaces
in both chassis.
{primary:node0}
root@SRX210-A> show interfaces terse
Interface Admin Link Proto Local Remote
ge-0/0/0 up down
ge-0/0/1 up down
fe-0/0/2 up up
This should be familiar if you’ve been reading through
this chapter, and certainly throughout the book. The other item to check
is the status of the reth within a redundancy group to see if the
interface is up or down inside the reth. It’s possible that the reth
could be physically up but logically down (in the event that there was
an issue on the data plane). To check the status of a reth interface,
use the show chassis cluster
interfaces
command.
root@SRX210-A> show chassis cluster interfaces
Control link 0 name: fxp1
Redundant-ethernet Information:
Name Status Redundancy-group
reth0 Up 1
reth1 Up 1
Interface Monitoring:
Interface Weight Status Redundancy-group
fe-0/0/2 255 Up 1
fe-2/0/2 255 Up 1
{primary:node0}
root@SRX210-A>
If the interfaces are physically up but the redundant interfaces show that they are in a down state, it’s time to look at the data plane.
Verifying the Data Plane
The data plane on the SRX passes and processes the traffic. Because it is an independent component from the RE, it could be down while the administrator is still in the RE. There are a few things to check on the SRX to validate the data plane.
Note
Because the data plane is very different between the branch SRX platform and the data center platform, there will be some variance between the commands.
Verifying the FPCs and PICs is the first step, and this
shows the status of the underlying hardware that needs to be up to
process the data traffic. On the branch SRX, the data plane is a single
multithreaded process, however, so running the show chassis fpc pic-status
command shows the
status of the data plane.
root@SRX210-A> show chassis fpc pic-status
node0:
---------------------------------------------------------------------
Slot 0 Online FPC
PIC 0 Online 2x GE, 6x FE, 1x 3G
node1:
---------------------------------------------------------------------
Slot 0 Online FPC
PIC 0 Online 2x GE, 6x FE, 1x 3G
{primary:node0}
root@SRX210-A>
As you can see, this output is from an SRX210, but the command will list the status of the data plane on each SRX. Here it shows a single FPC and a single PIC. Although the output does not mention anything about flowd or the data plane, the output shows that the SRX is up and ready to pass traffic.
Now let’s show node 1 with a failed data plane.
{primary:node0}
root@SRX210-A> show chassis fpc pic-status
node0:
---------------------------------------------------------------------
Slot 0 Online FPC
PIC 0 Online 2x GE, 6x FE, 1x 3G
node1:
---------------------------------------------------------------------
Slot 0 Offline FPC
{primary:node0}
root@SRX210-A>
Here, node 1’s data plane went offline, caused by the loss of the flowd process. Another event that can be seen is that redundancy groups 1 and greater will have their priority as zero (to be discussed in the next section).
The output of the pic
status
command should correlate with the hardware that is in
the chassis, which can be seen in the output of show chassis hardware
.
{primary:node0}
root@SRX210-A> show chassis hardware
node0:
---------------------------------------------------------------------
Hardware inventory:
Item Version Part number Serial number Description
Chassis AD2609AA0497 SRX210h
Routing Engine REV 28 750-021779 AAAH2307 RE-SRX210-HIGHMEM
FPC 0 FPC
PIC 0 2x GE, 6x FE, 1x 3G
Power Supply 0
node1:
---------------------------------------------------------------------
Hardware inventory:
Item Version Part number Serial number Description
Chassis AD2909AA0346 SRX210h
Routing Engine REV 28 750-021779 AAAH4743 RE-SRX210-HIGHMEM
FPC 0 FPC
PIC 0 2x GE, 6x FE, 1x 3G
Power Supply 0
{primary:node0}
root@SRX210-A>
Here the command shows the hardware in PIC 0, which is the same as
shown in the pic status
command. This
command is more useful on the data center platform because on the data
center SRX, it’s a little more complex, as there are typically many
different processors.
For example, here’s the PIC status of an SRX5800:
{primary:node0}
root@SRX5800-1> show chassis fpc pic-status
node0:
--------------------------------------------------------------------
Slot 0 Online SRX5k SPC
PIC 0 Online SPU Cp
PIC 1 Online SPU Flow
Slot 1 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 3 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 6 Online SRX5k DPC 4X 10GE
PIC 0 Online 1x 10GE(LAN/WAN) RichQ
PIC 1 Online 1x 10GE(LAN/WAN) RichQ
PIC 2 Online 1x 10GE(LAN/WAN) RichQ
PIC 3 Online 1x 10GE(LAN/WAN) RichQ
Slot 11 Online SRX5k DPC 40x 1GE
PIC 0 Online 10x 1GE RichQ
PIC 1 Online 10x 1GE RichQ
PIC 2 Online 10x 1GE RichQ
PIC 3 Online 10x 1GE RichQ
node1:
--------------------------------------------------------------------
Slot 0 Online SRX5k SPC
PIC 0 Online SPU Cp
PIC 1 Online SPU Flow
Slot 1 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 3 Online SRX5k SPC
PIC 0 Online SPU Flow
PIC 1 Online SPU Flow
Slot 6 Online SRX5k DPC 4X 10GE
PIC 0 Online 1x 10GE(LAN/WAN) RichQ
PIC 1 Online 1x 10GE(LAN/WAN) RichQ
PIC 2 Online 1x 10GE(LAN/WAN) RichQ
PIC 3 Online 1x 10GE(LAN/WAN) RichQ
Slot 11 Online SRX5k DPC 40x 1GE
PIC 0 Online 10x 1GE RichQ
PIC 1 Online 10x 1GE RichQ
PIC 2 Online 10x 1GE RichQ
PIC 3 Online 10x 1GE RichQ
{primary:node0}
root@SRX5800-1>
Here the command shows the SPCs that are online, which SPU is the CP, and the interface cards. A correctly operating device should have all of its SPCs online, and unless they are disabled, the interfaces should be online. Cards that have not booted yet will be offline or present. In a data center SRX, it can take up to five minutes for the data plane to completely start up. As the cards come online, the following messages will be sent to the command prompts. These messages should only come up once during the process and then they will be logged to the messages file.
{primary:node0} root@SRX5800-1> Message from syslogd@SRX5800-1 at Mar 13 22:01:48 ... SRX5800-1 node0.fpc1.pic0 SCHED: Thread 4 (Module Init) ran for 1806 ms without yielding Message from syslogd@SRX5800-1 at Mar 13 22:01:49 ... SRX5800-1 node0.fpc1.pic1 SCHED: Thread 4 (Module Init) ran for 1825 ms without yielding {primary:node0} root@SRX5800-1>
If these messages are coming up on the CLI, the SPUs are constantly restarting and should identify a problem, perhaps because not enough power is being sent to the data plane and the SPUs are restarting.
Let’s show the hardware that should match up to the output
of the show chassis
cluster fpc pic-status
command in the previous example. This will show all of the FPCs that are
SPCs, and the administrator should be able to match up which PICs should
be online and active.
{primary:node0}
root@SRX5800-1> show chassis hardware
node0:
-----------------------------------------------------------------------
Hardware inventory:
Item Version Part number Serial number Description
Chassis JN112A0AEAGA SRX 5800
Midplane REV 01 710-024803 TR8821 SRX 5800 Backplane
FPM Board REV 01 710-024632 WX3786 Front Panel Display
PDM Rev 03 740-013110 QCS12365066 Power Distribution Module
PEM 0 Rev 01 740-023514 QCS1233E066 PS 1.7kW; 200-240VAC in
PEM 1 Rev 01 740-023514 QCS1233E02V PS 1.7kW; 200-240VAC in
PEM 2 Rev 01 740-023514 QCS1233E02E PS 1.7kW; 200-240VAC in
Routing Engine 0 REV 03 740-023530 9009007746 RE-S-1300
CB 0 REV 03 710-024802 WX5793 SRX5k SCB
CB 1 REV 03 710-024802 WV8373 SRX5k SCB
FPC 0 REV 12 750-023996 XS7597 SRX5k SPC
CPU REV 03 710-024633 XS6648 SRX5k DPC PMB
PIC 0 BUILTIN BUILTIN SPU Cp
PIC 1 BUILTIN BUILTIN SPU Flow
FPC 1 REV 08 750-023996 XA7212 SRX5k SPC
CPU REV 02 710-024633 WZ0740 SRX5k DPC PMB
PIC 0 BUILTIN BUILTIN SPU Flow
PIC 1 BUILTIN BUILTIN SPU Flow
FPC 3 REV 12 750-023996 XS7625 SRX5k SPC
CPU REV 03 710-024633 XS6820 SRX5k DPC PMB
PIC 0 BUILTIN BUILTIN SPU Flow
PIC 1 BUILTIN BUILTIN SPU Flow
FPC 6 REV 17 750-020751 WY2754 SRX5k DPC 4X 10GE
CPU REV 02 710-024633 WY3706 SRX5k DPC PMB
PIC 0 BUILTIN BUILTIN 1x 10GE(LAN/WAN) RichQ
Xcvr 0 REV 02 740-011571 C831XJ039 XFP-10G-SR
PIC 1 BUILTIN BUILTIN 1x 10GE(LAN/WAN) RichQ
Xcvr 0 REV 01 740-011571 C744XJ021 XFP-10G-SR
PIC 2 BUILTIN BUILTIN 1x 10GE(LAN/WAN) RichQ
PIC 3 BUILTIN BUILTIN 1x 10GE(LAN/WAN) RichQ
FPC 11 REV 14 750-020235 WY8697 SRX5k DPC 40x 1GE
CPU REV 02 710-024633 WY3743 SRX5k DPC PMB
PIC 0 BUILTIN BUILTIN 10x 1GE RichQ
--snip--
PIC 1 BUILTIN BUILTIN 10x 1GE RichQ
--snip--
PIC 2 BUILTIN BUILTIN 10x 1GE RichQ
--snip--
PIC 3 BUILTIN BUILTIN 10x 1GE RichQ
Xcvr 0 REV 01 740-013111 8280380 SFP-T
--snip--
Fan Tray 0 REV 05 740-014971 TP8104 Fan Tray
Fan Tray 1 REV 05 740-014971 TP8089 Fan Tray
{primary:node0}
root@SRX5800-1>
Core Dumps
A core dump occurs when things have gone wrong and a process crashes. The memory for the process is then dumped to local storage. If something goes wrong and a process crashes on the SRX, the core dump is stored to several different directories on the local RE. Here’s an example of how to find core dumps:
{primary:node0}
root@SRX5800-1> show system core-dumps
node0:
--------------------------------------------------------------------
/var/crash/*core*: No such file or directory
/var/tmp/*core*: No such file or directory
/var/crash/kernel.*: No such file or directory
/tftpboot/corefiles/*core*: No such file or directory
node1:
-----------------------------------------------------------------------
/var/crash/*core*: No such file or directory
-rw-rw---- 1 root wheel 104611 Feb 26 22:22 /var/tmp/csh.core.0.gz
-rw-rw---- 1 root wheel 108254 Feb 26 23:11 /var/tmp/csh.core.1.gz
-rw-rw---- 1 root wheel 107730 Feb 26 23:11 /var/tmp/csh.core.2.gz
/var/crash/kernel.*: No such file or directory
/tftpboot/corefiles/*core*: No such file or directory
total 3
{primary:node0}
root@SRX5800-1>
If core dumps are found, there isn’t much for the users to troubleshoot. Although sometimes core dumps from CSH or a C shell can occur when a user uses Ctrl-C to terminate a program, these generally can be ignored. However, if a core dump for flowd or other processes exists, it should be reported to JTAC, as it might be an indicator of a more complex problem.
The Dreaded Priority Zero
For most administrators, the following output is a disaster:
{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node Priority Status Preempt Manual failover
Redundancy group: 0 , Failover count: 1
node0 254 primary no no
node1 1 secondary no no
Redundancy group: 1 , Failover count: 1
node0 254 primary no no
node1 0 secondary no no
{primary:node0}
root@SRX210-A>
Seeing a priority of zero tends to leave administrators in
a state of confusion, but the simple reason this occurs could be a
problem on the data plane. Determining the problem can be difficult.
Although some of the troubleshooting steps we already discussed can be
helpful, you might try another. Everything that happens with jsrpd is
logged to the file jsrpd, in the
directory /var/log. You can view
the file by using the show log jsrpd
command. The contents of the file vary, based on the events that occur
with jsrpd, but the file is typically quite readable.
There are some specific items to check for. The first is coldsync, which is the initial synchronization between the kernels on the two REs. A failed coldsync will cause the priority to be set to zero. If there is a problem and coldsync cannot complete, the coldsync monitoring weight will be set to 255. If it completes, it is set to zero. Here’s an example of a coldsync log:
{primary:node0}
root@SRX210-A> show log jsrpd | match coldsync
Apr 11 08:44:14 coldsync is completed for all the PFEs. cs monitoring weight
is set to ZERO
Apr 11 13:09:38 coldsync status message received from PFE: 0, status: 0x1
Apr 11 13:09:38 duplicate coldsync completed message from PFE: 0 ignored
Apr 11 13:09:38 coldsync is completed for all the PFEs. cs monitoring weight
is set to ZERO
Apr 11 13:11:20 coldsync status message received from PFE: 0, status: 0x1
Apr 11 13:11:20 duplicate coldsync completed message from PFE: 0 ignored
Apr 11 13:11:20 coldsync is completed for all the PFEs. cs monitoring weight
is set to ZERO
Apr 11 13:19:05 coldsync status message received from PFE: 0, status: 0x1
Apr 11 13:19:05 duplicate coldsync completed message from PFE: 0 ignored
Apr 11 13:19:05 coldsync is completed for all the PFEs. cs monitoring weight
is set to ZERO
If coldsync fails, it’s possible to do two things. First, on
either device, issue a commit full
command, which will resend the complete configuration to the data and
control planes (this might impact traffic as it reapplies all of the
policies). The other option is to reboot the secondary node and attempt
the coldsync process again. (As a last resort, read the next
section.)
In the logfile, the history of interfaces going up and down, node mastership, and other events are kept. Most of the events are quite obvious to administrators and should provide a road map to what happened on the device.
Additional information can be gathered by turning on the traceoptions; just be aware that a lot of additional processing can be required based on the type of traceoptions you enable. If all events are enabled, it will spike the chassis process to 100 percent utilization.
Warning
Do not enable traceoptions for more than a few minutes!
There have been countless times where administrators have left traceoptions enabled for all events, and all sorts of trouble has occurred, from service outages to crashing devices, if traceoptions stays active for a long enough period of time.
When All Else Fails
The SRX is a complex and feature-rich product, and Junos provides all sorts of configuration knobs that are not available on other products, all of it engineered with an appreciation that uptime is critical to any organization.
If the SRX is going to be deployed in a complex environment, the administrator should become familiar with the product before deployment. The administrator’s knowledge and understanding of the product is the first line of defense for ensuring that the product is going to work in the environment. The more critical the environment, the more detailed an administrator should be in her testing and knowledge about the product. Before deployment, some administrators spend months learning and staging the SRX. Although that might seem like an excessive amount for you and your network needs, it’s a fact that the most prepared administrators have the fewest issues. It’s one of the reasons we wrote this book, and hopefully, you’ve read this far into it.
There are other sources for studying and analyzing the SRX. For instance, the J-Net community allows users and product experts to communicate, sharing solutions and issues about Juniper products. It’s also a really great set of resources to learn from what other users are doing. Another great resource is the juniper-nsp mailing list. This mailing list has been around for many years and the SRX has become a popular topic.
You might also look at a new and budding series of free Day One booklets from Juniper Networks that cover the SRX product line.
But truly, when all else fails, it’s a good idea to contact JTAC for support. When contacting JTAC, it’s important to provide the correct information. If you share the correct data with JTAC, they can quickly get to the root of the problem.
First, collect the output from the command request support information
. The output can be
quite large. If possible, save it locally to the RE, then transfer it
off the box by using request support
information | save SupportInfo.txt
, and then use the following
sequence of commands to copy off the file:
{primary:node0} root@SRX5800-1> copy file SupportInfo.txt ftp://tester:password@myftpserver.com:/ OR {primary:node0} root@SRX5800-1> copy file SupportInfo.txt scp://172.19.100.50: root@172.19.100.50's password: SupportInfo.txt 100% 7882 7.7KB/s 00:00 {primary:node0} root@SRX5800-1>
JTAC might also request the contents of the /var/log directory. If possible, when opening a case, have the support information file, the /var/log contents, any core dumps, and a simple topology diagram readily available. By providing this, you will solve half the problem for JTAC in getting to the root of the issue. If some event occurs and it’s not reflected in the logs, there isn’t much JTAC can do. Be sure to document the event and share what was observed in the network. JTAC can take it from there and work with you to resolve the issue.
Manual Failover
Although the SRX has control over which node is in charge of each redundancy group, sometimes the administrator needs to fail over a redundancy group—say, for maintenance or troubleshooting purposes. No matter the reason, it’s possible to manually fail over any of the redundancy groups. By executing a manual failover, the SRX will place the new master node with a priority of 255 (you can’t configure this priority as it is only used for a manual failover).
Note
The only event that can take over a manual failover is a hard failure, such as the device failing. When using a manual failover, it’s best to unset the manual failover flag so that the SRX can manage it from there.
In this example, redundancy group 1 is failed over between the two chassis and then reset to the default state.
{primary:node0} root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 5 node0 254 primary no no node1 1 secondary no no {primary:node0} root@SRX210-A> request chassis cluster failover redundancy-group 1 node 1 node1: ---------------------------------------------------------------------- Initiated manual failover for redundancy group 1 {primary:node0} root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 6 node0 254 secondary no yes node1 255 primary no yes {primary:node0} root@SRX210-A>request chassis cluster failover reset redundancy-group 1
node0: --------------------------------------------------------------------- No reset required for redundancy group 1. node1: --------------------------------------------------------------------- Successfully reset manual failover for redundancy group 1 {primary:node0} root@SRX210-A>request chassis cluster failover redundancy-group 1 node 0
node0: --------------------------------------------------------------------- Initiated manual failover for redundancy group 1 root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 7 node0 255 primary no yes node1 1 secondary no yes {primary:node0} root@SRX210-A>request chassis cluster failover reset redundancy-group 1
node0: --------------------------------------------------------------------- Successfully reset manual failover for redundancy group 1 node1: --------------------------------------------------------------------- No reset required for redundancy group 1. {primary:node0} root@SRX210-A>
Here redundancy group 1 is failed over to node 1. Then, as you can
see, the priority is set to 255 and the manual failover flag is set.
Once this flag is set, another manual failover cannot occur until it is
cleared. Next, the failover is reset for redundancy group 1, using the
request chassis cluster failover reset
redundancy group 1
command, allowing the redundancy group to
be failed over again. Next, the redundancy group is failed over back to
the original node and the manual failover is reset. If a hold-down timer
was configured, the manual failover cannot go over the hold-down timer,
meaning that a manual failover cannot occur until the hold-down timer
has passed.
It is also possible to do this for the control plane. However, it’s best to not rapidly fail over the control plane, and best practice recommends that you use a 300-second hold-down timer to prevent excessive flapping of the control plane (which was discussed in the section Preserving the Control Plane earlier in this chapter).
Now, in this manual failover example, redundancy group 0 is failed over and then the hold-down timer prevents a manual failover.
{primary:node0} root@SRX210-A>show configuration chassis cluster
control-link-recovery; reth-count 2; heartbeat-interval 2000; heartbeat-threshold 8; redundancy-group 0 { node 0 priority 254; node 1 priority 1; hold-down-interval 300; } redundancy-group 1 { node 0 priority 254; node 1 priority 1; interface-monitor { fe-2/0/2 weight 255; fe-0/0/2 weight 255; } } {primary:node0} root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 7 node0 254 primary no no node1 1 secondary no no {primary:node0} root@SRX210-A>request chassis cluster failover redundancy-group 0 node 1
node1: --------------------------------------------------------------------- Initiated manual failover for redundancy group 0 {primary:node0} root@SRX210-A> show chassis cluster status Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 2 node0 254 secondary-hold no yes node1 255 primary no yes Redundancy group: 1 , Failover count: 7 node0 254 primary no no node1 1 secondary no no {secondary-hold:node0} root@SRX210-A>request chassis cluster failover reset redundancy-group 0
node0: ---------------------------------------------------------------------- No reset required for redundancy group 0. node1: ---------------------------------------------------------------------- Successfully reset manual failover for redundancy group 0 {secondary-hold:node0} root@SRX210-A>request chassis cluster failover redundancy-group 0 node 0
node0: ---------------------------------------------------------------------- Manual failover is not permitted as redundancy-group 0 on node0 is in secondary- hold state. {secondary-hold:node0} root@SRX210-A>show chassis cluster status
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 2 node0 254 secondary-hold no no node1 1 primary no no Redundancy group: 1 , Failover count: 7 node0 254 primary no no node1 1 secondary no no {secondary-hold:node0} root@SRX210-A>
Here redundancy group 0 is failed over from node 0 to node 1. This
is just as before. It creates the priority on the new primary as 255 and
sets the manual failover to yes
.
However, now node 0 shows secondary-hold
as its status, indicating that
it is in secondary mode but is also on a hold-down timer. When the timer
expires, it will show secondary
. In
the event of a critical failure to the primary device, the secondary-hold
unit can still take over.
Finally, an attempt to manually fail over the node is made, and it’s not
possible to fail over because the node is on a hold-down
timer.
Sample Deployments
The most common chassis cluster deployment on an SRX is active/passive. This type of deployment has many benefits that outweigh its drawbacks. An active/passive deployment offers resiliency in the event of a failover and is fairly easy to operate. The downside is that the backup box lays dormant until it is needed to step in for the primary device. The risk is that the backup device could run into an issue while waiting for its turn. If this occurs, your network will go down. So when running an SRX active/passive cluster, you should routinely fail the devices over to ensure both devices are operational.
For our sample deployment, we show a typical SRX100 branch deployment. Figure 7-11 shows our example topology.
In this deployment, we have two subnets: Trust and Untrust. The Untrust subnet is 10.0.2.0/24 and the Trust subnet is 10.0.1.0/24. We will implement one reth interface for each subnet. We will also utilize fxp0 interfaces for management. This is how our interfaces are configured:
{secondary:node1}[edit] root@SRX-HA-1# show interfaces | display inheritance fe-0/0/3 { fastether-options { redundant-parent reth0; } } fe-0/0/4 { fastether-options { redundant-parent reth1; } } fe-1/0/3 { fastether-options { redundant-parent reth0; } } fe-1/0/4 { fastether-options { redundant-parent reth1; } } fab0 { fabric-options { member-interfaces { fe-0/0/5; } } } fab1 { fabric-options { member-interfaces { fe-1/0/5; } } } fxp0 { unit 0 { family inet { address 10.0.1.253/24 { master-only; } ## ## '10.0.1.252/24' was inherited from group 'node1' ## address 10.0.1.252/24; } } } reth0 { redundant-ether-options { redundancy-group 1; } unit 0 { family inet { address 10.0.1.254/24; } family inet6 { address 2001:4270:8163::2/55; } } } reth1 { redundant-ether-options { redundancy-group 1; } unit 0 { family inet { address 10.0.2.1/24; } family inet6 { address 2001:4720:8163:200::1/55; } } }
From this output, we can only see one fxp0 interface. This
is because we are on the secondary node. Using show interfaces | display inheritance
, we can
see that the fxp0 interface is being imported into the configuration from
node groups. Next you can see our node group configuration. This gives
each host a unique hostname and management IP.
{secondary:node1}[edit] root@SRX-HA-1# show groups node0 { system { host-name SRX-HA-0; backup-router 10.0.1.2 destination 0.0.0.0/0; } interfaces { fxp0 { unit 0 { family inet { address 10.0.1.251/24; } } } } } node1 { system { host-name SRX-HA-1; backup-router 10.0.1.2 destination 0.0.0.0/0; } interfaces { fxp0 { unit 0 { family inet { address 10.0.1.252/24; } } } } }
We can check to see that the devices are correctly working through the standard chassis cluster status commands.
{secondary:node1} root@SRX-HA-1> show chassis cluster status Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 0 node0 254 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 0 node0 254 primary no no node1 1 secondary no no {secondary:node1} root@SRX-HA-1> show chassis cluster interfaces Control link status: Up Control interfaces: Index Interface Status 0 fxp1 Up Fabric link status: Up Fabric interfaces: Name Child-interface Status fab0 fe-0/0/5 Up fab0 fab1 fe-1/0/5 Up fab1 Redundant-ethernet Information: Name Status Redundancy-group reth0 Up 1 reth1 Up 1 {secondary:node1} root@SRX-HA-1>
Summary
Because the SRX will be placed in a mission-critical location in the network, it is extremely important to ensure that it is up and functional. Firewalls are placed in between the untrusted and trusted locations within a network. If the firewall fails, there is nothing left to bring the two networks together, causing a major outage. As you saw in this chapter, the SRX has a robust HA architecture that can survive the worst of tragedies.
The biggest benefit to the SRX HA design is the flexibility it gives to the end user. The ability to use redundancy groups and mix and match them with local interfaces is very powerful. It allows you to overcome the traditional limitations of a redundant firewall configuration and explore new design scenarios. At first, the new paradigm of mixing redundant interfaces, redundancy groups, and local interfaces is overwhelming. Hopefully, this chapter will allow you to think more freely and move away from past firewall limitations.
Study Questions
- Questions
What is the purpose of the control link?
What are the three types of communication that pass over the fabric link?
Can configuration groups be used for any other tasks on a Junos device? Be specific.
What feature needs to be enabled when using dynamic routing?
What are the two most important commands when troubleshooting an SRX cluster?
From what Juniper product did the SRX get part of its HA code infrastructure?
Which platform supports the automatic upgrade of the secondary node?
Are acknowledgments sent for session synchronization messages?
What is a redundancy group?
Why is the control port so important?
- Answers
The control link is used for the two REs to talk to each other. The kernels synchronize state between each other, the REs talk to the data plane on the other node, and jsrpd communicates. The jsrpd daemon sends heartbeat messages to validate that the other side is up and running.
Heartbeats are sent by the jsrpd daemon to ensure that the remote node is up and healthy. The heartbeats pass through the data planes of both devices and back to the other side. This validates the entire path end to end, making sure it is able to pass traffic. In the event that traffic needs to be forwarded between the two nodes, it is done over the data link. Last but not least, the data link is used to synchronize RTO messages between the two chassis. RTOs are used in the maintenance of the state between the two devices. This includes session creation and session closing messages.
Node-specific information is configured using Junos groups. This was one of the fundamental features that was created in Junos. Junos groups can also be thought of as configuration templates or snippets. They can be used to do such things as enabling logging on all firewall policies and configuring specific snippets of information. Using Junos groups where it makes sense simplifies the administration of the SRX and makes reading the configuration easier.
When using dynamic routing, the graceful restart feature should be enabled. It allows the data plane to keep dynamic routes active if the control plane fails over. It also allows for other routers that surround the SRX to assist it during a control plane failover.
The two most important commands are
show chassis cluster status
andshow chassis cluster statistics
. This will allow for the current state of the cluster and the current status of communication between the two nodes. Anyone who is administering a cluster will use these two commands the most.The SRX used code from the TX Series products. The TX Series are some of the largest and most scalable routing products in the world.
The data center SRXs support unified in-service software upgrades. This feature allows for an automatic upgrade of the backup node without impacting network availability.
Session synchronization messages are not acknowledged. This would take additional time and resources away from the processors by forcing the processing of an additional message.
A redundancy group is a logical collection of objects. It can contain either the control plane (redundancy group 0 only) or interfaces (redundancy group 1+).
The control port provides critical communication between the two REs. If this link is lost, the two REs cannot synchronize the state of the kernels. Because of this, if the control link goes down, the secondary node will go into a disabled state.
Get Juniper SRX Series now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.