Skip to content
CA Spectrum - 10.2 to 10.2.3
Documentation powered by DocOps

Fault Isolation

Last update August 24, 2014

Contents

Fault Management is one of the key requirements of network management. A fault is different from an error because it is an abnormal condition that requires management attention and repair. Problems that give rise to a fault could be caused by bad firmware, bad hardware, or a bad network. Each of these problems requires a different response from the network manager. Thus the goal is to determine the exact location of the fault and to get the attention of the network administrators as quickly as possible.

CA Spectrum intelligence has the capability of isolating a network problem to the most probable faulty component. To speed up fault isolation and to reduce unnecessary traffic, two actions occur:

  • Are-You-Down Action
    Upon losing contact with the device it represents, a model sends the Are-You-Down action to all of its neighbors to determine its own condition. If all of the neighbors return a response of TRUE, the model’s condition color will turn gray (meaning “my device might be down, but it is impossible to tell because all the neighbors are down”). However, if any of the neighbors return a response of FALSE, the model’s condition color will turn red (meaning “my device must be down, because one of the neighbors is up”).
  • Are-You-Up Action
    Upon re-establishing contact with the device it represents, a model sends the Are-You-Up action to its neighbors to speed up the fault isolation. Upon receiving this action, each neighbor will return TRUE if it has an established contact status. If the model’s contact status is lost, and the next-time-to-poll is more than 60 seconds, then the model pings the device for quicker fault isolation.

Every time a model’s status changes, or the information available to CA Spectrum changes, a new assessment occurs. CA Spectrum intelligence keeps the topology presentation as current and as accurate as possible, but it depends on correct modeling to accurately assess contact status and determine device failures on the network. Correct modeling includes placing your VNM model in proper relation to the other models that represent your network; it must have a resolved connection in the Topology view of a model that represents a device to which the VNM host is actually connected. When the VNM model is properly connected and CA Spectrum loses contact with a model, the icon representing that model displays a condition color of Gray, Orange, or Red, which helps the network administrators to locate the faults immediately.

How Model Category Affects Contact Status

Each fault is associated with a particular condition, which is represented by a particular color that displays on the icon representing the model where the fault occurs. The condition color reflects both the contact status and the alarm status of the model. However, the contact status and condition color asserted for a model also depend upon which of the following categories a model belongs to. The following list summarizes how the categories to which a model and its neighbors belong influences its contact status and condition color.

  • Significant Device Models
    Any device that requires an administrator’s attention for the smooth operation of the network is called a significant device. To change an insignificant model into a significant model change the value of the attribute Value_When_Red (0x1000e) to 7.
  • Insignificant Device Models
    An insignificant device such as an end user PC toggles between Blue and Green contact states and does not generate alarms or event messages to get the attention of the administrator. To change a significant model into an insignificant model change the value of the attribute Value_When_Red (0x1000e) to 0.
  • Inferred Connectors
    These are dumb models that do not poll, but that keep track of a list of their Data Relay neighbors. Possible inferred connectors are: WA_Segment, Fanout, and so on. CA Spectrum automatically enables Live Pipes for all ports connected to a WA_Segment.
    Note: CA Spectrum intelligence does not expect Fanout models to be connected to each other; thus this configuration results in inaccurate contact status displays. If two Fanouts are connected to each other and each of them is in turn connected to a device with a green contact status, the Fanouts nonetheless turn red. If two Fanouts are connected to each other with no other devices connected to either one, both Fanouts turn gray.
  • Shared Media Link
    The Shared Media Link is a specialized inferred connector. These models are similar to Fanouts, but the fault management works differently. Unlike a Fanout model, the Shared Media Link model condition is based on configured threshold values.
    Example: If the critical threshold is set to 80, the Shared Media Link turns red when it loses contact with 80 percent of the downstream models.
  • Composite and Discrete Topology Models
    The contact status of LAN, LAN 802.3, LAN 802.5 and so on models is determined by the contact status of its collected children. A LAN model with lost contact status will turn either red or gray, depending on the condition of its collected models.
  • Wide Area Links
    Wide Area Links (WA_Links) are modeled in conjunction with wide area segment (WA_Segment) models. This allows for proper rollup of the Wide Area Link condition. WA_Link models can only represent point-to-point connections, such as T1 and T3 lines, and there can be no more than two devices connected to it at a time. Also, you must connect the WA_Segment model to the correct port of the device models.
    Note: WA_Link models can accommodate only one WA_Segment model. If you attempt to paste more than one WA_Segment model into a WA_Link model’s Topology view, the second one will be destroyed immediately and an alarm will be generated.
    spec--vnmsegment_OTH
  • Wide Area Segments
    WA_Segments poll the InternalPortLinkStatus (IPLS) attribute of each interface model which Connects_To the WA_Segment. This is an active poll, meaning that the IPLS of each connected interface is read at every polling interval rather than simply watched for a change in the attribute. Therefore, CA Spectrum does not have to lose contact with one of the connected routers for a fault isolation alarm to be generated on a WA_Link.
    The polling of the connected ports’ IPLS will be regulated by the WA_Link model’s Polling_Interval and PollingStatus attributes. When the Polling_Interval changes to zero (0) or PollingStatus goes to FALSE, polling of the connected port’s IPLS is stopped.
    If one of the connected interfaces has an IPLS of BAD (for example, Admin Status is ON, but Oper Status is OFF), then the WA_Segment’s Contact_Status is set to ‘lost’ and the WA_Segment turns gray. The WA_Link turns red.
    If one of the connected interfaces has an IPLS of ‘disabled’ (for example, Admin Status is OFF), then the WA_Segment’s Contact_Status is set to ‘lost’ and the WA_Segment turns gray. The WA_Link turns orange. This is because the alarm must be severe enough to be viewed in the Alarms tab, but it is not a “Contact Lost” alarm.
    If the DISABLED interface causes CA Spectrum to lose contact with the remote router, then the WA_Link turns red. This is the regular InferConnector-type fault isolation working.
Model Category Connected Models (Neighbors) Condition Color
Significant Devices (Modeling Hub-types only) connected to a VNM... turn Red after losing contact
Significant Devices with no connections to other models (a zero connector count)...
Significant Devices connected to an established Data Relay neighbor...
Composite and Discrete Topologies in which all of the collected children have a lost contact status and at least one of those collected children is Red...
Inferred Connectors where the fanout model has lost contact but one of its neighbors is good and the associated port has bad port link status, then it...
Significant Devices, Inferred Connectors, and WA_Links where all neighbors have also lost contact status... turn Gray after losing contact.
Composite and Discrete Topologies in which all ocs and none of those collected children are Red...
Significant Devices (Modeling Hub-types only) connected to an end-point neighbor (such as a PC) that has established contact status... turn Orange after losing contact.
WA_Links WA_Segment (or fanout) is good and one of the routers is lost then...
Significant Devices connected to a model with an Established contact status... turn Green.
Composite/Discrete Topologies and WA_Links in which any of the collected children has established contact status, then the LAN will also...
Inferred Connectors connected to a model with an established contact status where at least one of its neighbors is Good and its associated port (port connected to the Fanout) status is Good...
Significant and Insignificant Devices not yet connected to other devices... turn Blue
Composite/Discrete Topologies and WA_Links when all collected children of a LAN have initial contact status, then the LAN will also have the initial contact status...

Fault Isolation Examples

The following examples illustrate how CA Spectrum fault isolation operates with various network configurations and problem scenarios.

Example: Proactive Fault Isolation

This example demonstrates that fault isolation is a proactive mechanism which does not depend upon polling all of the connected models.

Consider a simple network topology as shown in the following diagram. The device H1 is connected to the VNM model. Devices H1, H2, and H3 poll every 3 minutes. H4 polls every 5 minutes. The PC polls every 30 minutes.

spec--faultisolation1

Assume H2 is BAD. As a result H2 turns red, H4 turns gray, PC (insignificant model) turns blue, while H1 and H3 remain green.

Fault isolation is initiated as soon as H2, H4 or PC polls. If H4 is lost, it sends an Are-You-Down action to H2. If H2 is lost by then, it sends TRUE to H4, otherwise it pings itself and then sends the response to H4. This causes H4 to turn gray.

Now H2 is lost, and it sends Are-You-Down action to H1. Since H1 is established, H2 has to decide between orange and red conditions. H2 pings PC. Since PC cannot respond H2 will turn red. The ping from H2 puts PC in a lost state. Since PC is an insignificant device it will turn blue.

Example: Modeling a Fanout

This example demonstrates fault isolation when modeling a fanout.

Assume the fanout is red and D2, D3, and PC are gray. The following diagram illustrates this scenario.

spec--faultisolation2

The fanout registers a watch on D1's contact status. If D1 goes down, the fanout turns Gray as a result of the watch trigger.

When D3 eventually polls successfully, D3 will have an established contact status and turn Green. D3 then sends an Are-You-Up action to the fanout. The fanout reads device P3’s (D3’s port connection to the fanout) internal link port status. Assuming the port has a good status, the watch is cleared and the fanout turns Green with an established contact status. This means that as long as P1 (D1’s port connection to the fanout) has good internal link port status, the contact status of the inferred connector will remain good.

What if D2 goes bad? D2 will lose its contact status and sends an Are-You-Down action to the fanout. The fanout will ping D1, and finds D1 to be good. The intelligence then examines the status of P1. Assuming Link-Status of P1 is good, the fanout will return FALSE to model D2. This causes D2 to turn Red.

What if P1 is bad? This is the same case as disconnecting the network connection to the fanout. If D3 polls first, it will lose its contact status and send an Are-You-Down action to the fanout. The fanout will ping D1 as finds it as a good neighbor. fanout then reads the internal-port-link-status of the port P1. Since P1 is bad, the fanout will lose its contact status and turns Red. The fanout will return TRUE to the model D3. This causes D3 to turn Gray. D2 will also turn Gray in the same way as D3. PC being the insignificant device will turn Blue immediately after losing its contact status.

Example: Redundant Paths Fault Isolation

This example shows how CA Spectrum manages devices using redundant paths if a link is shut down administratively (i.e., admin-status equals down).

The following diagram depicts a network with redundant WA Links. Here VNM manages Rtr3 through link WL-1 and Rtr2 using link WL-2. Assume that the network administrator shuts down the WL-1 link. This causes WL-1 to turn gray. Rtr3 will turn red because VNM cannot talk to it through WL-1. The redundancy intelligence of Rtr3 will modify its agent address, so that VNM can talk to it using links WL-2 and WL-3. This causes Rtr3 to turn green again. The link WL-1 will still have the gray condition.

spec--faultisolation3

Example: Inferred Connector Fault Isolation

This example demonstrates that fault isolation for an Inferred Connector requires specific modeling. Assume that two routing devices, Rtr1 and Rtr2, are connected at both ends of the WA_Link and that their ports are P1 and P2 respectively.

WA_Link models need to be associated with a WA_Segment (or fanout) model through the Collects relation to enable the proper rollup of the WA_Link condition. The devices at either end of the WA_Link need to be connected to the WA_Segment collected by the WA_Link model. You do this by navigating into the device’s Device Topology view and resolving the WA_Segment off-page reference icon to the appropriate port. You can view the connections by navigating into the WA_Segment’s view.

This cross-connection is very important for fault isolation to work, as shown in the following diagram.

spec--faultisolation4

Assume P1 is the port on Rtr1 and P2 is the port on Rtr2. The routers connected to the WA_Segment will cause it to behave as described in the following table. Note that the port link status becomes important in determining the status of the WA_Link only when both routers are “contact established.”

Rtr1 Rtr2 WA_Link
Initial Initial Blue
Established Lost Red
Lost Lost Gray
Established Established Check Port States*

* If both Rtr1 and Rtr2 have a contact status of established then the port status of P1 and P2 will determine the condition of the WA_Link. If any port is BAD, the WA_Link will be RED. If any port is DISABLED, the WA_Link will be ORANGE. Otherwise, the WA_Link will be GREEN.

Was this helpful?

Please log in to post comments.