Fault Management is one of the key requirements of network management. A fault is different from an error because it is an abnormal condition that requires management attention and repair. Problems that give rise to a fault could be caused by bad firmware, bad hardware, or a bad network. Each of these problems requires a different response from the network manager. Thus the goal is to determine the exact location of the fault and to get the attention of the network administrators as quickly as possible.
CA Spectrum intelligence has the capability of isolating a network problem to the most probable faulty component. To speed up fault isolation and to reduce unnecessary traffic, two actions occur:
Every time a model’s status changes, or the information available to CA Spectrum changes, a new assessment occurs. CA Spectrum intelligence keeps the topology presentation as current and as accurate as possible, but it depends on correct modeling to accurately assess contact status and determine device failures on the network. Correct modeling includes placing your VNM model in proper relation to the other models that represent your network; it must have a resolved connection in the Topology view of a model that represents a device to which the VNM host is actually connected. When the VNM model is properly connected and CA Spectrum loses contact with a model, the icon representing that model displays a condition color of Gray, Orange, or Red, which helps the network administrators to locate the faults immediately.
Each fault is associated with a particular condition, which is represented by a particular color that displays on the icon representing the model where the fault occurs. The condition color reflects both the contact status and the alarm status of the model. However, the contact status and condition color asserted for a model also depend upon which of the following categories a model belongs to. The following list summarizes how the categories to which a model and its neighbors belong influences its contact status and condition color.
|Model Category||Connected Models (Neighbors)||Condition Color|
|Significant Devices (Modeling Hub-types only)||connected to a VNM...||turn Red after losing contact|
|Significant Devices||with no connections to other models (a zero connector count)...|
|Significant Devices||connected to an established Data Relay neighbor...|
|Composite and Discrete Topologies||in which all of the collected children have a lost contact status and at least one of those collected children is Red...|
|Inferred Connectors||where the fanout model has lost contact but one of its neighbors is good and the associated port has bad port link status, then it...|
|Significant Devices, Inferred Connectors, and WA_Links||where all neighbors have also lost contact status...||turn Gray after losing contact.|
|Composite and Discrete Topologies||in which all ocs and none of those collected children are Red...|
|Significant Devices (Modeling Hub-types only)||connected to an end-point neighbor (such as a PC) that has established contact status...||turn Orange after losing contact.
|WA_Links||WA_Segment (or fanout) is good and one of the routers is lost then...|
|Significant Devices||connected to a model with an Established contact status...||turn Green.
|Composite/Discrete Topologies and WA_Links||in which any of the collected children has established contact status, then the LAN will also...|
|Inferred Connectors||connected to a model with an established contact status where at least one of its neighbors is Good and its associated port (port connected to the Fanout) status is Good...|
|Significant and Insignificant Devices||not yet connected to other devices...||turn Blue
|Composite/Discrete Topologies and WA_Links||when all collected children of a LAN have initial contact status, then the LAN will also have the initial contact status...|
The following examples illustrate how CA Spectrum fault isolation operates with various network configurations and problem scenarios.
Example: Proactive Fault Isolation
This example demonstrates that fault isolation is a proactive mechanism which does not depend upon polling all of the connected models.
Consider a simple network topology as shown in the following diagram. The device H1 is connected to the VNM model. Devices H1, H2, and H3 poll every 3 minutes. H4 polls every 5 minutes. The PC polls every 30 minutes.
Assume H2 is BAD. As a result H2 turns red, H4 turns gray, PC (insignificant model) turns blue, while H1 and H3 remain green.
Fault isolation is initiated as soon as H2, H4 or PC polls. If H4 is lost, it sends an Are-You-Down action to H2. If H2 is lost by then, it sends TRUE to H4, otherwise it pings itself and then sends the response to H4. This causes H4 to turn gray.
Now H2 is lost, and it sends Are-You-Down action to H1. Since H1 is established, H2 has to decide between orange and red conditions. H2 pings PC. Since PC cannot respond H2 will turn red. The ping from H2 puts PC in a lost state. Since PC is an insignificant device it will turn blue.
Example: Modeling a Fanout
This example demonstrates fault isolation when modeling a fanout.
Assume the fanout is red and D2, D3, and PC are gray. The following diagram illustrates this scenario.
The fanout registers a watch on D1's contact status. If D1 goes down, the fanout turns Gray as a result of the watch trigger.
When D3 eventually polls successfully, D3 will have an established contact status and turn Green. D3 then sends an Are-You-Up action to the fanout. The fanout reads device P3’s (D3’s port connection to the fanout) internal link port status. Assuming the port has a good status, the watch is cleared and the fanout turns Green with an established contact status. This means that as long as P1 (D1’s port connection to the fanout) has good internal link port status, the contact status of the inferred connector will remain good.
What if D2 goes bad? D2 will lose its contact status and sends an Are-You-Down action to the fanout. The fanout will ping D1, and finds D1 to be good. The intelligence then examines the status of P1. Assuming Link-Status of P1 is good, the fanout will return FALSE to model D2. This causes D2 to turn Red.
What if P1 is bad? This is the same case as disconnecting the network connection to the fanout. If D3 polls first, it will lose its contact status and send an Are-You-Down action to the fanout. The fanout will ping D1 as finds it as a good neighbor. fanout then reads the internal-port-link-status of the port P1. Since P1 is bad, the fanout will lose its contact status and turns Red. The fanout will return TRUE to the model D3. This causes D3 to turn Gray. D2 will also turn Gray in the same way as D3. PC being the insignificant device will turn Blue immediately after losing its contact status.
Example: Redundant Paths Fault Isolation
This example shows how CA Spectrum manages devices using redundant paths if a link is shut down administratively (i.e., admin-status equals down).
The following diagram depicts a network with redundant WA Links. Here VNM manages Rtr3 through link WL-1 and Rtr2 using link WL-2. Assume that the network administrator shuts down the WL-1 link. This causes WL-1 to turn gray. Rtr3 will turn red because VNM cannot talk to it through WL-1. The redundancy intelligence of Rtr3 will modify its agent address, so that VNM can talk to it using links WL-2 and WL-3. This causes Rtr3 to turn green again. The link WL-1 will still have the gray condition.
Example: Inferred Connector Fault Isolation
This example demonstrates that fault isolation for an Inferred Connector requires specific modeling. Assume that two routing devices, Rtr1 and Rtr2, are connected at both ends of the WA_Link and that their ports are P1 and P2 respectively.
WA_Link models need to be associated with a WA_Segment (or fanout) model through the Collects relation to enable the proper rollup of the WA_Link condition. The devices at either end of the WA_Link need to be connected to the WA_Segment collected by the WA_Link model. You do this by navigating into the device’s Device Topology view and resolving the WA_Segment off-page reference icon to the appropriate port. You can view the connections by navigating into the WA_Segment’s view.
This cross-connection is very important for fault isolation to work, as shown in the following diagram.
Assume P1 is the port on Rtr1 and P2 is the port on Rtr2. The routers connected to the WA_Segment will cause it to behave as described in the following table. Note that the port link status becomes important in determining the status of the WA_Link only when both routers are “contact established.”
|Established||Established||Check Port States*|
* If both Rtr1 and Rtr2 have a contact status of established then the port status of P1 and P2 will determine the condition of the WA_Link. If any port is BAD, the WA_Link will be RED. If any port is DISABLED, the WA_Link will be ORANGE. Otherwise, the WA_Link will be GREEN.