Skip to content
CA Spectrum - 10.3
Documentation powered by DocOps

Establish Fault Tolerance

Last update May 12, 2017

Contents

Establishing Fault Tolerance

You can set up a fault-tolerant environment when you first install CA Spectrum, before any models have been created. Or you can set up a fault-tolerant environment after you install CA Spectrum.

The following procedure describes how to set up two SpectroSERVERs: a primary and a secondary. You can also set up a tertiary SpectroSERVER by taking the same steps. However, assign the tertiary SpectroSERVER a higher precedence number than the secondary SpectroSERVER.

Note: To establish fault tolerance in an environment with a Southbound Gateway integration, see the Southbound Gateway Toolkit.

Follow these steps:

  1. Install the same version of CA Spectrum with the same modeling catalog on both the primary SpectroSERVER and the secondary SpectroSERVER. Each server requires the same landscape handle.
  2. Verify that both the primary and secondary SpectroSERVERs have entries in their .hostrc files that give the SpectroSERVERs mutual access permissions.

    Note: If you are specifying secure users for the secondary SpectroSERVER in the .hostrc file on the primary SpectroSERVER, and the secondary SpectroSERVER is running in the Windows environment, include the user SYSTEM in the secure user list.
  3. Verify that the MAIN_LOCATION_HOST_NAME parameter in the .locrc file on the secondary SpectroSERVER server points to the same system name as the .locrc file on the primary SpectroSERVER. Otherwise, synchronization fails.
  4. Configure the primary and secondary SpectroSERVERs so that the user running each SpectroSERVER is the same. If the users are not the same, the secondary SpectroSERVER fails or does not run properly after an Online Backup.
  5. Make a copy of the primary SpectroSERVER database by running Online Backup. Or, if the SpectroSERVER is shut down, use the SSdbsave utility with the -cm argument (to save the modeling catalog and any new models).

For more information, see the Database Management.

  1. Verify that the save file that you created is available to the server that hosts the secondary SpectroSERVER. Copy the file to the server if necessary.
  2. On the secondary server, with SpectroSERVER shutdown, navigate to the CA Spectrum SS directory and load the save file using the following command:

    ../SS-Tools/SSdbload -il -add precedence savefile
    
    • precedence
      Specifies a numeric value greater than the primary server default value of 10 (20 is recommended).
    • savefile
      Specifies the name of the saved file that was previously created.
  3. (Optional) Add the line 'secondary_polling=yes' to the .vnmrc file to let the secondary SpectroSERVER function as a hot backup
  4. Start the primary SpectroSERVER, if it is not already running.
  5. Start the secondary SpectroSERVER.
  6. To verify the setup, use the MapUpdate command with the view argument to display the current landscape map.

For more information, see the Database Management.

The secondary SpectroSERVER is now available to take over automatically if the primary SpectroSERVER fails. If you previously activated secondary polling, the secondary SpectroSERVER is available immediately. Otherwise, polling begins when the server detects that it has lost contact with the primary SpectroSERVER.

When service switches from the primary SpectroSERVER to the secondary SpectroSERVER, the Connection Status icon SPEC--fault_tolerance_icon displays yellow. To view the connection status of all servers in a landscape, click the Connection Status icon. In the Connection Status dialog, the Connection Status icon for each server in the landscape displays yellow to indicate the “switched” condition.

When the primary SpectroSERVER comes back online, the secondary SpectroSERVER stops polling (unless you have set secondary_polling to 'yes'). All the applications switch back to the primary SpectroSERVER. However, any edits that you make to the secondary SpectroSERVER while it is active are not automatically replicated to the primary SpectroSERVER. Manually recreate these modifications on the primary SpectroSERVER.

When you restart the primary SpectroSERVER, connections are accepted when all models are loaded, but before all models are activated. The models can take some time to activate. Because the secondary SpectroSERVER stops polling when the primary SpectroSERVER is restarted, a gap in your network management coverage can result.

To avoid this situation, edit the .vnmrc file on the primary SpectroSERVER so that the wait_active resource is set to 'yes'. This parameter causes the server to wait until all of the models are activated before accepting any connections. The message area in the CA Spectrum Control Panel also dynamically displays the percentage of models that are activated. The SpectroSERVER can appear to take longer to come up. However when all the models are activated, the SpectroSERVER is ready to manage the network.

You can also set the wait_active resource to 'yes' on the secondary SpectroSERVER. During a planned shutdown of the primary SpectroSERVER, you can then verify in the CA Spectrum Control Panel that the secondary SpectroSERVER is ready to take over.

For more information, see the Database Management.

Validate Fault Tolerance Configuration

After you have set up fault tolerance in a distributed SpectroSERVER deployment, verify that the OneClick server has access to both primary and secondary SpectroSERVERs. Without connectivity to both servers, the OneClick server cannot fail over to the secondary SpectroSERVER.

Follow these steps:

  1. Access the OneClick Administration, Landscapes web page.
  2. Check the ‘Secondary Status’ column. Verify that OneClick has established contact with the secondary SpectroSERVER.
    The status also indicates whether Fault Tolerance is ready for failover.
    The Fault Tolerance configuration is validated.

Test Fault Tolerance

During an initial installation, the secondary SpectroSERVER might not have access to all the devices to which the primary SpectroSERVER has access. This situation causes the secondary SpectroSERVER to generate false alarms. To avoid false alarms, verify that the secondary SpectroSERVER can manage your network devices by testing fault tolerance.

Note: Test fault tolerance whenever new devices are added to the primary SpectroSERVER.

Follow these steps:

  1. With both the primary and secondary SpectroSERVERs up and running, bring down the primary SpectroSERVER.
    The Connection Status icon SPEC--fault_tolerance_icon is yellow to indicate the "switched" condition.
    A red connector indicates that the OneClick server was not able to contact the secondary SpectroSERVER.
  2. Wait 15 - 20 minutes for the secondary SpectroSERVER to run.
  3. Verify the following conditions:
    • The Connection Status icon does not display red.
    • All device models and pingable models maintain SNMP or ICMP contact.
      If this contact is lost, verify that the secondary SpectroSERVER has access to your devices. Contact a Network Administrator to resolve this problem, if applicable.
    • CA Spectrum is managing all devices that have an established contact state. Verify the status by checking for device contact or management contact loss alarms from any of the device models.
  4. Restart the primary SpectroSERVER.
    The Connection Status icon displays green to indicate a normal contact state.

Fault-Tolerant Recovery

Following are the two possible failure scenarios:

  • The primary SpectroSERVER stops. The secondary SpectroSERVER then forwards event and statistical information to the primary Archive Manager that is running on the server that hosts the primary SpectroSERVER. When the primary SpectroSERVER restarts, no event and statistical data have been lost.
  • The computer where the primary SpectroSERVER and the primary Archive Manager is running stops operating completely. The secondary SpectroSERVER then caches event and statistical data in its database until the primary SpectroSERVER computer comes back online. If a secondary Archive Manager is running, historical, and real-time information is available in OneClick, but the information is still cached for transfer to primary Archive Manager.

Restart both the primary Archive Manager and the primary SpectroSERVER if their server goes down, or if the primary SpectroSERVER stops operating.

Note: It is no longer necessary to start the Archive Manager before the SpectroSERVER, the cached events from the secondary SpectroSERVER can be transferred at any time, even after the primary SpectroSERVER has started logging new events.

Follow these steps:

  1. Start the SPECTRUM Control Panel on the primary SpectroSERVER host.
  2. To start the SpectroSERVER, click Start SpectroSERVER on the SPECTRUM Control Panel.
    When the primary Archive Manager is again operational, the secondary SpectroSERVER connects and transfers its cached event data to the primary Archive Manager.

Change the Host Names of the Primary and Secondary SpectroSERVERs

SpectroSERVERs in a fault-tolerant environment use a precedence value that is associated with their host names to recognize their relationship to one another. Therefore, to preserve the fault-tolerant relationship, use SSdbsave and SSdbload to change the host name of your primary SpectroSERVER.

Follow these steps:

  1. Save the database using SSdbsave with the -cm option.
  2. Change the host name.
  3. Reload the database with the save file that you created in the first step. Run SSdbload with the -il option and the -replace option:

    SSdbload -il -replace precedence savefile
    

    This command causes the database to associate the new host name with the precedence value (10) that designates a primary SpectroSERVER.
    The change in the host name is communicated to any warm or hot standby SpectroSERVERs the next time that the databases are synchronized as a result of Online Backup being run.
    In the meantime, however, the host name change prevents the standby SpectroSERVERs from detecting that the primary SpectroSERVER is running. As a result, any SpectroSERVER that is configured as a warm standby starts polling.

  4. Load the save file on the warm standby using SSdbload with the -il and -replace options, and specify a higher precedence value (for example, 20) that designates it as a standby.

Now you can change the host name of the secondary SpectroSERVER.

Follow these steps:

  1. Save the database using SSdbsave with the -cm option.
  2. Make the change to the host name.
  3. Reload the database with the save file that you created in the first step. Run SSdbload with the -il option and the -replace option:

    SSdbload -il -replace precedence savefile
    

    This command causes the database to associate the new host name with the precedence value (20) that designates a secondary SpectroSERVER.
    When you restart the secondary SpectroSERVER, the server communicates the new host name and precedence to the primary SpectroSERVER.

For more information, see the Database Management.

Was this helpful?

Please log in to post comments.