Sources Flapping Between “Not Responding” and “Healthy” After Adding New VA to Cluster

Hi Developers

Hope you are doing well

We recently added a new Virtual Appliance (VA) to our existing VA cluster under the IDN tenant. Since this change, we’ve observed that several sources intermittently switch to a Not Responding state, and then recover to Healthy status without any manual intervention. This has been happening consistently for the past 3 days.

Key observations:

  • The issue only started after the new VA was added.
  • Both VAs in the cluster are showing as Healthy in the IdentityNow UI.
  • The affected sources begin working again after a short period, indicating that the issue is intermittent.
  • We suspect that the newly added VA is causing aggregation or connectivity failures when selected by the load balancer.
  • There are no obvious errors in the UI; however, connector-level or container logs may reveal transient failures when the new VA is used.

Troubleshooting steps we have tried:

  • Verified that both VAs are up and connected.
  • Ensured the new VA was fully updated.
  • Restarted the VA cluster.

Regards
Vatan

In my experience, missing certificates or network issues are usually the reasons for this behavior.

Do you have any Certificates loaded on your old VAs? If yes, did you add them to the new VA?

Is the new VA in the same network subnet as the existing VAs? Can you confirm that the new VA can ping and connect over SSH/Telnet to your sources?

Hi @Carlatto

thank you for your response.

No, I did not paste all cert in new VA and did not test ping command in new VA for SSH connection.

Do i need to add cert to new VA as well ?

Thanks

Yes, make sure the same certificates are on the new VA as the old VAs. Likely the new VA is trying to connect to your sources and the connections are failing because a certificate is missing.

Hi @Carlatto

I added cert in new VA.
I tested with ping and “openssl s_client -connect :” command.
now both are working fine.
I will configure VA in cluster again and test it.
thanks

:white_check_mark: Resolution to Intermittent “Not Responding” State After Adding a New VA to the Cluster

Following the addition of a new Virtual Appliance (VA) to the existing VA cluster under the IDN tenant, we observed that some sources intermittently entered a “Not Responding” state before recovering without intervention. After investigation, the issue was traced back to configuration discrepancies between the existing and newly added VA.

Below is the step-by-step solution that resolved the problem:


:hammer_and_wrench: Steps to Resolve

  1. Ensure Consistent Passphrase Usage
  • Confirm that the same encryption passphrase used for the original (primary) VA is also configured on the newly added VA.
  • A mismatch in passphrases can cause secure communication or decryption failures, especially under load-balanced conditions.
  1. Transfer the host.yaml Configuration
  • Copy the host.yaml file from the existing primary VA to the new VA.
  • This file contains essential cluster and host-specific configurations needed for seamless operation in a VA cluster setup.
  1. Transfer SSL Certificates
  • Copy the SSL certificates from the existing VA to the new VA.
  • Place the certificate files in the /sailpoint/certificate directory on the new VA.
  • This ensures secure connectivity and consistency in mutual TLS/SSL configurations across the cluster.
  1. Validate Network and SSL Connectivity
  • Run connectivity tests between sources and the new VA using:
    • ping to verify network-level reachability.
    • openssl s_client -connect <host>:<port> or equivalent tools to test SSL/TLS handshake and certificate integrity.

:magnifying_glass_tilted_left: Outcome

After aligning the configuration and certificate setup across the VAs, the issue of sources intermittently showing as “Not Responding” was resolved. Both VAs are now operating reliably under the load balancer with no further disruptions observed.