drwx------. 2 root root 4096 Jun 24 2025 /opt/sailpoint/share/secure
The secure/ directory is root:root 700 — completely inaccessible to the sailpoint user or the container’s entrypoint script. My theory is that vector-env is trying to read the certificate/key files at startup and failing immediately because of this permission.
I also noticed that otel_agent has been logging 503 errors when trying to push metrics to:
Looks like permission errors causing container startup failures, chown/chmod as a fix, and the relationship between vector-env failing and file access issues.
But for quick resolution, kindly raise a ticket with SailPoint Support team so that they can guide you in a better way because mostly VA related things are their hands.
From what you’ve described, this looks more like a broken Vector startup/configuration issue than a resource problem.
A few observations:
secure/ being root:root 700 is typically expected on the VA. Containers that need those certs are usually granted access through mounts/permissions, so I wouldn’t assume that’s the root cause by itself.
vector.log remaining 0 bytes and vector-start.log only showing vector-env failed suggests Vector is failing before the service even initializes.
The fact that only one VA in the cluster is affected points more toward local corruption, a failed update, or a bad container/image state on that specific VA.
The OTEL 503s are likely a symptom rather than the cause. If telemetry components can’t start correctly, you’ll often see downstream metric export failures.
Given that:
Compare the Vector container image/version and environment variables between VA1 and VA2.
Check whether the certificate/key files referenced by VA_CERTIFICATE_PATH and VA_PRIVATE_KEY_PATH actually exist on VA2.
If everything matches VA1, I would open a support case with SailPoint and attach the VA support bundle.
In practice, I’ve seen redeploying the affected VA resolve similar single-node container corruption issues faster than prolonged troubleshooting.
Since the issue survives a restart and is isolated to one VA, my next step would be support bundle + VA redeployment.