VA Critical Error – vector container failing with "vector-env failed" on every restart

Jaey · May 26, 2026, 5:22am

Hi everyone,

I’ve been dealing with a persistent Critical Error on one of my Virtual Appliances and would appreciate any insights.

SETUP

1 VA cluster with 2 VAs
VA 1: Connected (healthy, handling all workloads normally)
VA 2: Connected + Critical Error
VA 2 vector version: 0.53.0
OS: Flatcar Linux

SYMPTOM

VA 2 triggered the following Critical Error:

“vector service has restarted 53 times in the last 30 minutes”category: container / severity: errors

After investigating on the VA directly, I found:

vector container status: Exited (ExitCode 1) immediately on every start attempt
vector.log: 0 bytes (container never actually starts)
vector-start.log: filled entirely with “ERROR: vector-env failed”, nothing else
All other containers (ccg, va_agent, charon, fluent, otel_agent) are healthy
Memory: 12 Gi free, Disk: 104 G free — not a resource issue
OOMKilled: false

WHAT I FOUND

The vector container has these environment variables set:

VA_CERTIFICATE_PATH=/opt/sailpoint/share/secure/va-gateway.crt
VA_PRIVATE_KEY_PATH=/opt/sailpoint/share/secure/va-gateway.key

But when I checked the directory:

drwx------. 2 root root 4096 Jun 24 2025 /opt/sailpoint/share/secure

The secure/ directory is root:root 700 — completely inaccessible to the sailpoint user or the container’s entrypoint script. My theory is that vector-env is trying to read the certificate/key files at startup and failing immediately because of this permission.

I also noticed that otel_agent has been logging 503 errors when trying to push metrics to:

https://edge-prometheus-us-east-1.identitynow-demo.com/api/v1/write

Last error was around 2026-05-21. Not sure if this is related.

WHAT I TRIED

Cluster restart — issue reproduced after about 30 minutes
Checked vector-start.log, vector.log, otel_agent.log
Confirmed no other errors in vector-start.log besides “vector-env failed”

QUESTIONS

Has anyone seen “ERROR: vector-env failed” before? What was the root cause in your case?
Is the secure/ directory being root:root 700 expected, or does it suggest something went wrong during a VA update?
Did a VA re-deployment resolve this for anyone, or is this something that requires SailPoint Support intervention?
Any idea whether the otel_agent 503 errors are related to the vector issue?

Thanks in advance — any experience or pointers would be really helpful!

RAKGDS · May 31, 2026, 1:40pm

Hi,

I would request you to raise a Sailpoint Support ticket for this and they should be able to guide you through next steps.

Jetendrakumar1991 · June 1, 2026, 7:57am

Looks like permission errors causing container startup failures, chown/chmod as a fix, and the relationship between vector-env failing and file access issues.
But for quick resolution, kindly raise a ticket with SailPoint Support team so that they can guide you in a better way because mostly VA related things are their hands.

577bdd863f3b8692a032ea4eaa2f50f · June 1, 2026, 10:20am

Jaeyoung Yang:

Hi everyone,

I’ve been dealing with a persistent Critical Error on one of my Virtual Appliances and would appreciate any insights.

SETUP

1 VA cluster with 2 VAs

VA 1: Connected (healthy, handling all workloads normally)

VA 2: Connected + Critical Error

VA 2 vector version: 0.53.0

OS: Flatcar Linux

SYMPTOM

VA 2 triggered the following Critical Error:
“vector service has restarted 53 times in the last 30 minutes”category: container / severity: errors
After investigating on the VA directly, I found:

vector container status: Exited (ExitCode 1) immediately on every start attempt

vector.log: 0 bytes (container never actually starts)

vector-start.log: filled entirely with “ERROR: vector-env failed”, nothing else

All other containers (ccg, va_agent, charon, fluent, otel_agent) are healthy

Memory: 12 Gi free, Disk: 104 G free — not a resource issue

OOMKilled: false

WHAT I FOUND

The vector container has these environment variables set:
VA_CERTIFICATE_PATH=/opt/sailpoint/share/secure/va-gateway.crt
VA_PRIVATE_KEY_PATH=/opt/sailpoint/share/secure/va-gateway.key
But when I checked the directory:
drwx------. 2 root root 4096 Jun 24 2025 /opt/sailpoint/share/secure
The secure/ directory is root:root 700 — completely inaccessible to the sailpoint user or the container’s entrypoint script. My theory is that vector-env is trying to read the certificate/key files at startup and failing immediately because of this permission.

I also noticed that otel_agent has been logging 503 errors when trying to push metrics to:
https://edge-prometheus-us-east-1.identitynow-demo.com/api/v1/write
Last error was around 2026-05-21. Not sure if this is related.

WHAT I TRIED

Cluster restart — issue reproduced after about 30 minutes

Checked vector-start.log, vector.log, otel_agent.log

Confirmed no other errors in vector-start.log besides “vector-env failed”

QUESTIONS

Has anyone seen “ERROR: vector-env failed” before? What was the root cause in your case?

Is the secure/ directory being root:root 700 expected, or does it suggest something went wrong during a VA update?

Did a VA re-deployment resolve this for anyone, or is this something that requires SailPoint Support intervention?

Any idea whether the otel_agent 503 errors are related to the vector issue?

Thanks in advance — any experience or pointers would be really helpful!

From what you’ve described, this looks more like a broken Vector startup/configuration issue than a resource problem.

A few observations:

secure/ being root:root 700 is typically expected on the VA. Containers that need those certs are usually granted access through mounts/permissions, so I wouldn’t assume that’s the root cause by itself.
vector.log remaining 0 bytes and vector-start.log only showing vector-env failed suggests Vector is failing before the service even initializes.
The fact that only one VA in the cluster is affected points more toward local corruption, a failed update, or a bad container/image state on that specific VA.
The OTEL 503s are likely a symptom rather than the cause. If telemetry components can’t start correctly, you’ll often see downstream metric export failures.

Given that:

Compare the Vector container image/version and environment variables between VA1 and VA2.
Check whether the certificate/key files referenced by VA_CERTIFICATE_PATH and VA_PRIVATE_KEY_PATH actually exist on VA2.
If everything matches VA1, I would open a support case with SailPoint and attach the VA support bundle.
In practice, I’ve seen redeploying the affected VA resolve similar single-node container corruption issues faster than prolonged troubleshooting.

Since the issue survives a restart and is isolated to one VA, my next step would be support bundle + VA redeployment.

Topic		Replies	Views
My cluster has 2 VA's one va become inactive state suddenly, VA update is in progress state. I did the curl and it's allowing but docker ps -a shows ccg exited yesterday and otel_agent exited today. Need suggestions or thoughts asap will be more helpful ISC Discussion and Questions identity-security-cloud , virtual-appliance	1	75	December 15, 2025
"Unable to authenticate with SailPoint" error message during VA setup ISC Discussion and Questions identity-security-cloud , virtual-appliance	9	1317	April 11, 2024
Partner VA issue again ISC Discussion and Questions identity-security-cloud , virtual-appliance	2	94	March 6, 2026
SailPoint VA Pairing Errors: "Certificate expired" & "/etc/ssl/certs isn't writable" ISC Discussion and Questions identity-security-cloud , virtual-appliance	5	191	April 6, 2026
New VA setup only has charon and va_agent ISC Discussion and Questions identity-security-cloud	7	431	July 8, 2024

VA Critical Error – vector container failing with "vector-env failed" on every restart

Related topics