JumpCloud Connector — Persistent 502 Bad Gateway During Account Aggregation (ISC Cloud)

Hi everyone,

we’ve been battling intermittent 502 Bad Gateway errors on our JumpCloud Web Services connector in SailPoint ISC (Cloud) for about a week now. I’ve done extensive analysis and wanted to share findings in case others are seeing similar behavior — and to ask if anyone has found a workable solution.

Environment

  • Platform: SailPoint Identity Security Cloud (ISC)

  • Connector: JumpCloud (Web Services type)

  • Account count: hundreds of users

  • Aggregation schedule: Every 30 minutes (account aggregation)

  • Entitlement types: user_group, active_directory, g_suite

The Problem

Account aggregation fails approximately 15–20% of the time with HTTP 502 Bad Gateway errors from the JumpCloud API. When a 502 occurs on any single user’s API call and retries exhaust, ISC terminates the entire aggregation run — there is no skip-and-continue capability in ISC (unlike IdentityIQ’s maxAllowedErrors).

Failure Pattern (7-Day Data)

Day         Passed  Failed  Total  Success
─────────────────────────────────────────────
2026-05-07    41       3      44     93%
2026-05-08    38      12      50     76%  ← worst day
2026-05-09    42       7      49     86%
2026-05-10    44       5      49     90%
2026-05-11    42       7      49     86%
2026-05-12    32       7      39     82%
2026-05-13    11       3      14     79%  (partial)
─────────────────────────────────────────────
TOTAL        250      44     294     85%
TARGET                              >95%

Failing Endpoints

The 502 errors hit these JumpCloud v2 endpoints during per-user association lookups:

GET /v2/users/{userId}/associations?targets=active_directory
GET /v2/users/{userId}/associations?targets=g_suite
GET /v2/users/{userId}/memberof

Key observations:

  • 502 hits at random positions in the user list (user 0, 100, 200, 300, 500 out of 601) — not user-specific

  • Failures are randomly distributed across all hours — no correlation with business hours or peak load

  • No HTTP 429 (rate limit) errors observed — only 502s

  • Test Connection to JumpCloud always passes; source stays SOURCE_STATE_HEALTHY

SailPoint Event Log Samples

Typical failure cluster from May 12 (UTC):

❌ 2026-05-12 07:02 UTC | SOURCE_ACCOUNT_AGGREGATION_FAILED | aggId=4d2d7e73...
❌ 2026-05-12 07:32 UTC | SOURCE_ACCOUNT_AGGREGATION_FAILED | aggId=fa5ab874...
❌ 2026-05-12 08:02 UTC | SOURCE_ACCOUNT_AGGREGATION_FAILED | aggId=150d1066...
✅ 2026-05-12 08:18 UTC | SOURCE_ACCOUNT_AGGREGATION_PASSED | aggId=f10083fd... (601 accounts)
✅ 2026-05-12 08:37 UTC | SOURCE_ACCOUNT_AGGREGATION_PASSED | aggId=7b883a3c...
...15 consecutive passes...
❌ 2026-05-13 03:46 UTC | SOURCE_ACCOUNT_AGGREGATION_FAILED
✅ ... passes ...
❌ 2026-05-13 08:11 UTC | SOURCE_ACCOUNT_AGGREGATION_FAILED
❌ 2026-05-13 09:49 UTC | SOURCE_ACCOUNT_AGGREGATION_FAILED

What We’ve Tried

1. Retry configuration increase (reverted — no improvement)

// Applied May 11, reverted May 12 after 24h monitoring showed no change
maxRetryCount:        3 → 5  (reverted to 3)
retryWaitTime:        10000ms → 20000ms  (reverted to 10000ms)
aggregationRetryErrors: ["502", "Bad Gateway", "Service Unavailable", "Connection reset", "Read timed out"]

// Result: Success rate went from 83% to 81% — statistically identical
// The 502s outlast even 5 retries × 20s = 100-second retry windows

2. Reduced parallel API load

// Reduced custom upload_users job from 2x/hour to 1x/hour (at :45 mark)
// to lower overall API pressure on JumpCloud endpoints
// Result: No measurable improvement in aggregation success rate

3. Attempted to remove AD/GS operations (reverted — data loss risk)

// Initially tried removing the 12 active_directory and g_suite connector operations
// to eliminate ~1,200 unnecessary API calls per aggregation cycle
// 
// REVERTED after discovering:
//   active_directory holders: 23/601 users (4%)
//   g_suite holders: 241/601 users (40%)
// Removing these would orphan entitlement data for 40% of users

Connector Configuration (Current)

{
  "connectorAttributes": {
    "maxRetryCount": "3",
    "retryWaitTime": "10000",
    "aggregationRetryErrors": ["502", "Bad Gateway", "Service Unavailable", "Connection reset", "Read timed out"],
    // 18 active operations (6 user_group + 6 active_directory + 6 g_suite)
  }
}

Our Hypothesis: Cloudflare Proxy Layer

We believe the 502s are Cloudflare-mediated rather than JumpCloud’s own rate limiting, based on:

  1. No 429s observed — JumpCloud’s rate limiter returns HTTP 429. We only see 502s, which is a gateway/proxy error.

  2. JumpCloud uses Cloudflare CDN — all API traffic passes through Cloudflare’s reverse proxy. Per Cloudflare’s docs, 502s can originate from:

    • Origin server (JumpCloud) being slow/overloaded → Cloudflare times out

    • Cloudflare’s traffic manager doing data center rebalancing (documented to cause brief 502s)

    • WAF/rate-limiting rules in JumpCloud’s Cloudflare configuration

  3. Random distribution — both temporally and by user position — is inconsistent with client-side rate limiting

  4. Retries don’t help — even 100-second retry windows fail, suggesting the origin is genuinely unreachable during failure windows

  5. SailPoint isn’t the only caller — our OlaCrew API, Hangfire jobs, and other integrations all share the same Cloudflare-proxied gateway

ISC Platform Limitation

SailPoint ISC does NOT support skip-and-continue. When all retries exhaust on a single account’s API call, the entire aggregation is terminated. This is an IdentityIQ-only capability (maxAllowedErrors). There is no workaround in ISC — every single 502 that survives the retry budget kills the whole run.

Questions for the Community

  1. Has anyone else running the JumpCloud Web Services connector in ISC seen similar intermittent 502s on the /v2/users/{id}/associations endpoints?

  2. Is there a way to configure the Web Services connector to skip failed accounts and continue aggregation in ISC? (We believe there isn’t, but hoping to be wrong.)

  3. Has anyone had success with page size reduction or request spacing in the JumpCloud connector to reduce API pressure?

  4. Are there any undocumented connector attributes for error tolerance or request throttling?

  5. Has SailPoint engineering acknowledged this as a known limitation for connectors talking to APIs behind Cloudflare CDN?

We’ve also reached out to JumpCloud directly through their community Slack and are opening a support case, but any insights from the SailPoint side would be very helpful.

Thanks in advance.

Hi,
Have you tried increasing the account aggregation schedule from 30 minutes to 1 hour or maybe 2 hours and see ?

Thanks