Iterator fetching huge identity ids from Rule caused server to crash

@aseelvn07 -

Why the server choked

Step in your rule What really happens Cost
context.search("sql … LIKE '%NativeChangeDetection%'") Hibernate executes the raw SQL and hydrates a full Identity object for every matching row, not just the id. ≈ 100 k objects × dozens of columns ⇒ tens of GB on-heap
IteratorUtils.toList(iterate) Pulls the whole result set into a Java ArrayList all at once. 1 object wrapper + 1 array slot + 1 Hibernate proxy per row ⇒ heap blow-up
No decache / batching All those objects remain pinned in the session cache; GC thrashes, CPU spikes, OOM kills the JVM. Crash

The log line never printed because the JVM spent its time allocating, paging and finally dying before it reached that statement.


A pattern that will survive 100 k+ cubes

import sailpoint.api.IdIterator;
import sailpoint.object.QueryOptions;
import sailpoint.object.Filter;
import sailpoint.tools.Util;

QueryOptions qo = new QueryOptions();
// identical semantics, but let Hibernate build the SQL for you
qo.addFilter(Filter.like("attributes", "NativeChangeDetection"));

IdIterator it = new IdIterator(context, Identity.class, qo);   // streams only the ID column
int processed = 0;

try {
    while (it.hasNext()) {
        String id = (String) it.next();        // just a GUID, negligible memory
        Identity cube = context.getObjectById(Identity.class, id);

        // …your business logic here…

        context.decache(cube);                 // keep session small
        if (++processed % 500 == 0) {          // batch-size tune as needed
            context.commitTransaction();       // lets other threads run, releases locks
        }
    }
} finally {
    Util.flushIterator(it);                    // closes DB cursor 
}

Key changes

  1. Stream, don’t collect
    IdIterator (or its convenience wrapper IncrementalObjectIterator) fetches only the primary-key column, automatically decaches every 100 objects, and lets you commit/rollback inside the loop without losing your place - TaskResult Monitor UpdateProgress seems to save in memory objects? - #10 by jeff_larson
  2. Projection, not hydration
    If you prefer context.search, call the overload that lists the columns you need:
Iterator<Object[]> iter = context.search(Identity.class, qo, "id");

This avoids materialising full Identity objects until you explicitly call getObjectById - Context.search: Care and feeding of iterators - Compass.
3. Decache aggressively
context.decache(obj) (or a regular context.decache() every N records) prevents the session cache from ballooning.
4. Batch commits
A small commitTransaction every few hundred objects shortens database locks and gives the GC a chance to clean up.
5. Flush the iterator
Always Util.flushIterator (or use the try/finally above) so JDBC cursors are closed even on exceptions - Context.search: Care and feeding of iterators - Compass.


Optional refinements

Technique When it helps
qo.setCloneResults(true) You must call commitTransaction() inside the loop but are stuck with a standard iterator.
Partition the workload (multiple “Run Rule” tasks, each filtering on id range or shard key) Clustered deployments where you want predictable runtimes and lower per-node memory.
Create a database index on the JSON attributes field (or split the flag into its own column) If the LIKE '%NativeChangeDetection%' scan itself is the bottleneck.

TL;DR

Load only what you need, process in a streaming loop, decache, flush, and batch-commit.
Following the pattern above, 100 k, 1 M or even more cubes can be processed without exhausting heap or hanging the server.

Cheers!!!

2 Likes