Iterator fetching huge identity ids from Rule caused server to crash

aseelvn07 · April 24, 2025, 7:15am

Which IIQ version are you inquiring about?

8.3

Share all details about your problem, including any error messages you may have received.

We have below code run from a ‘Run Rule’ Task that caused server which it ran to crash.

Iterator iterate = context.search("sql:  SELECT id FROM identityiq.identityiq.spt_identity si WHERE [attributes] LIKE '%NativeChangeDetection%';", null, null);
 
List tempList = IteratorUtils.toList(iterate);
log.error("Total no: of cubes with NCD : "+tempList.size());
 
iterate = tempList.iterator();
while (iterate.hasNext()) {
    Identity id = context.getObjectById(Identity.class, iterate.next());
     // remaining code
}

The sql query (when tested running from DB directly) returned around 100k identity cubes. When the task was running for few minutes before server crashed, we saw CPU and memory utilization go very high (as seen in admin console).
Even the log.error statement didn’t get printed, which means the execution didn’t even reach there

What is the best way to tackle this scenario?

sukanta_biswas · April 24, 2025, 7:43am

@aseelvn07 -

Why the server choked

Step in your rule	What really happens	Cost
`context.search("sql … LIKE '%NativeChangeDetection%'")`	Hibernate executes the raw SQL and hydrates a full `Identity` object for every matching row, not just the `id`.	≈ 100 k objects × dozens of columns ⇒ tens of GB on-heap
`IteratorUtils.toList(iterate)`	Pulls the whole result set into a Java `ArrayList` all at once.	1 object wrapper + 1 array slot + 1 Hibernate proxy per row ⇒ heap blow-up
No decache / batching	All those objects remain pinned in the session cache; GC thrashes, CPU spikes, OOM kills the JVM.	Crash

The log line never printed because the JVM spent its time allocating, paging and finally dying before it reached that statement.

A pattern that will survive 100 k+ cubes

import sailpoint.api.IdIterator;
import sailpoint.object.QueryOptions;
import sailpoint.object.Filter;
import sailpoint.tools.Util;

QueryOptions qo = new QueryOptions();
// identical semantics, but let Hibernate build the SQL for you
qo.addFilter(Filter.like("attributes", "NativeChangeDetection"));

IdIterator it = new IdIterator(context, Identity.class, qo);   // streams only the ID column
int processed = 0;

try {
    while (it.hasNext()) {
        String id = (String) it.next();        // just a GUID, negligible memory
        Identity cube = context.getObjectById(Identity.class, id);

        // …your business logic here…

        context.decache(cube);                 // keep session small
        if (++processed % 500 == 0) {          // batch-size tune as needed
            context.commitTransaction();       // lets other threads run, releases locks
        }
    }
} finally {
    Util.flushIterator(it);                    // closes DB cursor 
}

Key changes

Stream, don’t collect
IdIterator (or its convenience wrapper IncrementalObjectIterator) fetches only the primary-key column, automatically decaches every 100 objects, and lets you commit/rollback inside the loop without losing your place - TaskResult Monitor UpdateProgress seems to save in memory objects? - #10 by jeff_larson
Projection, not hydration
If you prefer context.search, call the overload that lists the columns you need:

Iterator<Object[]> iter = context.search(Identity.class, qo, "id");

This avoids materialising full Identity objects until you explicitly call getObjectById - Context.search: Care and feeding of iterators - Compass.
3. Decache aggressively
context.decache(obj) (or a regular context.decache() every N records) prevents the session cache from ballooning.
4. Batch commits
A small commitTransaction every few hundred objects shortens database locks and gives the GC a chance to clean up.
5. Flush the iterator
Always Util.flushIterator (or use the try/finally above) so JDBC cursors are closed even on exceptions - Context.search: Care and feeding of iterators - Compass.

Optional refinements

Technique	When it helps
`qo.setCloneResults(true)`	You must call `commitTransaction()` inside the loop but are stuck with a standard iterator.
Partition the workload (multiple “Run Rule” tasks, each filtering on `id` range or shard key)	Clustered deployments where you want predictable runtimes and lower per-node memory.
Create a database index on the JSON `attributes` field (or split the flag into its own column)	If the `LIKE '%NativeChangeDetection%'` scan itself is the bottleneck.

TL;DR

Load only what you need, process in a streaming loop, decache, flush, and batch-commit.
Following the pattern above, 100 k, 1 M or even more cubes can be processed without exhausting heap or hanging the server.

Cheers!!!

aseelvn07 · April 24, 2025, 9:09am

@sukanta_biswas
Thanks for providing a very detailed analysis and solution for this. Appreciate it!

If possible, could you please elaborate on the “Partition the workload” refinement you mentioned.

I’m aware of partitioning tasks like IR task, Aggregation Task, Certification Staging etc using RequestDefinition object configuration/ setting it directly on these tasks.

But I don’t see any option to partition a task of ‘Rule Run’ type. Could you please help on this.

aseelvn07 · April 24, 2025, 10:49am

@sukanta_biswas

There is one other issue :
qo.addFilter(Filter.like(“attributes”, “NativeChangeDetection”));

This filter is not returning any identities.

sukanta_biswas · April 24, 2025, 3:19pm

@aseelvn07 -

The filter is not working because the attributes column is not searchable. Also, I would never recommend you to make it searchable.

If you could explain me the exact use case scenario, I would be able to help you in a better way. So far I understood you are trying to do something with Native Change Detection.

Also, If you must stick with raw SQL

The same streaming pattern still applies; just keep these tweaks:

Iterator rows = context.search(
        "sql:SELECT id FROM spt_identity WHERE attributes LIKE '%demoexample%'", 
        null, null);

int processed = 0;

try {
    while (rows.hasNext()) {
        Object row = rows.next();          // BeanShell: no generics
        String cubeId = row.toString();    // only column selected

        Identity cube = context.getObjectById(Identity.class, cubeId);

        log.info("First name = " + cube.getFirstname());

        context.decache(cube);             // free this one object
        processed++;

        if (processed % 200 == 0) {
            context.commitTransaction();   // shorter locks, GC window
        }
    }
} finally {
    Util.flushIterator(rows);              // always close cursor 
}

Important: call context.decache(cube), not context.decache() with no argument, otherwise you flush the whole Hibernate session every loop iteration.