Iterator fetching huge identity ids from Rule caused server to crash

Which IIQ version are you inquiring about?

8.3

Share all details about your problem, including any error messages you may have received.

We have below code run from a ‘Run Rule’ Task that caused server which it ran to crash.

Iterator iterate = context.search("sql:  SELECT id FROM identityiq.identityiq.spt_identity si WHERE [attributes] LIKE '%NativeChangeDetection%';", null, null);
 
List tempList = IteratorUtils.toList(iterate);
log.error("Total no: of cubes with NCD : "+tempList.size());
 
iterate = tempList.iterator();
while (iterate.hasNext()) {
    Identity id = context.getObjectById(Identity.class, iterate.next());
     // remaining code
}

The sql query (when tested running from DB directly) returned around 100k identity cubes. When the task was running for few minutes before server crashed, we saw CPU and memory utilization go very high (as seen in admin console).
Even the log.error statement didn’t get printed, which means the execution didn’t even reach there

What is the best way to tackle this scenario?

@aseelvn07 -

Why the server choked

Step in your rule What really happens Cost
context.search("sql … LIKE '%NativeChangeDetection%'") Hibernate executes the raw SQL and hydrates a full Identity object for every matching row, not just the id. ≈ 100 k objects × dozens of columns ⇒ tens of GB on-heap
IteratorUtils.toList(iterate) Pulls the whole result set into a Java ArrayList all at once. 1 object wrapper + 1 array slot + 1 Hibernate proxy per row ⇒ heap blow-up
No decache / batching All those objects remain pinned in the session cache; GC thrashes, CPU spikes, OOM kills the JVM. Crash

The log line never printed because the JVM spent its time allocating, paging and finally dying before it reached that statement.


A pattern that will survive 100 k+ cubes

import sailpoint.api.IdIterator;
import sailpoint.object.QueryOptions;
import sailpoint.object.Filter;
import sailpoint.tools.Util;

QueryOptions qo = new QueryOptions();
// identical semantics, but let Hibernate build the SQL for you
qo.addFilter(Filter.like("attributes", "NativeChangeDetection"));

IdIterator it = new IdIterator(context, Identity.class, qo);   // streams only the ID column
int processed = 0;

try {
    while (it.hasNext()) {
        String id = (String) it.next();        // just a GUID, negligible memory
        Identity cube = context.getObjectById(Identity.class, id);

        // …your business logic here…

        context.decache(cube);                 // keep session small
        if (++processed % 500 == 0) {          // batch-size tune as needed
            context.commitTransaction();       // lets other threads run, releases locks
        }
    }
} finally {
    Util.flushIterator(it);                    // closes DB cursor 
}

Key changes

  1. Stream, don’t collect
    IdIterator (or its convenience wrapper IncrementalObjectIterator) fetches only the primary-key column, automatically decaches every 100 objects, and lets you commit/rollback inside the loop without losing your place - TaskResult Monitor UpdateProgress seems to save in memory objects? - #10 by jeff_larson
  2. Projection, not hydration
    If you prefer context.search, call the overload that lists the columns you need:
Iterator<Object[]> iter = context.search(Identity.class, qo, "id");

This avoids materialising full Identity objects until you explicitly call getObjectById - Context.search: Care and feeding of iterators - Compass.
3. Decache aggressively
context.decache(obj) (or a regular context.decache() every N records) prevents the session cache from ballooning.
4. Batch commits
A small commitTransaction every few hundred objects shortens database locks and gives the GC a chance to clean up.
5. Flush the iterator
Always Util.flushIterator (or use the try/finally above) so JDBC cursors are closed even on exceptions - Context.search: Care and feeding of iterators - Compass.


Optional refinements

Technique When it helps
qo.setCloneResults(true) You must call commitTransaction() inside the loop but are stuck with a standard iterator.
Partition the workload (multiple “Run Rule” tasks, each filtering on id range or shard key) Clustered deployments where you want predictable runtimes and lower per-node memory.
Create a database index on the JSON attributes field (or split the flag into its own column) If the LIKE '%NativeChangeDetection%' scan itself is the bottleneck.

TL;DR

Load only what you need, process in a streaming loop, decache, flush, and batch-commit.
Following the pattern above, 100 k, 1 M or even more cubes can be processed without exhausting heap or hanging the server.

Cheers!!!

2 Likes

@sukanta_biswas
Thanks for providing a very detailed analysis and solution for this. Appreciate it!

If possible, could you please elaborate on the “Partition the workload” refinement you mentioned.

I’m aware of partitioning tasks like IR task, Aggregation Task, Certification Staging etc using RequestDefinition object configuration/ setting it directly on these tasks.

But I don’t see any option to partition a task of ‘Rule Run’ type. Could you please help on this.

@sukanta_biswas

There is one other issue :
qo.addFilter(Filter.like(“attributes”, “NativeChangeDetection”));

This filter is not returning any identities.

@aseelvn07 -

The filter is not working because the attributes column is not searchable. Also, I would never recommend you to make it searchable.

If you could explain me the exact use case scenario, I would be able to help you in a better way. So far I understood you are trying to do something with Native Change Detection.

Also, If you must stick with raw SQL

The same streaming pattern still applies; just keep these tweaks:

Iterator rows = context.search(
        "sql:SELECT id FROM spt_identity WHERE attributes LIKE '%demoexample%'", 
        null, null);

int processed = 0;

try {
    while (rows.hasNext()) {
        Object row = rows.next();          // BeanShell: no generics
        String cubeId = row.toString();    // only column selected

        Identity cube = context.getObjectById(Identity.class, cubeId);

        log.info("First name = " + cube.getFirstname());

        context.decache(cube);             // free this one object
        processed++;

        if (processed % 200 == 0) {
            context.commitTransaction();   // shorter locks, GC window
        }
    }
} finally {
    Util.flushIterator(rows);              // always close cursor 
}

Important: call context.decache(cube), not context.decache() with no argument, otherwise you flush the whole Hibernate session every loop iteration.

Cheers!!!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.