Perform Maintenance task is progressing very slow

Which IIQ version are you inquiring about?

Version 8.3

Share all details related to your problem, including any error messages you may have received.

Hi Sailors,
We are noticing a problem in the production environment. Waiting to hear more updates from the community

Perform maintenance task with only below option is running for more than 7-8 hours for processing around 3000 workflow events. Any way we can speed these events to process quickly.

Impact: Requesters are getting access very lately (taking almost 10-12 hours for the approval process to kick in and then provisioning another few hours).

Perform maintenance task with only below option enabled: (Scheduled to run every 5 mins)
Process background workflow events: enabled
Number of background workflow threads: 4 (Increased the count to 8 - still the processing is same)
partitioning : enabled

Server specs - 6 task servers with good amount of cpu and ram

Other information

  1. For almost 12 - 13 hours application account + group aggregations and identity refresh run every day.
  2. Able to see Workflowcases and WorkItems are getting locked (less in number) via iiq console (unlocked them using unlock iiq console command)
  3. Ran the db performance stats as suggested in identityiq kb article (stats below) - https://community.sailpoint.com/t5/Other-Documents/IdentityIQ-Database-Performance-Tests/ta-p/78060
    Meter IIQDB-Test-DataSet-1k-Item: 1000 calls, 20887 milliseconds, 7 minimum, 118 maximum, 20 average, top five [118,68,66,62,59]
    Meter IIQDB-Test-DataSet-4k-Item: 1000 calls, 17841 milliseconds, 10 minimum, 86 maximum, 17 average, top five [86,78,56,51,47]
    Meter IIQDB-Test-DataSet-8k-Item: 1000 calls, 20312 milliseconds, 13 minimum, 56 maximum, 20 average, top five [56,55,50,37,37]
  4. Access requests workload - 1500 per day with role/entitlement requests
  5. cant enable logs and not able to reproduce the load in other environments

Provide your suggestions on what should the next steps.

Vinod, quick questions:

How many servers do you have for processing?
How Many thread s do you have available to you?
Are you running things in Foreground or Background?

Best!

Hi @ipobeidi,
Thanks for your reply.

How many servers do you have for processing? - 6 task servers with 8 cpus and 32 gb ram
How Many thread s do you have available to you? -
Aggregate Partition : max threads - 8
Identity Refresh Partition: max threads - 8
Role Propagation Partition: max thread - 1
WorkItem Maintenance Request: max thread - 1
Certification Builder - max threads - 4
Certification Maintenance Request - max threads - 1
Are you running things in Foreground or Background? - background

let me know if you need any further details.

by any chance are you having thread dispute? having all the threads running aggregations and not having anything for the perform maint.

Perform maintenance is processing but it is very slow. We have no of background workflow threads as 4 right to process events.
I don’t think there is any thread dispute
How to find even if it’s there ?

Open de requests monitor, and see if you have requests hanging in there
count how many you have and compare to the number of threads.

best

We have increased the background workflow threads from 4 to 7
Number of background workflow threads: 4 to 7
finisher threads also from 4 to 7

Workflow thread timeout (seconds) : 600 seconds
Added the above parameter as we saw some workflow cases are taking long time to process as they are more than 5MB

Enabled partition: yes

With the above options the queue was cleared.

Need to use the above combinations of options to resolve the issue.

one more thing i have noticed was the below 3 objects were getting locked

WorkflowCase
WorkItem
Identity

when i do a listLocks command via iiq console on all of the above objects i see some of them were locked.

what could be the reason of these objects getting locked ?

We added timeout value in perform maintenance task this way the bigger access request related workflows will get interrupted, resulting in perform maintenance completion at a faster pace.