Pause Source Aggregation/provisioning during Source Maintenance

Currently we have been notified that a Source will be undergoing an upgrade and regressions testing. The maintenance window that was noted is 4 days for the upgrade/testing, so this will affect the aggregation schedule and provisioning to it.

What I am looking for is options for pausing or disabling the Aggregations and provisioning to this Source so that the system does not sit in an error state, or cause issues.

The current process being considered is as follows:

  • Disable Aggregation Schedules
  • Disable Role that triggers Create Provisioning (No additional roles in this situation, but may be for other source)
  • Uncheck all Attribute Sync items
  • Remove the source from the Disable list in the Identity Profiles
  • Remove the Provisioning flags from the source.

This would cover most/all of the source access we encounter, but it does have some downsides.

  • Changes to several objects
  • Creation of Manual Work Items for things that might provision while flags are off.
  • Not all processes will re-trigger when turned back on. (Termination for example.)

Ideally there would be an option to put a source into Maintenance Mode, where it would paust the aggregsations and queue the provisioning requests until the system was out of Maintenance Mode.

What are some other ways people have handled this type of situation?

Hi @gmilunich this will affect all tenant or its only for yours.

Hey @gmilunich

I’d probably handle this situation the same, perhaps the only difference I’d consider is leaving some of the provisioning on despite knowing it will fail. That way I can at least go back and do a search of anything that failed and address it manually. But I like your idea of a maintenance mode. I’m not sure if it’s possible or not to leave tasks queued while allowing other tasks to proceed, but it would definitely make this kind of scenario easiier.

First off, I think this is a great idea, to have some ‘maintenance mode’ or ‘pause’ mode on a source that instantly ensures nothing happens on / to that source, while allowing your business to continue (request access, etc.)

What I would do (and of course try first in a sandbox environment) is to remove the provisioning capability of your source in question and then turn of the aggregation schedule. That way, all provisioning does ‘happen’ but turns into manual tasks in IdentityNow. Later you can put the capability back on the source and provisioning will happen again.

This might not fully work if you have a lot of access profiles / entitlements that are requestable, as they do not ‘stick’ on the identity and are not reprovisioned automatically, but attribute sync and role provisioning would at least happen.

This would just be for a singular tenant that it is configured for

I thought about letting them fail as well. We considered changing the URL/Credentials so any operation would fail, which would then provide us a mechanism to find those failed tasks. I think a combination of those approaches will be taken.

I agree too that there likely is not a way to leave tasks queued, but I wanted to mention is in case they are looking into it. I created an Idea for the Maintenance Mode in the ideas forum since that is what the product managers look at when determining new features to work on.
https://ideas.sailpoint.com/ideas/GOV-I-3920

See my link above for the Idea I created, and please add comments/suggestions to it so we can start a dialog with them.

The goal was to reduce the Manual Work Items created if possible. We have noted that turning off the Role will pause the create process, and when it is turned back on, that will ‘catch up’ all the users who should have had the role and thus an account. Likewise, as you mentioned, we can turn the attribute sync back on when the upgrade is complete and then do a manual, global attribute sync to catch those values up. It is really just the terminations we need to monitor, and I think we can do that with a query that looks for Terminated Users with active accounts once the system is back and had a chance to aggregate any changes in (just in case they manually did them during their regression testing after the upgrade)