How to Retry Workflows in Identity Security Cloud

colin_mckibben · October 17, 2024, 1:59pm

Description

Workflows may fail for a number of reasons. Your workflow logic may have been flawed and resulted in an unforeseen error during an execution, a downstream API that you are invoking may have been unavailable, or the workflow service itself may have had an outage. Actions that would have been automated by the failed workflow execution now require you to find and remediate them manually… or do they? Join us as SailPoint Developer Advocate, Colin McKibben, demonstrates how to retry failed workflows.

Resources

The test workflow endpoint will only work if the workflow is disabled. Attempting to test an enabled workflow will result in an error. Either disable your workflow before invoking the test endpoint, or duplicate the workflow in a disabled state for the purpose of retrying executions.

angelo_mekenkamp · February 28, 2025, 4:01pm

This is not supported anymore is it?

I sadly got this error when trying to test a workflow that was enabled.

{
  "detailCode":"Internal Server Error",
  "trackingId":"7023be1ca6ab48a7a5da558a34840124",
  "messages":[
    {
    "locale":"en-US",
    "localeOrigin":"DEFAULT",
    "text":"workflow is enabled but executed as a test workflow"
    }
  ]
}

This makes working with workflows more cumbersome.

I think there should be an out of the box way to retry failed workflows, but also an easier way to execute them. It seems it is not even possible anymore as ORG_ADMIN to execute a external workflow?

@tburt, is this a recent change? That it now starts giving errors when you attempt to retry a failed workflow execution? Has this change been announced, as it is breaking our process?

We currently have the following scenario:

Workflow was working
Workflow owner lost admin access
Workflow failed performing the get identity action, because apparently it performs this action on behalf of the workflow owner, even though no client id and secret had to be given.
We briefly disabled the workflow, because it is not possible to change ownership of enabled workflows. Then changed the owner to someone with admin access and quickly enabled the workflow again.
Now we want to retry the failed workflows without disabling the workflow, as it is still being used for current workflow executions, but it does not seem to be possible.

Kind regards,
Angelo

tburt · March 4, 2025, 10:37pm

Hello @angelo_mekenkamp

There was a bug fix applied that closes this loophole in how the API works. In the past, you could use the [Test Workflow by ID] (test-workflow | SailPoint Developer Community) API against Enabled workflows.

You can duplicate the existing workflow and run a “Test Workflow by ID” using the same API but we should not allow workflows that are Enabled to have be run using the TEST API.

angelo_mekenkamp · March 5, 2025, 8:02am

Hi @tburt, thank you for your response!

If this is truly a bug, why is SailPoint recommanding using it in this way in the video @colin_mckibben shared here?

Being able to manually call an enabled workflow is used for:

Workflows, that should be called on both a specific trigger, but also on an ad hoc basis.
Workflows that use recursion and should be able to call themselves. For example to deal with pagination.
Workflows that failed due to external factors (can’t disable account as source is disconnected) and should be retried.
Org Admins that want to execute a workflow with ad hoc trigger (external trigger) through a script or postman, using the default Personal Access Token they use from there. After all, if they are org admin, they can execute the steps in the workflow in the workaround you suggest anyway, so might as well allow them to call the workflow itself properly while it is in enabled state.

Your “'bugfix” is breaking all this already build and used functionality.

Besides breaking this functionality, I also don’t agree with the statement that this change closes the loophole in how the API works. It is not a loophole to begin with as we actually need this, but even if it was. You are literally saying how you can use a loophole to circumvent the closure of this “loophole”. So the “issue” is still not solved. Those who could do this before can still do this, but just with more tedious steps.

Besides requiring more actions, another downside of this loophole of this “loophole” would be that the workflow execution is now at a different location, and if the workflow uses recursion and calls itself, the duplication will not use recursion upon duplication as it still references the original workflow. As a consequence, different iterations of workflows will now have executions in different locations as well, adding more chaos to the process.

Besides this we would then have to have twice as many workflows. One is enabled mode and one in disabled mode. Disabled would usually suggest here that it is not being used, but due to your suggestion, we actually have to use these workflows in disabled mode in production for it to work at all.

I strongly request to revisit this strategy.

Kind regards,
Angelo

atarodia · March 5, 2025, 9:28am

Hi @angelo_mekenkamp,

Thank you for raising this concern.

I completely agree with your points. We were also leveraging the “Test Workflow by ID” API for more than just retrying failed workflows. One key use case for us was triggering a workflow from another workflow, independent of its enabled state.

With this change, we now have to maintain duplicate workflows performing the exact same actions just to work around this limitation. This not only adds unnecessary complexity but also makes it harder to explain to customers why we need multiple versions of the same workflow for proper execution.

I strongly support revisiting this decision.

Regards,
Animesh

tburt · March 5, 2025, 2:03pm

@angelo_mekenkamp The developer community provides tools and tips that are in extension of what is officially documented and in some cases supported. As example, the use of Request Response triggers in Workflows is not officially supported but there are articles on how you can implement them in this community.

For this specific issue, there were negative outcomes that could be achieved using the Test API against “Enabled” workflows that were never intended. As an interesting side note, there is a support ticket that your company opened that pointed us to this issue and caused us to lock this down to only “Disabled Workflows” (the original intent of the API)

I will defer to @colin_mckibben for a more detailed explanation of Dev Community.

angelo_mekenkamp · March 5, 2025, 2:42pm

Can you please ping me the reference of this support ticket @tburt and I will investigate this. I am interested in these negative outcomes you mentioned.

You can see, also based by the comment of @atarodia that I am not alone with the request to be able to call workflows outside of the configured trigger.

If anything I would suggest to refrain from calling this endpoint ‘test’, because after all, it will perform all the operations anyway, like sending emails to production identities, regardless of if you call it ‘test’ or not.

tburt · March 5, 2025, 5:29pm

@angelo_mekenkamp I do not have the support ticket but this came out of the conversation we had November 1, 2024. You requested the conversation on that date using the ‘Provide Feedback’ link in-product.

angelo_mekenkamp · March 5, 2025, 6:48pm

Ahh, you are referring to that conversation.

Yes, there I was referring to the security vulnerability I found where you could take a workflow with external trigger, generate client Id and secret from this workflow and then use these credentials to trigger other workflows, using the test API. That was a security issue since you want to be able to pass these credentials to a third party that can then only use those credentials to call that specific workflow and instead they could run all workflows.

It still makes sense to me that org admins can use PATs to trigger workflows (both those with and without external trigger)
I believe I also mentioned that it would make sense for this test endpoint to not be called a ‘test’ endpoint as it still performs all the steps anyway, not unlike test email template, which only sends the email to you.

But preventing manual executions of enabled accounts was not part of my suggestions.
Preventing those who can already create loopholes for executing workflows from directly executing these workflows doesn’t sound like solving a bug to me.

But regardless of what was discussed during that call, I hope you understand the points raised here by @atarodia and me here and see merit in having a way that allows us to retry failed workflows and perform the other operations as suggested above and to allow workflows to recursively call themselves irrespective of trigger.

And given that this functionality was being used this way. A timely announcement would have been preferred as well.

Please don’t get me wrong here. I believe we have the shared goal of having workflows that allow customers to build processes that are user friendly, efficient and secure. And my feedback is given with that goal in mind.

One last thing I noticed. If a workflow executes another workflow in the regular way while that child workflow is disabled. The parent workflow will not get a 400 error, but instead get a 200 message saying it can not perform the requested operation as the workflow is disabled. Since it does not return an error code, the parent workflow will not properly follow the error part of the workflow and the parent workflow suggests it ended successfully even though it was not able to do what it was supposed to do.

Kind regards,
Angelo

adunker · March 7, 2025, 3:26pm

I completely agree with @angelo_mekenkamp that taking away the functional use case is tough and something we had built operational procedures around. For example, there was a SailPoint outage today that affected our workflow executions (the 503s) and now we need to re-execute workflows, but are unable to use the previous procedure of using the /test enpoint that we had.