429 Errors on Account Aggregations: Fixing with Throttle and Retries (Web Services - BlackLine)

Which IIQ version are you inquiring about?

8.5p1

Please share any images or screenshots, if relevant.

Share all details about your problem, including any error messages you may have received.

IAM wizards, I beseech thee!

Request:
I’m curious how to setup Account Aggregation Retries when receiving a 429 error, and how to appropriately throttle API requests to avoid it.

I’m hoping to:

1. get some eyes on our implementation and criticize it if needed

2. save someone else a headache if they have a similar issue

Context:

We currently use an Account Aggregation for the BlackLine (finance) application, which has a default global API rate limit of 1000 requests/minute (see here). Our personal aggregation pulls about 760~ users, with a parent call for /v1/users and then 3 child calls per user (for /teams, /entities, and /roles-products).

That’s a lot of calls very quickly, so we’ve changed a few things in hopes to try to fix the 429 errors we’d occasionally see. Here’s what we did:

  1. Added throttle to each call using a Before-Rule. We added the following Beanshell code to make sure the Thread sleeps 650 ms between each call (parent call included):
<Source>
  <![CDATA[
  import java.util.concurrent.TimeUnit;
  
  // Throttle all calls
  try {
    TimeUnit.MILLISECONDS.sleep(650); // 300ms and 400ms still saw occasional 429 errors...
  } catch (InterruptedException e) {
    log.warn("Throttle interrupted", e);
    Thread.currentThread().interrupt();
  }
  ]]>
</Source>
  1. Added this before-rule to each call (4 total) within the application object.
  2. Added retry logic to the BlackLine app object based on other topics found in the dev community (here and here) - we implemented this code within our app object:
<entry key="retryCount" value="5"/>
<entry key="retryWait" value="60000"/>
<entry key="aggregationRetryErrors">
  <value>
    <List>
      <String>429</String>
      <String>Too Many Requests</String>
      <String>Rate limit quota violation</String>
      <String>UsersAndDevices-V1_Default</String>
    </List>
  </value>
</entry>
<entry key="maxRetryCount" value="5"/>
<entry key="retryWaitTime" value="60000"/>
<entry key="retryableErrors">
  <value>
    <List>
      <String>429</String>
      <String>Too Many Requests</String>
      <String>Rate limit quota violation</String>
      <String>UsersAndDevices-V1_Default</String>
    </List>
  </value>
</entry>
  1. In the account aggregation task, we set sequential=”true”
  2. Set pagination for /v1/users to 300 instead of the default 25.

With these changes, we were seeing a relatively successful rate of aggregations, but I want to make sure this is the best solution here.

The average time after these changes is about 25-30 minutes for our 760 users, but success rate is higher (about 90%+).

The average time before these changes was about 7 minutes, but success rate was anywhere between 50% - 80%.

All recorded Fails and Cancels for these figures were due to 429 Errors.

When perusing the Web Services Connector docs, there is a setting for provisioning retries, but not for account aggregation retries… so I’m going off of the dev topics above, hoping that they’re working as intended?

I do not think there is much we can suggest apart from what you have already done.
Only thing is you can try to figure out if you can avoid the child api calls somehow. Eg: Some application supports you to add addtional attribute in api call and you can get all those listed in single call rather than executing multiple calls.

You are right track, if the increased aggregation time is acceptable and the errors stopped then I would leave at that.

Otherwise as @pradeep1602 suggested check if you have end point to get the entitlements, roles for all users at once instead of per user. You can invoke these end point in pre-iterate rule and keep that ready as lookup map which you can use for each user instead of making extra call to get that information.

Thank you @pradeep1602 and @sanjaysutarc, looking into the BlackLine API doc, it looks like there is a pagination based call for users and for teams, but entities and roles-products still need to be called per id (user).

Paginating the teams call would definitely lower the amount of API calls needed; do you know if this logic would go better into the aggregation’s Before-Rule or into the Operation for the API call?

@noahseward Please check this post if it helps, this is also related to Blackline Rate limit issue: Troubleshooting 429 Error: Rate Limit Quota Violation in SailPoint IIQ Webservice Application - #12 by neel193

Hi @noahseward

I personally worked on Blackline - IIQ integration in one of the project and this is what we done to resolve the issue -

  • Enable Custom Authentication operation in application - This handles your token generation logic
  • You will use the same token in Account Aggregation operation too just by passing a reference from Custom Authentication operation
  • Configure paging in Account Aggregation such that you increase pageStartIndex every time. Blackline APIs gives data of 25 users by default for each page. So one page of aggregation gives you 25 users data.
  • Pass the same access token from Account Aggregation AfterOperation rule to Customization Rule by adding token in resource object for every user.
  • In Customization Rule, Call Additional APIs for teams, entities and roles using the same token. So 1 token generated before aggregation will help you pull in data of 25 users along with their teams, entities and roles.
  • In the same rule, update aggCount in state variable like -
Integer aggCount = state.get("aggCount");
if (aggCount != null) 
{
	// Increment the existing count
	aggCount += 3;
	state.put("aggCount", aggCount);
	if (aggCount > 800) 
	{
		// Sleep for 90000 milliseconds
		Thread.sleep(90000);
		state.put("aggCount", 1);
	}
}
else 
{
	// Initialize the count
	state.put("aggCount", 1);
}
  • The above code will make the aggergation pause for 1.5 minutes and the process resumes afterwards
  • Similarly the next bunch of 25 users and their data is retrieved and the process continues till last user
  • This way we limited the calls made to generate access token and also solved the 429 error by pausing aggergation

Hi @noahseward As everyone said, you are in right track already. Regarding retries for aggregation, you can try with ‘possibleHttpErrors’ entry, though the purpose of that is different, I’m not sure whether it helps or not but you can give it a try. You can get more information in 8.2 connector notes from this link - https://community.sailpoint.com/t5/IdentityIQ-Connectors/8-2-Web-Services-Connector/ta-p/196607