A Complete Guide to Retrieving Identity Data via the Identity Security Cloud API

Introduction

One of the most common needs when working programmatically with the Identity Security Cloud (ISC, formerly IdentityNow) API is to get information about Identities. There are two different ways to get this data via API.

/identities API endpoint

While technically a pre-prod API, there is an /identities endpoint. It was originally released as https://<orgName>.api.identitynow.com/beta/identities and is now available at /v2025/identities with the X-SailPoint-Experimental header set to true.

This works in a standard RESTful way:

List Identities
GET /identities lists all the identities, with the standard pagination using offset (>=0) and limit (<=250)

Identity Details
GET /identities/:id lets you get one specific identity using the object guid (32-character hexadecimal string) as the value for the path variable :id

NOTE: the first endpoint returns an array of JSON objects, while the second just returns a single object. If the filter HTTP query parameter you apply to “List Identities” causes the endpoint to return only one identity, the result will be an array with one element.

Response Schema

Let’s take a look at the response schema and point out some relevant details—see the example JSON below:

  • alias: the “technical name” of the Identity (For those coming from IIQ: the “cube” name)
  • emailAddress: this appears as a top-level attribute on the Identity, but it may also appear as one or more “attributes” based on how you configured the IdentityProfile [see “attributes” below]
  • managerRef: the manager relationship is represented as a pointer to another Identity object. Each Identity contains some minimum information to be useful to both machines and humans, namely managerRef.id and managerRef.name
  • “status” properties (lifecycleState/processingState/identityStatus): which properties are returned and what their possible values can be have changed over time, most notably when SailPoint introduced the concept of Identity State in addition to lifecycle state. As this is technically an experimental endpoint, I expect this to continue to change as concepts of status evolve. However, all of these are top-level properties in each result.
  • attributes: this is a JSON object containing all the attributes defined in the Identity Profile that was used to create this Identity. I’ve truncated the attributes Map in the example, but this object can get very long.
{
	"alias": "NCC1701",
	"emailAddress": "[email protected]",
	"isManager": true,
	"managerRef": {
		"type": "IDENTITY",
		"id": "ad75c841dceb4a208507fdce9da03f5a",
		"name": "Dee Bossa"
	},
	"lastRefresh": "2025-01-13T22:25:33.501Z",
	"lifecycleState": {
		"stateName": "active",
		"manuallyUpdated": false
	},
	"processingState": null,
	"identityStatus": "ACTIVE",
	"attributes": {
		"lastname": "Sandhu",
		"firstname": "Vip",
		"email": "[email protected]",
		"uid": "[email protected]",
		"title": "Tamyrlin Seat"
		"department": "CISO",
		//...
	},
	"id": "3d02660e4edc4c45a4dd54c37ccde6e2",
	"name": "Vip Sandhu",
	"created": "2020-02-06T0:10:48.869Z",
	"modified": "2025-01-13T22:25:37.415Z"
}

Search

What if we wanted to get manager information on 10,000 identities? With a maximum limit of 250 identities returned per page, that would take 40 pages’ worth of queries against the /identities endpoint. Now imagine we need to pull that information for 100,000 users. The wild truth is that you can do it in 10 queries, and probably do it faster than hitting the /identities endpoint for 10,000 users. The secret is in specifying what information you want to know about each identity.

The Deep Identity-Retrieval Magic

What has been a constant way to retrieve Identity information programmatically is Search. It is an extremely powerful way to find information - not just about Identities, but that’s the “index” that I will discuss in this article. The other indices are: Access Profiles, Account Activities, Entitlements, Events, and Roles. The /search endpoint is present in these currently available versions of the API [ v3, v2024, v2025]. Search can seem daunting, but I’ll simplify its use by keeping the scope of this blog post to searching for Identity data in a way that unifies search in the UI and in the API.

Compose the POST body

Searching using an API query requires a POST operation. The request body is a JSON object with a variable number of properties, discussed in detail below. At a minimum, you should put two properties in the request body:

  • While the search index (the indices property in the POST body) is optional, I recommend that you specify this to speed up your queries and guarantee that only Identities are returned. Specifically, this value is an array of strings, and the array should have just one element, the all-lowercase string "identities".
  • The query property is a JSON object. To keep it simple, put just one property in that object: the query.query string itself. Any valid search query that you could use in the UI at https://<orgname>.identitynow.com/ui/search will work.

Example of minimal search POST body, looking for Active Employee Identities created since the beginning of 2020:

{
	"indices": ["identities"],
	"query": {
		"query": "status:ACTIVE AND attributes.workerType:Employee AND created:>=2020-01-01"
	}
}

When copy/pasting your search query from the UI, the only thing to keep in mind is that if your query contains double quotes, you must deal with that using either of two approaches:

1. Escape the double-quotes with backlashes: If your search query is going into a double-quoted string, any double-quotes inside the search query must be escaped with a backslash

{
	"indices": ["identities"],
	"query": {
		"query": "firstName:\"Jim Bob\""
	}
}

2. Use outer single quotes: If you wrap the query.query property value in single quotes, double-quotes inside the search query will work.

{
	"indices": ["identities"],
	"query": {
		"query": 'firstName:"Jim Bob"'
	}
}

Default properties in the Search result schema

A note on the example results you will see below—search always includes 3 properties in each returned object, regardless of index: “type,” and two properties that, according to the common Javascript naming convention of a leading underscore, should be treated as “private” members for internal use only: “_type” and “_version”. They are only mentioned here for completeness and to reduce confusion.

{
	//...
	"_type": "identity",
	"type": "identity",
	"_version": "v2"
}

Two additional default properties show up when you include nested results (see two sections below for an explanation of nested results):

  • “pod”: <environment><serialNumber>-<aws region of the tenant>
  • “org”: <orgname> (identitynow.com subdomain)

How to Sort and Fetch more than 10,000 Results

If you’re potentially going to retrieve thousands of Identities and want to boost the performance of your code, first make sure you specify the page limit in the HTTP query parameters. The default value is 250, but on this /search endpoint, the maximum value is 10000. I.e.: you should query the endpoint using the following URL pattern:

https://<orgName>.api.identitynow.com/<version>/search?limit=10000

An example of a broad search query that can return a very large number of Identities:

{
	"indices": ["identities"],
	"query": {
		"query": "status:ACTIVE"
	},
	"sort": [ "id" ]
}

Notice that I have included a new property in the POST body, sort. It takes an array of property names and lets you perform tiered sorting. For example, if you wanted to find the longest-serving person in every department, you could use the following, which would sort first by department then within each department by start date:

"sort": [ "attributes.department", "attributes.startDate"]

By default, search returns the results in ascending order, but you can specify the direction by prefixing the property name with ‘+’ for ascending, or ‘-’ for descending order. For example, let’s say you wanted to see the newest person in each department first, you could use the following (note that the ‘+’ is redundant but valid syntax):

"sort": [ "+attributes.department", "-attributes.startDate"]

Technically, sorting the results can be done even if you expect a small number of them. However, it’s mandatory when paginating results beyond 10,000, because the offset HTTP query parameter doesn’t work to paginate results beyond the first 10K. (I’m making an educated guess here, but I’m inferring that on the backend, ElasticSearch is grabbing up to 10K results at a time, and the way for us to “talk past” the API HTTP server to the ElasticSearch engine is via the POST body). Instead of using offset, you sort the results, then specify that you want to “searchAfter” the last result of the first 10,000. For this reason, you want to ensure that you’re sorting on a unique value. It is recommended that you always include "id" in the sort array to guarantee a fixed sort order. Since the most likely approach you’ll use to deal with 10,000+ results is to handle them programmatically, only sorting on "id" is a valid approach. Let’s say we run the search in the full JSON POST body above, i.e.: searching for all users with an ACTIVE status. These are the summarized results (just the relevant properties of the first and last result of each 10,000):

First 10,000:

[
	{
		"displayName": "William Edward Kaiyode",
		"id": "001ad3e7d10e418a834de5fa7e9d0902",
		"_type": "identity",
		"type": "identity",
		"_version": "v2"
	},
//...
	{
		"displayName": "Zod Kryptonovich",
		"id": "2c91808471516cc6017154cebe745bca",
		"_type": "identity",
		"type": "identity",
		"_version": "v2"
	}
]

We should search after Zod Kryptonovich, whose id is 2c91808471516cc6017154cebe745bca. The searchAfter property’s value is an array that mirrors the structure of the sort array. If we had sorted by displayName, then by id, and Zod Kryptonovich had still been the last result on a page, we would put in both values for the result after which we want to search:

	"sort": [ "displayName", "id" ],
	"searchAfter": [ "Zod Kryptonovich", "2c91808471516cc6017154cebe745bca" ]

Since these results were received sorting only on id (which is the most common use case), we only need to specify Zod’s id as below:

{
	"indices": ["identities"],
	"query": {
		"query": "status:ACTIVE"
	},
	"sort": [ "id" ],
	"searchAfter": ["2c91808471516cc6017154cebe745bca"]
}

This returns the next 10,000 results, similarly summarized. Notice that the first id is very close (in terms of guids) to the last of the previous page, only 0x670 0008 apart (or 108,003,336 in decimal)

[
	{
		"displayName": "Jack Sparrow",
		"id": "2c91808471516cc6017154cec4e45bd2",
	},
//...
	{
		"displayName": "J Jonah Jameson",
		"id": "ffb6b887a4cc4475bf149c5717698143",
	}
]

If we wanted to get the next 10,000, we would just search after "ffb6b887a4cc4475bf149c5717698143".

Include Nested Properties in Search Results

Search allows you to optionally include 3 nested object properties with the Identity Results: accounts, apps, and access. Each is an array of objects. This is the default behavior and can be turned off by setting "includeNested" to false in the POST body. For example:

{
	"indices": ["identities"],
	"query": {
		"query": "status:ACTIVE"
	},
	"sort": [ "id" ],
	"includeNested": false
}

Nested objects greatly “bulk up” the search results and drop performance. So, while the maximum value for the limit HTTP query parameter is 10,000, that’s only really usable without returning the nested arrays. Empirically, an experiment I ran showed that searches that just specify an index, a query and a sort, which have"includeNested": true by default, have their best performance when limit is 125. (See last section: Addendum – Performance Experiments).

Let’s discuss each of those nested object properties in the order they appear in the results schema:

Accounts [ ]

These are summaries of the accounts owned by the Identity. They do not provide account attributes like the /accounts endpoint does. Rather, they give a rundown of the Account (Link) object and the entitlements found on it. Here’s an example of what an Active Directory account would look like as part of this array property:

[
	{
		"id": "075bd3f0036d41af92d9eb3c8195f6d8",
		"name": "wkaiyode",
		"accountId": "CN=William Edward Kaiyode,OU=Explosive Testers,OU=Users,DC=acme,DC=com",
		"source": {
			"id": "c12900d1168e4b17b9966d3a795515e1",
			"name": "AD - Acme.com",
			"type": "Active Directory - Direct"
		},
		"disabled": false,
		"locked": false,
		"privileged": false,
		"manuallyCorrelated": false,
		"passwordLastSet": "2024-12-09T07:27:55.107Z",
		"entitlementAttributes": {
			"memberOf": [
				"CN=Trauma Team,OU=Benefits,OU=Groups,DC=acme,DC=com",
				"CN=Jet-Powered Pogo Stick,OU=R&D,OU=Groups,DC=acme,DC=com",
				"CN=Giant Magnet,OU=Customer Beta Testers,OU=Groups,DC=acme,DC=com"
			]
		},
		"created": "2025-01-13T18:31:08.212Z"
	},
//...
]

Apps [ ]

These are summaries of the (Logical) Applications to which the Identity has access. Unlike the data returned by the /source-apps endpoint, the information returned is less about how the Application is wired up, and more about why this Identity gets access to the App. The nested "account" object inside each app in the apps array represents the Source account which grants logical access to the Application:

  • the account.id is the account (Link) object’s id, and can be used to get the Account Details at /accounts/:id
  • the account.accountId is actually the nativeIdentity of the account object.
[
	{
		"id": "22193",
		"name": "Acme R&D",
		"source": {
			"id": "c12900d1168e4b17b9966d3a795515e1",
			"name": "AD - Acme.com"
		},
		"account": {
			"id": "075bd3f0036d41af92d9eb3c8195f6d8",
			"accountId": "CN=William Edward Kaiyode,OU=Explosive Testers,OU=Users,DC=acme,DC=com"
		}
	},
//...
]

Access [ ]

Access Items comes in 3 types: Roles, Access Profiles, and Entitlements. The nested array contains summaries of those granted to the Identity, with certain features—some shared, some unique to a specific type:

  • Common features: id, name, displayname, and type (this value is in SPINAL_UPPERCASE)
  • Roles and Access Profiles: have a description, an owner (an Identity reference), and a revocable attribute
  • Access Profiles and Entitlements: have a source reference.
  • Roles: indicate whether they are disabled
  • Entitlements: show the attribute/value pair whose presence on an account in the relevant Source was used by ISC to determine that the Identity has that entitlement. Also indicate whether they are standalone or granted to this Identity as part of a Role or Access Profile.
[
	{
		"id": "243621ccb36c41d898f42ae38bca02dc",
		"name": "Birthright - Super Genius",
		"displayName": "Birthright - Super Genius",
		"type": "ROLE",
		"description": "Business Role for those who are Super Geniuses - those who embrace 'Have Brain, Will Travel'",
		"owner": {
			"id": "ac09f514d6524b2280556b1a2da5e140",
			"name": "admin007",
			"displayName": "Vip Sandhu"
		},
		"disabled": false,
		"revocable": false
	},
	{
		"id": "515d3266e02b4158a74a6e1d15f6d446",
		"name": "Acme - Research & Development - Jet-Powered Pogo Stick",
		"displayName": "Acme - Research & Development - Jet-Powered Pogo Stick",
		"type": "ACCESS_PROFILE",
		"description": "Application access Acme's R&D UI - allows user to edit Field notes on the Jet-Powered Pogo Stick",
		"source": {
			"id": "c12900d1168e4b17b9966d3a795515e1",
			"name": "AD - Acme.com"
		},
		"owner": {
			"id": "ac09f514d6524b2280556b1a2da5e140",
			"name": "admin007",
			"displayName": "Vip Sandhu"
		},
		"revocable": true
	},
	{
		"id": "eda0a94523df431f86b6b1a749e8a6c8",
		"name": "Jet-Powered Pogo Stick",
		"displayName": "Acme.com - R&D - Jet-Powered Pogo Stick",
		"type": "ENTITLEMENT",
		"source": {
			"id": "c12900d1168e4b17b9966d3a795515e1",
			"name": "AD - Acme.com"
		},
		"privileged": false,
		"attribute": "memberOf",
		"value": "CN=Jet-Powered Pogo Stick,OU=R&D,OU=Groups,DC=acme,DC=com",
		"standalone": false
	},
//...
]

GET individual identities

All the same information is available using Get Document By Id using the index “identities” in the URL:
GET https://<orgname>.api.identitynow.com/<version>/search/identities/:id

  • This searches for a single indexed object by id, so we don’t need to POST a search query body.
  • The GET operation always includes nested objects, which isn’t too bad of a performance hit for 1 object. However, given the overhead of making each network call, GET search should be used very sparingly. If multiple Identities need retrieval, it is far more performant to build up a list of ids and get them all in a combined search query.
  • Because you’re Getting a single object, the response body will not be an array.
  • Additionally, the default properties will not be present. (see section “Default properties in the Search result Schema” above)

“Count” properties

Search provides a lot of “count” properties. Each of them counts an array property on the Identity of the same name, where the count Property is <arrayProperty>Countand the corresponding property is called <arrayProperty>. Note that 3 of them count the nested object arrays that can optionally be included in the results: accounts, access, and apps—the count properties will be returned even if you don’t includeNested. Interestingly, setting includeNested to false changes the order of the properties, but not their presence or values (this mostly matters if you’re visually inspecting the results).

  • accountCount: # of Accounts
  • sourceCount: # of Sources on which those accounts are found
  • appCount: # of Applications to which the Identity has access
  • accessCount: # of Access items
  • entitlementCount: # of Access items that have type: ENTITLEMENT
  • roleCount: # of Access items that have type: ROLE
  • accessProfileCount: # of Access items that have type: ACCESS_PROFILE
  • ownsCount: # of types of objects found in the owns array property, where types are among {sources, accessProfiles, roles, governanceGroups, and apps}. If there are any objects of a given type, owns.<type> will be an array containing the name and id of each. (Author’s note: this may not be the intended behavior, since it counts types instead of items)
  • tagsCount: # of tags applied to the Identity, which are found in the tags array property.
  • visibleSegmentCount: # of Access Request Segments of which the Identity is a part, allowing that person to see certain access items in the Request Center that are not visible to the general population of Identities.

Shaping the Response Schema

One of the great powers of Search is the ability to retrieve a lot of results with a wide variety of data returned for each Identity in the response. The downside is that by transferring so much data, your requests can get bogged down. Which brings us to the next great power of Search: Queryfilters give you the ability to declare exactly what data you want returned for each Identity that matches your search query. This is analogous to the benefit of GraphQL. In a real-world test, the queryResultFilter below dropped the response time per Identity from 12.5 to 2.5 milliseconds when fetching 250 Identities, with even bigger gains for 10,000 Identities (dropping to 0.6 ms per Identity). The direct comparison only compared 250 Identities because including nested objects with a full response schema actually causes the response time to rise when asking for more than 125 Identities. (see Addendum – Performance Experiments)

{
	"indices": ["identities"],
	"query": {
		"query": "id:(001ad3e7d10e418a834de5fa7e9d0902 || 2c91808471516cc6017154cebe745bca || 2c91808471516cc6017154cec4e45bd2 || ffb6b887a4cc4475bf149c5717698143)"
	},
	"includeNested": true,
	"sort": [ "id" ],
	"queryResultFilter": {
		"includes": [ "displayName", "name", "email", "id", "manager", "apps" ],
		"excludes": [ "manager.displayName", "apps.source" ]
	}
}

If a term appears in both the lists, excludes wins—that property will not appear. The best use of this is to exclude sub-fields from nested objects in the results. For example, the “manager” property on an Identity is an object reference with 3 properties:

"manager": {
	"displayName": "Chuck Jones",
	"name": "100N3Y",
	"id": "733f61f8cff94281bbe928bdafab9546"
}

Let’s say your code didn’t care about the id of the manager’s Identity object because it was generating a report that would only be consumed by humans. So, all you need is their displayName and their employeeNumber, which you have wisely chosen to make their identity name. You could leave the manager’s object id out of the results.

{
//...
	"queryResultFilter": {
		"includes": [ "displayName", "name", "email", "id", "manager", "apps" ],
		"excludes": [ "manager.id", "apps.source" ]
	}
}

Note: the above doesn’t work as intended unless you also have includeNested set to true, because otherwise the nested apps property won’t be returned.

As a final note, sort order holds even if the properties on which you are sorting are not returned with the result. (see section “How to Sort and Fetch more than 10,000 Results” above for a review of sorting) In the queryResultFilter above, you aren’t returning any Identity attributes that aren’t promoted to top-level properties in the search results (like firstName and lastName are), but if we were to include the following sort property in the POST body, the results would still be returned in order by department, then by startDate within each department, even though the results don’t include those properties.
"sort": [ "attributes.department", "attributes.startDate"]

Performance Takeaways

  • The real advantage of the experimental /identities endpoint is the optimization of developer time, not execution time.
  • Grab results 10,000 at a time, not in pages of 250—only possible through /search, and by editing the default limit.
  • If you need the unabridged Identity objects and also need to includeNested, drop the limit to 125 for best server-side performance
  • Search allows you to simplify the response schema using a queryResultFilter, speeding up server-side processing and reducing the amount of data flying over the network to your client code. This is far more impactful on performance than trying to fine-tune the limit (see Addendum – Performance Experiments)

Disclaimer

The information here is current as of the time of publication. If you find any errors or things that have changed, let me know and I’ll update the post.

Addendum – Performance Experiments

Methods

To determine the optimal limit value (how big the page size should be), I ran two experiments: one where I shaped the results using a queryResultFilter, and one where I didn’t (full schema arm).
Other than that one difference, both experiments used the following POST body:

{
	"indices": ["identities"],
	"query": {
		"query": "attributes.cloudLifecycleState:active AND identityProfile.name:\"<Main_HR_Profile>\""
	},
	"includeNested": true,
	"queryResultFilter": {
		"includes": [ "displayName", "name", "email", "id", "manager", "apps" ],
		"excludes": [ "manager.displayName", "apps.source" ]
	}
}

To ensure that I got good statistics, I ran 10 trials at each limit value. To ensure that random effects that varied over time didn’t covary with the limit value, I ran through each limit value before coming back to a given value. The pseudocode for this approach is:

for trial in 1 to 10
	for limit in MIN_LIMIT to MAX_LIMIT
		POST search query and record timings

Because I very quickly found an optimal limit for the full schema arm, I only had to set MAX_LIMIT to 250, starting from 1. For the slim schema, I had to set MAX_LIMIT all the way up to 10000, but I ran the script in 3 runs: 1 to 250, 251 to 500, then 501 to 10000. I did this because I was looking for a local minimum, but found none up to a limit of 500, so I changed tactics and gathered data on all possible values.

Full Schema Results

I started by just grabbing the mean and standard deviation of the response time in milliseconds (ms) across the 10 trials at each limit value, and noticed a roughly linear shape, but with a noticeable downward bend. The relatively linear standard deviation curve provides confidence that the pattern wasn’t affected by extreme outliers.

Looking at the response time per Identity showed a hyperbolic shape, indicating that some fixed cost was being amortized over the identities as the limit grew. It also showed a horizontal asymptote that was so low relative to the initial values that the downwards flexure of the first graph couldn’t be appreciated with the initial graph window.

Zooming in on the vertical axis shows the detail of the downward flexure, showing a clear minimum at a limit of 125, followed by an increasing time per Identity.
This kind of shape implies that there are two competing processes: A constant overhead that has less effect on the per-Identity time cost as the limit grows, and some costs that increase as the limit grows. I have created model curves for the two (320ms/n and 2.8ms + 0.035ms*n, respectively, where n is the limit). Summing those two produces a curve whose shape is very similar to the observed results. I did not concern myself with creating a perfect fit because demonstrating that a fitting curve could be the sum of a hyperbolic and linear function was my goal.

Slim Schema Results

I once again began by just grabbing the mean and standard deviation across the 10 trials at each limit value. This time, however, I noticed a more logarithmic shape. Indeed, the logarithmic best-fit curve produced very good results.

Looking at the per-Identity response times, a clear hyperbolic shape was once again noted, but this time with a much lower asymptote.

Zooming in on the vertical axis and plotting the best-fit curve (“normalized” by dividing by the limit x) showed exactly how good the logarithmic fit was for the asymptote. This curve is monotonically decreasing (always has a negative slope), meaning that the performance increases as the limit value increases. However, this per-Identity graph revealed some deviation for limits under 2000 that was worth investigating, specifically to rule out a local minimum.

This data for limits < 2000 was a lot noisier. Given that I ran this arm of the experiment in 3 runs (1-250, 251-500, 501-10000) on my laptop, that explains the discontinuity of the data points in the limit 251-500 range: some external factors like open applications, WiFi strength, etc. must have pushed those response times up, because the first and third chunks seem to continue the same curve. The other issue was the existence of sharp upward deflections around the limit values of 300 and 750. I co-plotted the Coefficient of Variance (Standard Deviation divided by Average), which showed that those deflections are explainable by long outlier response times, so the “local minima” on either side can be safely ignored. Finally, to determine local minima in noisy data, it’s best to analyze a smoothed curve. By smoothing over a running average of +/-50 limit values, we can see the only local minimum is near 750, which we’ve already explained as an artifact of outliers. This means the whole graph is effectively a monotonically decreasing function.

Conclusions

So, I conclude that if we use a reasonable queryResultFilter to shape the response schema, then Search gets increasingly efficient with page limits up to the maximum limit of 10,000 Identities.
When comparing the two arms, we can see a dramatic improvement in per-Identity response times. When the limit is 250, the full schema costs 12 ms per Identity, and the slim schema only 2.5 ms. Even the “optimum” limit of 125 with the full schema only lowers that to 10 ms per Identity. Compare that to the optimal value of maxing out the limit to 10,000 where we see the per-Identity time drop to an amazing 0.6ms.

References

Building a Search Query

Searchable Fields

Frequently Asked Questions and Sample Data Models

10 Likes