Generating Reports with Identity Security Cloud Data and Pandas

mpotti · February 25, 2025, 3:09pm

Problem

In our environment we use SAML as our SSO, so the samAccountName needs to be the same across the systems to allow for users to seamlessly log in. After going live with Identity Security Cloud, our team discovered that users’ source account IDs were becoming misaligned. ISC would generate a new samAccountName for users. Some users would be provisioned a new user ID if that ID already existed on the given source. This would cause the user’s samAccountName that came from Workday to be different from Active Directory and ServiceNow. Because the samAccountName was misaligned across systems, users would not be able to log in to Workday, Active Directory, or ServiceNow.

Example of the issue:

Workday	Identity Security Cloud	ServiceNow	Active Directory	Correct?
userA	userA	userA	userA	true
userB	userB	userB1	userB	false
userC	userC1	userC01	userC2	false
userE	userE	userE	userE01	false

This report was created to give us the ability to be proactive in correcting these users’ IDs prior to their start date. Once identified, an ISC administrator works with HR and ServiceNow to fix the misaligned IDs. This has reduced the number of tickets that come in to each team when new users start. If the user logs in, it makes fixing this access take longer because the teams must merge records. This report has reduced the time to process a user to minutes from hours. While this report does take hours to generate, the amount of time a human needs to interact with these accounts was been greatly reduced saving us labor hours.

Business Requirements

We wanted a report that shows the user ID for Service Now, Workday, NERM, and Active Directory. We would like to know anytime that these IDs do not match across all these systems. This needs to be a nightly job that runs and creates a report that the team can work prior to the user’s start date. The end goal is to improve the onboarding experience by ensuring that all applications work as intended.

WARNING:

This process is extremely resource intensive. While you will be able to repeat the steps laid out below, please keep in mind that this will take quite a bit of RAM to run this report. The more sources you need to compare, the larger the memory pool will need to be. The below steps have been tested and work for around 818,629 objects. It takes this process 2 hours 40 minutes to run. Before cleaning up Workday data this process took over 4 hours to run and pushed the count of objects to over 1,000,000 objects. Please be careful when using this method. This is running on a workstation with 4 CPU cores /threads and 16GB of RAM.

Solution

For this project we needed to be able to pull all the samAccountNames across the connected systems. So when asked to design this report I looked to ISC REST APIs to pull all the accounts tied to all the users across our HR sources. After pulling a full listing of accounts across the sources, I decided to leverage the Pandas open-source Python library to build this report. The reason I picked Pandas over another solution is its ability to work with such a large dataset easily and quickly. Once all the data has been gathered, we do joins and merges on the data to start building out the report. The end result is a report that is easy for the admins to understand and quickly see user IDs that are misaligned. This allows for admins to focus only on users who have mismatched IDs instead of the full population.

Python Configuration

Application Name	Version	Link
Python	3.9.13	Download Python \| Python.org

Required Packages

Package Name	Version
pandas	2.2.2
jupyter	1.0.0
jupyter-console	6.6.3
jupyter-events	0.10.0
jupyter-lsp	2.2.5
jupyter_client	8.6.1
jupyter_core	5.7.2
jupyter_server	2.14.0
jupyter_server_terminals	0.5.3
jupyterlab	4.1.6
jupyterlab_pygments	0.3.0
jupyterlab_server	2.26.0
jupyterlab_widgets	3.0.10
pip	24.0
requests	2.31.0
numpy	1.26.4
notebook	7.1.3
notebook_shim	0.2.4

Creating Virtual Python Environment

Windows

Install Python
Open PowerShell
Run python --version to confirm Python is installed correctly
Run pip --version to confirm pip is installed correctly
Run python -m venv "{FILEPATH}" to where you want to run your virtual Python instance
Run cd {FILEPATH}\scripts\
Run .\activate
You should now see that you are in your virtual Python environment

Installing Packages

We will be using pip to install all the required packages to get this project working. You might have to use pip3 instead of pip. That will be determined based on your p\Python install.

pip install --upgrade pip
pip install pandas
pip install jupyter

or

pip3 install --upgrade pip
pip3 install pandas
pip3 install jupyter

Setting up Jupyter Notebook

We will first show how the code is developed with Jupyter Notebook.

Windows

Open PowerShell
Run cd {FILEPATH} (FILEPATH is where your virtual Python environment is located)
Run .\scripts\activate
Run jupyter notebook
This will open a browser window in the root path
Create a new folder to store your prject
In the new folder create a new .ipynb file by clicking file new notebook

image624×315 64.3 KB
A new file will open

Importing Packages

In the first box of the new file we are going to list all our import statements
import pandas as pd
import requests
import json
import os
from datetime import datetime

Running Jupyter Notebook

Click the add a box icon in the upper right hand corner
In the new text box we will start adding code that we want to run
First we will create a function to generate a bearer token

def getBearerToken(clientId, clientSecret, baseUrl):
    token = requests.post(f"{baseUrl}/oauth/token?grant_type=client_credentials&client_id={clientId}&client_secret={clientSecret}")
    return token

Next we will set variables to pass into our function. This should be environment variables if you plan on running this from a server. For running on local machine we will hard code these variables for now. Hard code credentials at you own risk. Always follow coding best practices.

baseUrl = f"{SAILPOINTURL}"
clientId = f"{CLIENTID}"
clientSec = f"{CLIENTSECRET}"

Now we will create our bearer token

token = getBearerToken(clientId, clientSec, baseUrl)

Now we extract the json response from the getBearerToken function

jsontoken = token.json()
bearerToken = jsontoken['access_token']

Now we set up an empty dictionary for payload and another dictionary to hold our headers

payload = {}
headers = {
    'Accept': 'application/json',
  'Authorization': 'Bearer ' + bearerToken
}

Add a new coding block
Now we need to pull the first round of accounts for our report. We can turn this into a function that is called as this bit of code is repeated.

offset = 0
apiUrl = baseUrl + "/v3/accounts?offset=" + str(offset) + f"&limit=250&count=true&filters=sourceId eq \"{SYSID}\""

#Make first api call to grab accounts
workdayJsonData = requests.request("GET", apiUrl, headers=headers, data=payload)

#convert to Json object
workdayResponseJsonData = workdayJsonData.json()

#get the full account number of records
numberOfRecords = int(workdayJsonData.headers['X-Total-Count'])

#build dict to store all the API responses
workdayAccounts = []
#add first api data call to dict
workdayAccounts.extend(workdayResponseJsonData)

#loop though all the pages to collect all the user accounts reguardless of user's status
while offset < numberOfRecords:
    offset += 250
    apiUrl = baseUrl + "/v3/accounts?offset=" + str(offset) + f"&limit=250&count=true&filters=sourceId eq \"{SYSID}\""
    response = requests.request("GET", apiUrl, headers=headers, data=payload)
    workdayAccounts.extend(response.json())    

#Time to run 2392
time.sleep(2)

Function code

def get_sailpoint_accounts(base_url, sys_id, headers, payload):
    #set variables to be called later
    offset = 0
    apiUrl = f"{baseUrl}/v3/accounts?offset=" + str(offset) + f"&limit=250&count=true&filters=sourceId eq \"{sys_id}\""
    
    #Make first api call to grab accounts
    JsonData = requests.request("GET", apiUrl, headers=headers, data=payload)
    
    #convert to Json object
    ResponseJsonData = JsonData.json()
    
    #get the full account number of records
    numberOfRecords = int(JsonData.headers['X-Total-Count'])
    
    #build dict to store all the API responses
    Accounts = []
    #add first api data call to dict
    Accounts.extend(ResponseJsonData)
    
    #loop though all the pages to collect all the user accounts reguardless of user's status
    while offset < numberOfRecords:
        offset += 250
        apiUrl = f"{baseUrl}/v3/accounts?offset=" + str(offset) + f"&limit=250&count=true&filters=sourceId eq \"{sys_id}\""
        response = requests.request("GET", apiUrl, headers=headers, data=payload)
        Accounts.extend(response.json())    
    
    #Time to run 2392
    time.sleep(2)

Then it would be called like:

workday_accounts = get_sailpoint_accounts(base_url, sys_id, headers, payload)
service_now_accounts = get_sailpoint_accounts(base_url, sys_id, headers, payload)
nerm_accounts = get_sailpoint_accounts(base_url, sys_id, headers, payload)

Now that we all of our accounts we can start building our dataframes

workdayDataFrame = pd.json_normalize(workdayAccounts)
adDataFrame = pd.json_normalize(adAccounts)
snowDataFrame = pd.json_normalize(snowAccounts)
nermDataFrame  = pd.json_normalize(nermAccounts)

Now we need to reindex the data so that we can work with it

workdayReindexed = workdayDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.USERID','attributes.FILENUMBER'])
adReindexed = adDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.newSamAccountName'])
snowReindexed = snowDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.user_name'])
nermReindexed = nermDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.uid', 'attributes.sailpoint_username_ne_attribute'])

Now we need to group the data together

workdayGrouped = workdayReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
adGrouped = adReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
snowGrouped = snowReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
nermdayGrouped = nermReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()

Now we need to merge our data together

baseReport = workdayGrouped.merge(adGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left').merge(snowGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left').merge(nermdayGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left')

Now we rename the header in the columns to be more human readable

renamedHeadersBaseReprot = baseReport.rename(columns={'identityId': "SailPointUID", 'cloudLifecycleState': "CloudLifeCycleState", 'identity.name': "UserDisplayName", 'attributes.USERID': "WorkdaySamAccountName", 'attributes.FILENUMBER': "WorkdayEEID", 'attributes.uid': "SecZettaEEID", 'attributes.newSamAccountName': "ADSamAccountName", 'attributes.user_name': "ServiceNowSamAccountName", 'attributes.sailpoint_username_ne_attribute': "SecZettaSamAccountName"})
renamedHeadersBaseReprot[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]

Now that we have all of our data in a workable format and grouped together, we can start building our final report.

collectionDataFrame = []
matchingCollectionDataFrame = []
for index, row in renamedHeadersBaseReprot.iterrows():
    if not str(row["WorkdaySamAccountName"]).lower() == str(row["ADSamAccountName"]).lower() == str(row["ServiceNowSamAccountName"]).lower():
        collectionDataFrame.append(row)
    elif  str(row["WorkdaySamAccountName"]).lower() == str(row["ADSamAccountName"]).lower() == str(row["ServiceNowSamAccountName"]).lower():
        matchingCollectionDataFrame.append(row)
collection = pd.DataFrame(collectionDataFrame)
matchingCollection = pd.DataFrame(matchingCollectionDataFrame)

finalFilter = collection.reset_index()
matchingFinalFilter = matchingCollection.reset_index()

finalReport = finalFilter[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]
matchingFinalReport = matchingFinalFilter[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]

datetimenow = datetime.now()
filename = "MisMatched_Final_Report_" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
matchingFileName = "Matching_Final_Report" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)

finalReport.reset_index().to_csv(os.environ['USERPROFILE'] + '\\downloads\\' + filename + '.csv', index=False)
matchingFinalReport.reset_index().to_csv(os.environ['USERPROFILE'] + '\\downloads\\' + matchingFileName + '.csv', index=False)
fullFilterData = []
for index, row in finalFilter.iterrows():
    if not row['CloudLifeCycleState'] == 'inactive' and not pd.isna(row['ADSamAccountName']):
        fullFilterData.append(row)

Finally we export the report to a file

fullFilterDataFrame = pd.DataFrame(fullFilterData)
filternonmatchingFileName = "filternonmatching" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
fullFilterDataFrame.reset_index().to_csv(os.environ['USERPROFILE'] + '\\downloads\\' + filternonmatchingFileName + '.csv', index=False)

Python Code without Juypter

import functools
import requests
import pandas as pd
import json
import os
from datetime import datetime
import time

print("Process Staring at:  " + str(datetime.now()))
def getBearerToken(clientId, clientSecret, baseUrl):
    token = requests.post(baseUrl + "/oauth/token?grant_type=client_credentials&client_id=" + clientId + "&client_secret=" + clientSecret)
    return token

#Get the bearer token that will passed when making other API calls
baseUrl = "{REDACTED}"
clientId = "{REDACTED}"
clientSec = "{REDACTED}"

#Get the access token to use laster on.
token = getBearerToken(clientId, clientSec, baseUrl)

#Convert the repsonse object in a json objet
jsontoken = token.json()

#Put the access token into a variable for later use
bearerToken = jsontoken['access_token']

#Set payload and header varilbles to be passed to API calls. 
payload = {}
headers = {
    'Accept': 'application/json',
  'Authorization': 'Bearer ' + bearerToken
}

#Time to run 1 sec

print("Collecting Workday Data started at: " + str(datetime.now()))
#set variables to be called later
offset = 0
apiUrl = baseUrl + "/v3/accounts?offset=" + str(offset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""

#Make first api call to grab accounts
workdayJsonData = requests.request("GET", apiUrl, headers=headers, data=payload)

#convert to Json object
workdayResponseJsonData = workdayJsonData.json()

#get the full account number of records
numberOfRecords = int(workdayJsonData.headers['X-Total-Count'])

#build dict to store all the API responses
workdayAccounts = []
#add first api data call to dict
workdayAccounts.extend(workdayResponseJsonData)

#loop though all the pages to collect all the user accounts reguardless of user's status
while offset < numberOfRecords:
    offset += 250
    apiUrl = baseUrl + "/v3/accounts?offset=" + str(offset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""
    response = requests.request("GET", apiUrl, headers=headers, data=payload)
    workdayAccounts.extend(response.json())    

print("Collecting Workday Data ended at: " + str(datetime.now()))
#Time to run 2392
time.sleep(2)


print("Collecting Active Directory Data started at: " + str(datetime.now()))
#This section is to pull Active Directory Account in ISC
adOffset = 0
adApiUrl = baseUrl + "/v3/accounts?offset=" + str(adOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""

adJsonData = requests.request("GET", adApiUrl, headers=headers, data=payload)
adReponseData = adJsonData.json()

adNumberOfRecords = int(adJsonData.headers['X-Total-Count'])
adAccounts = []
adAccounts.extend(adReponseData)

while adOffset < adNumberOfRecords:
    adOffset += 250
    adApiUrl = baseUrl + "/v3/accounts?offset=" + str(adOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""
    response = requests.request("GET", adApiUrl, headers=headers, data=payload)
    adAccounts.extend(response.json())    
#Time to run 2286
print("Collecting Active Directory Data ended at: " + str(datetime.now()))
time.sleep(2)

#This section is to pull ServiceNow Account in ISC
print("Collecting ServiceNow Data started at: " + str(datetime.now()))
snowOffset = 0
snowApiUrl = baseUrl + "/v3/accounts?offset=" + str(snowOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\" and uncorrelated eq false"

snowJsonData = requests.request("GET", snowApiUrl, headers=headers, data=payload)
snowReponseData = snowJsonData.json()

snowNumberOfRecords = int(snowJsonData.headers['X-Total-Count'])
snowAccounts = []
snowAccounts.extend(snowReponseData)

while snowOffset < snowNumberOfRecords:
    snowOffset += 250
    snowApiUrl = baseUrl + "/v3/accounts?offset=" + str(snowOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\" and uncorrelated eq false"
    response = requests.request("GET", snowApiUrl, headers=headers, data=payload)
    snowAccounts.extend(response.json())      
#Time to run 7999
print("Collecting ServiceNow Data ended at: " + str(datetime.now()))
time.sleep(2)

#This section is to pull NERM Account in ISC
print("Collecting Non Employee Risk Management Data started at: " + str(datetime.now()))
nermOffset = 0
nermApiUrl = baseUrl + "/v3/accounts?offset=" + str(nermOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""

nermJsonData = requests.request("GET", nermApiUrl, headers=headers, data=payload)
nermReponseData = nermJsonData.json()

nermNumberOfRecords = int(nermJsonData.headers['X-Total-Count'])
nermAccounts = []
nermAccounts.extend(nermReponseData)

while nermOffset < nermNumberOfRecords:
    nermOffset += 250
    nermApiUrl = baseUrl + "/v3/accounts?offset=" + str(nermOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""
    response = requests.request("GET", nermApiUrl, headers=headers, data=payload)
    nermAccounts.extend(response.json())    

print("Collecting Non Employee Risk Management Data ended at: " + str(datetime.now()))
time.sleep(2)

print("Parsing Data started at: " + str(datetime.now()))

workdayDataFrame = pd.json_normalize(workdayAccounts)
adDataFrame = pd.json_normalize(adAccounts)
snowDataFrame = pd.json_normalize(snowAccounts)
nermDataFrame  = pd.json_normalize(nermAccounts)

workdayReindexed = workdayDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.USERID','attributes.FILENUMBER'])
adReindexed = adDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.newSamAccountName'])
snowReindexed = snowDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.user_name'])
nermReindexed = nermDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.uid', 'attributes.sailpoint_username_ne_attribute'])

workdayGrouped = workdayReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
adGrouped = adReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
snowGrouped = snowReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
nermdayGrouped = nermReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()

#framesToMerge = [workdayGrouped, adGrouped, snowGrouped,nermdayGrouped]
#baseReprot = functools.reduce(lambda left,right: pd.merge(left,right,on=['identityId'],how='outer'), framesToMerge)
baseReport = workdayGrouped.merge(adGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left').merge(snowGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left').merge(nermdayGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left')

renamedHeadersBaseReprot = baseReport.rename(columns={'identityId': "SailPointUID", 'cloudLifecycleState': "CloudLifeCycleState", 'identity.name': "UserDisplayName", 'attributes.USERID': "WorkdaySamAccountName", 'attributes.FILENUMBER': "WorkdayEEID", 'attributes.uid': "SecZettaEEID", 'attributes.newSamAccountName': "ADSamAccountName", 'attributes.user_name': "ServiceNowSamAccountName", 'attributes.sailpoint_username_ne_attribute': "SecZettaSamAccountName"})
renamedHeadersBaseReprot[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]

collectionDataFrame = []
matchingCollectionDataFrame = []
for index, row in renamedHeadersBaseReprot.iterrows():
    if not str(row["WorkdaySamAccountName"]).lower() == str(row["ADSamAccountName"]).lower() == str(row["ServiceNowSamAccountName"]).lower():
        collectionDataFrame.append(row)
    elif  str(row["WorkdaySamAccountName"]).lower() == str(row["ADSamAccountName"]).lower() == str(row["ServiceNowSamAccountName"]).lower():
        matchingCollectionDataFrame.append(row)


collection = pd.DataFrame(collectionDataFrame)
matchingCollection = pd.DataFrame(matchingCollectionDataFrame)

finalFilter = collection.reset_index()
matchingFinalFilter = matchingCollection.reset_index()

finalReport = finalFilter[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]
matchingFinalReport = matchingFinalFilter[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]

datetimenow = datetime.now()
filename = "MisMatched_Final_Report_" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
matchingFileName = "Matching_Final_Report" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)

finalReport.reset_index().to_csv('{REDACTED}' + filename + '.csv', index=False)
matchingFinalReport.reset_index().to_csv('{REDACTED} + matchingFileName + '.csv', index=False)

fullFilterData = []
for index, row in finalFilter.iterrows():
    if not row['CloudLifeCycleState'] == 'inactive' and not pd.isna(row['ADSamAccountName']):
        fullFilterData.append(row)

fullFilterDataFrame = pd.DataFrame(fullFilterData)
filternonmatchingFileName = "filternonmatching" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
fullFilterDataFrame.reset_index().to_csv('{REDACTED}' + filternonmatchingFileName + '.csv', index=False)

print("Parsing Data ended at: " + str(datetime.now()))
print("Process Ending at:  " + str(datetime.now()))

Example of Report Output

Matching Report

index	SailPointUID	CloudLifeCycleState	UserDisplayName	WorkdayEEID	WorkdaySamAccountName	ADSamAccountName	ServiceNowSamAccountName
0	{REDACTED}	active	Tate, Tiffany	{REDACTED}	ttateh	ttateh	ttateh
1	{REDACTED}	active	Irwin, Michelle	{REDACTED}	mirwin	mirwin	mirwin
2	{REDACTED}	active	Waggo, Conley	{REDACTED}	cwaggo	cwaggo	cwaggo
3	{REDACTED}	active	Vandivere, Kim	{REDACTED}	kvandi	kvandi	kvandi
4	{REDACTED}	leave	Kopleman, Sara	{REDACTED}	skople	skople	skople
5	{REDACTED}	prehire	Dawson, Chris	{REDACTED}	cdawso	cdawso	cdawso
6	{REDACTED}	active	Graeler, Burt	{REDACTED}	bgrael	bgrael	bgrael
7	{REDACTED}	active	Richard, Donald	{REDACTED}	dricha	dricha	dricha
8	{REDACTED}	active	Monjarez, Jose	{REDACTED}	jmonja	jmonja	jmonja
9	{REDACTED}	active	Epleson, Emily	{REDACTED}	epleso	epleso	epleso
10	{REDACTED}	active	Fischer, Marie	{REDACTED}	mfisch	mfisch	mfisch
11	{REDACTED}	active	Cleaver, Harry	{REDACTED}	hcleav	hcleav	hcleav

Misaligned Report

index	SailPointUID	CloudLifeCycleState	UserDisplayName	WorkdayEEID	SecZettaEEID	WorkdaySamAccountName	ADSamAccountName	ServiceNowSamAccountName	SecZettaSamAccountName
0	{REDACTED}	inactive	Hunter, David	{REDACTED}		dhunte0		dhunte0
1	{REDACTED}	inactive	Orstein, Rachel	{REDACTED}		rorste0		rorste0
2	{REDACTED}	leave	Howell, Marty	{REDACTED}		mhowel
3	{REDACTED}	leave	Kluge, Maureen	{REDACTED}		mkluge
4	{REDACTED}	leave	Mayall, Melissa	{REDACTED}		mmayal
5	{REDACTED}	inactive	Mennemeier, Kristen	{REDACTED}		kmenne		kmenne
6	{REDACTED}	active	Farmer, Melina	{REDACTED}		mfarme
7	{REDACTED}	leave	Heart, Kristy	{REDACTED}		kheart
8	{REDACTED}	active	Perkins, Sandy	{REDACTED}	{REDACTED}	sperkr		sperkr	sperkr
9	{REDACTED}	leave	Bailey, Joan	{REDACTED}		jbaile	jbaile
10	{REDACTED}	leave	Bernson, Michael	{REDACTED}		mberns0
11	{REDACTED}	inactive	Millee, Sarah	{REDACTED}		smillw0		smillw0

Conclusion

This project came with a number of challenges to overcome. This really pushed my way of thinking about working with large datasets and ways that I could better optimize the processing of large datasets. Thanks to this project it helped us identify things that we could do on the ISC side to improve processing times and allow for making better use of memory. Since this project was completed, the team has praised this report for how much time it has saved them. The other teams are grateful for not having so many tickets coming to them to fix these issues. I am sure this project has many areas where it can still be better optimized. This report was initially created as a stop gap that has become a critical process for my team. This report has been critical in showing that some changes we have made to our ISC instance are having the desired affect.