Generating Reports with Identity Security Cloud Data and Pandas

Problem

In our environment we use SAML as our SSO, so the samAccountName needs to be the same across the systems to allow for users to seamlessly log in. After going live with Identity Security Cloud, our team discovered that users’ source account IDs were becoming misaligned. ISC would generate a new samAccountName for users. Some users would be provisioned a new user ID if that ID already existed on the given source. This would cause the user’s samAccountName that came from Workday to be different from Active Directory and ServiceNow. Because the samAccountName was misaligned across systems, users would not be able to log in to Workday, Active Directory, or ServiceNow.

Example of the issue:

Workday Identity Security Cloud ServiceNow Active Directory Correct?
userA userA userA userA true
userB userB userB1 userB false
userC userC1 userC01 userC2 false
userE userE userE userE01 false

This report was created to give us the ability to be proactive in correcting these users’ IDs prior to their start date. Once identified, an ISC administrator works with HR and ServiceNow to fix the misaligned IDs. This has reduced the number of tickets that come in to each team when new users start. If the user logs in, it makes fixing this access take longer because the teams must merge records. This report has reduced the time to process a user to minutes from hours. While this report does take hours to generate, the amount of time a human needs to interact with these accounts was been greatly reduced saving us labor hours.

Business Requirements

We wanted a report that shows the user ID for Service Now, Workday, NERM, and Active Directory. We would like to know anytime that these IDs do not match across all these systems. This needs to be a nightly job that runs and creates a report that the team can work prior to the user’s start date. The end goal is to improve the onboarding experience by ensuring that all applications work as intended.

WARNING:

This process is extremely resource intensive. While you will be able to repeat the steps laid out below, please keep in mind that this will take quite a bit of RAM to run this report. The more sources you need to compare, the larger the memory pool will need to be. The below steps have been tested and work for around 818,629 objects. It takes this process 2 hours 40 minutes to run. Before cleaning up Workday data this process took over 4 hours to run and pushed the count of objects to over 1,000,000 objects. Please be careful when using this method. This is running on a workstation with 4 CPU cores /threads and 16GB of RAM.

Solution

For this project we needed to be able to pull all the samAccountNames across the connected systems. So when asked to design this report I looked to ISC REST APIs to pull all the accounts tied to all the users across our HR sources. After pulling a full listing of accounts across the sources, I decided to leverage the Pandas open-source Python library to build this report. The reason I picked Pandas over another solution is its ability to work with such a large dataset easily and quickly. Once all the data has been gathered, we do joins and merges on the data to start building out the report. The end result is a report that is easy for the admins to understand and quickly see user IDs that are misaligned. This allows for admins to focus only on users who have mismatched IDs instead of the full population.

Python Configuration


Application Name Version Link
Python 3.9.13 Download Python | Python.org

Required Packages


Package Name Version
pandas 2.2.2
jupyter 1.0.0
jupyter-console 6.6.3
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_client 8.6.1
jupyter_core 5.7.2
jupyter_server 2.14.0
jupyter_server_terminals 0.5.3
jupyterlab 4.1.6
jupyterlab_pygments 0.3.0
jupyterlab_server 2.26.0
jupyterlab_widgets 3.0.10
pip 24.0
requests 2.31.0
numpy 1.26.4
notebook 7.1.3
notebook_shim 0.2.4

Creating Virtual Python Environment

Windows

  1. Install Python
  2. Open PowerShell
  3. Run python --version to confirm Python is installed correctly
  4. Run pip --version to confirm pip is installed correctly
  5. Run python -m venv "{FILEPATH}" to where you want to run your virtual Python instance
  6. Run cd {FILEPATH}\scripts\
  7. Run .\activate
  8. You should now see that you are in your virtual Python environment

{442442B0-E54E-43DB-8BB6-77EFAD4D3B08}

Installing Packages

We will be using pip to install all the required packages to get this project working. You might have to use pip3 instead of pip. That will be determined based on your p\Python install.

  1. pip install --upgrade pip
  2. pip install pandas
  3. pip install jupyter

or

  1. pip3 install --upgrade pip
  2. pip3 install pandas
  3. pip3 install jupyter

Setting up Jupyter Notebook

We will first show how the code is developed with Jupyter Notebook.

Windows

  1. Open PowerShell
  2. Run cd {FILEPATH} (FILEPATH is where your virtual Python environment is located)
  3. Run .\scripts\activate
  4. Run jupyter notebook
  5. This will open a browser window in the root path
  6. Create a new folder to store your prject
  7. In the new folder create a new .ipynb file by clicking file new notebook
  8. A new file will open

Importing Packages

  1. In the first box of the new file we are going to list all our import statements
    image
  2. import pandas as pd
  3. import requests
  4. import json
  5. import os
  6. from datetime import datetime

Running Jupyter Notebook

  1. Click the add a box icon in the upper right hand corner
    {5E303B8A-25FA-43A3-A2DF-CEE5C9A78AAD}
  2. In the new text box we will start adding code that we want to run
  3. First we will create a function to generate a bearer token
def getBearerToken(clientId, clientSecret, baseUrl):
    token = requests.post(f"{baseUrl}/oauth/token?grant_type=client_credentials&client_id={clientId}&client_secret={clientSecret}")
    return token
  1. Next we will set variables to pass into our function. This should be environment variables if you plan on running this from a server. For running on local machine we will hard code these variables for now. Hard code credentials at you own risk. Always follow coding best practices.
baseUrl = f"{SAILPOINTURL}"
clientId = f"{CLIENTID}"
clientSec = f"{CLIENTSECRET}"
  1. Now we will create our bearer token
token = getBearerToken(clientId, clientSec, baseUrl)
  1. Now we extract the json response from the getBearerToken function
jsontoken = token.json()
bearerToken = jsontoken['access_token']
  1. Now we set up an empty dictionary for payload and another dictionary to hold our headers
payload = {}
headers = {
    'Accept': 'application/json',
  'Authorization': 'Bearer ' + bearerToken
}
  1. Add a new coding block
    {5E303B8A-25FA-43A3-A2DF-CEE5C9A78AAD}
  2. Now we need to pull the first round of accounts for our report. We can turn this into a function that is called as this bit of code is repeated.
offset = 0
apiUrl = baseUrl + "/v3/accounts?offset=" + str(offset) + f"&limit=250&count=true&filters=sourceId eq \"{SYSID}\""

#Make first api call to grab accounts
workdayJsonData = requests.request("GET", apiUrl, headers=headers, data=payload)

#convert to Json object
workdayResponseJsonData = workdayJsonData.json()

#get the full account number of records
numberOfRecords = int(workdayJsonData.headers['X-Total-Count'])

#build dict to store all the API responses
workdayAccounts = []
#add first api data call to dict
workdayAccounts.extend(workdayResponseJsonData)

#loop though all the pages to collect all the user accounts reguardless of user's status
while offset < numberOfRecords:
    offset += 250
    apiUrl = baseUrl + "/v3/accounts?offset=" + str(offset) + f"&limit=250&count=true&filters=sourceId eq \"{SYSID}\""
    response = requests.request("GET", apiUrl, headers=headers, data=payload)
    workdayAccounts.extend(response.json())    

#Time to run 2392
time.sleep(2)

Function code

def get_sailpoint_accounts(base_url, sys_id, headers, payload):
    #set variables to be called later
    offset = 0
    apiUrl = f"{baseUrl}/v3/accounts?offset=" + str(offset) + f"&limit=250&count=true&filters=sourceId eq \"{sys_id}\""
    
    #Make first api call to grab accounts
    JsonData = requests.request("GET", apiUrl, headers=headers, data=payload)
    
    #convert to Json object
    ResponseJsonData = JsonData.json()
    
    #get the full account number of records
    numberOfRecords = int(JsonData.headers['X-Total-Count'])
    
    #build dict to store all the API responses
    Accounts = []
    #add first api data call to dict
    Accounts.extend(ResponseJsonData)
    
    #loop though all the pages to collect all the user accounts reguardless of user's status
    while offset < numberOfRecords:
        offset += 250
        apiUrl = f"{baseUrl}/v3/accounts?offset=" + str(offset) + f"&limit=250&count=true&filters=sourceId eq \"{sys_id}\""
        response = requests.request("GET", apiUrl, headers=headers, data=payload)
        Accounts.extend(response.json())    
    
    #Time to run 2392
    time.sleep(2)

Then it would be called like:

workday_accounts = get_sailpoint_accounts(base_url, sys_id, headers, payload)
service_now_accounts = get_sailpoint_accounts(base_url, sys_id, headers, payload)
nerm_accounts = get_sailpoint_accounts(base_url, sys_id, headers, payload)
  1. Now that we all of our accounts we can start building our dataframes
workdayDataFrame = pd.json_normalize(workdayAccounts)
adDataFrame = pd.json_normalize(adAccounts)
snowDataFrame = pd.json_normalize(snowAccounts)
nermDataFrame  = pd.json_normalize(nermAccounts)
  1. Now we need to reindex the data so that we can work with it
workdayReindexed = workdayDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.USERID','attributes.FILENUMBER'])
adReindexed = adDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.newSamAccountName'])
snowReindexed = snowDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.user_name'])
nermReindexed = nermDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.uid', 'attributes.sailpoint_username_ne_attribute'])
  1. Now we need to group the data together
workdayGrouped = workdayReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
adGrouped = adReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
snowGrouped = snowReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
nermdayGrouped = nermReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
  1. Now we need to merge our data together
baseReport = workdayGrouped.merge(adGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left').merge(snowGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left').merge(nermdayGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left')
  1. Now we rename the header in the columns to be more human readable
renamedHeadersBaseReprot = baseReport.rename(columns={'identityId': "SailPointUID", 'cloudLifecycleState': "CloudLifeCycleState", 'identity.name': "UserDisplayName", 'attributes.USERID': "WorkdaySamAccountName", 'attributes.FILENUMBER': "WorkdayEEID", 'attributes.uid': "SecZettaEEID", 'attributes.newSamAccountName': "ADSamAccountName", 'attributes.user_name': "ServiceNowSamAccountName", 'attributes.sailpoint_username_ne_attribute': "SecZettaSamAccountName"})
renamedHeadersBaseReprot[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]
  1. Now that we have all of our data in a workable format and grouped together, we can start building our final report.
collectionDataFrame = []
matchingCollectionDataFrame = []
for index, row in renamedHeadersBaseReprot.iterrows():
    if not str(row["WorkdaySamAccountName"]).lower() == str(row["ADSamAccountName"]).lower() == str(row["ServiceNowSamAccountName"]).lower():
        collectionDataFrame.append(row)
    elif  str(row["WorkdaySamAccountName"]).lower() == str(row["ADSamAccountName"]).lower() == str(row["ServiceNowSamAccountName"]).lower():
        matchingCollectionDataFrame.append(row)
collection = pd.DataFrame(collectionDataFrame)
matchingCollection = pd.DataFrame(matchingCollectionDataFrame)

finalFilter = collection.reset_index()
matchingFinalFilter = matchingCollection.reset_index()

finalReport = finalFilter[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]
matchingFinalReport = matchingFinalFilter[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]

datetimenow = datetime.now()
filename = "MisMatched_Final_Report_" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
matchingFileName = "Matching_Final_Report" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)

finalReport.reset_index().to_csv(os.environ['USERPROFILE'] + '\\downloads\\' + filename + '.csv', index=False)
matchingFinalReport.reset_index().to_csv(os.environ['USERPROFILE'] + '\\downloads\\' + matchingFileName + '.csv', index=False)
fullFilterData = []
for index, row in finalFilter.iterrows():
    if not row['CloudLifeCycleState'] == 'inactive' and not pd.isna(row['ADSamAccountName']):
        fullFilterData.append(row)
  1. Finally we export the report to a file
fullFilterDataFrame = pd.DataFrame(fullFilterData)
filternonmatchingFileName = "filternonmatching" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
fullFilterDataFrame.reset_index().to_csv(os.environ['USERPROFILE'] + '\\downloads\\' + filternonmatchingFileName + '.csv', index=False)

Python Code without Juypter

import functools
import requests
import pandas as pd
import json
import os
from datetime import datetime
import time

print("Process Staring at:  " + str(datetime.now()))
def getBearerToken(clientId, clientSecret, baseUrl):
    token = requests.post(baseUrl + "/oauth/token?grant_type=client_credentials&client_id=" + clientId + "&client_secret=" + clientSecret)
    return token

#Get the bearer token that will passed when making other API calls
baseUrl = "{REDACTED}"
clientId = "{REDACTED}"
clientSec = "{REDACTED}"

#Get the access token to use laster on.
token = getBearerToken(clientId, clientSec, baseUrl)

#Convert the repsonse object in a json objet
jsontoken = token.json()

#Put the access token into a variable for later use
bearerToken = jsontoken['access_token']

#Set payload and header varilbles to be passed to API calls. 
payload = {}
headers = {
    'Accept': 'application/json',
  'Authorization': 'Bearer ' + bearerToken
}

#Time to run 1 sec

print("Collecting Workday Data started at: " + str(datetime.now()))
#set variables to be called later
offset = 0
apiUrl = baseUrl + "/v3/accounts?offset=" + str(offset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""

#Make first api call to grab accounts
workdayJsonData = requests.request("GET", apiUrl, headers=headers, data=payload)

#convert to Json object
workdayResponseJsonData = workdayJsonData.json()

#get the full account number of records
numberOfRecords = int(workdayJsonData.headers['X-Total-Count'])

#build dict to store all the API responses
workdayAccounts = []
#add first api data call to dict
workdayAccounts.extend(workdayResponseJsonData)

#loop though all the pages to collect all the user accounts reguardless of user's status
while offset < numberOfRecords:
    offset += 250
    apiUrl = baseUrl + "/v3/accounts?offset=" + str(offset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""
    response = requests.request("GET", apiUrl, headers=headers, data=payload)
    workdayAccounts.extend(response.json())    

print("Collecting Workday Data ended at: " + str(datetime.now()))
#Time to run 2392
time.sleep(2)


print("Collecting Active Directory Data started at: " + str(datetime.now()))
#This section is to pull Active Directory Account in ISC
adOffset = 0
adApiUrl = baseUrl + "/v3/accounts?offset=" + str(adOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""

adJsonData = requests.request("GET", adApiUrl, headers=headers, data=payload)
adReponseData = adJsonData.json()

adNumberOfRecords = int(adJsonData.headers['X-Total-Count'])
adAccounts = []
adAccounts.extend(adReponseData)

while adOffset < adNumberOfRecords:
    adOffset += 250
    adApiUrl = baseUrl + "/v3/accounts?offset=" + str(adOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""
    response = requests.request("GET", adApiUrl, headers=headers, data=payload)
    adAccounts.extend(response.json())    
#Time to run 2286
print("Collecting Active Directory Data ended at: " + str(datetime.now()))
time.sleep(2)

#This section is to pull ServiceNow Account in ISC
print("Collecting ServiceNow Data started at: " + str(datetime.now()))
snowOffset = 0
snowApiUrl = baseUrl + "/v3/accounts?offset=" + str(snowOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\" and uncorrelated eq false"

snowJsonData = requests.request("GET", snowApiUrl, headers=headers, data=payload)
snowReponseData = snowJsonData.json()

snowNumberOfRecords = int(snowJsonData.headers['X-Total-Count'])
snowAccounts = []
snowAccounts.extend(snowReponseData)

while snowOffset < snowNumberOfRecords:
    snowOffset += 250
    snowApiUrl = baseUrl + "/v3/accounts?offset=" + str(snowOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\" and uncorrelated eq false"
    response = requests.request("GET", snowApiUrl, headers=headers, data=payload)
    snowAccounts.extend(response.json())      
#Time to run 7999
print("Collecting ServiceNow Data ended at: " + str(datetime.now()))
time.sleep(2)

#This section is to pull NERM Account in ISC
print("Collecting Non Employee Risk Management Data started at: " + str(datetime.now()))
nermOffset = 0
nermApiUrl = baseUrl + "/v3/accounts?offset=" + str(nermOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""

nermJsonData = requests.request("GET", nermApiUrl, headers=headers, data=payload)
nermReponseData = nermJsonData.json()

nermNumberOfRecords = int(nermJsonData.headers['X-Total-Count'])
nermAccounts = []
nermAccounts.extend(nermReponseData)

while nermOffset < nermNumberOfRecords:
    nermOffset += 250
    nermApiUrl = baseUrl + "/v3/accounts?offset=" + str(nermOffset) +"&limit=250&count=true&filters=sourceId eq \"{REDACTED}\""
    response = requests.request("GET", nermApiUrl, headers=headers, data=payload)
    nermAccounts.extend(response.json())    

print("Collecting Non Employee Risk Management Data ended at: " + str(datetime.now()))
time.sleep(2)

print("Parsing Data started at: " + str(datetime.now()))

workdayDataFrame = pd.json_normalize(workdayAccounts)
adDataFrame = pd.json_normalize(adAccounts)
snowDataFrame = pd.json_normalize(snowAccounts)
nermDataFrame  = pd.json_normalize(nermAccounts)

workdayReindexed = workdayDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.USERID','attributes.FILENUMBER'])
adReindexed = adDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.newSamAccountName'])
snowReindexed = snowDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.user_name'])
nermReindexed = nermDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.uid', 'attributes.sailpoint_username_ne_attribute'])

workdayGrouped = workdayReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
adGrouped = adReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
snowGrouped = snowReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
nermdayGrouped = nermReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()

#framesToMerge = [workdayGrouped, adGrouped, snowGrouped,nermdayGrouped]
#baseReprot = functools.reduce(lambda left,right: pd.merge(left,right,on=['identityId'],how='outer'), framesToMerge)
baseReport = workdayGrouped.merge(adGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left').merge(snowGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left').merge(nermdayGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left')

renamedHeadersBaseReprot = baseReport.rename(columns={'identityId': "SailPointUID", 'cloudLifecycleState': "CloudLifeCycleState", 'identity.name': "UserDisplayName", 'attributes.USERID': "WorkdaySamAccountName", 'attributes.FILENUMBER': "WorkdayEEID", 'attributes.uid': "SecZettaEEID", 'attributes.newSamAccountName': "ADSamAccountName", 'attributes.user_name': "ServiceNowSamAccountName", 'attributes.sailpoint_username_ne_attribute': "SecZettaSamAccountName"})
renamedHeadersBaseReprot[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]

collectionDataFrame = []
matchingCollectionDataFrame = []
for index, row in renamedHeadersBaseReprot.iterrows():
    if not str(row["WorkdaySamAccountName"]).lower() == str(row["ADSamAccountName"]).lower() == str(row["ServiceNowSamAccountName"]).lower():
        collectionDataFrame.append(row)
    elif  str(row["WorkdaySamAccountName"]).lower() == str(row["ADSamAccountName"]).lower() == str(row["ServiceNowSamAccountName"]).lower():
        matchingCollectionDataFrame.append(row)


collection = pd.DataFrame(collectionDataFrame)
matchingCollection = pd.DataFrame(matchingCollectionDataFrame)

finalFilter = collection.reset_index()
matchingFinalFilter = matchingCollection.reset_index()

finalReport = finalFilter[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]
matchingFinalReport = matchingFinalFilter[["SailPointUID", "CloudLifeCycleState", "UserDisplayName",  "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]

datetimenow = datetime.now()
filename = "MisMatched_Final_Report_" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
matchingFileName = "Matching_Final_Report" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)

finalReport.reset_index().to_csv('{REDACTED}' + filename + '.csv', index=False)
matchingFinalReport.reset_index().to_csv('{REDACTED} + matchingFileName + '.csv', index=False)

fullFilterData = []
for index, row in finalFilter.iterrows():
    if not row['CloudLifeCycleState'] == 'inactive' and not pd.isna(row['ADSamAccountName']):
        fullFilterData.append(row)

fullFilterDataFrame = pd.DataFrame(fullFilterData)
filternonmatchingFileName = "filternonmatching" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
fullFilterDataFrame.reset_index().to_csv('{REDACTED}' + filternonmatchingFileName + '.csv', index=False)

print("Parsing Data ended at: " + str(datetime.now()))
print("Process Ending at:  " + str(datetime.now()))

Example of Report Output

Matching Report

index SailPointUID CloudLifeCycleState UserDisplayName WorkdayEEID SecZettaEEID WorkdaySamAccountName ADSamAccountName ServiceNowSamAccountName SecZettaSamAccountName
0 {REDACTED} active Tate, Tiffany {REDACTED} ttateh ttateh ttateh
1 {REDACTED} active Irwin, Michelle {REDACTED} mirwin mirwin mirwin
2 {REDACTED} active Waggo, Conley {REDACTED} cwaggo cwaggo cwaggo
3 {REDACTED} active Vandivere, Kim {REDACTED} kvandi kvandi kvandi
4 {REDACTED} leave Kopleman, Sara {REDACTED} skople skople skople
5 {REDACTED} prehire Dawson, Chris {REDACTED} cdawso cdawso cdawso
6 {REDACTED} active Graeler, Burt {REDACTED} bgrael bgrael bgrael
7 {REDACTED} active Richard, Donald {REDACTED} dricha dricha dricha
8 {REDACTED} active Monjarez, Jose {REDACTED} jmonja jmonja jmonja
9 {REDACTED} active Epleson, Emily {REDACTED} epleso epleso epleso
10 {REDACTED} active Fischer, Marie {REDACTED} mfisch mfisch mfisch
11 {REDACTED} active Cleaver, Harry {REDACTED} hcleav hcleav hcleav

Misaligned Report

index SailPointUID CloudLifeCycleState UserDisplayName WorkdayEEID SecZettaEEID WorkdaySamAccountName ADSamAccountName ServiceNowSamAccountName SecZettaSamAccountName
0 {REDACTED} inactive Hunter, David {REDACTED} dhunte0 dhunte0
1 {REDACTED} inactive Orstein, Rachel {REDACTED} rorste0 rorste0
2 {REDACTED} leave Howell, Marty {REDACTED} mhowel
3 {REDACTED} leave Kluge, Maureen {REDACTED} mkluge
4 {REDACTED} leave Mayall, Melissa {REDACTED} mmayal
5 {REDACTED} inactive Mennemeier, Kristen {REDACTED} kmenne kmenne
6 {REDACTED} active Farmer, Melina {REDACTED} mfarme
7 {REDACTED} leave Heart, Kristy {REDACTED} kheart
8 {REDACTED} active Perkins, Sandy {REDACTED} {REDACTED} sperkr sperkr sperkr
9 {REDACTED} leave Bailey, Joan {REDACTED} jbaile jbaile
10 {REDACTED} leave Bernson, Michael {REDACTED} mberns0
11 {REDACTED} inactive Millee, Sarah {REDACTED} smillw0 smillw0

Conclusion

This project came with a number of challenges to overcome. This really pushed my way of thinking about working with large datasets and ways that I could better optimize the processing of large datasets. Thanks to this project it helped us identify things that we could do on the ISC side to improve processing times and allow for making better use of memory. Since this project was completed, the team has praised this report for how much time it has saved them. The other teams are grateful for not having so many tickets coming to them to fix these issues. I am sure this project has many areas where it can still be better optimized. This report was initially created as a stop gap that has become a critical process for my team. This report has been critical in showing that some changes we have made to our ISC instance are having the desired affect.