Problem
Need a way to compare user id’s across connected sources. Mismatched user id’s are being created across the downstream sources. The business ask was to create a report to show mismatched user id’s across the sources. They also asked that we remove the abuiltiy to all ISC to increment the user id.
Diagnosis
The user id was already in use in the the downstream system. The account was deemed as stale account that was never cleaned up. ISC would then +1 to the user id in order to create the account.
Solution
Remove ISC ability to increment the ID:
For this solution while it sounds straight forward was quite difficult due to having multiple HR sources. Workday and NERM are our 2 HR sources. For Workday we only needed to update the Identity Profile mapping to just point to the user ID that WD was sending us.
For NERM was a lot more involved process. We had to create a Attribute Generator to calculate the user ID. This works but is not perfect and when reviewing with the business this was deemed to be an acceptable risk for the time being. We also had to create an attribute that would house the created user id. The non employee profile was then updated to include this new attribute. Workflows were updated for on boarding a new non employee profile type to include creating the user id when no value was populated. (This allows us to enter an id for a user who may already exist and keep the id they are already using.) Finally we had to set the correct permission on the user id attribute.
On ISC side then we had to update the mapping for the NERM identity profile to map to the needly created attribute. We also had to add this value to the schema for when we pull over the user account information form NERM.
Create a report to show missed match user id’s
This report was very tricky to fully get working properly. Requirements for my solution experience with Python using the Pandas library. Please note all sensitive information has been removed
To start out for this report I needed to pull a list of all active Identities with an active Workday Accounts. Then I take the Identity Id from the Identities and pull each of the target sources. From here once I have collected all the data from the sources I then join the dataframes based on like information and select the required data from each data set. We renamed the values to be more report friendly. Then we combine the data based on the Identity ID. Then push the collected data to a few csv files. One is showing any mismated id’s, and another showing non mismatched id’s. The non mismatched id report is for validation purposes.
Then on a server we installed all the needed requirements to run this file. We then set this to run on a daily basis during off hours. I know this is not the best solution as it does require a lot of API calls.
misMatchReprot.py
import requests
import functools
import pandas as pd
import json
import os
from datetime import datetime
import time
print("Process Staring at: " + str(datetime.now()))
def getBearerToken(clientId, clientSecret, baseUrl):
token = requests.post(baseUrl + "/oauth/token?grant_type=client_credentials&client_id=" + clientId + "&client_secret=" + clientSecret)
return token
#Get the bearer token that will passed when making other api calls
baseUrl = "https://REMOVED.api.identitynow.com"
clientId = "REMOVED"
clientSec = "REMOVED"
#Get the access token to use laster on.
token = getBearerToken(clientId, clientSec, baseUrl)
#Convert the repsonse object in a json objet
jsontoken = token.json()
#Put the access token into a variable for later use
bearerToken = jsontoken['access_token']
#Set payload and header varilbles to be passed to api calls.
payload = {}
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer ' + bearerToken
}
#Time to run 1 sec
print("Collecting Workday Data started at: " + str(datetime.now()))
#set variables to be called later
offset = 0
apiUrl = baseUrl + "/v3/accounts?offset=" + str(offset) +"&limit=250&count=true&filters=sourceId eq \"REMOVED\""
#Make first api call to grab accounts
workdayJsonData = requests.request("GET", apiUrl, headers=headers, data=payload)
#convert to Json object
workdayResponseJsonData = workdayJsonData.json()
#get the full account number of records
numberOfRecords = int(workdayJsonData.headers['X-Total-Count'])
#build dict to store all the api responses
workdayAccounts = []
#add first api data call to dict
workdayAccounts.extend(workdayResponseJsonData)
#loop though all the pages to collect all the user accounts reguardless of user's status
while offset < numberOfRecords:
offset += 250
apiUrl = baseUrl + "/v3/accounts?offset=" + str(offset) +"&limit=250&count=true&filters=sourceId eq \"REMOVED\""
response = requests.request("GET", apiUrl, headers=headers, data=payload)
workdayAccounts.extend(response.json())
print("Collecting Workday Data ended at: " + str(datetime.now()))
#Time to run 2392
time.sleep(2)
print("Collecting Active Directory Data started at: " + str(datetime.now()))
#This section is to pull Active Directory Account in ISC
adOffset = 0
adApiUrl = baseUrl + "/v3/accounts?offset=" + str(adOffset) +"&limit=250&count=true&filters=sourceId eq \"REMOVED\""
adJsonData = requests.request("GET", adApiUrl, headers=headers, data=payload)
adReponseData = adJsonData.json()
adNumberOfRecords = int(adJsonData.headers['X-Total-Count'])
adAccounts = []
adAccounts.extend(adReponseData)
while adOffset < adNumberOfRecords:
adOffset += 250
adApiUrl = baseUrl + "/v3/accounts?offset=" + str(adOffset) +"&limit=250&count=true&filters=sourceId eq \"REMOVED\""
response = requests.request("GET", adApiUrl, headers=headers, data=payload)
adAccounts.extend(response.json())
#Time to run 2286
print("Collecting Active Directory Data ended at: " + str(datetime.now()))
time.sleep(2)
#This section is to pull ServiceNow Account in ISC
print("Collecting ServiceNow Data started at: " + str(datetime.now()))
snowOffset = 0
snowApiUrl = baseUrl + "/v3/accounts?offset=" + str(snowOffset) +"&limit=250&count=true&filters=sourceId eq \"REMOVED\" and uncorrelated eq false"
snowJsonData = requests.request("GET", snowApiUrl, headers=headers, data=payload)
snowReponseData = snowJsonData.json()
snowNumberOfRecords = int(snowJsonData.headers['X-Total-Count'])
snowAccounts = []
snowAccounts.extend(snowReponseData)
while snowOffset < snowNumberOfRecords:
snowOffset += 250
snowApiUrl = baseUrl + "/v3/accounts?offset=" + str(snowOffset) +"&limit=250&count=true&filters=sourceId eq \"REMOVED\" and uncorrelated eq false"
response = requests.request("GET", snowApiUrl, headers=headers, data=payload)
snowAccounts.extend(response.json())
#Time to run 7999
print("Collecting ServiceNow Data ended at: " + str(datetime.now()))
time.sleep(2)
#This section is to pull NERM Account in ISC
print("Collecting Non Employee Risk Management Data started at: " + str(datetime.now()))
nermOffset = 0
nermApiUrl = baseUrl + "/v3/accounts?offset=" + str(nermOffset) +"&limit=250&count=true&filters=sourceId eq \"REMOVED\""
nermJsonData = requests.request("GET", nermApiUrl, headers=headers, data=payload)
nermReponseData = nermJsonData.json()
nermNumberOfRecords = int(nermJsonData.headers['X-Total-Count'])
nermAccounts = []
nermAccounts.extend(nermReponseData)
while nermOffset < nermNumberOfRecords:
nermOffset += 250
nermApiUrl = baseUrl + "/v3/accounts?offset=" + str(nermOffset) +"&limit=250&count=true&filters=sourceId eq \"REMOVED\""
response = requests.request("GET", nermApiUrl, headers=headers, data=payload)
nermAccounts.extend(response.json())
print("Collecting Non Employee Risk Management Data ended at: " + str(datetime.now()))
time.sleep(2)
print("Parsing Data started at: " + str(datetime.now()))
workdayDataFrame = pd.json_normalize(workdayAccounts)
adDataFrame = pd.json_normalize(adAccounts)
snowDataFrame = pd.json_normalize(snowAccounts)
nermDataFrame = pd.json_normalize(nermAccounts)
workdayReindexed = workdayDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.USERID','attributes.FILENUMBER'])
adReindexed = adDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.newSamAccountName'])
snowReindexed = snowDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.user_name'])
nermReindexed = nermDataFrame.reindex(columns=['identityId', 'cloudLifecycleState', 'identity.name', 'attributes.uid', 'attributes.sailpoint_username_ne_attribute'])
workdayGrouped = workdayReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
adGrouped = adReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
snowGrouped = snowReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
nermdayGrouped = nermReindexed.groupby(['identityId', 'cloudLifecycleState', 'identity.name'], sort=False).sum().reset_index()
#framesToMerge = [workdayGrouped, adGrouped, snowGrouped,nermdayGrouped]
#baseReprot = functools.reduce(lambda left,right: pd.merge(left,right,on=['identityId'],how='outer'), framesToMerge)
baseReport = workdayGrouped.merge(adGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left').merge(snowGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left').merge(nermdayGrouped,on=['identityId','identity.name','cloudLifecycleState'],how='left')
renamedHeadersBaseReprot = baseReport.rename(columns={'identityId': "SailPointUID", 'cloudLifecycleState': "CloudLifeCycleState", 'identity.name': "UserDisplayName", 'attributes.USERID': "WorkdaySamAccountName", 'attributes.FILENUMBER': "WorkdayEEID", 'attributes.uid': "SecZettaEEID", 'attributes.newSamAccountName': "ADSamAccountName", 'attributes.user_name': "ServiceNowSamAccountName", 'attributes.sailpoint_username_ne_attribute': "SecZettaSamAccountName"})
renamedHeadersBaseReprot[["SailPointUID", "CloudLifeCycleState", "UserDisplayName", "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]
collectionDataFrame = []
matchingCollectionDataFrame = []
for index, row in renamedHeadersBaseReprot.iterrows():
if not str(row["WorkdaySamAccountName"]).lower() == str(row["ADSamAccountName"]).lower() == str(row["ServiceNowSamAccountName"]).lower():
collectionDataFrame.append(row)
elif str(row["WorkdaySamAccountName"]).lower() == str(row["ADSamAccountName"]).lower() == str(row["ServiceNowSamAccountName"]).lower():
matchingCollectionDataFrame.append(row)
collection = pd.DataFrame(collectionDataFrame)
matchingCollection = pd.DataFrame(matchingCollectionDataFrame)
finalFilter = collection.reset_index()
matchingFinalFilter = matchingCollection.reset_index()
finalReport = finalFilter[["SailPointUID", "CloudLifeCycleState", "UserDisplayName", "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]
matchingFinalReport = matchingFinalFilter[["SailPointUID", "CloudLifeCycleState", "UserDisplayName", "WorkdayEEID", "SecZettaEEID","WorkdaySamAccountName", "ADSamAccountName", "ServiceNowSamAccountName", "SecZettaSamAccountName"]]
datetimenow = datetime.now()
filename = "MisMatched_Final_Report_" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
matchingFileName = "Matching_Final_Report" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
finalReport.reset_index().to_csv('REMOVED + filename + '.csv', index=False)
matchingFinalReport.reset_index().to_csv('REMOVED + matchingFileName + '.csv', index=False)
fullFilterData = []
for index, row in finalFilter.iterrows():
if not row['CloudLifeCycleState'] == 'inactive' and not pd.isna(row['ADSamAccountName']):
fullFilterData.append(row)
fullFilterDataFrame = pd.DataFrame(fullFilterData)
filternonmatchingFileName = "filternonmatching" + str(datetimenow.year) + '-' + str(datetimenow.month) + '-' + str(datetimenow.day)
fullFilterDataFrame.reset_index().to_csv('REMOVED + filternonmatchingFileName + '.csv', index=False)
print("Parsing Data ended at: " + str(datetime.now()))
print("Process Ending at: " + str(datetime.now()))
Final thoughts
While this does work it is not an ideal solution for this problem. But with all the limitations we have based on our licensing this was the best I could come up with. I did try to build this out in Workflows but that was not possible at the time we built this script. This report has save our business hours of time by being able to quickly see where the issue is and address them before any tickets are created with user’s having issues. I hope this help out anyone who is also running into the same limitations and being able to collect the required data you are addressing.