Monitoring use-cases within solutions that use various logs can be onerous when there is “noise” or low-value events in the logs. For provisioning use-cases, we prefer to focus on the “CrUD” use-cases and actions.
The CA/Symantec Identity Manager solution has a mid-tier component, IM Provisioning Server, that captures quite a bit of information useful for monitoring for success/failure. The default Log Level of the primary log file, etatrans*.log, is log level = 7. This log level will capture all possible searches and information of activity within the Provisioning Server’s service and transactions to its connector tier.
We can reduce some of the “noise” of searches/information and focus on the “CRuD” actions of “add/mod/del” by reducing the log level to level = 3.
This help as well to reduce the impact to the disk spaces and roll-over of the etatrans*.log file during bulk tasks or feed tasks.
Challenges:
However, even with log level = 3, we still have some “noise” in the etatrans*.log.
Additional “pain points”, the etatrans*.log file is renamed upon every restart of the IMPS service and during rollover at a size of 1 MB.
Resolution:
To assist with “finding” the current file, and to remove the noise, we have created the following “function/alias” for the IMPS user ID.
Log into the IMPS service ID: sudo su – imps {Ensure you use the “dash” character to ensure the .profile is sourced when you switch IDs}
Edit the .profile file: vi .profile
The current file will only have one line, that sources the primary IMPS environmental information: . /etc/.profile_imps
4. Add the following body after the IMPS environmental profile line
function tap () {
cd $ETAHOME/logs
a=$(ls -rt $ETAHOME/logs | grep etatrans | tail -1)
pwd
echo "Tail current log file with exclusions: "$a
tail -F $a | grep -v -e ":LDAP" -e ":Config" -e "AUDITCONFIG" -e ":EtaServer" -e ":Bind " -e ":Unbind " -e ":Search "
}
export -f tap
This new “function/alias” will cd to the correct folder of logs, then tail the correct etatrans*.log file, and exclude the noise of non-CrUD activity. Using the new alias of “tap” on all provisioning servers, will allow us to isolate any challenges during use-case validation.
5. Exit out of the IMPS user ID account; then re-sudo back into this account, and test the “tap” alias.
6. While using the “tap” alias, exercise use-cases within the IM Provisioning Manager (GUI) and the IM User Console (browser); monitor the “Add/Mod/Deletes”. You will also be able to see the “Child” updates to endpoints and updates to the IMPS notification queue (IME Callback).
The prior releases of CA Identity Manager / Identity Suite have a bottleneck with the provisioning tier.
The top tier of the solution stack, Identity Manager Environment (IME/J2EE Application), may communicate to multiple Provisioning Servers (IMPS), but this configuration only has value for fail-over high availability.
This default deployment means we will have a “many-to-one” challenge, multiple IMEs experiencing a bottleneck with provisioning communication to a single IMPS server.
If this IMPS server is busy, then transactions for one or more IMEs are paused or may timeout. Unfortunately, the IME (J2EE) error messages or delays are not clear that this is a provisioning bottleneck challenge. Clients may attempt to resolve this challenge by increasing the number of IME and IMPS servers but will still be impacted by the provisioning bottleneck.
Two (2) prior methods used to overcome this bottleneck challenge were:
a) Pseudo hostname(s) entries, on the J2EE servers, for the Provisioning Tier, then rotate the order pseudo hostname(s) on the local J2EE host file to have their IP addresses access other IMPS. This methodology would give us a 1:1 configuration where one (1) IME is now locked to one (1) IMPS (by the pseudo hostname/IP address). This method is not perfect but ensures that all IMPS servers will be utilized if the number of IMPS servers equals IME (J2EE) servers. Noteworthy, this method is used by the CA identity Suite virtual appliance, where the pseudo hostname(s) are ca-prov-srv-01, ca-prov-srv-02, ca-prov-03, etc. (see image above)
b) A Router placed in-front of the IMPS (TCP 20389/20390), that contains “stickiness” to ensure that when round-robin model is used, that the same IMPS server is used for the IME that submitted a transaction, to avoid any concerns/challenges of possible”RACE” conditions, where a modify operations may occur before the create operation.
The “RACE” challenge is a concern of both of the methods above, but this risk is low, and can be managed with additional business rules that include pre-conditional checks, e.g., does the account exist before any modifications.
New CP2 Loading Balance Feature – No more bottleneck.
Identity Manager can now use round-robin load balancing support, without any restrictions on either type of provisioning operations or existing runtime limitations. This load balancing approach distributes client requests across a group of Provisioning servers.
This feature is managed in the IME tier, and will also address any RACE conditions/concerns.
No configuration changes are required on the IMPS tier. After updates of CP2, we can now use the IME Management console to export the directory.xml for the IMPS servers and update the XML tag for <Connection. This configuration may also be deployed to the Virtual Appliances.
Before applying this patch, we recommend collecting your metrics for feed operations that include multiple create operations, and modify operations to minimal of 1000 IDS, Monitor the IMPS etatrans logs as well; and the JCS/CCS logs. After the patch, run the same feed operations to determine the value of provisioning load-balance feature; and any provisioning delays that have been addressed. You may wish to increase the # of JCS/CCS servers (MS Windows) to speed up provisioning to Active Directory and other endpoints.
The CA Directory solution provides a mechanism to automate daily on-line backups, via one simple parameter:
dump dxgrid-db period 0 86400;
Where the first number is the offset from GMT/UTC (in seconds) and the second number is how often to run the backup (in seconds), e.g. Once a day = 86400 sec = 24 hr x 60 min/hr x 60 sec/min
Two Gaps/Challenge(s):
History: The automated backup process will overwrite the existing offline file(s) (*.zdb) for the Data DSA. Any requirement or need to perform a RCA is lost due to this fact. What was the data like 10 days ago? With the current state process, only the CA Directory or IM logs would be of assistance.
Size: The automated backup will create an offline file (*.zdb) footprint of the same size as the data (*.db) file. If your Data DSA (*.db) is 10 GB, then your offline (*.zdb) will be 10 GB. The Identity Provisioning User store has four (4) Data DSAs, that would multiple this number , e.g. four (4) db files + four (4) offline zdb files at 10 GB each, will require minimal of 80 GB disk space free. If we attempt to retain a history of these files for fourteen (14) days, this would be four (4) db + fourteen (14) zdb = eighteen (18) x 10 GB = 180 GB disk space required.
Resolutions:
Leverage the CA Directory tool (dxdumpdb) to convert from the binary data (*.db/*.zdb) to LDIF and the OS crontab for the ‘dsa’ account to automate a post ‘online backup’ export and conversion process.
Step 1: Validate the ‘dsa’ user ID has access to crontab (to avoid using root for this effort). cat /etc/cron.allow
If access is missing, append the ‘dsa’ user ID to this file.
Step 2: Validate that online backup process have been scheduled for your Data DSA. Use a find command to identify the offline files (*.zdb ). Note the size of the offline Data DSA files (*.zdb).
Step 3: Identify the online backup process start time, as defined in the Data DSA settings DXC file or perhaps DXI file. Convert this GMT offset time to the local time on the CA Directory server. (See references to assist)
Step 4: Use crontab -e as ‘dsa’ user ID, to create a new entry: (may use crontab -l to view any entries). Use the dxdumpdb -z switch with the DSA_NAME to create the exported LDIF file. Redirect this output to gzip to automatically bypass any need for temporary files. Note: Crontab has limited variable expansion, and any % characters must be escaped.
Example of the crontab for ‘dsa’ to run 30 minutes after (at 2 am CST) the online backup process is scheduled (at 1:30 am CST).
# Goal: Export and compress the daily DSA offline backup to ldif.gz at 2 AM every day
# - Ensure this crontab runs AFTER the daily automated backup (zdb) of the CA Directory Data DSAs
# - Review these two (2) tokens for DATA DSAs: ($DXHOME/config/settings/impd.dxc or ./impd_backup.dxc)
# a) Location: set dxgrid-backup-location = "/opt/CA/Directory/dxserver/backup/";
# b) Online Backup Period: dump dxgrid-db period 0 86400;
#
# Note1: The 'N' start time of the 'dump dxgrid-db period N M' is the offset in seconds from midnight of UTC
# For 24 hr clock, 0130 (AM) CST calculate the following in UTC/GMT => 0130 CST + 6 hours = 0730 UTC
# Due to the six (6) hour difference between CST and UTC TZ: 7.5 * 3600 = 27000 seconds
# Example(s):
# dump dxgrid-db period 19800 86400; [Once a day at 2330 CST]
# dump dxgrid-db period 27000 86400; [Once a day at 0130 CST]
#
# Note2: Alternatively, may force an online backup using this line:
# dump dxgrid-db;
# & issuing this command: dxserver init all
#
#####################################################################
# 1 2 3 4 5 6
# min hr d-o-m month d-o-w command(s)
#####################################################################
#####
##### Testing Backup Every Five (5) Minutes ####
#*/5 * * * * . $HOME/.profile && dxdumpdb -z `dxserver status | grep "impd-main" | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-main" | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
#####
##### Backup daily at 2 AM CST - 30 minutes after the online backup at 1:30 AM CST #####
#####
0 2 * * * . $HOME/.profile && dxdumpdb -z `dxserver status | grep "impd-main" | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-main" | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
0 2 * * * . $HOME/.profile && dxdumpdb -z `dxserver status | grep "impd-co" | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-co" | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
0 2 * * * . $HOME/.profile && dxdumpdb -z `dxserver status | grep "impd-inc" | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-inc" | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
0 2 * * * . $HOME/.profile && dxdumpdb -z `dxserver status | grep "impd-notify" | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-notify" | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
Example of the above lines that can be placed in a bash shell, instead of called directly via crontab. Note: Able to use variables and no need to escape the `date % characters `
Monitor with tail -f /var/log/cron (or syslog depending on your OS version), when the crontab is executed for your ‘dsa’ account
View the output folder for the newly created gzip LDIF files. The files may be extracted back to LDIF format, via gzip -d file.ldif.gz. Compare these file sizes with the original (*.zdb) files of 2GB.
Recommendation(s):
Implement a similar process and retain this data for fourteen (14) days, to assist with any RCA or similar analysis that may be needed for historical data. Avoid copied the (*.db or *.zdb) files for backup, unless using this process to force a clean sync between peer MW Data DSAs.
The Data DSAs may be reloaded (dxloadb) from these LDIF snapshots; the LDIF files do not have the same file size impact as the binary db files; and as LDIF files, they may be quickly search for prior data using standard tools such as grep “text string” filename.ldif.
This process will assist in site preparation for a DAR (disaster and recovery) scenario. Protect your data.