Load Balancing Provisioning Tier

The prior releases of CA Identity Manager / Identity Suite have a bottleneck with the provisioning tier.

The top tier of the solution stack, Identity Manager Environment (IME/J2EE Application), may communicate to multiple Provisioning Servers (IMPS), but this configuration only has value for fail-over high availability.

This default deployment means we will have a “many-to-one” challenge, multiple IMEs experiencing a bottleneck with provisioning communication to a single IMPS server.

If this IMPS server is busy, then transactions for one or more IMEs are paused or may timeout. Unfortunately, the IME (J2EE) error messages or delays are not clear that this is a provisioning bottleneck challenge. Clients may attempt to resolve this challenge by increasing the number of IME and IMPS servers but will still be impacted by the provisioning bottleneck.

Two (2) prior methods used to overcome this bottleneck challenge were:


a) Pseudo hostname(s) entries, on the J2EE servers, for the Provisioning Tier, then rotate the order pseudo hostname(s) on the local J2EE host file to have their IP addresses access other IMPS. This methodology would give us a 1:1 configuration where one (1) IME is now locked to one (1) IMPS (by the pseudo hostname/IP address). This method is not perfect but ensures that all IMPS servers will be utilized if the number of IMPS servers equals IME (J2EE) servers. Noteworthy, this method is used by the CA identity Suite virtual appliance, where the pseudo hostname(s) are ca-prov-srv-01, ca-prov-srv-02, ca-prov-03, etc. (see image above)

<Connection
  host="ca-prov-srv-primary" port="20390"
  failover="ca-prov-srv-01:20390,ca-prov-srv-02:20390,ca-prov-srv-03:20390,ca-prov-srv-04:20390“
/>

b) A Router placed in-front of the IMPS (TCP 20389/20390), that contains “stickiness” to ensure that when round-robin model is used, that the same IMPS server is used for the IME that submitted a transaction, to avoid any concerns/challenges of possible”RACE” conditions, where a modify operations may occur before the create operation.


The “RACE” challenge is a concern of both of the methods above, but this risk is low, and can be managed with additional business rules that include pre-conditional checks, e.g., does the account exist before any modifications.

Ref: RACE https://en.wikipedia.org/wiki/Race_condition

Example of one type of RACE condition that may be seen.

Ref: PX Rule Engine: https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-identity-and-access-management/identity-manager/14-3/Release-Notes/Cumulative-Patches/Latest-Cumulative-Patch-14_3-CP2.html

New CP2 Loading Balance Feature – No more bottleneck.

Identity Manager can now use round-robin load balancing support, without any restrictions on either type of provisioning operations or existing runtime limitations. This load balancing approach distributes client requests across a group of Provisioning servers.

https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-identity-and-access-management/identity-manager/14-3/Release-Notes/release-features-and-enhancement/Identity-Manager-14_3-CP2.html#concept.dita_b51ab03e-6e77-49be-8235-e50ee477247a_LoadBalancing

This feature is managed in the IME tier, and will also address any RACE conditions/concerns.


No configuration changes are required on the IMPS tier. After updates of CP2, we can now use the IME Management console to export the directory.xml for the IMPS servers and update the XML tag for <Connection. This configuration may also be deployed to the Virtual Appliances.

<Connection   
  host="ca-prov-srv-primary" port="20390”   
  loadbalance="ca-prov-srv-02:20390,ca-prov-srv-03:20390,ca-prov-srv-04:20390“   
  failover="ca-prov-srv-01:20390,ca-prov-srv-02:20390,ca-prov-srv-03:20390,ca-prov-srv-04:20390“ 
/>

View of CP2 to download.

https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-identity-and-access-management/identity-manager/14-3/Release-Notes/Cumulative-Patches/Latest-Cumulative-Patch-14_3-CP2.html

Before applying this patch, we recommend collecting your metrics for feed operations that include multiple create operations, and modify operations to minimal of 1000 IDS, Monitor the IMPS etatrans logs as well; and the JCS/CCS logs. After the patch, run the same feed operations to determine the value of provisioning load-balance feature; and any provisioning delays that have been addressed. You may wish to increase the # of JCS/CCS servers (MS Windows) to speed up provisioning to Active Directory and other endpoints.

Be safe and automate your backups for CA Directory Data DSAs to LDIF

The CA Directory solution provides a mechanism to automate daily on-line backups, via one simple parameter:

dump dxgrid-db period 0 86400;

Where the first number is the offset from GMT/UTC (in seconds) and the second number is how often to run the backup (in seconds), e.g. Once a day = 86400 sec = 24 hr x 60 min/hr x 60 sec/min

Two Gaps/Challenge(s):

History: The automated backup process will overwrite the existing offline file(s) (*.zdb) for the Data DSA. Any requirement or need to perform a RCA is lost due to this fact. What was the data like 10 days ago? With the current state process, only the CA Directory or IM logs would be of assistance.

Size: The automated backup will create an offline file (*.zdb) footprint of the same size as the data (*.db) file. If your Data DSA (*.db) is 10 GB, then your offline (*.zdb) will be 10 GB. The Identity Provisioning User store has four (4) Data DSAs, that would multiple this number , e.g. four (4) db files + four (4) offline zdb files at 10 GB each, will require minimal of 80 GB disk space free. If we attempt to retain a history of these files for fourteen (14) days, this would be four (4) db + fourteen (14) zdb = eighteen (18) x 10 GB = 180 GB disk space required.

Resolutions:

Leverage the CA Directory tool (dxdumpdb) to convert from the binary data (*.db/*.zdb) to LDIF and the OS crontab for the ‘dsa’ account to automate a post ‘online backup’ export and conversion process.

Step 1: Validate the ‘dsa’ user ID has access to crontab (to avoid using root for this effort). cat /etc/cron.allow

If access is missing, append the ‘dsa’ user ID to this file.

Step 2: Validate that online backup process have been scheduled for your Data DSA. Use a find command to identify the offline files (*.zdb ). Note the size of the offline Data DSA files (*.zdb).

Step 3: Identify the online backup process start time, as defined in the Data DSA settings DXC file or perhaps DXI file. Convert this GMT offset time to the local time on the CA Directory server. (See references to assist)

Step 4: Use crontab -e as ‘dsa’ user ID, to create a new entry: (may use crontab -l to view any entries). Use the dxdumpdb -z switch with the DSA_NAME to create the exported LDIF file. Redirect this output to gzip to automatically bypass any need for temporary files. Note: Crontab has limited variable expansion, and any % characters must be escaped.

Example of the crontab for ‘dsa’ to run 30 minutes after (at 2 am CST) the online backup process is scheduled (at 1:30 am CST).

# Goal:  Export and compress the daily DSA offline backup to ldif.gz at 2 AM every day
# - Ensure this crontab runs AFTER the daily automated backup (zdb) of the CA Directory Data DSAs
# - Review these two (2) tokens for DATA DSAs:  ($DXHOME/config/settings/impd.dxc  or ./impd_backup.dxc)
#   a)   Location:  set dxgrid-backup-location = "/opt/CA/Directory/dxserver/backup/";
#   b)   Online Backup Period:   dump dxgrid-db period 0 86400;
#
# Note1: The 'N' start time of the 'dump dxgrid-db period N M' is the offset in seconds from midnight of UTC
#   For 24 hr clock, 0130 (AM) CST calculate the following in UTC/GMT =>  0130 CST + 6 hours = 0730 UTC
#   Due to the six (6) hour difference between CST and UTC TZ:  7.5 * 3600 = 27000 seconds
# Example(s):
#   dump dxgrid-db period 19800 86400;   [Once a day at 2330 CST]
#   dump dxgrid-db period 27000 86400;   [Once a day at 0130 CST]
#
# Note2:  Alternatively, may force an online backup using this line:
#               dump dxgrid-db;
#        & issuing this command:  dxserver init all
#
#####################################################################
#        1      2         3       4       5        6
#       min     hr      d-o-m   month   d-o-w   command(s)
#####################################################################
#####
#####  Testing Backup Every Five (5) Minutes ####
#*/5 * * * *  . $HOME/.profile && dxdumpdb -z `dxserver status | grep "impd-main" | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-main" | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
#####
#####  Backup daily at 2 AM CST  -  30 minutes after the online backup at 1:30 AM CST #####
#####
0 2 * * *    . $HOME/.profile &&  dxdumpdb -z `dxserver status | grep "impd-main"   | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-main"   | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
0 2 * * *    . $HOME/.profile &&  dxdumpdb -z `dxserver status | grep "impd-co"     | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-co"     | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
0 2 * * *    . $HOME/.profile &&  dxdumpdb -z `dxserver status | grep "impd-inc"    | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-inc"    | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
0 2 * * *    . $HOME/.profile &&  dxdumpdb -z `dxserver status | grep "impd-notify" | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-notify" | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz

Example of the above lines that can be placed in a bash shell, instead of called directly via crontab. Note: Able to use variables and no need to escape the `date % characters `

# set DSA=main &&   dxdumpdb -z `dxserver status | grep "impd-$DSA" | awk '{print $1}'` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-$DSA" | awk '{print $1}'`_`/bin/date --utc +%Y%m%d%H%M%S.0Z`.ldif.gz
# set DSA=co &&     dxdumpdb -z `dxserver status | grep "impd-$DSA" | awk '{print $1}'` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-$DSA" | awk '{print $1}'`_`/bin/date --utc +%Y%m%d%H%M%S.0Z`.ldif.gz
# set DSA=inc &&    dxdumpdb -z `dxserver status | grep "impd-$DSA" | awk '{print $1}'` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-$DSA" | awk '{print $1}'`_`/bin/date --utc +%Y%m%d%H%M%S.0Z`.ldif.gz
# set DSA=notify && dxdumpdb -z `dxserver status | grep "impd-$DSA" | awk '{print $1}'` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-$DSA" | awk '{print $1}'`_`/bin/date --utc +%Y%m%d%H%M%S.0Z`.ldif.gz
#

Example of the output:

Monitor with tail -f /var/log/cron (or syslog depending on your OS version), when the crontab is executed for your ‘dsa’ account

View the output folder for the newly created gzip LDIF files. The files may be extracted back to LDIF format, via gzip -d file.ldif.gz. Compare these file sizes with the original (*.zdb) files of 2GB.

Recommendation(s):

Implement a similar process and retain this data for fourteen (14) days, to assist with any RCA or similar analysis that may be needed for historical data. Avoid copied the (*.db or *.zdb) files for backup, unless using this process to force a clean sync between peer MW Data DSAs.

The Data DSAs may be reloaded (dxloadb) from these LDIF snapshots; the LDIF files do not have the same file size impact as the binary db files; and as LDIF files, they may be quickly search for prior data using standard tools such as grep “text string” filename.ldif.

This process will assist in site preparation for a DAR (disaster and recovery) scenario. Protect your data.

References:

dxdumpdb

https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-identity-and-access-management/directory/14-1/administrating/tools-to-manage-ca-directory/dxtools/dxdumpdb-tool-export-data-from-a-datastore-to-an-ldif-file.html

dump dxgrid-db

https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-identity-and-access-management/directory/14-1/reference/commands-reference/dump-dxgrid-db-command-take-a-consistent-snapshot-copy-of-a-datastore.html

If you wish to learn more or need assistance, contact us.