Disaster Recovery Scenarios for Directories

Restore processes may be done with snapshots-in-time for both databases and directories. We wished to provide clarity of the restoration steps after a snapshot-in-time is utilized for a directory. The methodology outlined below has the following goals: a) allow sites to prepare before they need the restoration steps, b) provide a training module to exercise samples included in a vendor solution.

In this scenario, we focused on the CA/Broadcom/Symantec Directory solution. The CA Directory provides several tools to automate online backup snapshots, but these processes stop at copies of the binary data files.

Additionally, we desired to walk-through the provided DAR (Disaster and Recovery) scenarios and determine what needed to be updated to reflect newer features; and how we may validate that we did accomplish a full restoration.

Finally, to assist with the decision tree model, where we need to triage and determine if a full restore is required, or may we select partial restoration via extracts and imports of selected data.

Cluster Out-of-Sync Scenario


The first indicator that a userstore (CA Directory DATA DSA) is out-of-sync will be the CA Directory logs themselves, e.g. alarm or trace logs.

Another indication will be inconsistent query results for a user object that returns different results when using a front-end router to the DATA DSAs.

After awareness of the issue, the team will exercise a triage process to determine the extent of the out-of-sync data. For a quick check, one may execute LDAP queries direct to the TCP port of each DATA DSA on each host, and examine the results directory or even the total number of entries, e.g. dxTotalEntryCount.

The returned count value will help determine if the number of entries for each DATA DSA on the peer MW hosts are out-of-sync for ADD or DEL operations. The challenge/GAP with this method is it will not show any delta due to modify operations on the user objects themselves, e.g. address field changed.

Example of LDAP queries (dxsearch/ldapsearch) to CA Directory DATA DSA for the CA Identity Management solution (4 DATA DSA and 1 ROUTER DSA)

su - dsa    OR [ sudo -iu dsa ]
echo -n Password01 > .impd.pwd ; chmod 600 .impd.pwd

LDAPTLS_REQCERT=never  dxsearch -LLL -H ldaps://`hostname`:20404 -D 'eTDSAContainerName=DSAs,eTNamespaceName=CommonObjects,dc=etadb' -y .impd.pwd -s sub -b 'dc=notify,dc=etadb' '(objectClass=*)' dxTotalEntryCount
dn: dc=notify,dc=etadb

# INC BRANCH (TCP 20398)
LDAPTLS_REQCERT=never  dxsearch -LLL -H ldaps://`hostname`:20398 -D 'eTDSAContainerName=DSAs,eTNamespaceName=CommonObjects,dc=etadb' -y .impd.pwd -s sub -b 'eTInclusionContainerName=Inclusions,eTNamespaceName=CommonObjects,dc=im,dc=etadb' '(objectClass=*)' dxTotalEntryCount

# CO BRANCH (TCP 20396)
LDAPTLS_REQCERT=never  dxsearch -LLL -H ldaps://`hostname`:20396 -D 'eTDSAContainerName=DSAs,eTNamespaceName=CommonObjects,dc=etadb' -y .impd.pwd -s sub -b 'eTNamespaceName=CommonObjects,dc=im,dc=etadb' '(objectClass=*)' dxTotalEntryCount

LDAPTLS_REQCERT=never  dxsearch -LLL -H ldaps://`hostname`:20394 -D 'eTDSAContainerName=DSAs,eTNamespaceName=CommonObjects,dc=etadb' -y .impd.pwd -s sub -b 'dc=im,dc=etadb' '(objectClass=*)' dxTotalEntryCount

# ALL BRANCHES - Router Port (TCP 20391)
LDAPTLS_REQCERT=never  dxsearch -LLL -H ldaps://`hostname`:20391 -D 'eTDSAContainerName=DSAs,eTNamespaceName=CommonObjects,dc=etadb' -y .impd.pwd -s sub -b 'dc=etadb' '(objectClass=*)' dxTotalEntryCount

# Scroll to see entire line 

A better process to identify the delta(s) will be automating the daily backup process, to build out LDIF files for each peer MW DATA DSA and then performing a delta process between the LDIF files. We will walk through this more involve step later in this blog entry.

Recovery Processes

The below link has examples from CA/Broadcom/Symantec with recovery notes of CA Directory DATA DSA that are out-of-sync due to extended downtime or outage window.

The below image pulled from the document (page 9.) shows CA Directory r12.x using the latest recovery processes of “multiwrite-DISP” (MW-DISP) mode.

This recovery process of MW-DISP is default for the CA Identity Management DATA DSAs via the install wizard tools, when they create the IMPD DATA DSAs.


The above document is dated, and still mentions additional file structures that have been retired, e.g. oc/zoc, at,zat.

An enhancement request has been submitted for both of these requests:


The modified version we have started for CA Directory r14.x adds some clarity to the <dsaname>.dx files; and which steps may be adjusted to support the split data structure for the four (4) IMPD DATA DSAs.

The same time flow diagram was used. Extra notes were added for clarity, and if possible, examples of commands that will be used to assist with direct automation of each step (or maybe pasted in an SSH session window, as the dsa service ID).

Step 1, implicit in the identification/triage process, is to determine what userstore data is out-of-sync and how large a delta do we have. If the DSA service has been shut down (either deliberately or via a startup issue), if the shutdown delay is more than a few days, then the CA Directory process will check the date stamp in the <dsaname>.dp file and the transaction in the <dsaname>.tx file; if the dates are too large CA Directory will refuse to start the DATA DSA and issue a warning message.

Step 2, we will leverage the dxdisp <dsaname> command to generate a new time-stamp file <dsaname>.dx, that will be used to prevent unnecessary sync operations with any data older than the date stamp in this file. 

This command should be issued for every DATA DSA on the same host—Especially true for split DATA DSAs, e.g. IMPD (CA Identity Manager’s Provisioning Directories). In our example below, to assist with this step, we use a combination of commands with a while-loop to issue the dxdisp command.

This command can be executed regardless if the DSA is running or shutdown. If an existing <dsaname>.dx file exists, any additional execution of dxdisp will add updated time-stamps to this file.  

Note: The <dsaname>.dx file will be removed upon restart of the DATA DSA.

STEP 2: ISSUE DXDISP COMMAND [ Create time-stamp file for re-sync use ] ON ALL IMPD SERVERS.

su - dsa OR [ sudo -iu dsa ]
dxserver status | grep -v router | awk '{print $1}' | while IFS='' read -r LINE || [ -n "$LINE" ] ; do dxdisp "$LINE" ;done ; echo ; find $DXHOME -name "*.dx" -exec ls -larth {} \;

# Scroll to see entire line 

Step 3 will then ask for an updated online backup to be executed. 

In earlier release of CA Directory, this required a telnet/ssh connection to the dxconsole of each DATA DSA. Or using the DSA configuration files to contain a dump dxgrid-db; command that would be executed with dxserver init all command. 

In newer releases of CA Directory, we can leverage the dxserver onlinebackup <dsaname> process. 

This step can be a challenge to dump all DATA DSAs at the same time, using manual procedures. 

Fortunately, we can automate this with a single bash shell process; and as an enhancement, we can also generate the LDIF extracts of each DATA DSA for later delta compare operations.

Note: The DATA DSA must be running (started) for the onlinebackup process to function correctly. If unsure, issue a dxserver status or dxserver start all prior. 

Retain the LDIF files from the “BAD” DATA DSA Servers for analysis.

su - dsa OR [ sudo -iu dsa ]

dxserver status | grep started | grep -v router | awk '{print $1}' | while IFS='' read -r LINE || [ -n "$LINE" ] ; do dxserver onlinebackup "$LINE" ; sleep 10; dxdumpdb -w -z -f /tmp/`date '+%Y%m%d_%H%M%S_%s'`_$LINE.ldif $LINE ;done ; echo ; find $DXHOME -name "*.zdb" -exec ls -larth {} \; ; echo ; ls -larth --time-style=full-iso /tmp/*.ldif | grep  `date '+%Y-%m-%d'`

# Scroll to see entire line 

Step 4a Walks through the possible copy operations from “GOOD” to the “BAD” DATA DSA host, for the <dsaname>.zdb files. The IMPD DATA DSA will require that three (3) of four (4) zdb files are copied, to ensure no impact to referential integrity between the DATA DSA.

The preferred model to copy data from one remote host to another is via the compressed rsync process over SSH, as this is a rapid process for the CA Directory db / zdb files.


Below are the code blocks that demonstrate examples how to copy data from one DSA server to another DSA server.

sudo -iu dsa

time rsync --progress -e 'ssh -ax' -avz --exclude "User*" --exclude "*.dp" --exclude "*.tx" dsa@ $DXHOME/data

# Scroll to see entire line 
sudo -iu dsa

scp   REMOTE_ID@$HOST:./data/<folder_impd_data_dsa_name>/*.zdb   /tmp/dsa_data
/usr/bin/mv  /tmp/dsa_data/<incorrect_dsaname>.zdb   $DXHOME/data/<folder_impd_data_dsa_name>/<correct_dsaname>.db

# Scroll to see entire line 

Step 4b Walk through the final steps before restarting the “BAD” DATA DSA.

The ONLY files that should be in the data folders are <dsaname>.db (binary data file) and <dsaname>.dx (ASCII time-stamp file). Ensure that the copied <prior-hostname-dsaname>.zdb file has been renamed to the correct hostname & extension for <dsaname>.db

Remove the prior <dsaname>.dp (ASCII time-stamp file) { the DATA DSA will auto replace this file with the *.dx file contents } and the <dsaname>.tx (binary data transaction file).

Step 5a Startup the DATA DSA with the command

dxserver start all

If there is any issue with a DATA or ROUTER DSA not starting, then issue the same command with the debug switch (-d)

dxserver -d start <dsaname>

Use the output from the above debug process to address any a) syntax challenges, or b) older PID/LCK files ($DXHOME/pid)

Step 5b Finally, use dxsearch/ldapsearch to query a unit-test of authentication with the primary service ID. Use other unit/use-case tests as needed to confirm data is now synced.

echo -n Password01 > .impd.pwd ; chmod 600 .impd.pwd

LDAPTLS_REQCERT=never dxsearch -LLL -H ldaps://`hostname`:20394 -D 'eTDSAContainerName=DSAs,eTNamespaceName=CommonObjects,dc=etadb' -y .impd.pwd -s base -b 'eTDSAContainerName=DSAs,eTNamespaceName=CommonObjects,dc=etadb' '(objectClass=*)' | perl -p00e 's/\r?\n //g'

# Scroll to see entire line 

LDIF Recovery Processes

The steps above are for recovery via a 100% replacement method, where the assumption is that the “bad” DSA server does NOT have any data worth keeping or wish to be reviewed.

We wish to clarify a process/methodology, where the “peer” Multi-write DSA may be out-of-sync. Still, we are not sure “which” is truly the “good DSA” to select, or perhaps we wished to merge data from multiple DSA before we declare one to be the “good DSA” (with regards to the completeness of data).

Using CA Directory commands, we can join them together to automate snapshots and exports to LDIF files. These LDIF files can then be compared against their peers MW DATA DSA exports or even to themselves at different snapshot export times. As long as we have the LDIF exports, we can recover from any DAR scenario.

Example of using CA Directory dxserver and dxdumpdb commands (STEP 3) with the ldifdelta and dxmodify commands.

The output from ldifdelta may be imported to any remote peer MW DATA DSA server to sync via dxmodify to that hostname, to force a sync for the few objects that may be out-of-sync, e.g. Password Hashes or other.

dxserver status | grep started | grep -v router | awk '{print $1}' | while IFS='' read -r LINE || [ -n "$LINE" ] ; do dxserver onlinebackup "$LINE" ; sleep 10; dxdumpdb -z -f /tmp/`date '+%Y%m%d_%H%M%S_%s'`_$LINE.ldif $LINE ;done ; echo ; find $DXHOME -name "*.zdb" -exec ls -larth {} \; ; echo ; ls -larth --time-style=full-iso /tmp/*.ldif | grep  `date '+%Y-%m-%d'`

ldifdelta -x -S ca-prov-srv-01-impd-co  /tmp/20200819_122820_1597858100_ca-prov-srv-01-impd-co.ldif   /tmp/20200819_123108_1597858268_ca-prov-srv-01-impd-co.ldif  |  perl -p00e 's/\r?\n //g'  >   /tmp/delta_file_ca-prov-srv-01-impd-co.ldif   ; cat /tmp/delta_file_ca-prov-srv-01-impd-co.ldif

echo -n Password01 > .impd.pwd ; chmod 600 .impd.pwd
dxmodify -v -c -h`hostname` -p 20391  -D 'eTDSAContainerName=DSAs,eTNamespaceName=CommonObjects,dc=etadb' -y .impd.pwd -f /tmp/delta_file_ca-prov-srv-01-impd-co.ldif

# Scroll to see entire line 

The below images demonstrate a delta that exists between two (2) time snapshots. The CA Directory tool, ldifdelta, can identify and extract the modified entry to the user object.

The following examples will show how to re-import this delta using dxmodify command to the DATA DSA with no other modifications required to the input LDIF file.

In the testing example below, before any update to an object, let’s capture a snapshot-in-time and the LDIF files for each DATA DSA.

Lets make an update to a user object using any tool we wish, or command line process like ldapmodify.

Next, lets capture a new snapshot-in-time after the update, so we will be able to utilize the ldifdelta tool.

We can use the ldifdelta tool to create the delta LDIF input file. After we review this file, and accept the changes, we can then submit this LDIF file to the remote peer MW DATA DSA that are out-of-sync.

Hope this has value to you and any challenges you may have with your environment.