WAN Latency: Rsync versus SCP

We were curious about what methods we can use to manage large files that must be copied between sites with WAN-type latency and also restrict ourselves to processes available on the CA Identity Suite virtual appliance / Symantec IGA solution.

Leveraging VMware Workstation’s ability to introduce network latency between images, allows for a validation of a global password reset solution.

If we experience deployment challenges with native copy operations, we need to ensure we have alternatives to address any out-of-sync data.

The embedded CA Directory maintains the data tier in separate binary files, using a software router to join the data tier into a virtual directory. This allows for scalability and growth to accommodate the largest of sites.

We focused on the provisioning directory (IMPD) as our likely candidate for re-syncing.

Test Conditions:

  1. To ensure the data was being securely copied, we kept the requirement for SSH sessions between two (2) different nodes of a cluster.
  2. We introduce latency with VMware Workstation NIC for one of the nodes.

3. The four (4) IMPD Data DSAs were resized to 2500 MB each (a similar size we have seen in production at many sites).

4. We removed data and the folder structure from the receiving node to avoid any checksum restart processes from gaining an unfair advantage.

5. If the process allowed for exclusions, we did take advantage of this feature.

6. The feature/process/commands must be available on the vApp to the ‘config’ or ‘dsa’ userIDs.

7. The reference host/node that is being pulled, has the CA Directory Data DSAs offline (dxserver stop all) to prevent ongoing changes to the files during the copy operation.

Observations:

SCP without Compression: Unable to exclude other files (*.tx,*.dp, UserStore) – This process took over 12 minutes to copy 10,250 MB of data

SCP with Compression: Unable to exclude other files (*.tx,*.dp, UserStore) – This process still took over 12 minutes to copy 10,250 MB of data

Rsync without compression: This process can exclude files/folders and has built-in checksum features (to allow a restart of a file if the connection is broken) and works over SSH as well. If the folder was not deleted prior, then this process would give artificial high-speed results. This process was able to exclude the UserStore DSA files and the transaction files (*.dp & *.tx) that are not required to be copied for use on a remote server. Only 10,000 MB (4 x 2500 MB) was copied instead of an extra 250 MB.

Rsync with compression: This process can exclude files/folders and has built-in checksum features (to allow a restart of a file if the connection is broken) and works over SSH as well. This process was the winner, and; extremely amazing performance over the other processes.

Total Time: 1 min 10 seconds for 10,000 MB of data over a WAN latency of 70 ms (140 ms R/T)

Now that we have found our winner, we need to do a few post steps to use the copied files. CA Directory, to maintain uniqueness between peer members of the multi-write (MW) group, have a unique name for the data folder and the data file. On the CA Identity Suite / Symantec IGA Virtual Appliance, pseudo nomenclature is used with two (2) digits.

The next step is to rename the folder and the files. Since the vApp is locked down for installing other tools that may be available for rename operations, we utilized the find and mv command with a regular xpression process to assist with these two (2) steps.

Complete Process Summarized with Validation

The below process was written within the default shell of ‘dsa’ userID ‘csh’. If the shell is changed to ‘bash’; update accordingly.

The below process also utilized a SSH RSA private/public key process that was previously generated for the ‘dsa’ user ID. If you are using the vApp, change the userID to config; and su – dsa to complete the necessary steps. You may need to add a copy operation between dsa & config userIDs.

Summary of using rsync with find/mv to rename copied IMPD *.db files/folders
[dsa@pwdha03 ~/data]$ dxserver status
ca-prov-srv-03-impd-main started
ca-prov-srv-03-impd-notify started
ca-prov-srv-03-impd-co started
ca-prov-srv-03-impd-inc started
ca-prov-srv-03-imps-router started
[dsa@pwdha03 ~/data]$ dxserver stop all > & /dev/null
[dsa@pwdha03 ~/data]$ du -hs
9.4G    .
[dsa@pwdha03 ~/data]$ eval `ssh-agent` && ssh-add
Agent pid 5395
Enter passphrase for /opt/CA/Directory/dxserver/.ssh/id_rsa:
Identity added: /opt/CA/Directory/dxserver/.ssh/id_rsa (/opt/CA/Directory/dxserver/.ssh/id_rsa)
[dsa@pwdha03 ~/data]$ rm -rf *
[dsa@pwdha03 ~/data]$ du -hs
4.0K    .
[dsa@pwdha03 ~/data]$ time rsync --progress -e 'ssh -ax' -avz --exclude "User*" --exclude "*.dp" --exclude "*.tx" dsa@192.168.242.135:./data/ $DXHOME/data
FIPS mode initialized
receiving incremental file list
./
ca-prov-srv-01-impd-co/
ca-prov-srv-01-impd-co/ca-prov-srv-01-impd-co.db
  2500000000 100%  143.33MB/s    0:00:16 (xfer#1, to-check=3/9)
ca-prov-srv-01-impd-inc/
ca-prov-srv-01-impd-inc/ca-prov-srv-01-impd-inc.db
  2500000000 100%  153.50MB/s    0:00:15 (xfer#2, to-check=2/9)
ca-prov-srv-01-impd-main/
ca-prov-srv-01-impd-main/ca-prov-srv-01-impd-main.db
  2500000000 100%  132.17MB/s    0:00:18 (xfer#3, to-check=1/9)
ca-prov-srv-01-impd-notify/
ca-prov-srv-01-impd-notify/ca-prov-srv-01-impd-notify.db
  2500000000 100%  130.91MB/s    0:00:18 (xfer#4, to-check=0/9)

sent 137 bytes  received 9810722 bytes  139161.12 bytes/sec
total size is 10000000000  speedup is 1019.28
27.237u 5.696s 1:09.43 47.4%    0+0k 128+19531264io 2pf+0w
[dsa@pwdha03 ~/data]$ ls
ca-prov-srv-01-impd-co  ca-prov-srv-01-impd-inc  ca-prov-srv-01-impd-main  ca-prov-srv-01-impd-notify
[dsa@pwdha03 ~/data]$ find $DXHOME/data/ -mindepth 1 -type d -exec bash -c 'mv  $0 ${0/01/03}' {} \; > & /dev/null
[dsa@pwdha03 ~/data]$ ls
ca-prov-srv-03-impd-co  ca-prov-srv-03-impd-inc  ca-prov-srv-03-impd-main  ca-prov-srv-03-impd-notify
[dsa@pwdha03 ~/data]$ find $DXHOME/data -depth -name '*.db' -exec bash -c 'mv  $0 ${0/01/03}' {} \; > & /dev/null
[dsa@pwdha03 ~/data]$ dxserver start all
Starting all dxservers
ca-prov-srv-03-impd-main starting
..
ca-prov-srv-03-impd-main started
ca-prov-srv-03-impd-notify starting
..
ca-prov-srv-03-impd-notify started
ca-prov-srv-03-impd-co starting
..
ca-prov-srv-03-impd-co started
ca-prov-srv-03-impd-inc starting
..
ca-prov-srv-03-impd-inc started
ca-prov-srv-03-imps-router starting
..
ca-prov-srv-03-imps-router started
[dsa@pwdha03 ~/data]$ du -hs
9.4G    .
[dsa@pwdha03 ~/data]$


Note: An enhancement has been open to request that the ‘dsa’ userID is able to use remote SSH processes to address any challenges if the Data IMPD DSAs need to be copied or retained for backup processes.

https://community.broadcom.com/participate/ideation-home/viewidea?IdeationKey=7c795c51-d028-4db8-adb1-c9df2dc48bff

Example for vApp Patches:

Note: There is no major different in speed if the files being copied are already compressed. The below image shows that initial copy is at the rate of the network w/ latency. The value gain from using rsync is still the checksum feature that allow auto-restart where it left off.

vApp Patch process refined to a few lines (to three nodes of a cluster deployment)

# PATCHES
# On Local vApp [as config userID]
mkdir -p patches  && cd patches
curl -L -O ftp://ftp.ca.com/pub/CAIdentitySuiteVA/cumulative-patches/14.3.0/CP-VA-140300-0002.tar.gpg
curl -L -O ftp://ftp.ca.com/pub/CAIdentitySuiteVA/cumulative-patches/14.3.0/CP-IMV-140300-0001.tgz.gpg
screen    [will open a new bash shell ]
patch_vapp CP-VA-140300-0002.tar.gpg           [Patch VA prior to any solution patch]
patch_vapp CP-IMV-140300-0001.tgz.gpg
exit          [exit screen]
cd ..
# Push from one host to another via scp
IP=192.168.242.136;scp -r patches  config@$IP:
IP=192.168.242.137;scp -r patches  config@$IP:
# Push from one host to another via rsync over ssh          [Minor gain for compressed files]
IP=192.168.242.136;rsync --progress -e 'ssh -ax' -avz $HOME/patches config@$IP:
IP=192.168.242.137;rsync --progress -e 'ssh -ax' -avz $HOME/patches config@$IP:
# Pull from one host to another via rsync over ssh          [Minor gain for compressed files]
IP=192.168.242.135;rsync --progress -e 'ssh -ax' -avz config@$IP:./patches $HOME

# View the files were patched
IP=192.168.242.136;ssh -tt config@$IP "ls -lart patches"
IP=192.168.242.137;ssh -tt config@$IP "ls -lart patches"

# On Remote vApp Node #2
IP=192.168.242.136;ssh $IP
cd patches
screen    [will open a new bash shell ]
patch_vapp CP-VA-140300-0002.tar.gpg
patch_vapp CP-IMV-140300-0001.tgz.gpg
exit          [exit screen]
exit          [exit to original host]

# On Remote vApp Node #3
IP=192.168.242.137;ssh $IP
cd patches
screen    [will open a new bash shell ]
patch_vapp CP-VA-140300-0002.tar.gpg
patch_vapp CP-IMV-140300-0001.tgz.gpg
exit          [exit screen]
exit          [exit to original host]

View of rotating the SSH RSA key for CONFIG User ID

# CONFIG - On local vApp host
ls -lart .ssh     [view any prior files]
echo y | ssh-keygen -b 4096 -N Password01 -C $USER -f $HOME/.ssh/id_rsa
IP=192.168.242.135;ssh-keyscan -p 22 $IP >> .ssh/known_hosts
IP=192.168.242.136;ssh-keyscan -p 22 $IP >> .ssh/known_hosts
IP=192.168.242.137;ssh-keyscan -p 22 $IP >> .ssh/known_hosts
cp -r -p .ssh/id_rsa.pub .ssh/authorized_keys
rm -rf /tmp/*.$USER.ssh-keys.tar
tar -cvf /tmp/`/bin/date -u +%s`.$USER.ssh-keys.tar .ssh
ls -lart /tmp/*.$USER.ssh-keys.tar
eval `ssh-agent` && ssh-add           [Enter Password for SSH RSA Private Key]
IP=192.168.242.136;scp `ls /tmp/*.$USER.ssh-keys.tar`  config@$IP:
IP=192.168.242.137;scp `ls /tmp/*.$USER.ssh-keys.tar`  config@$IP:
USER=config;ssh -tt $USER@192.168.242.136 "tar -xvf *.$USER.ssh-keys.tar"
USER=config;ssh -tt $USER@192.168.242.137 "tar -xvf *.$USER.ssh-keys.tar"
IP=192.168.242.136;ssh $IP `/bin/date -u +%s`
IP=192.168.242.137;ssh $IP `/bin/date -u +%s`
IP=192.168.242.136;ssh -vv $IP              [Use -vv to troubleshoot ssh process]
IP=192.168.242.137;ssh -vv $IP 				[Use -vv to troubleshoot ssh process]

Avoid Data Quality Issues during Testing (TDM)

Why do we see data quality challenge in lower environments (Test, Dev, QA) that we do not see in Production Environments?

If the project team was asked to set up lower environments for any new solution, it might be that the TDM (test-data-management) methodology is not a formal corporate process.

TDM may be simply described as capturing non-PII (sensitive) production data and coping a full or limited set of the data to the non-production environments. This non-PII data may be 1:1 or masked during this process.

A TDM (test-data-management) process for a new environment may be a challenge if there is no current production environment or that the current production environment is from a prior solution or M&A (merge/acquisitions).

While there are formal paid tools/solutions for TDM, a project team may wish to leverage CLI (command-line) and/or scripts to create this sub-set of non-PII production data for the lower environments.

This process may be as simple as deciding to export the full DIT (directory structure/directory information tree) of an LDAP store with all its current group names, but replace the userID/Full Name/sensitive data with “dummy/masked” data. This exported data would be loaded with the near-Production data, to allow for full use-case and negative use-case testing in the lower environments.

The Goal? Avoid show-stopper or high-level issues due to missed data quality concerns during a Go-Live or Business Release Cycle. This is very important when we have a small maintenance window to add new functionality.

Let us help with the knowledge transfer and building of representatives environments. We see this challenge often for the IAM solutions that manage 1000’s of endpoints, where even the basic Active Directory representation is missing the same DIT structure and group objects as the project AD domains, especially for M&A business projects.

Writing Successful Test Plans

One of the challenges we see is that project team members dislike writing.

Documentation that is very visible business owners/team leads, e.g. business/technical requirements, design, or project management, will not be greatly impacted due to the maturity of the senior resources.

However, one area seems to suffer and does have an impact for project timelines & future go-live estimates. Documentation for test plans may be very simplistic or detailed.

Project suffer timeline challenges when test plans & tests scripts are too simplistic.

The business QA resources assigned to execute the test plan/test scripts can NOT be assumed to have the in-depth background/knowledge of the solution. If the initial conditions and final output are not clearly called out (or how to reset them), then we have seen project timeline is drawn out as they are pushed into a seemingly never-ending cycle of QA testing.

To ensure your project is successful, demand that the test scripts for the test plans are written out as if to be executed by your great-grandparents. This includes which hyperlinks to use, which browser to use, which initial conditions to reset, which tool to reset to initial conditions, which steps to follow, how to record the final answer, where to capture the results, screenshot to be captured where and how.

The above methodology ensures that we do not have a “black box” of a solution, e.g. something-goes-in and we-hope-that-something-good-comes-out.

With the above process, the QA team lead can then scale out their team as needed.

When expected input/output information is captured, automated testing can be introduced with enhanced reporting and validation. This becomes exponentially valuable for IAM solution that manages 100’s of endpoints from legacy [AS/400, HP-NONSTOP NSK, Mainframe (ACF2/TSS/RACF/TSO)] to SaaS Cloud solutions.

So don’t contemplate, spend the time and reap the values. Make your grandparents proud!

Transparency through Automated Testing

One of the challenges that businesses have for projects is an awareness of the true status of tasks.

Project Methodology continues to advance with concepts of Agile Project Management which work well for larger projects. One of the value statements from Agile is the question to project resources when they can complete a task. This question provides a view into the mindset of the resource’s skill set and confidence to meet the task goal. If the resource is a junior resource or has limited skill in the task, then the effort provided to the team will be high. With Agile methodology using this process, it becomes very easy for resources, while they frantically research, to inadvertently drain the project bucket of effort, e.g., a 4-hour task that turns into a week duration.

Another area that has great success with enforcing transparency is automated testing. Automated testing may be used for unit, integration, use-case, performance, and scale testing. However, for project transparency, to lower business risk and project cost overrun, we would state the value of automated testing is from use-case & regression testing.

After technical and business requirements are complete, ensure that a project scheduled or WBS (work-breakdown-structure) has a defined milestone to migrate ALL manual use-case testing to automation. The effort to convert from manual use-case testing to automate testing will be considered by a few to have little value. However, when the final parts of a project are to meet a go-live over a weekend or to add in new business release with adjusted business logic. What would you trust to reach your goals 100%?

Below are two (2) common scenarios:

  1. Solution Upgrade Go-Live over a weekend. You have to be allocated 48 hours to backup of solution data & all platforms, perform a data snapshot, migrate data, integrate with newer solution components (possible new agents), combine with production data, and validate all use-cases for all business logic. And allow time for roll-back if, during triage of issues, the business team determines that show-stopper issues will not be addressed in the period. If you fail, you may be allowed one more attempt on another of your weekends, with all 2-20 people.
  2. Solution Business Release Cycle – Over a weekend or business day. You have the option to deploy new business logic to your solution. You can lower business risk to deploy during a business day but will require additional use-case and regression testing. If you have no automation, you will leverage a QA team of 2-10 people to exercise the use-cases; and sometimes negative use-cases.

Math: Assume your solution has twenty (20) use-cases & sub-use-cases where each use-case may have twenty (20) test scripts. Assume that you have an excellent QA/business/technical resources that have adequate capture the initial conditions (that must be reset every time) for each test script & they are checking for data quality challenges as well. Assume each test script takes about ten (10) minutes to execute, where your QA team resource (not the same skill set) will follow exactly and record success/failure. Perhaps you have trained them to use QA tools to screen capture your failure messages, and assign a technical project team resource to address.

20 use-case x 20 scripts/use-case x 10 min/script = 4000 minutes for one QA resource. Well, we have 1440 minutes in a day, so 4000/1440 = 2.78 days or 66.7 hours. Assume we add ten (10) QA business resources, while we have lower the QA cycle from 66.7 hours to 6.7 hours; we will be required to “freeze” any additional updates during this QA cycle; and likely impact our maintenance window for remediation of “found” issues for either scenario above.

Be aware of the “smoke” testing follies. This type of testing still leaves issues “burning.”

Enforce transparency for project owners, project managers, and team members.

Ensure that the effort to build the automated testing is kept for future regression when the new business logic phase is implemented. Prove to yourselves that prior business logic will NOT be impacted.

Many tools can be leveraged for automation, e.g., Open Source Jmeter (used by many customers), Selenium, or paid tools (Broadcom/CA Technologies Blazemeter), SOAPUI

Let us help.

We firmly believe, encourage, and perform knowledge transfer to our customers to help them succeed, and ensure that the introduction of automated testing lowers TCO of any solution. We can train your staff very quickly to leverage Jmeter from their desktop/servers to automate any written testing plans for solutions. These JMeter process can then be shared with all project team members.