View the JMS HornetQ Queue

Typically, we may use various tools to view JMS queue(s) related metrics for trends and stale/stuck activity. During issues with J2EE JMS Queue, though, it would be helpful to be able to view and trace transactions to assist with a resolution. With proper logging levels enabled, Wildfly/JBOSS logs show detailed information containing the JMS IDs associated with each transaction. These JMS transactions we see in the logs are already ‘in-flight’ and are being processed by a message handler.

On the Symantec Identity Suite Virtual Appliance, the Wildfly & HornetQ processes are run under the ‘wildfly’ service ID. The wildfly journals are located in the wildfly data folder and stored in a format that is efficient for processing. To perform analysis on the data within these journals, though, we noticed a challenge with read-permissions for the HornetQ files even when Wildfly/Java process is not actively running.

To avoid this issue on the Virtual Appliance, copy the HornetQ files to a temporary folder. Remember to copy the entire folder, including sub-folders.

mkdir -p /tmp/hornetq; cd /tmp/hornetq

cp -r -p /opt/CA/wildfly-idm/standalone/data/live-hornetq ./

java -cp "/opt/CA/wildfly-idm/modules/system/layers/base/io/netty/main/*:/opt/CA/wildfly-idm/modules/system/layers/base/org/hornetq/main/*:/opt/CA/wildfly-idm/modules/system/layers/base/org/jboss/logging/main/*" org.hornetq.tools.Main print-data /tmp/hornetq/live-hornetq/bindings  /tmp/hornetq/live-hornetq/journal

Once the live-hornetq folder is available in a tmp location, execute the below process for printing Journal content.

Print HornetQ Journal and Bindings

To export the HornetQ Journal Files to XML, the Java module of “org.hornetq.core.journal.impl.ExportJournal” requires the journal sub-folder with the prefix of “hornetq-data”, the file extension (hq), the file sizes, and where to export the XML file (export.dat). The prefix and file extension (hq) are unique to the Identity Suite vApp.

mkdir -p /tmp/hornetq; cd /tmp/hornetq

cp -r -p /opt/CA/wildfly-idm/standalone/data/live-hornetq ./

java -cp "/opt/CA/wildfly-idm/modules/system/layers/base/io/netty/main/*:/opt/CA/wildfly-idm/modules/system/layers/base/org/hornetq/main/*:/opt/CA/wildfly-idm/modules/system/layers/base/org/jboss/logging/main/*" org.hornetq.core.journal.impl.ExportJournal  /tmp/hornetq/live-hornetq/journal hornetq-data hq  25485760  /tmp/hornetq/export.dat
Export HornetQ Journal

The body/rows of the JMS export is partially base64. You may parse through this information as you wish.

Use this information to trace through transactions in the JMS queue.

For Cleanup, within the Symantec Identity Suite vApp, there are a few options. The first is deleting the JMS queue journals before starting the Wildfly service. This can be accomplished using the build-in alias ‘deleteIDMJMSqueue’.

alias deleteIDMJMSqueue='sudo /opt/CA/VirtualAppliance/scripts/.firstrun/deleteIDMJMSqueue.sh'

Another option is to remove a select JMS entry from the queue using /opt/CA/wildfly-idm/bin/jboss-cli.sh process. If created with an input script, escape the colons in the GUID.

/subsystem=transactions/log-store=log-store/:probe()

ls /subsystem=transactions/log-store=log-store/transactions

/subsystem=transactions/log-store=log-store/transactions=0:ffffa409cc8a:1c01b1ff:5c7e95ac:eb:delete() 

View a description of the JMS Processing from Broadcom Engineering/Support Teams (see below video)

This write-up provides the tools required for a deeper analysis. Debugging issues with JMS may test one’s patience, stay the course, stay persistent, and have fun!

References: (Delete JMS queue and remove a single entry)

https://knowledge.broadcom.com/external/article/233003/inprogress-task-issues-a-clients-guide.html

https://knowledge.broadcom.com/external/article/129101/arjuna016037-could-not-find-new-xaresour.html

Kubernetes and Vmware Workstation

Kubernetes was designed for the deployment of applications to cloud architecture with containers. Another way of thinking about Kubernetes; it gets us “out-of-the-install-binaries” business and focuses our efforts on the business value of a solution. We have documented our process of how we train our resources and partners. This process will help your team to excel and gain confidence with cloud technologies.

One of the business challenges of Kubernetes in the cloud architecture is the ongoing cost ($300-$600/month per resource) during the learning or development process. To lower this ongoing cost per resource, we focused on a method to use on-prem Kubernetes deployments.


We have found examples online of using minikube and Oracle Virtualbox to assist with keeping costs low while using an on-prem deployment but did not find many examples of using Vmware Workstation to our satisfaction. Our goal was to utilize a solution that we are very familiar with and has the supporting capabilities for rollback via snapshots.

We have used Vmware Workstation for many years while working on service projects. We cannot overstate its usefulness to offer a “play-ground” and development environment independent of a client’s environment. The features of snapshots allow for negative use-case testing or “what-if” scenarios to destroy or impact solutions being tested with minimal impact.

In this entry, we will discuss the use of Vmware Workstation and CentOS (or Ubuntu) as the primary Kubernetes Nodes. Both CentOS and/or Ubuntu OS are used by the cloud providers as their Kubernetes nodes, so this on-prem process will translate well.

Some of our team members run the Kubernetes environment from their laptop, a collection of individual servers, or a larger server that may scale to the number of vCPU/RAM required for the Kubernetes solution.

Decision 1: Choose an OS to be used.

Either CentOS or Ubuntu OS is acceptable to use for on-prem. When we checked the OSes used by the cloud providers, we noted they used one of these two (2) OS for Linux OS. We decided on CentOS 7, as iptables for routing are used within Kubernetes; and iptables are used by default in CentOS 7. You may find that other OSes will work fine as well.

Decision 2: Build a reference image

Identify all expected binaries to be used within this image. This reference image will be cloned for the Kubernetes control plane node (1) and the worker nodes (3-4). We will also use this image to build a supporting node (non-Kubernetes) for SiteMinder integration and a docker repository for the Kubernetes docker images. For a total of six (6) nodes.

Decision 3: DNS and Certificates

Recommendation: Please do not attempt to deploy a Kubernetes solution on-prem without having purchased a DNS domain/site and use wild card certificates tied to the DNS domain.

Without these two (2) supporting components, it is a challenge to have a working Kubernetes solution that reflects what you will experience in a cloud deployment.

For example, we purchased a domain for $12/year, and then created several “A” records that will host the IP addresses we may use to redirect to cloud or on-prem. Using sub-domains “A” records, we can have as many cloud addresses as we wish.

DNS "A" Records Example:    
aks.iam.anapartner.net (MS Azure),  
eks.iam.anapartner.net (Amazon),  
gke.iam.anapartner.net (Google).      

DNS "CNAME" Records Example:  
alertmanager.aks.iam.anapartner.net, 
grafana.aks.iam.anapartner.net, 
jaeger.aks.iam.anapartner.net,
kibana.aks.iam.anapartner.net, 
mgmt-ssp.aks.iam.anapartner.net, 
sm.aks.iam.anapartner.net, 
ssp.aks.iam.anapartner.net.       
Example of using Synology DNS Server for Kubernetes cluster’s application. With “A” and “CNAME” records.

Finally, we prefer to use wildcard certificates for these domains to avoid challenges within our Kubernetes deployment. There are several services out there offering free certificates.

We chose Let’sEncrypt https://letsencrypt.org/. While Let’sEncrypt has automated processes to renew their certs, we chose to use their DNS validation process with a CertBot solution. We can renew these certificates every 90 days for on-prem usage. The DNS validation process requires a unique string generated by the Let’sEncrypt process to be populated in a DNS “TXT” record like so: _acme-challenge.aks.iam.anapartner.net . See the example at the bottom of this blog entry on this process.

Decision 4: Supporting Components: Storage, Load-Balancing, DNS Resolution (Local)

The last step required for on-prem deployment is where will you decide to place persistence storage for your Kubernetes cluster. We chose to use an NFS share.

We first tested using the control-plane node, then decided to move the NFS share to a Synology NAS solution. Similar for the DNS resolution option, at first we used a DNS service on the control-plane node and then moved to to the Synology NAS solution.

For Load-Balancing, Kubernetes has a service option of NodePort and LoadBalancing. The LoadBalancing service if not deployed in the cloud, will default to NodePort behavior. To introduce load balancing for on-prem, we introduced the HA-proxy service on the control-plane node, along with Kubernetes NodePort service to meet this goal.

After the decisions have been made, we can now walk through the steps to set up a Vmware environment for Kubernetes.

Reference Image

Step 1: Download the OS DVD ISO image for deployment on Vmware Workstation (Centos 7 / Ubuntu ).

Determine specs for the future solution to be deployed on Kubernetes. Some solutions have pods that may require minimal memory/disc space. For the solution we decided on deploying, we confirmed that we need 16 GB RAM and 4vCPU minimal. We have confirmed these specs were required by previously deploying the solution in a cloud environment.

Without these memory/cpu specs, the solution that we chose would pause the deployment of Kubernetes pods to the nodes. You may or may not see error messages in the deployment of pods stating that the nodes did not have enough resources for all or some of the pods.

For disc size, we selected 100 GB to future-proof the solution during testing. For networking, please select BRIDGED mode, to allow the Vmware images to have minimal network issues when routing within your local network. Please avoid double NAT’ing the deployment to reduce your headaches.

Step 2: Install useful base packages and disable any UI tools. Please install an Entropy Daemon to avoid delays due to certificates usage of /dev/random and low entropy.

### UI Update for CentOS7 was stopping yum deployment - not required for our solution to be tested (e.g. VIP Auth Hub)
# su to root to run the below commands.   We will add sudo access later.

su - 
systemctl disable packagekit; systemctl stop packagekit; systemctl status packagekit

### Installed base useful packages.

yum -y install dnf epel-release yum-utils nfs-utils 

### Install useful 2nd tools.

yum -y install openldap-clients jq python3-pip tree

pip3 install yq
yum -y upgrade


### Install Entropy process (epel repo)

dnf -y install haveged
systemctl enable haveged --now

Step 3: Install docker and update the docker configuration for use with Kubernetes. Update the path & storage-driver for the docker images for initial deployment.

Ref: https://docs.docker.com/storage/storagedriver/overlayfs-driver/

### Install Docker repo & docker package

yum-config-manager --add-repo  https://download.docker.com/linux/centos/docker-ce.repo
dnf -y install docker-ce
docker version
systemctl enable docker --now
docker version

### Update docker image info after deployment and restart service

cat << EOF > /etc/docker/daemon.json
{
"debug": false,
"data-root": "/home/docker-images",
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF

### Restart docker to load updated image info.
systemctl restart docker; systemctl status docker; docker version

Step 4: Deploy the three (3) primary Kubernetes & the HELM binaries.

Ensure you select a Kubernetes version that matches what solution you wish to deploy and work with. This can be a gotcha if the Kubernetes binaries update during a dnf / yum upgrade process and your solution has not been vetted for the newer release of Kubernetes. See the reference link below on how to upgrade Kubernetes binaries.

Ref: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

### Add k8s repo

cat << EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

### When upgrading the OS, be sure to use the correct version of kubernetes (remove and add) - Example to force version 1.20.11 ###

dnf upgrade -y
dnf remove -y kubelet kubeadm kubectl
dnf install -y kubelet-1.20.11-0.x86_64 kubeadm-1.20.11-0.x86_64 kubectl-1.20.11-0.x86_64 --disableexcludes=kubernetes


### Start the k8s process.

systemctl enable kubelet --now;  systemctl status kubelet
systemctl daemon-reload && systemctl enable kubelet --now
yum-config-manager --save --setopt=kubernetes.skip_if_unavailable=true

### Add HELM binary 

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

Step 5: OS configurations required or useful for Kubernetes. Kubernetes kubelet binary requires SWAP to be disabled.

Ref: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

### Stop FirewallD - May add ports later for security

systemctl stop firewalld;systemctl disable firewalld; iptables -F

### Update OS Parameters for kubernetes

setenforce 0
sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
modprobe br_netfilter

cat << EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system

### Note:  IP forwarding is enabled by default.

sysctl -a | grep -i forward

### Note: Update /etc/fstab to comment out swap line with # character
### Warning:  kubectl init will fail if swap is left on cp or any worker node.

swapoff -a
sed -i 's|UUID\=\(.*\)-\(.*\)-\(.*\)-\(.*\)-\(.*\) swap|#UUID\=\1-\2-\3-\4-\5 swap|g' /etc/fstab
cat /etc/fstab

Step 6: Create SSH key for root or other services IDs to allow remote script updates from CP to Worker Nodes

### Create SSH key for root to allow remote script updates from CP to Worker Nodes - Enter a Blank/Null PASSWORD.

su - 
rm -rf ~/.ssh; echo y | ssh-keygen -b 4096  -C $USER -f ~/.ssh/id_rsa

### Copy the public rsa key to authorized keys to avoid password between cp/worker nodes for remote ssh commands.

cp -r -p ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys;chmod 600 ~/.ssh/authorized_keys;ls -lart .ssh

### Test for remote connection with no password:   
  
ssh -i ~/.ssh/id_rsa  root@localhost    

### Copy the id_rsa key to your host system for ease of testing.

### Add your local non-root user to sudo wheel group.  Change vip to your user ID.

LOCALUSER=vip
gpasswd -a $LOCALUSER wheel

### Update sudoers file to allow wheel group with no-password

sed -i 's|# %wheel|%wheel|g' /etc/sudoers

###  View update wheel group.

grep "%wheel" /etc/sudoers

# Example of return query.
# %wheel  ALL=(ALL)       ALL
# %wheel  ALL=(ALL)       NOPASSWD: ALL

Step 7: Stop or adjust the OS network manager, shutdown the reference image, and create a Vmware Snapshot

### Adjust or Disable the OS NetworkManager (to avoid overwriting /etc/resolv.conf)
### Important when using an internal DNS server.

systemctl disable NetworkManager;systemctl stop NetworkManager

### reboot CentOS7 Image and validate no issues upon reboot.
reboot

### Shutdown image and manually create snapshot called  "base"

Vmware Workstation Cloning

Step 8: Now that we have a reference image, we can now make clone images for the control-plane (1), the worker nodes (4), and the supporting node (1). This is a fairly quick process.

export BASE=/home/me/vmware/kub
export REF=/home/me/vmware/kub/CentOS7/CentOS7.vmx

VM=cp;mkdir       -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full
VM=worker01;mkdir -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full
VM=worker02;mkdir -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full
VM=worker03;mkdir -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full
VM=worker04;mkdir -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full
VM=sm;mkdir -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full

Step 9: Start the clone images and remotely assign new hostname/IP addresses to the images

# Start cloned images for CP and Worker Nodes - Update any files as needed. 
 
export DOMAIN=aks.iam.anapartner.net
export PASSWORD_VM=Password01

### Start the cloned images for CP and Worker Nodes.

VM=cp;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
VM=worker01;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
VM=worker02;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
VM=worker03;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
VM=worker04;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
VM=sm;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
vmrun -T ws list | sort -rn


### Update Hostnames for CP and Worker Nodes with Domain.

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait


### Update IP Address and Domain for NIC (ifcfg-ens33)

export CP=192.168.2.60
export WK1=192.168.2.61
export WK2=192.168.2.62
export WK3=192.168.2.63
export WK4=192.168.2.64
export SM=192.168.2.65

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$CP\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$WK1\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$WK2\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$WK3\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$WK4\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$SM\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait

Step 10: Enable the network gateway, disable DHCP, and reboot the images

export DOMAIN=aks.iam.anapartner.net
export PASSWORD_VM=Password01

### Update to create a new default GATEWAY HOST to address routing issues to external IP addresses.
GATEWAY=192.168.2.1

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait

### Disable DHCP (to avoid overwriting /etc/resolv.conf)

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait

 
### Reboot VIP Auth Hub CP and Nodes 

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait

Step 11: Update DNS on the clone images remotely using vmrun

### Update /etc/resolv.conf for correct DNS server.
### Ensure DHCP and Network Manager are disable to prevent these services from overwrite behavior.

export DOMAIN=aks.iam.anapartner.net
export PASSWORD_VM=Password01
DNSNEW=192.168.2.20

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
 
 
### Reboot VIP Auth Hub CP and Nodes
 
VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait

Step 12: Copy the root .ssh public cert to your main host, rename it to a useful name and these test your newly deployed clone images for DSN resolution using ssh. Please confirm this step is successful prior to continuing with the configuration of the control plane and worker nodes.

### Copy the root id_rsa file to host system to allow ease of testing with ssh.

export CP=192.168.2.60
export WK1=192.168.2.61
export WK2=192.168.2.62
export WK3=192.168.2.63
export WK4=192.168.2.64
export SM=192.168.2.65

### Add the hosts for ssh pre-validation. 

ssh-keyscan -p 22 $CP >> ~/.ssh/known_hosts
ssh-keyscan -p 22 $WK1 >> ~/.ssh/known_hosts
ssh-keyscan -p 22 $WK2 >> ~/.ssh/known_hosts
ssh-keyscan -p 22 $WK3 >> ~/.ssh/known_hosts
ssh-keyscan -p 22 $WK4 >> ~/.ssh/known_hosts
ssh-keyscan -p 22 $SM >> ~/.ssh/known_hosts


### Rename from id_rsa to vip_kub_root_id_rsa

ssh -tt -i ~/vip_kub_root_id_rsa root@$CP 'cat /etc/resolv.conf'
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK1 'cat /etc/resolv.conf'
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK2 'cat /etc/resolv.conf'
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK3 'cat /etc/resolv.conf'
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK4 'cat /etc/resolv.conf'
ssh -tt -i ~/vip_kub_root_id_rsa root@$SM 'cat /etc/resolv.conf'


### Validate Access with ssh to CP and Worker Nodes new IP addresses.

FQDN=ssp.aks.iam.anapartner.net
ssh -tt -i ~/vip_kub_root_id_rsa root@$CP  "ping -c 2 $FQDN"
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK1 "ping -c 2 $FQDN"
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK2 "ping -c 2 $FQDN"
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK3 "ping -c 2 $FQDN"
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK4 "ping -c 2 $FQDN"
ssh -tt -i ~/vip_kub_root_id_rsa root@$SM "ping -c 2 $FQDN"

Update CP (controlplane) Node

Step 13a: Copy files to CP Node from Vmware Workstation host and configure the CP node for dedicated CP usage. Recommend using two terminals/sessions to speed up the process. Install HAproxy for Load Balancing, copy the Let’s Encrypt wild card certificates, and copy the Kubernetes solution you will be deploying (scripts/yaml).

### Open Terminal 1 to CP host.
### Add bash completion to have better use of TAB to view parameters.

CP=192.168.2.60
ssh -tt -i ~/vip_kub_root_id_rsa root@$CP
dnf -y install bash-completion
echo 'export KUBECONFIG=/etc/kubernetes/admin.conf'  >>~/.bashrc
kubectl completion bash >/etc/bash_completion.d/kubectl
echo "alias k=kubectl | complete -F __start_kubectl k" >>~/.bashrc

### Install HAProxy and replace the haproxy.cfg file.
dnf -y install haproxy
systemctl enable haproxy --now
netstat -anp | grep -i -e haproxy

### Open Terminal 2 to host and push files to CP node.
### Copy HAProxy configuration, certs, and scripts
scp -i ~/vip_kub_root_id_rsa  haproxy.cfg   root@$CP:/etc/haproxy/haproxy.cfg
scp -i ~/vip_kub_root_id_rsa  cloud-certs-aks-eks-gke_exp-202X-01-12.tar  root@$CP:
scp -i ~/vip_kub_root_id_rsa  202X-11-03_vip_auth_hub_working_centos7_v2.tar   root@$CP:

### On Terminal 1 - on CP host - Restart to use new haproxy configuration file.
systemctl restart haproxy
netstat -anp | grep -i -e haproxy

### Extract CERTS to root home folder
tar -xvf cloud-certs-aks-eks-gke_exp-202X-01-12.tar

### Extract Working Scripts 
tar -xvf 202X-11-03_vip_auth_hub_working_centos7_v2.tar

### Update env variables for unique environment within step00 file.
vi step00_kubernetes_env.sh

### Add the env variables to the .bashrc file
echo ". ./step00_kubernetes_env.sh"

Step 13b: Example of /etc/haproxy/haproxy.cfg configuration for Kubernetes Load Balancing functionality for on-prem worker nodes. HAproxy deployed on control plane (CP) node. The example configuration file will route TCP 80/443/389 to one (1) of the four (4) worker nodes. If a Kubernetes NodePort service is enabled for TCP 389 (31888) ports, then this load balancer will function correctly and route the traffic for LDAP traffic as well.

[root@cp ~]# cat /etc/haproxy/haproxy.cfg
global
    user haproxy
    group haproxy
    chroot /var/lib/haproxy
    log /dev/log    local0
    log /dev/log    local1 notice
defaults
    mode http
    log global
    retries 2
    timeout http-request 10s
    timeout queue 1m
    timeout connect 10s
    timeout client 10m
    timeout server 10m
    timeout http-keep-alive 10s
    timeout check 10s
    maxconn 3000
frontend ingress
    bind *:80
    option tcplog
    mode http
    option forwardfor
    option http-server-close
    default_backend kubernetes-ingress-nodes
backend kubernetes-ingress-nodes
    mode http
    balance roundrobin
    server k8s-ingress-0 worker01.aks.iam.anapartner.net:80 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-1 worker02.aks.iam.anapartner.net:80 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-2 worker03.aks.iam.anapartner.net:80 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-2 worker04.aks.iam.anapartner.net:80 check fall 3 rise 2 send-proxy-v2
frontend ingress-https
    bind *:443
    option tcplog
    mode tcp
    option forwardfor
    option http-server-close
    default_backend kubernetes-ingress-nodes-https
backend kubernetes-ingress-nodes-https
    mode tcp
    balance roundrobin
    server k8s-ingress-0 worker01.aks.iam.anapartner.net:443 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-1 worker02.aks.iam.anapartner.net:443 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-2 worker03.aks.iam.anapartner.net:443 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-2 worker04.aks.iam.anapartner.net:443 check fall 3 rise 2 send-proxy-v2
frontend ldap
    bind *:389
    option tcplog
    mode tcp
    default_backend kubernetes-nodes-ldap
backend kubernetes-nodes-ldap
    mode tcp
    balance roundrobin
    server k8s-ldap-0 worker01.aks.iam.anapartner.net:31888  check fall 3 rise 2
    server k8s-ldap-1 worker02.aks.iam.anapartner.net:31888  check fall 3 rise 2
    server k8s-ldap-2 worker03.aks.iam.anapartner.net:31888  check fall 3 rise 2
    server k8s-ldap-2 worker04.aks.iam.anapartner.net:31888  check fall 3 rise 2

Deploy Solution on Kubernetes

Step 14: Validate that DNS and Storage are ready before deploying any solution or if you wish to have a base Kubernetes environment to use with the control-plane and four (4). worker nodes.

### Step:  Setup NFS Share either on-prem remote server or Synology NFS
### Use version 4.x checkbox for Synology.

### Example of lines on remote Linux Host with NFS share.

yum -y install nfs-utils
systemctl enable --now nfs-server rpcbind
mkdir -p /export/nfsshare ; chown nobody /export/nfsshare ; chmod -R 777 /export/nfsshare
echo "/export/nfsshare *(rw,sync,no_root_squash,insecure)" >> /etc/exports
exportfs -rav; exportfs -v

firewall-cmd --add-service=nfs --permanent
firewall-cmd --add-service={nfs3,mountd,rpc-bind} --permanent 
firewall-cmd --reload 



#### Setup DNS entries (A and CNAME) for twelve (12) items ( May be on-prem DNS or Synology DNS)

ns.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.60)
aks.iam.anapartner.net  NS ns.aks.iam.anapartner.net
cp.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.60)
worker01.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.61)
worker02.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.62)
worker03.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.63)
worker04.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.64)
sm.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.65)
kibana CNAME cp.aks.iam.anapartner.net 
grafana CNAME cp.aks.iam.anapartner.net 
jaeger CNAME cp.aks.iam.anapartner.net 
alertmanager CNAME cp.aks.iam.anapartner.net 
ssp CNAME cp.aks.iam.anapartner.net 
ssp-mgmt CNAME cp.aks.iam.anapartner.net 

### Pre-Step:  Enable DNS resolution for external IP addresses
### Enable forwarding to external h/w router and 8.8.8.8

Step 15: Recommendation. Deploy your solution in steps using Kubernetes yaml or Helm charts to assist with debugging any deployment issues. Do not forget to use kubectl logs, and kubectl describe to isolate startup or cert issues.

### Run scripts one-by-one.  They will have a watch command in each that will 
### provide feedback on the startup processes.
### Total startup from scratch to final with VIP Sample App is about 15-20 minutes.
### Note:  Step04 has a different chart variables for on-prem for Symantec Directory.
### Note:  /step00_kubernetes_env.sh is called by each script.


./step01_kubernetes_cluster_init_with_worker_nodes.sh
./step02_kubernetes_cluster_with_ingress_and_other_charts.sh
./step03_kubernetes_cluster_with_vip_auth_hub_charts.sh
./step04_kubernetes_cluster_with_vip_auth_hub_sample_app.sh

Docker Registry for On-Prem

There are two (2) types of docker registries we have found useful.

a. The standard Mirror method will capture all docker images from “docker.io” site to a local mirror. When Kubernetes or Helm deployments are used, the docker configuration file can be adjusted to check the local mirror without updating Kubernetes yaml files or Helm charts.

b. The second method is a full query of all images after they have been deployed once, and using the docker push process into a local registry. The challenge of the second method is that the Kubernetes yaml files and/or Helm charts do have to be updated to use this local registry.

Either method will help lower bandwidth cost to re-download the same docker images, if you use a docker prune method to keep your worker nodes disc size “clean”. If the docker prune process is not used, you may notice that the worker nodes may run out of disc space due to temporary docker images/containers that did not clean up properly.

#!/bin/bash
#################################################################################
#  Create a local docker mirror registry for docker-ios
#  and local docker non-mirror registry for all other images
#  to minimize download impact
#  during restart of the kubernetes solution
#
#  All registry iamges will be placed on NFS share
#  mount -v -t nfs 192.168.2.30:/volume1/nfs /mnt  &>/dev/null
#
# Certs will be provided by Let's Encrypt every 90 days
#
#  For docker-io mirror registry, all clients must have the following line in
#  /etc/docker/daemon.json     {Note:  Use commas as needed}
#
#    "registry-mirrors":
#     [
#      "https://sm.aks.iam.anapartner.net:444"
#     ],
#
#
#
# ANA 11/2021
#
#################################################################################
# To remove all containers - to allow restart of process
docker rm -f `docker ps -a | grep -v -e CONTAINER | awk '{print $1}'` ; docker image rm `docker image ls | grep -v -e REPOSITORY | grep -e minutes -e hour -e days -e '2 weeks'|  awk '{print $3}'` &>/dev/null


#################################################################################
# Update HOST name for local server for docker image
HOST=sm.aks.iam.anapartner.net
NFS_SERVER=192.168.2.30
NFS_SHARE=/volume1/nfs


#################################################################################
function start_registry {

    local_port=$1
    remote_registry_name=$2

    if [ "$3" == "" ]; then
        remote_registry_url=$remote_registry_name
    else
        remote_registry_url=$3
    fi

    echo -e "$local_port $remote_registry_name $remote_registry_url"


mount -v -t nfs $NFS_SERVER:$NFS_SHARE /mnt  &>/dev/null
mkdir -p /mnt/registry/${remote_registry_name}  &>/dev/null

docker run -d --name registry-${remote_registry_name}-mirror  \
-p $local_port:443 \
--restart=always \
-e REGISTRY_HTTP_ADDR=0.0.0.0:443 \
-e REGISTRY_PROXY_REMOTEURL="https://${remote_registry_url}/" \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/fullchain.pem \
-e REGISTRY_HTTP_TLS_KEY=/certs/privkey.pem \
-e REGISTRY_COMPATIBILITY_SCHEMA1_ENABLED=true \
-v /mnt/registry/certs:/certs \
-v /mnt/registry/${remote_registry_name}:/var/lib/registry \
registry:latest

sleep 1
echo "#################################################################################"
curl -s -X GET  https://$HOST:$local_port/v2/_catalog | jq
echo "#################################################################################"

}

#################################################################################
# start_registry <local_port>    <remote_registry_name>  <remote_registry_url>
#################################################################################

start_registry   444             docker-io               registry-1.docker.io

#################################################################################
# Non-Proxy configuration to allow 'docker tag & docker push' for all other images
#################################################################################

remote_registry_name=all
local_port=455
mkdir -p /var/lib/docker/registry/${remote_registry_name}  &>/dev/null
docker run -d --name registry-${remote_registry_name}-mirror  \
-p $local_port:443 \
--restart=always \
-e REGISTRY_HTTP_ADDR=0.0.0.0:443 \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/fullchain.pem \
-e REGISTRY_HTTP_TLS_KEY=/certs/privkey.pem \
-e REGISTRY_COMPATIBILITY_SCHEMA1_ENABLED=true \
-v /mnt/registry/certs:/certs \
-v /mnt/registry/${remote_registry_name}:/var/lib/registry \
registry:latest

sleep 1
echo "#################################################################################"
curl -s -X GET  https://$HOST:$local_port/v2/_catalog | jq
echo "#################################################################################"
docker ps -a
echo "#################################################################################"

echo "##### To tail the log of the docker-io container - useful for monitoring helm deployments  #####"
echo "docker logs `docker ps -a  --no-trunc | grep -v NAMES | grep 'docker-io' | awk '{print $1}'` -f "
echo "#################################################################################"
echo "##### To tail the log of the ALL container - useful for monitoring helm deployments  #####"
echo "docker logs `docker ps -a  --no-trunc | grep -v NAMES | grep 'all' | awk '{print $1}'` -f  "
echo "#################################################################################"
echo "##### Location of Registry Files on NFS share #####"
echo "ls -lart /mnt/registry/docker-io/docker/registry/v2/repositories"
echo "ls -lart /mnt/registry/all/docker/registry/v2/repositories"
echo "#################################################################################"

Example of the /etc/docker/daemon.json configuration file to use a local mirror for docker.io. See the parameter of “registry-mirrors”. Unfortunately, we were unable to use this process for the other docker registries.

{
"debug": false,
"data-root": "/home/docker-images",
"exec-opts": ["native.cgroupdriver=systemd"],
"storage-driver": "overlay2",
"registry-mirrors":
[
"https://sm.aks.iam.anapartner.net:444"
],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
}
}

Let’s Encrypt Certbot and DNS validation

Use Let’sEncrypt Certbox and manual DNS validation, to create our 90-day wild card certificates. Manual DNS validation allows us to avoid setting up a public-facing component for our internal labs.

Ref: https://letsencrypt.org/docs/challenge-types/

# Step 1:  Install SNAP service for Certbot usage on your host OS

cat /etc/redhat-release
Red Hat Enterprise Linux release 8.3 (Ootpa)

sudo yum install -y  snapd
Updating Subscription Management repositories.
Package snapd-2.49-2.el8.x86_64 is already installed.

systemctl enable --now snapd.socket

### Wait 1 min

snap install core; sudo snap refresh core



# Step 2: Remove prior certbot (if installed by yum/dnf)

yum remove -y certbot.


# Step 3:  Install new "classic" Certbot

sudo snap install --classic certbot
certbot 1.17.0 from Certbot Project (certbot-eff✓) installed

sudo ln -s /snap/bin/certbot /usr/bin/certbot



# Step 4: Issue certbot command with wildcard cert & update your DNS TXT record with the string provided.


sudo certbot certonly --manual  --preferred-challenges dns -d *.aks.iam.anapartner.org --register-unsafely-without-email

Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Please read the Terms of Service at
https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf. You must
agree in order to register with the ACME server. Do you agree?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: Y
Account registered.
Requesting a certificate for *.aks.iam.anapartner.org

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Please deploy a DNS TXT record under the name:

_acme-challenge.iam.anapartner.org.

with the following value:

u2cXXXXXXXXXXXXXXXXXXXXc

Before continuing, verify the TXT record has been deployed. Depending on the DNS
provider, this may take some time, from a few seconds to multiple minutes. You can
check if it has finished deploying with aid of online tools, such as the Google
Admin Toolbox: https://toolbox.googleapps.com/apps/dig/#TXT/_acme-challenge.iam.anapartner.org.
Look for one or more bolded line(s) below the line ';ANSWER'. It should show the
value(s) you've just added.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# Step 5:  In a 2nd terminal, validate that the DNS record has been updated and can be seen by a standard DNS query.   Have the 2nd console window open to test the DNS record, prior to <ENTER> key on verification request

# Example:
nslookup -type=txt _acme-challenge.aks.iam.anapartner.org
Non-authoritative answer:
_acme-challenge.aks.iam.anapartner.org  text = "u2cXXXXXXXXXXXXXXXXXXXXc"


# Step 6:  Press <ENTER> after you have validated the TXT record.

Press Enter to Continue
Waiting for verification...
Cleaning up challenges
Subscribe to the EFF mailing list (email: nala@baugher.us).

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /etc/letsencrypt/live/aks.iam.anapartner.org/fullchain.pem
   Your key file has been saved at:
   /etc/letsencrypt/live/aks.iam.anapartner.org/privkey.pem
  


# Step 7: View certs of fullchain.pem & privkey.pem  

cat /etc/letsencrypt/live/aks.iam.anapartner.org/fullchain.pem
-----BEGIN CERTIFICATE-----

<REMOVED>
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
<REMOVED>
-----END CERTIFICATE-----

cat /etc/letsencrypt/live/aks.iam.anapartner.org/privkey.pem
-----BEGIN PRIVATE KEY-----

<REMOVED>
-----END PRIVATE KEY-----




# Step 8:  Use the two files for your kubernetes solution 

# Step 9:  Ensure domain on host OS, cp, worker nodes in /etc/resolv.conf is set correctly to aks.iam.anapartner.org    to allow the certs to be resolved correctly.

# Step 10:  Ensure Synology NAS DNS service is configurated with all alias 


# Step 11:  Optional: Validate certs with openssl


# Show the kubernetes self-signed cert

true | openssl s_client -connect kibana.aks.iam.anapartner.org:443 2>/dev/null | openssl x509 -inform pem -noout -text

# Show the new wildcard cert for same hostname &  port

curl -vvI  https://kibana.aks.iam.anapartner.org/app/home#/

curl -vvI  https://kibana.aks.iam.anapartner.org/app/home#/   2>&1 | awk 'BEGIN { cert=0 } /^\* SSL connection/ { cert=1 } /^\*/ { if (cert) print }'

nmap -p 443 --script ssl-cert kibana.aks.iam.anapartner.org


Kubernetes Side Note:   Let's Encrypt certs do NOT show up within the Kubernetes cluster certs check process.

kubeadm certs check-expiration

View of the DNS TXT records to be updated with your DNS service provider. The Let’sEncrypt Certbot will need to be able to query these records for it to assign you wildcard certificates. Create the _acme-challenge hostname entry as a TXT type, and paste in the string provided by the Let’sEncrypt Certbot process. Wait 5 minutes or test the TXT record with nslookup, then upon positive validation, continue the Let’sEncrypt Certbot process.

View your kubernetes cluster / nodes for any constraints

After your cluster is created and you have worker nodes joined to the cluster, you may wish to monitor for any constraints of your on-prem deployment. Kubectl command with the action verb of describe or top is very useful for this goal.

kubectl describe nodes worker01
kubectl top node / kubectl top pod

Kubernetes Training (Formal)

If you are new to Kubernetes, we recommend the following class. You may need to dedicate 4-8 weeks to complete the course and then take the CKA exam via the Linux Foundation.

https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/ .

Kubernetes.io site has most of the information you need to get started.

https://kubernetes.io/docs/reference/kubectl/cheatsheet/

Parallel provisioning for Active Directory and MS Exchange mailboxes – Improve Birthright/DayOne Access

One of the challenges that IAM/IAG solutions may have is using single thread processing for select endpoints. For the CA/Symantec Identity Management solution, before IM r14.3cp2, we lived with a single-threaded connector to managed MS Active Directory endpoints.

To address this challenge, we deployed multiple connector servers. We allowed the IM Provisioning Server (IMPS) to use a built-in round-robin approach of load-balancing separate transactions to different connector servers, which would service the same Active Directory endpoints.

The IME may be running as fast as it can with its clustered deployment, but as soon as a task has MS Active Directory, and there is a bottleneck with the CCS Service. We begin to see the IME JMS queue reporting that it is stuck and the IME View Submitted Task reporting “In Progress” for all tasks. If the CCS service is restarted, all IME tasks are then reported as “Failed.”

This is/was the bottleneck for the solution for sites that have MS Active Directory for Birthright/DayOne Access.

We can now avoid this bottleneck. [*** (5/24/2021) – There is an enhancement to CP2 to address im_ccs.exe crashes during peak loads discovered using this testing process. ]

Via the newly delivered enhancement https://community.broadcom.com/participate/ideation-home/viewidea?IdeationKey=7154e15b-085d-469e-bff0-ac588ff6bd5b .

We now have full parallel provisioning to MS Active Directory from a single connector server (JCS/CCS).

The new attribute that regulates this behavior is eTADSMaxConnectionsInPool. This attribute will be applied on every existing ADS endpoint that is currently being managed by the IM Provisioning Server after CP2 is deployed. Note: The default value is 10, but we recommend after much testing, to match the value of the IMPS-> JCS and JCS->CCS to equal 200.

During testing within the IME using Bulk Tasks or the IM BLC, we can see that the CCS-> ADS traffic will reach 20-30 connections if allowed. You may set this attribute to a value of 200 via Jxplorer and/or an ldapmodify/dxmodify script.

echo "############### SET ADS MAX CONNECTIONS IN POOL SIZE ##################"
IMPS_HOST=192.168.242.135
IMPS_PORT=20389
IMPS_USER='eTGlobalUserName=etaadmin,eTGlobalUserContainerName=Global Users,eTNamespaceName=CommonObjects,dc=im,dc=eta'
IMPS_PWD="Password01"
NAMESPACE=exchange2016
LDAPTLS_REQCERT=never dxmodify -H ldap://$IMPS_HOST:$IMPS_PORT -c -x -D "$IMPS_USER" -w "$IMPS_PWD" << EOF
dn: eTADSDirectoryName=$NAMESPACE,eTNamespaceName=ActiveDirectory,dc=im,dc=eta
changetype: modify
eTADSMaxConnectionsInPool: 200
EOF
LDAPTLS_REQCERT=never dxsearch -LLL -H ldap://$IMPS_HOST:$IMPS_PORT -x -D "$IMPS_USER" -w "$IMPS_PWD" -b "eTADSDirectoryName=$NAMESPACE,eTNamespaceName=ActiveDirectory,dc=im,dc=eta" -s base eTADSMaxConnectionsInPool | perl -p00e 's/\r?\n //g'

To confirm the number of open connections is greater than one (1), we can issue a Bulk IM Task or use a performance tool like CA Directory dxsoak.

In this example, we will show case using CA Directory dxsoak to execute 100 parallel threads to create 100 ADS Accounts with MS Exchange Mailboxes. We will also enclose this script for download for others to review and use.

Performance Lab:

Pre-Steps:

  1. Leverage CA Directory samples’ dxsoak binary (performance testing). You may wish to use CA Directory on an existing IM Provisioning Server (Linux OS) or you may deploy CA Directory (MS Windows version) to the JCS/CCS connector. Examples are provided for both OSes.
  2. Create LDIF files for IM Provisioning Server and/or IM Connector Tier. This file is needed to ‘push’ the solution to-failure. The use of the IME Bulk Task and/or etautil scripts to the IM Provisioning Tier, will not provide the transaction speed we need to break the CCS service if possible.
  3. Within the IM Provisioning Manager enable the ADS Endpoint TXT Logs on the Logging TAB, for all checkboxes.
  4. Monitor the IMPS etatrans* logs, monitor the JCS ADS logs, monitor the CCS ADS logs, monitor the number of CCS-> ADS (LDAP/S – TCP 389/636) threads. [Suggest using MS Sysinternals Process Explorer and select im_ccs.exe & then TCP/IP TAB]
  5. Monitor the MS ADS Domain via MS ADUC (AD Users & Computers UI) and MS Exchange Mailbox (Mailbox UI via Browser)

Execution:

6. Perform a UNIT TEST with dxmodify/ldapmodify to confirm the LDIF file input is correct with the correct suffix.

time dxmodify -H ldap://192.168.242.135:20389 -c -x -D "eTGlobalUserName=etaadmin,eTGlobalUserContainerName=Global Users,eTNamespaceName=CommonObjects,dc=im,dc=eta" -w Password01 -f ads_user_with_exchange_dc_eta.ldif

7. Perform the PERFORMANCE TEST with dxsoak binary with the same LDIF file & correct suffix. Rate observed = 23 K ids/hr

./dxsoak -c -l 60 -t 100 -h 192.168.242.135:20389 -D "eTGlobalUserName=etaadmin,eTGlobalUserContainerName=Global Users,eTNamespaceName=CommonObjects,dc=im,dc=eta" -w Password01 -f ads_user_with_exchange_dc_eta.ldif

Observations:

8. IMPS etatrans*.log – Count the number of operations per second. Note any RACE and/or data collisions, e.g. ADS accounts deleted prior to add via 100 threads or ADS account created multiple times attempted in different threads.

9. IM CCS ADS <endpoint>.log – Will only have useful data if the ADS Endpoint Logging TAB has been checked for TXT logs.

10. Finally, validate directly in MS Active Domain with the ADUC or similar tool & MS Exchange mailboxes being created/deleted.

11. Count the number of threads from im_ccs.exe to ADS – Suggest using MS Sysinternals Process Explorer tool and/or Powershell to count the number of connections.

MS Powershell Script to count the number of LDAP (TCP 389) connection from im_ccs.exe. [Note: TCP 389 is used more if the ADS Endpoint is setup to use SASL authentication. TCP 636 is used more if the ADS Endpoint is using the older TLS authentication]

$i=1
Do {
cls
(Get-NetTCPConnection -State Established -OwningProcess (Get-Process -name im_ccs).id -RemotePort 389).count
Start-Sleep -s 1
$i++
}
while ($i -le 5)

Direct Performance Testing to JCS/CCS Service

While this testing has limited value, it can offer satisfaction and assistance to troubleshoot any challenges. We can use the prior LDIF files with a slightly different suffix, dc=etasa (instead of dc=eta), to use dxsoak to push the connector tier to failure. This step helped provide memory dumps back to CA/Symantec Engineering teams to help isolate challenges within the parallel processing. CCS Service is only exposed via localhost. If you wish to test the CCS Service remotely, then update the MS Registry key for the CCS service to use the external IP address of the JCS/CCS Server. Rate observed = 25 K ids/hr

Script to generate 100 ADS Accounts with MS Exchange Mailbox Creation

You may wish to review this script and adjust it for your ADS / MS Exchange domains for testing. You can also create a simple LDIF file with password resets or ADS group membership adds. Just remember that the IMPS Service (TCP 20389/20390) uses the suffix dc=eta, and the IM JCS/CCS Services (TCP 20410/20411) & (TCP 20402/20403) use the suffix dc=etasa. Additionally, if using CA Directory dxsoak, only use the non-TLS ports, as this binary is not equipped for using TLS certs.

#!/bin/bash
#######################################################################################################################
# Name:  Generate ADS Feed Files for IM Solution Provisioning/Connector Tiers
#
# Goal:  Validate the new parallel processes from the IM Connector Tier to Active Directory with MS Exchange
#
#
# Generate ADS User LDIF file(s) for use with unit (dxmodify) and performance testing (dxsoak) to:
#  - {Note: dxsoak will only work with non-TLS ports}
#
# IM JCS (20410)  "dc=etasa"    {Ensure MS Windows Firewall allows this port to be exposed}
# IM CCS (20402)  "dc=etasa"    {This port is localhost only, may open to network traffic via registry update}
# IMPS (20389)    "dc=eta"
#
#
# Monitor:  
#
# The IMPS etatrans*.log  {exclude searches}
# The JCS daily log
# The JCS ADS log {Enable the ADS Endpoint TXT logging for all checkboxes}
# The CCS ADS log {Enable the ADS Endpoint TXT logging for all checkboxes}
#
# Execute per the examples provided during run of this file
#
#
# ANA 05/2021
#######################################################################################################################

# Unique Variables for an ADS Domain
NAMESPACE=exchange2016
ADSDOMAIN=exchange.lab
DCDOMAIN="DC=exchange,DC=lab"
OU=People

#######################################################################################################################


MAX=100
start=00001
counter=$start
echo "###############################################################"
echo "###############################################################"
START=`/bin/date --utc +%Y%m%d%H%M%S,%3N.0Z`
echo `/bin/date --utc +%Y%m%d%H%M%S,%3N.0Z`" = Current OS UTC time stamp"
echo "###############################################################"
FILE1=ads_user_with_exchange_dc_etasa.ldif
FILE2=ads_user_with_exchange_dc_eta.ldif
echo "" > $FILE1
while [ $counter -le $MAX ]
do
    n=$((10000+counter)); n=${n#1}
    tz=`/bin/date --utc +%Y%m%d%H%M%S,3%N.0Z`
   echo "Counter with leading zeros = $n   at time:  $tz"


cat << EOF >> $FILE1
dn:  eTADSAccountName=firstname$n aaalastname$n,eTADSOrgUnitName=$OU,eTADSDirectoryName=$NAMESPACE,eTNamespaceName=ActiveDirectory,dc=im,dc=etasa
changetype: add
objectClass:  eTADSAccount
eTADSobjectClass:  user
eTADSAccountName:  firstname$n aaalastname$n
eTADSgivenName:  firstname$n
eTADSsn:  aaalastname$n
eTADSdisplayName:  firstname$n aaalastname$n
eTADSuserPrincipalName:  aaatestuser$n@$ADSDOMAIN
eTADSsAMAccountName:  aaatestuser$n
eTPassword:  Password01
eTADSpwdLastSet:  -1
eTSuspended:  0
eTADSuserAccountControl:  0000000512
eTADSDescription:  description $tz
eTADSphysicalDeliveryOfficeName:  office
eTADStelephoneNumber:  111-222-3333
eTADSmail:  aaatestuser$n@$ADSDOMAIN
eTADSwwwHomePage:  web.page.lab
eTADSotherTelephone:  111-222-3333
eTADSurl:  other.web.page.lab
eTADSstreetAddress:  street address line01
eTADSpostOfficeBox:  pobox 111
eTADSl:  city
eTADSst:  state
eTADSpostalCode:  11111
eTADSco:  UNITED STATES
eTADSc:  US
eTADScountryCode:  840
eTADSscriptPath:  loginscript.cmd
eTADSprofilePath:  \profile\path\here
eTADShomePhone:  111-222-3333
eTADSpager:  111-222-3333
eTADSmobile:  111-222-3333
eTADSfacsimileTelephoneNumber:  111-222-3333
eTADSipPhone:  111-222-3333
eTADSinfo:  Notes Here
eTADSotherHomePhone:  111-222-3333
eTADSotherPager:  111-222-3333
eTADSotherMobile:  111-222-3333
eTADSotherFacsimileTelephoneNumber:  111-222-3333
eTADSotherIpPhone:  111-222-3333
eTADStitle:  title
eTADSdepartment:  department
eTADScompany:  company
eTADSmanager:  CN=manager_fn manager_ln,OU=$OU,$DCDOMAIN
eTADSmemberOf:  CN=Backup Operators,CN=Builtin,$DCDOMAIN
eTADSlyncSIPAddressOption: 0000000000
eTADSdisplayNamePrintable: aaatestuser$n
eTADSmailNickname: aaatestuser$n
eTADShomeMDB: (Automatic Mailbox Distribution)
eTADShomeMTA: CN=DC001,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=First Organization,CN=Microsoft Exchange,CN=Services,CN=Configuration,$DCDOMAIN
eTAccountStatus: A
eTADSmsExchRecipientTypeDetails: 0000000001
eTADSmDBUseDefaults: TRUE
eTADSinitials: A
eTADSaccountExpires: 9223372036854775807

EOF
 counter=$(( $counter + 00001 ))
done


#  Create the delete ADS Process
start=00001
counter=$start
while [ $counter -le $MAX ]
do
    n=$((10000+counter)); n=${n#1}
    tz=`/bin/date --utc +%Y%m%d%H%M%S,3%N.0Z`
   echo "Counter with leading zeros = $n   at time:  $tz"


cat << EOF >> $FILE1
dn:  eTADSAccountName=firstname$n aaalastname$n,eTADSOrgUnitName=$OU,eTADSDirectoryName=$NAMESPACE,eTNamespaceName=ActiveDirectory,dc=im,dc=etasa
changetype: delete

EOF
 counter=$(( $counter + 00001 ))
done

echo ""
echo "################################### ADS USER OBJECT STATS ################################################################"
echo "Number of add objects: `grep "changetype: add" $FILE1 | wc -l`"
echo "Number of delete objects: `grep "changetype: delete" $FILE1 | wc -l`"
rm -rf $FILE2
cp -r -p $FILE1 $FILE2
sed -i 's|,dc=im,dc=etasa|,dc=im,dc=eta|g' $FILE2
ls -lart $FILE1
ls -lart $FILE2

echo ""
echo "################################### SET ADS MAX CONNECTIONS IN POOL SIZE ################################################################"
IMPS_HOST=192.168.242.135
IMPS_PORT=20389
IMPS_USER='eTGlobalUserName=etaadmin,eTGlobalUserContainerName=Global Users,eTNamespaceName=CommonObjects,dc=im,dc=eta'
IMPS_PWD="Password01"
LDAPTLS_REQCERT=never dxmodify  -H ldap://$IMPS_HOST:$IMPS_PORT -c -x -D "$IMPS_USER" -w "$IMPS_PWD"  << EOF
dn: eTADSDirectoryName=$NAMESPACE,eTNamespaceName=ActiveDirectory,dc=im,dc=eta
changetype: modify
eTADSMaxConnectionsInPool: 200
EOF
LDAPTLS_REQCERT=never dxsearch -LLL  -H ldap://$IMPS_HOST:$IMPS_PORT -x -D "$IMPS_USER" -w "$IMPS_PWD" -b "eTADSDirectoryName=$NAMESPACE,eTNamespaceName=ActiveDirectory,dc=im,dc=eta" -s base eTADSMaxConnectionsInPool | perl -p00e 's/\r?\n //g'

echo ""
echo "################################### CCS UNIT & PERF TEST ################################################################"
CCS_HOST=192.168.242.80
CCS_PORT=20402
CCS_USER="cn=root,dc=etasa"
CCS_PWD="Password01"
echo "Execute this command to the CCS Service to test single thread with dxmodify or ldapmodify"
echo "dxmodify  -H ldap://$CCS_HOST:$CCS_PORT -c -x -D $CCS_USER -w $CCS_PWD -f $FILE1 "
echo "Execute this command to the CCS Service to test 100 threads with dxsoak "
echo "./dxsoak -c -l 60 -t 100 -h $CCS_HOST:$CCS_PORT -D $CCS_USER -w $CCS_PWD -f $FILE1 "

echo ""
echo "################################### JCS UNIT & PERF TEST ################################################################"
CCS_HOST=192.168.242.80
CCS_PORT=20410
CCS_USER="cn=root,dc=etasa"
CCS_PWD="Password01"
echo "Execute this command to the JCS Service to test single thread with dxmodify or ldapmodify "
echo "dxmodify  -H ldap://$CCS_HOST:$CCS_PORT -c -x -D $CCS_USER -w $CCS_PWD -f $FILE1 "
echo "Execute this command to the JCS Service to test 100 threads with dxsoak "
echo "./dxsoak -c -l 60 -t 100 -h $CCS_HOST:$CCS_PORT -D $CCS_USER -w $CCS_PWD -f $FILE1 "


echo ""
echo "################################### IMPS UNIT & PERF TEST ################################################################"
IMPS_HOST=192.168.242.135
IMPS_PORT=20389
IMPS_USER='eTGlobalUserName=etaadmin,eTGlobalUserContainerName=Global Users,eTNamespaceName=CommonObjects,dc=im,dc=eta'
IMPS_PWD="Password01"
echo "Execute this command to the IMPS Service to test single thread with dxmodify or ldapmodify "
echo "dxmodify  -H ldap://$IMPS_HOST:$IMPS_PORT -c -x -D \"$IMPS_USER\" -w $IMPS_PWD -f $FILE2 "
echo "Execute this command to the IMPS Service to test 100 threads with dxsoak "
echo "./dxsoak -c -l 60 -t 100 -h $IMPS_HOST:$IMPS_PORT -D \"$IMPS_USER\" -w $IMPS_PWD -f $FILE2 "




Address the new bottleneck of MS Exchange / O365 Provisioning.

After parallel provisioning has been introduced with the new im_ccs.exe service, you may noticed that the number of transactions is still being throttled during performance testing.

Out-of-the-box MS Active Directory Global Throttling Policy has the parameter of PowerShellMaxConcurrency set to a default of 18 connection. Any provisioning that uses MS Powershell for MS Exchange and/or MS O365 will be impacted by this default parameter.

To address this bottleneck, we can create a new Throttling Policy and only assign the service ID that will be managing identities, to avoid a global change.

Example: New-ThrottlingPolicy MaxPowershell -PowerShellMaxConcurrency 100 & Set-Mailbox “User Name” -ThrottlingPolicy MaxPowershell

After this change has been made, restart the IM JCS/CCS Services, and retest again with your performance tools. Review the CCS ADS log for # of creations in 60 seconds, and you will be pleasantly surprise at the rate. The logs are the strong confirmation we are looking for.

Performance test (947 ADS accounts w/Exchange mailboxes in 60 seconds, 08:59:54 to  09:00:53) => Rate of 15 ids/second   (or 54 K ids/hr) with updated MaxPowershell = 100 thottlingpolicy.

The last bottleneck appears to be CPU availability to MS Exchange Supporting Services, w3wp.exe, the MS IIS Service. Which appears to be managing MS Powershell connections per its startup string of

" c:\windows\system32\inetsrv\w3wp.exe -ap "MSExchangePowerShellAppPool" -v "v4.0" -c "C:\Program Files\Microsoft\Exchange Server\V15\bin\GenericAppPoolConfigWithGCServerEnabledFalse.config" -a \.\pipe\iisipme304c50e-6b42-4b26-83a4-229ee037be5d -h "C:\inetpub\temp\apppools\MSExchangePowerShellAppPool\MSExchangePowerShellAppPool.config" -w "" -m 0"

WAN Latency: Rsync versus SCP

We were curious about what methods we can use to manage large files that must be copied between sites with WAN-type latency and also restrict ourselves to processes available on the CA Identity Suite virtual appliance / Symantec IGA solution.

Leveraging VMware Workstation’s ability to introduce network latency between images, allows for a validation of a global password reset solution.

If we experience deployment challenges with native copy operations, we need to ensure we have alternatives to address any out-of-sync data.

The embedded CA Directory maintains the data tier in separate binary files, using a software router to join the data tier into a virtual directory. This allows for scalability and growth to accommodate the largest of sites.

We focused on the provisioning directory (IMPD) as our likely candidate for re-syncing.

Test Conditions:

  1. To ensure the data was being securely copied, we kept the requirement for SSH sessions between two (2) different nodes of a cluster.
  2. We introduce latency with VMware Workstation NIC for one of the nodes.

3. The four (4) IMPD Data DSAs were resized to 2500 MB each (a similar size we have seen in production at many sites).

4. We removed data and the folder structure from the receiving node to avoid any checksum restart processes from gaining an unfair advantage.

5. If the process allowed for exclusions, we did take advantage of this feature.

6. The feature/process/commands must be available on the vApp to the ‘config’ or ‘dsa’ userIDs.

7. The reference host/node that is being pulled, has the CA Directory Data DSAs offline (dxserver stop all) to prevent ongoing changes to the files during the copy operation.

Observations:

SCP without Compression: Unable to exclude other files (*.tx,*.dp, UserStore) – This process took over 12 minutes to copy 10,250 MB of data

SCP with Compression: Unable to exclude other files (*.tx,*.dp, UserStore) – This process still took over 12 minutes to copy 10,250 MB of data

Rsync without compression: This process can exclude files/folders and has built-in checksum features (to allow a restart of a file if the connection is broken) and works over SSH as well. If the folder was not deleted prior, then this process would give artificial high-speed results. This process was able to exclude the UserStore DSA files and the transaction files (*.dp & *.tx) that are not required to be copied for use on a remote server. Only 10,000 MB (4 x 2500 MB) was copied instead of an extra 250 MB.

Rsync with compression: This process can exclude files/folders and has built-in checksum features (to allow a restart of a file if the connection is broken) and works over SSH as well. This process was the winner, and; extremely amazing performance over the other processes.

Total Time: 1 min 10 seconds for 10,000 MB of data over a WAN latency of 70 ms (140 ms R/T)

Now that we have found our winner, we need to do a few post steps to use the copied files. CA Directory, to maintain uniqueness between peer members of the multi-write (MW) group, have a unique name for the data folder and the data file. On the CA Identity Suite / Symantec IGA Virtual Appliance, pseudo nomenclature is used with two (2) digits.

The next step is to rename the folder and the files. Since the vApp is locked down for installing other tools that may be available for rename operations, we utilized the find and mv command with a regular xpression process to assist with these two (2) steps.

Complete Process Summarized with Validation

The below process was written within the default shell of ‘dsa’ userID ‘csh’. If the shell is changed to ‘bash’; update accordingly.

The below process also utilized a SSH RSA private/public key process that was previously generated for the ‘dsa’ user ID. If you are using the vApp, change the userID to config; and su – dsa to complete the necessary steps. You may need to add a copy operation between dsa & config userIDs.

Summary of using rsync with find/mv to rename copied IMPD *.db files/folders
[dsa@pwdha03 ~/data]$ dxserver status
ca-prov-srv-03-impd-main started
ca-prov-srv-03-impd-notify started
ca-prov-srv-03-impd-co started
ca-prov-srv-03-impd-inc started
ca-prov-srv-03-imps-router started
[dsa@pwdha03 ~/data]$ dxserver stop all > & /dev/null
[dsa@pwdha03 ~/data]$ du -hs
9.4G    .
[dsa@pwdha03 ~/data]$ eval `ssh-agent` && ssh-add
Agent pid 5395
Enter passphrase for /opt/CA/Directory/dxserver/.ssh/id_rsa:
Identity added: /opt/CA/Directory/dxserver/.ssh/id_rsa (/opt/CA/Directory/dxserver/.ssh/id_rsa)
[dsa@pwdha03 ~/data]$ rm -rf *
[dsa@pwdha03 ~/data]$ du -hs
4.0K    .
[dsa@pwdha03 ~/data]$ time rsync --progress -e 'ssh -ax' -avz --exclude "User*" --exclude "*.dp" --exclude "*.tx" dsa@192.168.242.135:./data/ $DXHOME/data
FIPS mode initialized
receiving incremental file list
./
ca-prov-srv-01-impd-co/
ca-prov-srv-01-impd-co/ca-prov-srv-01-impd-co.db
  2500000000 100%  143.33MB/s    0:00:16 (xfer#1, to-check=3/9)
ca-prov-srv-01-impd-inc/
ca-prov-srv-01-impd-inc/ca-prov-srv-01-impd-inc.db
  2500000000 100%  153.50MB/s    0:00:15 (xfer#2, to-check=2/9)
ca-prov-srv-01-impd-main/
ca-prov-srv-01-impd-main/ca-prov-srv-01-impd-main.db
  2500000000 100%  132.17MB/s    0:00:18 (xfer#3, to-check=1/9)
ca-prov-srv-01-impd-notify/
ca-prov-srv-01-impd-notify/ca-prov-srv-01-impd-notify.db
  2500000000 100%  130.91MB/s    0:00:18 (xfer#4, to-check=0/9)

sent 137 bytes  received 9810722 bytes  139161.12 bytes/sec
total size is 10000000000  speedup is 1019.28
27.237u 5.696s 1:09.43 47.4%    0+0k 128+19531264io 2pf+0w
[dsa@pwdha03 ~/data]$ ls
ca-prov-srv-01-impd-co  ca-prov-srv-01-impd-inc  ca-prov-srv-01-impd-main  ca-prov-srv-01-impd-notify
[dsa@pwdha03 ~/data]$ find $DXHOME/data/ -mindepth 1 -type d -exec bash -c 'mv  $0 ${0/01/03}' {} \; > & /dev/null
[dsa@pwdha03 ~/data]$ ls
ca-prov-srv-03-impd-co  ca-prov-srv-03-impd-inc  ca-prov-srv-03-impd-main  ca-prov-srv-03-impd-notify
[dsa@pwdha03 ~/data]$ find $DXHOME/data -depth -name '*.db' -exec bash -c 'mv  $0 ${0/01/03}' {} \; > & /dev/null
[dsa@pwdha03 ~/data]$ dxserver start all
Starting all dxservers
ca-prov-srv-03-impd-main starting
..
ca-prov-srv-03-impd-main started
ca-prov-srv-03-impd-notify starting
..
ca-prov-srv-03-impd-notify started
ca-prov-srv-03-impd-co starting
..
ca-prov-srv-03-impd-co started
ca-prov-srv-03-impd-inc starting
..
ca-prov-srv-03-impd-inc started
ca-prov-srv-03-imps-router starting
..
ca-prov-srv-03-imps-router started
[dsa@pwdha03 ~/data]$ du -hs
9.4G    .
[dsa@pwdha03 ~/data]$


Note: An enhancement has been open to request that the ‘dsa’ userID is able to use remote SSH processes to address any challenges if the Data IMPD DSAs need to be copied or retained for backup processes.

https://community.broadcom.com/participate/ideation-home/viewidea?IdeationKey=7c795c51-d028-4db8-adb1-c9df2dc48bff

Example for vApp Patches:

Note: There is no major different in speed if the files being copied are already compressed. The below image shows that initial copy is at the rate of the network w/ latency. The value gain from using rsync is still the checksum feature that allow auto-restart where it left off.

vApp Patch process refined to a few lines (to three nodes of a cluster deployment)

# PATCHES
# On Local vApp [as config userID]
mkdir -p patches  && cd patches
curl -L -O ftp://ftp.ca.com/pub/CAIdentitySuiteVA/cumulative-patches/14.3.0/CP-VA-140300-0002.tar.gpg
curl -L -O ftp://ftp.ca.com/pub/CAIdentitySuiteVA/cumulative-patches/14.3.0/CP-IMV-140300-0001.tgz.gpg
screen    [will open a new bash shell ]
patch_vapp CP-VA-140300-0002.tar.gpg           [Patch VA prior to any solution patch]
patch_vapp CP-IMV-140300-0001.tgz.gpg
exit          [exit screen]
cd ..
# Push from one host to another via scp
IP=192.168.242.136;scp -r patches  config@$IP:
IP=192.168.242.137;scp -r patches  config@$IP:
# Push from one host to another via rsync over ssh          [Minor gain for compressed files]
IP=192.168.242.136;rsync --progress -e 'ssh -ax' -avz $HOME/patches config@$IP:
IP=192.168.242.137;rsync --progress -e 'ssh -ax' -avz $HOME/patches config@$IP:
# Pull from one host to another via rsync over ssh          [Minor gain for compressed files]
IP=192.168.242.135;rsync --progress -e 'ssh -ax' -avz config@$IP:./patches $HOME

# View the files were patched
IP=192.168.242.136;ssh -tt config@$IP "ls -lart patches"
IP=192.168.242.137;ssh -tt config@$IP "ls -lart patches"

# On Remote vApp Node #2
IP=192.168.242.136;ssh $IP
cd patches
screen    [will open a new bash shell ]
patch_vapp CP-VA-140300-0002.tar.gpg
patch_vapp CP-IMV-140300-0001.tgz.gpg
exit          [exit screen]
exit          [exit to original host]

# On Remote vApp Node #3
IP=192.168.242.137;ssh $IP
cd patches
screen    [will open a new bash shell ]
patch_vapp CP-VA-140300-0002.tar.gpg
patch_vapp CP-IMV-140300-0001.tgz.gpg
exit          [exit screen]
exit          [exit to original host]

View of rotating the SSH RSA key for CONFIG User ID

# CONFIG - On local vApp host
ls -lart .ssh     [view any prior files]
echo y | ssh-keygen -b 4096 -N Password01 -C $USER -f $HOME/.ssh/id_rsa
IP=192.168.242.135;ssh-keyscan -p 22 $IP >> .ssh/known_hosts
IP=192.168.242.136;ssh-keyscan -p 22 $IP >> .ssh/known_hosts
IP=192.168.242.137;ssh-keyscan -p 22 $IP >> .ssh/known_hosts
cp -r -p .ssh/id_rsa.pub .ssh/authorized_keys
rm -rf /tmp/*.$USER.ssh-keys.tar
tar -cvf /tmp/`/bin/date -u +%s`.$USER.ssh-keys.tar .ssh
ls -lart /tmp/*.$USER.ssh-keys.tar
eval `ssh-agent` && ssh-add           [Enter Password for SSH RSA Private Key]
IP=192.168.242.136;scp `ls /tmp/*.$USER.ssh-keys.tar`  config@$IP:
IP=192.168.242.137;scp `ls /tmp/*.$USER.ssh-keys.tar`  config@$IP:
USER=config;ssh -tt $USER@192.168.242.136 "tar -xvf *.$USER.ssh-keys.tar"
USER=config;ssh -tt $USER@192.168.242.137 "tar -xvf *.$USER.ssh-keys.tar"
IP=192.168.242.136;ssh $IP `/bin/date -u +%s`
IP=192.168.242.137;ssh $IP `/bin/date -u +%s`
IP=192.168.242.136;ssh -vv $IP              [Use -vv to troubleshoot ssh process]
IP=192.168.242.137;ssh -vv $IP 				[Use -vv to troubleshoot ssh process]

Avoid Data Quality Issues during Testing (TDM)

Why do we see data quality challenge in lower environments (Test, Dev, QA) that we do not see in Production Environments?

If the project team was asked to set up lower environments for any new solution, it might be that the TDM (test-data-management) methodology is not a formal corporate process.

TDM may be simply described as capturing non-PII (sensitive) production data and coping a full or limited set of the data to the non-production environments. This non-PII data may be 1:1 or masked during this process.

A TDM (test-data-management) process for a new environment may be a challenge if there is no current production environment or that the current production environment is from a prior solution or M&A (merge/acquisitions).

While there are formal paid tools/solutions for TDM, a project team may wish to leverage CLI (command-line) and/or scripts to create this sub-set of non-PII production data for the lower environments.

This process may be as simple as deciding to export the full DIT (directory structure/directory information tree) of an LDAP store with all its current group names, but replace the userID/Full Name/sensitive data with “dummy/masked” data. This exported data would be loaded with the near-Production data, to allow for full use-case and negative use-case testing in the lower environments.

The Goal? Avoid show-stopper or high-level issues due to missed data quality concerns during a Go-Live or Business Release Cycle. This is very important when we have a small maintenance window to add new functionality.

Let us help with the knowledge transfer and building of representatives environments. We see this challenge often for the IAM solutions that manage 1000’s of endpoints, where even the basic Active Directory representation is missing the same DIT structure and group objects as the project AD domains, especially for M&A business projects.

Writing Successful Test Plans

One of the challenges we see is that project team members dislike writing.

Documentation that is very visible business owners/team leads, e.g. business/technical requirements, design, or project management, will not be greatly impacted due to the maturity of the senior resources.

However, one area seems to suffer and does have an impact for project timelines & future go-live estimates. Documentation for test plans may be very simplistic or detailed.

Project suffer timeline challenges when test plans & tests scripts are too simplistic.

The business QA resources assigned to execute the test plan/test scripts can NOT be assumed to have the in-depth background/knowledge of the solution. If the initial conditions and final output are not clearly called out (or how to reset them), then we have seen project timeline is drawn out as they are pushed into a seemingly never-ending cycle of QA testing.

To ensure your project is successful, demand that the test scripts for the test plans are written out as if to be executed by your great-grandparents. This includes which hyperlinks to use, which browser to use, which initial conditions to reset, which tool to reset to initial conditions, which steps to follow, how to record the final answer, where to capture the results, screenshot to be captured where and how.

The above methodology ensures that we do not have a “black box” of a solution, e.g. something-goes-in and we-hope-that-something-good-comes-out.

With the above process, the QA team lead can then scale out their team as needed.

When expected input/output information is captured, automated testing can be introduced with enhanced reporting and validation. This becomes exponentially valuable for IAM solution that manages 100’s of endpoints from legacy [AS/400, HP-NONSTOP NSK, Mainframe (ACF2/TSS/RACF/TSO)] to SaaS Cloud solutions.

So don’t contemplate, spend the time and reap the values. Make your grandparents proud!

Transparency through Automated Testing

One of the challenges that businesses have for projects is an awareness of the true status of tasks.

Project Methodology continues to advance with concepts of Agile Project Management which work well for larger projects. One of the value statements from Agile is the question to project resources when they can complete a task. This question provides a view into the mindset of the resource’s skill set and confidence to meet the task goal. If the resource is a junior resource or has limited skill in the task, then the effort provided to the team will be high. With Agile methodology using this process, it becomes very easy for resources, while they frantically research, to inadvertently drain the project bucket of effort, e.g., a 4-hour task that turns into a week duration.

Another area that has great success with enforcing transparency is automated testing. Automated testing may be used for unit, integration, use-case, performance, and scale testing. However, for project transparency, to lower business risk and project cost overrun, we would state the value of automated testing is from use-case & regression testing.

After technical and business requirements are complete, ensure that a project scheduled or WBS (work-breakdown-structure) has a defined milestone to migrate ALL manual use-case testing to automation. The effort to convert from manual use-case testing to automate testing will be considered by a few to have little value. However, when the final parts of a project are to meet a go-live over a weekend or to add in new business release with adjusted business logic. What would you trust to reach your goals 100%?

Below are two (2) common scenarios:

  1. Solution Upgrade Go-Live over a weekend. You have to be allocated 48 hours to backup of solution data & all platforms, perform a data snapshot, migrate data, integrate with newer solution components (possible new agents), combine with production data, and validate all use-cases for all business logic. And allow time for roll-back if, during triage of issues, the business team determines that show-stopper issues will not be addressed in the period. If you fail, you may be allowed one more attempt on another of your weekends, with all 2-20 people.
  2. Solution Business Release Cycle – Over a weekend or business day. You have the option to deploy new business logic to your solution. You can lower business risk to deploy during a business day but will require additional use-case and regression testing. If you have no automation, you will leverage a QA team of 2-10 people to exercise the use-cases; and sometimes negative use-cases.

Math: Assume your solution has twenty (20) use-cases & sub-use-cases where each use-case may have twenty (20) test scripts. Assume that you have an excellent QA/business/technical resources that have adequate capture the initial conditions (that must be reset every time) for each test script & they are checking for data quality challenges as well. Assume each test script takes about ten (10) minutes to execute, where your QA team resource (not the same skill set) will follow exactly and record success/failure. Perhaps you have trained them to use QA tools to screen capture your failure messages, and assign a technical project team resource to address.

20 use-case x 20 scripts/use-case x 10 min/script = 4000 minutes for one QA resource. Well, we have 1440 minutes in a day, so 4000/1440 = 2.78 days or 66.7 hours. Assume we add ten (10) QA business resources, while we have lower the QA cycle from 66.7 hours to 6.7 hours; we will be required to “freeze” any additional updates during this QA cycle; and likely impact our maintenance window for remediation of “found” issues for either scenario above.

Be aware of the “smoke” testing follies. This type of testing still leaves issues “burning.”

Enforce transparency for project owners, project managers, and team members.

Ensure that the effort to build the automated testing is kept for future regression when the new business logic phase is implemented. Prove to yourselves that prior business logic will NOT be impacted.

Many tools can be leveraged for automation, e.g., Open Source Jmeter (used by many customers), Selenium, or paid tools (Broadcom/CA Technologies Blazemeter), SOAPUI

Let us help.

We firmly believe, encourage, and perform knowledge transfer to our customers to help them succeed, and ensure that the introduction of automated testing lowers TCO of any solution. We can train your staff very quickly to leverage Jmeter from their desktop/servers to automate any written testing plans for solutions. These JMeter process can then be shared with all project team members.