Kubernetes and Vmware Workstation

Kubernetes was designed for the deployment of applications to cloud architecture with containers. Another way of thinking about Kubernetes; it gets us “out-of-the-install-binaries” business and focuses our efforts on the business value of a solution. We have documented our process of how we train our resources and partners. This process will help your team to excel and gain confidence with cloud technologies.

One of the business challenges of Kubernetes in the cloud architecture is the ongoing cost ($300-$600/month per resource) during the learning or development process. To lower this ongoing cost per resource, we focused on a method to use on-prem Kubernetes deployments.


We have found examples online of using minikube and Oracle Virtualbox to assist with keeping costs low while using an on-prem deployment but did not find many examples of using Vmware Workstation to our satisfaction. Our goal was to utilize a solution that we are very familiar with and has the supporting capabilities for rollback via snapshots.

We have used Vmware Workstation for many years while working on service projects. We cannot overstate its usefulness to offer a “play-ground” and development environment independent of a client’s environment. The features of snapshots allow for negative use-case testing or “what-if” scenarios to destroy or impact solutions being tested with minimal impact.

In this entry, we will discuss the use of Vmware Workstation and CentOS (or Ubuntu) as the primary Kubernetes Nodes. Both CentOS and/or Ubuntu OS are used by the cloud providers as their Kubernetes nodes, so this on-prem process will translate well.

Some of our team members run the Kubernetes environment from their laptop, a collection of individual servers, or a larger server that may scale to the number of vCPU/RAM required for the Kubernetes solution.

Decision 1: Choose an OS to be used.

Either CentOS or Ubuntu OS is acceptable to use for on-prem. When we checked the OSes used by the cloud providers, we noted they used one of these two (2) OS for Linux OS. We decided on CentOS 7, as iptables for routing are used within Kubernetes; and iptables are used by default in CentOS 7. You may find that other OSes will work fine as well.

Decision 2: Build a reference image

Identify all expected binaries to be used within this image. This reference image will be cloned for the Kubernetes control plane node (1) and the worker nodes (3-4). We will also use this image to build a supporting node (non-Kubernetes) for SiteMinder integration and a docker repository for the Kubernetes docker images. For a total of six (6) nodes.

Decision 3: DNS and Certificates

Recommendation: Please do not attempt to deploy a Kubernetes solution on-prem without having purchased a DNS domain/site and use wild card certificates tied to the DNS domain.

Without these two (2) supporting components, it is a challenge to have a working Kubernetes solution that reflects what you will experience in a cloud deployment.

For example, we purchased a domain for $12/year, and then created several “A” records that will host the IP addresses we may use to redirect to cloud or on-prem. Using sub-domains “A” records, we can have as many cloud addresses as we wish.

DNS "A" Records Example:    
aks.iam.anapartner.net (MS Azure),  
eks.iam.anapartner.net (Amazon),  
gke.iam.anapartner.net (Google).      

DNS "CNAME" Records Example:  
alertmanager.aks.iam.anapartner.net, 
grafana.aks.iam.anapartner.net, 
jaeger.aks.iam.anapartner.net,
kibana.aks.iam.anapartner.net, 
mgmt-ssp.aks.iam.anapartner.net, 
sm.aks.iam.anapartner.net, 
ssp.aks.iam.anapartner.net.       
Example of using Synology DNS Server for Kubernetes cluster’s application. With “A” and “CNAME” records.

Finally, we prefer to use wildcard certificates for these domains to avoid challenges within our Kubernetes deployment. There are several services out there offering free certificates.

We chose Let’sEncrypt https://letsencrypt.org/. While Let’sEncrypt has automated processes to renew their certs, we chose to use their DNS validation process with a CertBot solution. We can renew these certificates every 90 days for on-prem usage. The DNS validation process requires a unique string generated by the Let’sEncrypt process to be populated in a DNS “TXT” record like so: _acme-challenge.aks.iam.anapartner.net . See the example at the bottom of this blog entry on this process.

Decision 4: Supporting Components: Storage, Load-Balancing, DNS Resolution (Local)

The last step required for on-prem deployment is where will you decide to place persistence storage for your Kubernetes cluster. We chose to use an NFS share.

We first tested using the control-plane node, then decided to move the NFS share to a Synology NAS solution. Similar for the DNS resolution option, at first we used a DNS service on the control-plane node and then moved to to the Synology NAS solution.

For Load-Balancing, Kubernetes has a service option of NodePort and LoadBalancing. The LoadBalancing service if not deployed in the cloud, will default to NodePort behavior. To introduce load balancing for on-prem, we introduced the HA-proxy service on the control-plane node, along with Kubernetes NodePort service to meet this goal.

After the decisions have been made, we can now walk through the steps to set up a Vmware environment for Kubernetes.

Reference Image

Step 1: Download the OS DVD ISO image for deployment on Vmware Workstation (Centos 7 / Ubuntu ).

Determine specs for the future solution to be deployed on Kubernetes. Some solutions have pods that may require minimal memory/disc space. For the solution we decided on deploying, we confirmed that we need 16 GB RAM and 4vCPU minimal. We have confirmed these specs were required by previously deploying the solution in a cloud environment.

Without these memory/cpu specs, the solution that we chose would pause the deployment of Kubernetes pods to the nodes. You may or may not see error messages in the deployment of pods stating that the nodes did not have enough resources for all or some of the pods.

For disc size, we selected 100 GB to future-proof the solution during testing. For networking, please select BRIDGED mode, to allow the Vmware images to have minimal network issues when routing within your local network. Please avoid double NAT’ing the deployment to reduce your headaches.

Step 2: Install useful base packages and disable any UI tools. Please install an Entropy Daemon to avoid delays due to certificates usage of /dev/random and low entropy.

### UI Update for CentOS7 was stopping yum deployment - not required for our solution to be tested (e.g. VIP Auth Hub)
# su to root to run the below commands.   We will add sudo access later.

su - 
systemctl disable packagekit; systemctl stop packagekit; systemctl status packagekit

### Installed base useful packages.

yum -y install dnf epel-release yum-utils nfs-utils 

### Install useful 2nd tools.

yum -y install openldap-clients jq python3-pip tree

pip3 install yq
yum -y upgrade


### Install Entropy process (epel repo)

dnf -y install haveged
systemctl enable haveged --now

Step 3: Install docker and update the docker configuration for use with Kubernetes. Update the path & storage-driver for the docker images for initial deployment.

Ref: https://docs.docker.com/storage/storagedriver/overlayfs-driver/

### Install Docker repo & docker package

yum-config-manager --add-repo  https://download.docker.com/linux/centos/docker-ce.repo
dnf -y install docker-ce
docker version
systemctl enable docker --now
docker version

### Update docker image info after deployment and restart service

cat << EOF > /etc/docker/daemon.json
{
"debug": false,
"data-root": "/home/docker-images",
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF

### Restart docker to load updated image info.
systemctl restart docker; systemctl status docker; docker version

Step 4: Deploy the three (3) primary Kubernetes & the HELM binaries.

Ensure you select a Kubernetes version that matches what solution you wish to deploy and work with. This can be a gotcha if the Kubernetes binaries update during a dnf / yum upgrade process and your solution has not been vetted for the newer release of Kubernetes. See the reference link below on how to upgrade Kubernetes binaries.

Ref: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

### Add k8s repo

cat << EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

### When upgrading the OS, be sure to use the correct version of kubernetes (remove and add) - Example to force version 1.20.11 ###

dnf upgrade -y
dnf remove -y kubelet kubeadm kubectl
dnf install -y kubelet-1.20.11-0.x86_64 kubeadm-1.20.11-0.x86_64 kubectl-1.20.11-0.x86_64 --disableexcludes=kubernetes


### Start the k8s process.

systemctl enable kubelet --now;  systemctl status kubelet
systemctl daemon-reload && systemctl enable kubelet --now
yum-config-manager --save --setopt=kubernetes.skip_if_unavailable=true

### Add HELM binary 

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

Step 5: OS configurations required or useful for Kubernetes. Kubernetes kubelet binary requires SWAP to be disabled.

Ref: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

### Stop FirewallD - May add ports later for security

systemctl stop firewalld;systemctl disable firewalld; iptables -F

### Update OS Parameters for kubernetes

setenforce 0
sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
modprobe br_netfilter

cat << EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system

### Note:  IP forwarding is enabled by default.

sysctl -a | grep -i forward

### Note: Update /etc/fstab to comment out swap line with # character
### Warning:  kubectl init will fail if swap is left on cp or any worker node.

swapoff -a
sed -i 's|UUID\=\(.*\)-\(.*\)-\(.*\)-\(.*\)-\(.*\) swap|#UUID\=\1-\2-\3-\4-\5 swap|g' /etc/fstab
cat /etc/fstab

Step 6: Create SSH key for root or other services IDs to allow remote script updates from CP to Worker Nodes

### Create SSH key for root to allow remote script updates from CP to Worker Nodes - Enter a Blank/Null PASSWORD.

su - 
rm -rf ~/.ssh; echo y | ssh-keygen -b 4096  -C $USER -f ~/.ssh/id_rsa

### Copy the public rsa key to authorized keys to avoid password between cp/worker nodes for remote ssh commands.

cp -r -p ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys;chmod 600 ~/.ssh/authorized_keys;ls -lart .ssh

### Test for remote connection with no password:   
  
ssh -i ~/.ssh/id_rsa  root@localhost    

### Copy the id_rsa key to your host system for ease of testing.

### Add your local non-root user to sudo wheel group.  Change vip to your user ID.

LOCALUSER=vip
gpasswd -a $LOCALUSER wheel

### Update sudoers file to allow wheel group with no-password

sed -i 's|# %wheel|%wheel|g' /etc/sudoers

###  View update wheel group.

grep "%wheel" /etc/sudoers

# Example of return query.
# %wheel  ALL=(ALL)       ALL
# %wheel  ALL=(ALL)       NOPASSWD: ALL

Step 7: Stop or adjust the OS network manager, shutdown the reference image, and create a Vmware Snapshot

### Adjust or Disable the OS NetworkManager (to avoid overwriting /etc/resolv.conf)
### Important when using an internal DNS server.

systemctl disable NetworkManager;systemctl stop NetworkManager

### reboot CentOS7 Image and validate no issues upon reboot.
reboot

### Shutdown image and manually create snapshot called  "base"

Vmware Workstation Cloning

Step 8: Now that we have a reference image, we can now make clone images for the control-plane (1), the worker nodes (4), and the supporting node (1). This is a fairly quick process.

export BASE=/home/me/vmware/kub
export REF=/home/me/vmware/kub/CentOS7/CentOS7.vmx

VM=cp;mkdir       -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full
VM=worker01;mkdir -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full
VM=worker02;mkdir -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full
VM=worker03;mkdir -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full
VM=worker04;mkdir -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full
VM=sm;mkdir -p $BASE/$VM; time vmrun -T ws clone $REF $BASE/$VM/$VM.vmx -cloneName=$VM -snapshot=base full

Step 9: Start the clone images and remotely assign new hostname/IP addresses to the images

# Start cloned images for CP and Worker Nodes - Update any files as needed. 
 
export DOMAIN=aks.iam.anapartner.net
export PASSWORD_VM=Password01

### Start the cloned images for CP and Worker Nodes.

VM=cp;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
VM=worker01;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
VM=worker02;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
VM=worker03;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
VM=worker04;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
VM=sm;vmrun -T ws start $BASE/$VM/$VM.vmx nogui
vmrun -T ws list | sort -rn


### Update Hostnames for CP and Worker Nodes with Domain.

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash "hostnamectl set-hostname $VM.$DOMAIN" -noWait


### Update IP Address and Domain for NIC (ifcfg-ens33)

export CP=192.168.2.60
export WK1=192.168.2.61
export WK2=192.168.2.62
export WK3=192.168.2.63
export WK4=192.168.2.64
export SM=192.168.2.65

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$CP\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$WK1\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$WK2\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$WK3\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$WK4\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|TYPE=\"Ethernet\"|TYPE=\"Ethernet\"\nIPADDR=$SM\nDOMAIN=$DOMAIN|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait

Step 10: Enable the network gateway, disable DHCP, and reboot the images

export DOMAIN=aks.iam.anapartner.net
export PASSWORD_VM=Password01

### Update to create a new default GATEWAY HOST to address routing issues to external IP addresses.
GATEWAY=192.168.2.1

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|# Created by anaconda|# Created by anaconda\nGATEWAY=$GATEWAY|g' /etc/sysconfig/network" -noWait

### Disable DHCP (to avoid overwriting /etc/resolv.conf)

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "sed -i 's|BOOTPROTO=\"dhcp\"|BOOTPROTO=\"none\"|g'   /etc/sysconfig/network-scripts/ifcfg-ens33" -noWait

 
### Reboot VIP Auth Hub CP and Nodes 

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait

Step 11: Update DNS on the clone images remotely using vmrun

### Update /etc/resolv.conf for correct DNS server.
### Ensure DHCP and Network Manager are disable to prevent these services from overwrite behavior.

export DOMAIN=aks.iam.anapartner.net
export PASSWORD_VM=Password01
DNSNEW=192.168.2.20

VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "echo 'nameserver $DNSNEW' >>  /etc/resolv.conf" -noWait
 
 
### Reboot VIP Auth Hub CP and Nodes
 
VM=cp;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker01;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker02;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker03;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=worker04;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait
VM=sm;vmrun -T ws -gu root -gp $PASSWORD_VM runScriptInGuest $BASE/$VM/$VM.vmx  /bin/bash  "reboot" -noWait

Step 12: Copy the root .ssh public cert to your main host, rename it to a useful name and these test your newly deployed clone images for DSN resolution using ssh. Please confirm this step is successful prior to continuing with the configuration of the control plane and worker nodes.

### Copy the root id_rsa file to host system to allow ease of testing with ssh.

export CP=192.168.2.60
export WK1=192.168.2.61
export WK2=192.168.2.62
export WK3=192.168.2.63
export WK4=192.168.2.64
export SM=192.168.2.65

### Add the hosts for ssh pre-validation. 

ssh-keyscan -p 22 $CP >> ~/.ssh/known_hosts
ssh-keyscan -p 22 $WK1 >> ~/.ssh/known_hosts
ssh-keyscan -p 22 $WK2 >> ~/.ssh/known_hosts
ssh-keyscan -p 22 $WK3 >> ~/.ssh/known_hosts
ssh-keyscan -p 22 $WK4 >> ~/.ssh/known_hosts
ssh-keyscan -p 22 $SM >> ~/.ssh/known_hosts


### Rename from id_rsa to vip_kub_root_id_rsa

ssh -tt -i ~/vip_kub_root_id_rsa root@$CP 'cat /etc/resolv.conf'
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK1 'cat /etc/resolv.conf'
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK2 'cat /etc/resolv.conf'
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK3 'cat /etc/resolv.conf'
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK4 'cat /etc/resolv.conf'
ssh -tt -i ~/vip_kub_root_id_rsa root@$SM 'cat /etc/resolv.conf'


### Validate Access with ssh to CP and Worker Nodes new IP addresses.

FQDN=ssp.aks.iam.anapartner.net
ssh -tt -i ~/vip_kub_root_id_rsa root@$CP  "ping -c 2 $FQDN"
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK1 "ping -c 2 $FQDN"
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK2 "ping -c 2 $FQDN"
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK3 "ping -c 2 $FQDN"
ssh -tt -i ~/vip_kub_root_id_rsa root@$WK4 "ping -c 2 $FQDN"
ssh -tt -i ~/vip_kub_root_id_rsa root@$SM "ping -c 2 $FQDN"

Update CP (controlplane) Node

Step 13a: Copy files to CP Node from Vmware Workstation host and configure the CP node for dedicated CP usage. Recommend using two terminals/sessions to speed up the process. Install HAproxy for Load Balancing, copy the Let’s Encrypt wild card certificates, and copy the Kubernetes solution you will be deploying (scripts/yaml).

### Open Terminal 1 to CP host.
### Add bash completion to have better use of TAB to view parameters.

CP=192.168.2.60
ssh -tt -i ~/vip_kub_root_id_rsa root@$CP
dnf -y install bash-completion
echo 'export KUBECONFIG=/etc/kubernetes/admin.conf'  >>~/.bashrc
kubectl completion bash >/etc/bash_completion.d/kubectl
echo "alias k=kubectl | complete -F __start_kubectl k" >>~/.bashrc

### Install HAProxy and replace the haproxy.cfg file.
dnf -y install haproxy
systemctl enable haproxy --now
netstat -anp | grep -i -e haproxy

### Open Terminal 2 to host and push files to CP node.
### Copy HAProxy configuration, certs, and scripts
scp -i ~/vip_kub_root_id_rsa  haproxy.cfg   root@$CP:/etc/haproxy/haproxy.cfg
scp -i ~/vip_kub_root_id_rsa  cloud-certs-aks-eks-gke_exp-202X-01-12.tar  root@$CP:
scp -i ~/vip_kub_root_id_rsa  202X-11-03_vip_auth_hub_working_centos7_v2.tar   root@$CP:

### On Terminal 1 - on CP host - Restart to use new haproxy configuration file.
systemctl restart haproxy
netstat -anp | grep -i -e haproxy

### Extract CERTS to root home folder
tar -xvf cloud-certs-aks-eks-gke_exp-202X-01-12.tar

### Extract Working Scripts 
tar -xvf 202X-11-03_vip_auth_hub_working_centos7_v2.tar

### Update env variables for unique environment within step00 file.
vi step00_kubernetes_env.sh

### Add the env variables to the .bashrc file
echo ". ./step00_kubernetes_env.sh"

Step 13b: Example of /etc/haproxy/haproxy.cfg configuration for Kubernetes Load Balancing functionality for on-prem worker nodes. HAproxy deployed on control plane (CP) node. The example configuration file will route TCP 80/443/389 to one (1) of the four (4) worker nodes. If a Kubernetes NodePort service is enabled for TCP 389 (31888) ports, then this load balancer will function correctly and route the traffic for LDAP traffic as well.

[root@cp ~]# cat /etc/haproxy/haproxy.cfg
global
    user haproxy
    group haproxy
    chroot /var/lib/haproxy
    log /dev/log    local0
    log /dev/log    local1 notice
defaults
    mode http
    log global
    retries 2
    timeout http-request 10s
    timeout queue 1m
    timeout connect 10s
    timeout client 10m
    timeout server 10m
    timeout http-keep-alive 10s
    timeout check 10s
    maxconn 3000
frontend ingress
    bind *:80
    option tcplog
    mode http
    option forwardfor
    option http-server-close
    default_backend kubernetes-ingress-nodes
backend kubernetes-ingress-nodes
    mode http
    balance roundrobin
    server k8s-ingress-0 worker01.aks.iam.anapartner.net:80 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-1 worker02.aks.iam.anapartner.net:80 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-2 worker03.aks.iam.anapartner.net:80 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-2 worker04.aks.iam.anapartner.net:80 check fall 3 rise 2 send-proxy-v2
frontend ingress-https
    bind *:443
    option tcplog
    mode tcp
    option forwardfor
    option http-server-close
    default_backend kubernetes-ingress-nodes-https
backend kubernetes-ingress-nodes-https
    mode tcp
    balance roundrobin
    server k8s-ingress-0 worker01.aks.iam.anapartner.net:443 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-1 worker02.aks.iam.anapartner.net:443 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-2 worker03.aks.iam.anapartner.net:443 check fall 3 rise 2 send-proxy-v2
    server k8s-ingress-2 worker04.aks.iam.anapartner.net:443 check fall 3 rise 2 send-proxy-v2
frontend ldap
    bind *:389
    option tcplog
    mode tcp
    default_backend kubernetes-nodes-ldap
backend kubernetes-nodes-ldap
    mode tcp
    balance roundrobin
    server k8s-ldap-0 worker01.aks.iam.anapartner.net:31888  check fall 3 rise 2
    server k8s-ldap-1 worker02.aks.iam.anapartner.net:31888  check fall 3 rise 2
    server k8s-ldap-2 worker03.aks.iam.anapartner.net:31888  check fall 3 rise 2
    server k8s-ldap-2 worker04.aks.iam.anapartner.net:31888  check fall 3 rise 2

Deploy Solution on Kubernetes

Step 14: Validate that DNS and Storage are ready before deploying any solution or if you wish to have a base Kubernetes environment to use with the control-plane and four (4). worker nodes.

### Step:  Setup NFS Share either on-prem remote server or Synology NFS
### Use version 4.x checkbox for Synology.

### Example of lines on remote Linux Host with NFS share.

yum -y install nfs-utils
systemctl enable --now nfs-server rpcbind
mkdir -p /export/nfsshare ; chown nobody /export/nfsshare ; chmod -R 777 /export/nfsshare
echo "/export/nfsshare *(rw,sync,no_root_squash,insecure)" >> /etc/exports
exportfs -rav; exportfs -v

firewall-cmd --add-service=nfs --permanent
firewall-cmd --add-service={nfs3,mountd,rpc-bind} --permanent 
firewall-cmd --reload 



#### Setup DNS entries (A and CNAME) for twelve (12) items ( May be on-prem DNS or Synology DNS)

ns.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.60)
aks.iam.anapartner.net  NS ns.aks.iam.anapartner.net
cp.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.60)
worker01.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.61)
worker02.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.62)
worker03.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.63)
worker04.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.64)
sm.aks.iam.anapartner.net  A  IP_ADDRESS (192.168.2.65)
kibana CNAME cp.aks.iam.anapartner.net 
grafana CNAME cp.aks.iam.anapartner.net 
jaeger CNAME cp.aks.iam.anapartner.net 
alertmanager CNAME cp.aks.iam.anapartner.net 
ssp CNAME cp.aks.iam.anapartner.net 
ssp-mgmt CNAME cp.aks.iam.anapartner.net 

### Pre-Step:  Enable DNS resolution for external IP addresses
### Enable forwarding to external h/w router and 8.8.8.8

Step 15: Recommendation. Deploy your solution in steps using Kubernetes yaml or Helm charts to assist with debugging any deployment issues. Do not forget to use kubectl logs, and kubectl describe to isolate startup or cert issues.

### Run scripts one-by-one.  They will have a watch command in each that will 
### provide feedback on the startup processes.
### Total startup from scratch to final with VIP Sample App is about 15-20 minutes.
### Note:  Step04 has a different chart variables for on-prem for Symantec Directory.
### Note:  /step00_kubernetes_env.sh is called by each script.


./step01_kubernetes_cluster_init_with_worker_nodes.sh
./step02_kubernetes_cluster_with_ingress_and_other_charts.sh
./step03_kubernetes_cluster_with_vip_auth_hub_charts.sh
./step04_kubernetes_cluster_with_vip_auth_hub_sample_app.sh

Docker Registry for On-Prem

There are two (2) types of docker registries we have found useful.

a. The standard Mirror method will capture all docker images from “docker.io” site to a local mirror. When Kubernetes or Helm deployments are used, the docker configuration file can be adjusted to check the local mirror without updating Kubernetes yaml files or Helm charts.

b. The second method is a full query of all images after they have been deployed once, and using the docker push process into a local registry. The challenge of the second method is that the Kubernetes yaml files and/or Helm charts do have to be updated to use this local registry.

Either method will help lower bandwidth cost to re-download the same docker images, if you use a docker prune method to keep your worker nodes disc size “clean”. If the docker prune process is not used, you may notice that the worker nodes may run out of disc space due to temporary docker images/containers that did not clean up properly.

#!/bin/bash
#################################################################################
#  Create a local docker mirror registry for docker-ios
#  and local docker non-mirror registry for all other images
#  to minimize download impact
#  during restart of the kubernetes solution
#
#  All registry iamges will be placed on NFS share
#  mount -v -t nfs 192.168.2.30:/volume1/nfs /mnt  &>/dev/null
#
# Certs will be provided by Let's Encrypt every 90 days
#
#  For docker-io mirror registry, all clients must have the following line in
#  /etc/docker/daemon.json     {Note:  Use commas as needed}
#
#    "registry-mirrors":
#     [
#      "https://sm.aks.iam.anapartner.net:444"
#     ],
#
#
#
# ANA 11/2021
#
#################################################################################
# To remove all containers - to allow restart of process
docker rm -f `docker ps -a | grep -v -e CONTAINER | awk '{print $1}'` ; docker image rm `docker image ls | grep -v -e REPOSITORY | grep -e minutes -e hour -e days -e '2 weeks'|  awk '{print $3}'` &>/dev/null


#################################################################################
# Update HOST name for local server for docker image
HOST=sm.aks.iam.anapartner.net
NFS_SERVER=192.168.2.30
NFS_SHARE=/volume1/nfs


#################################################################################
function start_registry {

    local_port=$1
    remote_registry_name=$2

    if [ "$3" == "" ]; then
        remote_registry_url=$remote_registry_name
    else
        remote_registry_url=$3
    fi

    echo -e "$local_port $remote_registry_name $remote_registry_url"


mount -v -t nfs $NFS_SERVER:$NFS_SHARE /mnt  &>/dev/null
mkdir -p /mnt/registry/${remote_registry_name}  &>/dev/null

docker run -d --name registry-${remote_registry_name}-mirror  \
-p $local_port:443 \
--restart=always \
-e REGISTRY_HTTP_ADDR=0.0.0.0:443 \
-e REGISTRY_PROXY_REMOTEURL="https://${remote_registry_url}/" \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/fullchain.pem \
-e REGISTRY_HTTP_TLS_KEY=/certs/privkey.pem \
-e REGISTRY_COMPATIBILITY_SCHEMA1_ENABLED=true \
-v /mnt/registry/certs:/certs \
-v /mnt/registry/${remote_registry_name}:/var/lib/registry \
registry:latest

sleep 1
echo "#################################################################################"
curl -s -X GET  https://$HOST:$local_port/v2/_catalog | jq
echo "#################################################################################"

}

#################################################################################
# start_registry <local_port>    <remote_registry_name>  <remote_registry_url>
#################################################################################

start_registry   444             docker-io               registry-1.docker.io

#################################################################################
# Non-Proxy configuration to allow 'docker tag & docker push' for all other images
#################################################################################

remote_registry_name=all
local_port=455
mkdir -p /var/lib/docker/registry/${remote_registry_name}  &>/dev/null
docker run -d --name registry-${remote_registry_name}-mirror  \
-p $local_port:443 \
--restart=always \
-e REGISTRY_HTTP_ADDR=0.0.0.0:443 \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/fullchain.pem \
-e REGISTRY_HTTP_TLS_KEY=/certs/privkey.pem \
-e REGISTRY_COMPATIBILITY_SCHEMA1_ENABLED=true \
-v /mnt/registry/certs:/certs \
-v /mnt/registry/${remote_registry_name}:/var/lib/registry \
registry:latest

sleep 1
echo "#################################################################################"
curl -s -X GET  https://$HOST:$local_port/v2/_catalog | jq
echo "#################################################################################"
docker ps -a
echo "#################################################################################"

echo "##### To tail the log of the docker-io container - useful for monitoring helm deployments  #####"
echo "docker logs `docker ps -a  --no-trunc | grep -v NAMES | grep 'docker-io' | awk '{print $1}'` -f "
echo "#################################################################################"
echo "##### To tail the log of the ALL container - useful for monitoring helm deployments  #####"
echo "docker logs `docker ps -a  --no-trunc | grep -v NAMES | grep 'all' | awk '{print $1}'` -f  "
echo "#################################################################################"
echo "##### Location of Registry Files on NFS share #####"
echo "ls -lart /mnt/registry/docker-io/docker/registry/v2/repositories"
echo "ls -lart /mnt/registry/all/docker/registry/v2/repositories"
echo "#################################################################################"

Example of the /etc/docker/daemon.json configuration file to use a local mirror for docker.io. See the parameter of “registry-mirrors”. Unfortunately, we were unable to use this process for the other docker registries.

{
"debug": false,
"data-root": "/home/docker-images",
"exec-opts": ["native.cgroupdriver=systemd"],
"storage-driver": "overlay2",
"registry-mirrors":
[
"https://sm.aks.iam.anapartner.net:444"
],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
}
}

Let’s Encrypt Certbot and DNS validation

Use Let’sEncrypt Certbox and manual DNS validation, to create our 90-day wild card certificates. Manual DNS validation allows us to avoid setting up a public-facing component for our internal labs.

Ref: https://letsencrypt.org/docs/challenge-types/

# Step 1:  Install SNAP service for Certbot usage on your host OS

cat /etc/redhat-release
Red Hat Enterprise Linux release 8.3 (Ootpa)

sudo yum install -y  snapd
Updating Subscription Management repositories.
Package snapd-2.49-2.el8.x86_64 is already installed.

systemctl enable --now snapd.socket

### Wait 1 min

snap install core; sudo snap refresh core



# Step 2: Remove prior certbot (if installed by yum/dnf)

yum remove -y certbot.


# Step 3:  Install new "classic" Certbot

sudo snap install --classic certbot
certbot 1.17.0 from Certbot Project (certbot-eff✓) installed

sudo ln -s /snap/bin/certbot /usr/bin/certbot



# Step 4: Issue certbot command with wildcard cert & update your DNS TXT record with the string provided.


sudo certbot certonly --manual  --preferred-challenges dns -d *.aks.iam.anapartner.org --register-unsafely-without-email

Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Please read the Terms of Service at
https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf. You must
agree in order to register with the ACME server. Do you agree?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: Y
Account registered.
Requesting a certificate for *.aks.iam.anapartner.org

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Please deploy a DNS TXT record under the name:

_acme-challenge.iam.anapartner.org.

with the following value:

u2cXXXXXXXXXXXXXXXXXXXXc

Before continuing, verify the TXT record has been deployed. Depending on the DNS
provider, this may take some time, from a few seconds to multiple minutes. You can
check if it has finished deploying with aid of online tools, such as the Google
Admin Toolbox: https://toolbox.googleapps.com/apps/dig/#TXT/_acme-challenge.iam.anapartner.org.
Look for one or more bolded line(s) below the line ';ANSWER'. It should show the
value(s) you've just added.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# Step 5:  In a 2nd terminal, validate that the DNS record has been updated and can be seen by a standard DNS query.   Have the 2nd console window open to test the DNS record, prior to <ENTER> key on verification request

# Example:
nslookup -type=txt _acme-challenge.aks.iam.anapartner.org
Non-authoritative answer:
_acme-challenge.aks.iam.anapartner.org  text = "u2cXXXXXXXXXXXXXXXXXXXXc"


# Step 6:  Press <ENTER> after you have validated the TXT record.

Press Enter to Continue
Waiting for verification...
Cleaning up challenges
Subscribe to the EFF mailing list (email: nala@baugher.us).

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /etc/letsencrypt/live/aks.iam.anapartner.org/fullchain.pem
   Your key file has been saved at:
   /etc/letsencrypt/live/aks.iam.anapartner.org/privkey.pem
  


# Step 7: View certs of fullchain.pem & privkey.pem  

cat /etc/letsencrypt/live/aks.iam.anapartner.org/fullchain.pem
-----BEGIN CERTIFICATE-----

<REMOVED>
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
<REMOVED>
-----END CERTIFICATE-----

cat /etc/letsencrypt/live/aks.iam.anapartner.org/privkey.pem
-----BEGIN PRIVATE KEY-----

<REMOVED>
-----END PRIVATE KEY-----




# Step 8:  Use the two files for your kubernetes solution 

# Step 9:  Ensure domain on host OS, cp, worker nodes in /etc/resolv.conf is set correctly to aks.iam.anapartner.org    to allow the certs to be resolved correctly.

# Step 10:  Ensure Synology NAS DNS service is configurated with all alias 


# Step 11:  Optional: Validate certs with openssl


# Show the kubernetes self-signed cert

true | openssl s_client -connect kibana.aks.iam.anapartner.org:443 2>/dev/null | openssl x509 -inform pem -noout -text

# Show the new wildcard cert for same hostname &  port

curl -vvI  https://kibana.aks.iam.anapartner.org/app/home#/

curl -vvI  https://kibana.aks.iam.anapartner.org/app/home#/   2>&1 | awk 'BEGIN { cert=0 } /^\* SSL connection/ { cert=1 } /^\*/ { if (cert) print }'

nmap -p 443 --script ssl-cert kibana.aks.iam.anapartner.org


Kubernetes Side Note:   Let's Encrypt certs do NOT show up within the Kubernetes cluster certs check process.

kubeadm certs check-expiration

View of the DNS TXT records to be updated with your DNS service provider. The Let’sEncrypt Certbot will need to be able to query these records for it to assign you wildcard certificates. Create the _acme-challenge hostname entry as a TXT type, and paste in the string provided by the Let’sEncrypt Certbot process. Wait 5 minutes or test the TXT record with nslookup, then upon positive validation, continue the Let’sEncrypt Certbot process.

View your kubernetes cluster / nodes for any constraints

After your cluster is created and you have worker nodes joined to the cluster, you may wish to monitor for any constraints of your on-prem deployment. Kubectl command with the action verb of describe or top is very useful for this goal.

kubectl describe nodes worker01
kubectl top node / kubectl top pod

Kubernetes Training (Formal)

If you are new to Kubernetes, we recommend the following class. You may need to dedicate 4-8 weeks to complete the course and then take the CKA exam via the Linux Foundation.

https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/ .

Kubernetes.io site has most of the information you need to get started.

https://kubernetes.io/docs/reference/kubectl/cheatsheet/

Global Password Reset

The recent DNS challenges for a large organization that impacted their worldwide customers bring to mind a project we completed this year, a global password reset redundancy solution.

We worked with a client who desired to manage unplanned WAN outages to their five (5) data centers for three (3) independent MS Active Directory Domains with integration to various on-prem applications/ endpoints. The business requirement was for self-service password sync, where the users’ password change process is initialed/managed by the two (2) different MS Active Directory Password Policies.

Without the WAN outage requirement, any IAM/IAG solution may manage this request within a single data center. A reverse password sync agent process is enabled on all writable MS Active Directory domain controllers (DC). All the world-wide MS ADS domain controllers would communicate to the single data center to validate and resend this password change to all of the users’ managed endpoint/application accounts, e.g. SAP, Mainframe (ACF2/RACF/TSS), AS/400, Unix, SaaS, Database, LDAP, Certs, etc.

With the WAN outage requirement, however, a queue or components must be deployed/enabled at each global data center, so that password changes are allowed to sync locally to avoid work-stoppage and async-queued to avoid out-of-sync password to the other endpoint/applications that may be in other data centers.

We were able to work with the client to determine that their current IAM/IAG solution would have the means to meet this requirement, but we wished to confirm no issues with WAN latency and the async process. The WAN latency was measured at less than 300 msec between remote data centers that were opposite globally. The WAN latency measured is the global distance and any intermediate devices that the network traffic may pass through.

To review the solution’s ability to meet the latency issues, we introduced a test environment to emulate the global latency for deployment use-cases, change password use-cases, and standard CrUD use-cases. There is a feature within VMWare Workstation, that allows emulation of degraded network traffic. This process was a very useful planning/validation tool to lower rollback risk during production deployment.

VMWare Workstation Network Adapter Advance Settings for WAN latency emulation

The solution used for the Global Password Rest solution was Symantec Identity Suite Virtual Appliance r14.3cp2. This solution has many tiers, where select components may be globally deployed and others may not.

We avoided any changes to the J2EE tier (Wildfly) or Database for our architecture as these components are not supported for WAN latency by the Vendor. Note: We have worked with other clients that have deployment at two (2) remote data centers within 1000 km, that have reported minimal challenges for these tiers.

We focused our efforts on the Provisioning Tier and Connector Tier. The Provisioning Tier consists of the Provisioning Server and Provisioning Directory.

The Provisioning Server has no shared knowledge with other Provisioning Servers. The Provisioning Directory (Symantec Directory) is where the provisioning data may be set up in a multi-write peer model. Symantec Directory is a proper X.500 directory with high redundancy and is designed to manage WAN latency between remote data centers and recovery after an outage. See example provided below.

https://techdocs.broadcom.com/us/en/symantec-security-software/identity-security/directory/14-1/ca-directory-concepts/directory-replication/multiwrite-mw-replication.html

The Connector Tier consists of the Java Connector Server and C++ Connector Server, which may be deployed on MS Windows as an independent component. There is no shared knowledge between Connector Servers, which works in our favor.

Requirement:

Three (3) independent MS Active Directory domain in five (5) remote data centers need to allow self-service password change & allow local password sync during a WAN outage. Passwords changes are driven by MS ADS Password Policies (every N days). The IME Password Policy for IAG/IAM solution is not enabled, IME authentication is redirected to an ADS domain, and the IMPS IM Callback Feature is disabled.

Below is an image that outlines the topology for five (5) global data centers in AMER, EMEA, and APAC.

The flow diagram below captures the password change use-case (self-service or delegated), the expected data flow to the user’s managed endpoints/applications, and the eventual peer sync of the MS Active Directory domain local to the user.

Observation(s):

The standalone solution of Symantec IAG/IAM has no expected challenges with configurations, but the Virtual Appliance offers pre-canned configurations that may impact a WAN deployment.

During this project, we identified three (3) challenges using the virtual appliance.

Two (2) items needed the assistance of the Broadcom Support and Engineering teams. They were able to work with us to address deployment configuration challenges with the “check_cluster_clock_sync -v ” process that incorrectly increments time delays between servers instead of resetting a value of zero between testing between servers.

Why this is important? The “check_cluster_clock_sync” alias is used during auto-deployment of vApp nodes. If the time reported between servers is > 15 seconds then replication may fail. This time check issue was addressed with a hotfix. After the hot-fix was deployed, all clock differences were resolved.

The second challenge was a deployment challenge of the IMPS component for its embedded “registry files/folders”. The prior embedded copy process was observed to be using standard “scp”. With a WAN latency, the scp copy operation may take more than 30 seconds. Our testing with the Virtual Appliance showed that a simple copy would take over two (2) minutes for multiple small files. After reviewing with CA support/engineering, they provided an updated copy process using “rsync” that speeds up copy performance by >100x. Before this update, the impact was provisioning tier deployment would fail and partial rollback would occur.

The last challenge we identified was using the Symantec Directory’s embedded features to manage WAN latency via multi-write HUB groups. The Virtual Appliance cannot automatically manage this feature when enabled in the knowledge files of the provisioning data DSAs. Symantec Directory will fail to start after auto-deployment.

Fortunately, on the Virtual appliance, we have full access to the ‘dsa’ service ID and can modify these knowledge files before/after deployment. Suppose we wish to roll back or add a new Provisioning Server Virtual Appliance. In that case, we must disable the multi-write HUB group configuration temporarily, e.g. comment out the configuration parameter and re-init the DATA DSAs.

Six (6) Steps for Global Password Reset Solution Deployment

We were able to refine our list of steps for deployment using pre-built knowledge files and deployment of the vApp nodes in blank slates with the base components of Provisioning Server (PS) and Provisioning Directory) with a remote MS Windows server for the Connector Server (JCS/CCS).

Step 1: Update Symantec Directory DATA DSA’s knowledge configuration files to use the multiple group HUB model. Note that multi-write group configuration is enabled within the DATA DSA’s *.dxc files. One Directory servers in each data center will be defined as a “HUB”.

Ref: https://techdocs.broadcom.com/us/en/symantec-security-software/identity-security/directory/14-1/ca-directory-concepts/directory-replication/multiwrite-mw-groups-hubs/topology-sample-and-disaster-recovery.html

To assist this configuration effort, we leveraged a serials of bash shell scripts that could be pasted into multiple putty/ssh sessions on each vApp to replace the “HUB” string with a “sed” command.

After the HUB model is enabled (stop/start the DATA DSAs), confirm that delayed WAN latency has no challenge with Symantec Directory sync processes. By monitoring the Symantec Directory logs during replication, we can see that sync operation with the WAN latency is captured with the delay > 1 msecs between data centers AMER1 and APAC1.

Step 2: Update IMPS configurations to avoid delays with Global Password Reset solution.

Note for this architecture, we do not use external IME Password Policies. We ensure that each AD endpoint has the checkbox enabled for “Password synchronization agent is installed” & each Global User (GU) has “Enable Password Synchronization Agent” checkbox enabled to prevent data looping. To ensure this GU attribute is always enabled, we updated an attribute under “Create Users Default Attributes”.

Step 3a: Update the Connector Tier (CCS Component)

Ensure that the MS Windows Environmental variables for the CCS connector are defined for Failover (ADS_FAILOVER) and Retry (ADS_RETRY).

Step 3b: Update the CCS DNS knowledge file of ADS DCs hostnames.

Important Note: Avoid using the refresh feature “Refresh DC List” within the IMPS GUI for the ADS Endpoint. If this feature is used, then a “merge” will be processed from the local CCS DNS file contents and what is defined within the IMPS GUI refresh process. If we wish to manage the redirection to local MS ADS Domain Controllers, we need to control this behavior. If this step is done, we can clean out the Symantec Directory of extra entries. The only negative aspect is the local password change may attempt to communicate to one of the remote MS ADS Domain Controllers that are not within the local data center. During a WAN outage, a user would notice a delay during the password change event while the CCS connector timed out the connection until it connected to the local MS ADS DC.

Step 3c: CCS ADS Failover

If using SSL over TCP 636 confirm the ADS Domain Root Certificate is deployed to the MS Windows Server where the CCS service is deployed. If using SASL over TCP 389 (if available), then no additional effort is required.

If using SSL over TCP 636, use the MS tool certlm.msc to export the public root CA Certificate for this ADS Domain. Export to base64 format for import to the MS Windows host (if not already part of the ADS Domain) with the same MS tool certlm.msc.

Step 4a: Update the Connector Tier for the JCS component.

Add the stabilization parameter “maxWait” to the JCS/CCS configuration file. Recommend 10-30 seconds.

Step 4b: Update JCS registration to the IMPS Tier

You may use the Virtual Appliance Console, but this has a delay when pulling the list of any JCS connector that may be down at this time of the check/submission. If we use the Connector Xpress UI, we can accomplish the same process much faster with additional flexibility for routing rules to the exact MS ADS Endpoints in the local data center.

Step 4c: Observe the IMPS routing to JCS via etatrans log during any transaction.

If any JCS service is unavailable (TCP 20411), then the routing rules process will report a value of 999.00, instead of a low value of 0.00-1.00.

Step 5: Update the Remote Password Change Agent (DLL) on MS ADS Domain Controllers (writable)

Step 6a: Validation of Self-Service Password Change to selected MS ADS Domain Controller.

Using various MS Active Directory processes, we can emulate a delegated or self-service password change early during the configuration cycle, to confirm deployment is correct. The below example uses MS Powershell to select a writable MS ADS Domain Controller to update a user’s password. We can then monitor the logs at all tiers for completion of this password change event.

A view of the password change event from the Reverse Password Sync Agent log file on the exact MS Domain Controller.

Step 6b: Validation of password change event via CCS ADS Log.

Step 6c: Validation of password change event via IMPS etatrans log

Note: Below screenshot showcases alias/function to assist with monitoring the etatrans logs on the Virtual Appliance.

Below screen shot showcases using ldapsearch to check timestamps for before/after of password change event within MS Active Directory Domain.

We hope these notes are of some value to your business and projects.

Appendix

Using the MS Windows Server for CCS Server 

Get current status of AD account on select DC server before Password Change:

PowerShell Example:

get-aduser -Server dc2012.exchange2020.lab   "idmpwtest"  -properties passwordlastset, passwordneverexpires | ft name, passwordlastset

LdapSearch Example:  (using ldapsearch.exe from CCS bin folder - as the user with current password.)

C:\> & "C:\Program Files (x86)\CA\Identity Manager\Connector Server\ccs\bin\ldapsearch.exe" -LLL -h dc2012.exchange2012.lab -p 389 -D "cn=idmpwtest,cn=Users,DC=exchange2012,DC=lab" -w "Password05" -b "CN=idmpwtest,CN=Users,DC=exchange2012,DC=lab" -s base pwdLastSet

Change AD account's password via Powershell:
PowerShell Example:

Set-ADAccountPassword -Identity "idmpwtest" -Reset -NewPassword (ConvertTo-SecureString -AsPlainText "Password06" -Force) -Server dc2016.exchange.lab

Get current status of AD account on select DC server after Password Change:

PowerShell Example:

get-aduser -Server dc2012.exchange2020.lab   "idmpwtest"  -properties passwordlastset, passwordneverexpires | ft name, passwordlastset

LdapSearch Example:  (using ldapsearch.exe from CCS bin folder - as the user with NEW password)

C:\> & "C:\Program Files (x86)\CA\Identity Manager\Connector Server\ccs\bin\ldapsearch.exe" -LLL -h dc2012.exchange2012.lab -p 389 -D "cn=idmpwtest,cn=Users,DC=exchange2012,DC=lab" -w "Password06" -b "CN=idmpwtest,CN=Users,DC=exchange2012,DC=lab" -s base pwdLastSet

Using the Provisioning Server for password change event

Get current status of AD account on select DC server before Password Change:
LDAPSearch Example:   (From IMPS server - as user with current password)

LDAPTLS_REQCERT=never  ldapsearch -LLL -H ldaps://192.168.242.154:636 -D 'CN=idmpwtest,OU=People,dc=exchange2012,dc=lab'  -w  Password05   -b "CN=idmpwtest,OU=People,dc=exchange2012,dc=lab" -s sub dn pwdLastSet whenChanged


Change AD account's password via ldapmodify & base64 conversion process:
LDAPModify Example:

BASE64PWD=`echo -n '"Password06"' | iconv -f utf8 -t utf16le | base64 -w 0`
ADSHOST='192.168.242.154'
ADSUSERDN='CN=Administrator,CN=Users,DC=exchange2012,DC=lab'
ADSPWD='Password01!’

ldapmodify -v -a -H ldaps://$ADSHOST:636 -D "$ADSUSERDN" -w "$ADSPWD" << EOF
dn: CN=idmpwtest,OU=People,dc=exchange2012,dc=lab 
changetype: modify
replace: unicodePwd
unicodePwd::$BASE64PWD
EOF

Get current status of AD account on select DC server after Password Change:
LDAPSearch Example:   (From IMPS server - with user's account and new password)

LDAPTLS_REQCERT=never  ldapsearch -LLL -H ldaps://192.168.242.154:636 -D 'CN=idmpwtest,OU=People,dc=exchange2012,dc=lab' -w  Password06   -b "CN=idmpwtest,OU=People,dc=exchange2012,dc=lab" -s sub dn pwdLastSet whenChanged

LDAP MITM Methodology to isolate data challenge

The Symantec (CA/Broadcom) Directory solution provides a mechanism for routing LDAPv3 traffic to other solutions. This routing mechanism allows Symantec Directory to act as a virtual directory service for other directories, e.g., MS Active Directory, SunOne, Novell eDirectory, etc.


The Symantec Identity Suite solution uses the LDAP protocol for its mid-tier and connector-tier components. The Provisioning Server is exposed on TCP 20389/20390, the JCS (Java Connector Server) is exposed on TCP 20410/20411, and the CCS (C++ Connector Server) is exposed on TCP 20402/20403.


We wished to isolate provisioning data challenges within the Symantec Identity Management solution that was not fully viewable using the existing debugging logs & features of the provisioning tier & connector tiers. Using Symantec Directory, we can leverage the routing mechanism to build a MITM (man-in-the-middle) methodology to track all LDAP traffic through the Symantec Identity Manager connector tier.


We focused on the final leg of provisioning and created a process to track the JCS -> CCS LDAP traffic. We wanted to understand what and how the data was being sent from the JCS to the CCS to isolate issues to the CCS service and MS Active Directory. Using the trace level of Symantec Directory, we can capture all LDAP traffic, including binds/queries/add/modify actions.

The below steps showcase how to use Symantec Directory as an approved MITM process for troubleshooting exercises. We found this process more valuable than deploying Wireshark on the JCS/CCS Server and decoding the encrypted traffic for LDAP.

Background:

Symantec Directory documentation on routing. Please note the concept / feature of “set transparent-routing = true;” to avoid schema challenges when routing to other directory/ldap solutions.

https://techdocs.broadcom.com/us/en/symantec-security-software/identity-security/directory/14-1/ca-directory-concepts/directory-distribution-and-routing.html

MITM Methodology for JCS->CCS Service:

The Symantec Identity Management connector tier may be deployed on MS Windows or Linux OS. If the CCS service is being used, then MS Windows OS is required for this MS Visual C++ component/service. As we are focused on the CCS service, we will introduce the Symantec Directory solution on the same MS Windows OS.

NOTE: We will keep the MITM process contained on a single host, and will not redirect the network traffic beyond the host.

Step 1: Deploy the latest Symantec Directory solution on MS Windows OS. This deployment is a blank slate for the next steps to follow.

Step 2: Copy the folders of schema, limits, and ssld from an existing Symantec Directory deployment of the Symantec Identity Manager solution. Using the existing schema files, references, and certificates will allow us to avoid any challenges during startup of the Router DSA due to the pre-defined provisioning/connector tier configurations. Please note when copying from a Linux OS version of Symantec Directory, we will need to update the path from Linux format to MS Windows format in the SSLD impd.dxc file for “cert-dir” and “ca-file” parameters.

# DXserver/config/ssld/impd.dxc

set ssl = {
cert-dir = "C:\Program Files\CA\Directory\dxserver\config\ssld\personalities"
ca-file = "C:\Program Files\CA\Directory\dxserver\config\ssld\impd_trusted.pem"
cipher = "HIGH:!SSLv2:!EXP:!aNULL:!eNULL"
#protocol = tlsv12
fips = false
};

Step 3: Create a new Router DSA DXI configuration file. This is the primary configuration file for Symantec Directory DSA. It will referenced the schema, knowledge, limits, and certificates for the DSA. Note the parameters for “transparent-routing” to avoid schema challenges with other solutions. Note the trace level used to trace the LDAP traffic in the Symantec Directory Router DSA trace log.

# DXserver/config/servers/admin_router_ccs_30402.dxi

# logging and tracing 
close summary-log; 
close trace-log; 
source "../logging/default.dxc"; 
 
# schema 
clear schema; 
source "../schema/impd.dxg";
 
# access controls 
clear access; 
# source "../access/"; 
 
# ssld
source "../ssld/impd.dxc";

# knowledge 
clear dsas; 
source "../knowledge/admin_router_ccs_group.dxg"; 
 
# operational settings 
source "../settings/default.dxc"; 
 
# service limits 
source "../limits/impd.dxc"; 

# database  - none - transparent router
set transparent-routing=TRUE;

# tunnel through eAdmin server error code and  messages
set route-non-compliant-ldap-error-codes = true;

set trace=ldap,time,stats;
#set trace=dsa,time;

Step 4: Create the three (3) knowledge files. The “group” knowledge file will be used to redirect to the other two (2) knowledge files of the router DSA and the re-direct DSA to the CCS service.

# DXserver/config/knowledge/admin_router_ccs_group.dxg 
# The admin_router_ccs_30402.dxc PORT 30402 
# will be used for the IAMCS (JCS) CCS port override configuration file
# server_ccs.properties via proxyConnectionConfig.proxyServerPort=30402

source "admin_router_ccs_30402.dxc";
source "admin_ccs_server_01.dxc";
# DXserver/config/knowledge/admin_router_ccs_30402.dxc 
# This file is sourced by admin_router_ccs_group.dxg.
 
set dsa admin_router_ccs_30402 =  
{ 
    prefix        = <> 
    dsa-name      = <dc etasa><cn admin_router_ccs_30402> 
    dsa-password  = "secret"
    address       = ipv4 localhost port 30402
    snmp-port     = 22500
    console-port  = 22501
    auth-levels   = clear-password
    dsp-idle-time = 100000 
    trust-flags = allow-check-password, trust-conveyed-originator
    link-flags    = ssl-encryption-remote
};
# DXserver/config/knowledge/admin_ccs_server_01.dxc
# This file is sourced by admin_router_ccs_group.dxg.

set dsa admin_ccs_server_01 =  
{ 
     prefix        = <dc etasa> 
     dsa-name      = <dc etasa><cn admin_ccs_server_01> 
     dsa-password  = "secret"
     address       = ipv4 localhost port 20402
     auth-levels   = clear-password
     dsp-idle-time = 100000
     dsa-flags     = load-share
     trust-flags   = allow-check-password, no-server-credentials, trust-conveyed-originator
     link-flags    = dsp-ldap
     #link-flags    = dsp-ldap, ssl-encryption
     # Note:  ssl will require update to /etc/hosts with:  <IP_Address>  eta_server

};

Step 5: Update the JCS configuration file that contains the TCP port that we will be redirecting to. In this example, we will declare TCP 30402 to be the new port.

#C:\Program Files (x86)\CA\Identity Manager\Connector Server\jcs\conf\override\server_ccs.properties

ccsWindowsController.ccsScriptPath=C:\\Program Files (x86)\\CA\\Identity Manager\\Connector Server\\ccs\\bin
proxyCCSManager.enabled=true
proxyCCSManager.startupWait=30
proxyConnectionConfig.proxyServerHostname=localhost
#proxyConnectionConfig.proxyServerPort=20402
proxyConnectionConfig.proxyServerPort=30402
proxyConnectionConfig.proxyServerUser=cn=root,dc=etasa
proxyConnectionConfig.proxyServerPassword={AES}pbj27RvWGakDKCr+DhRH4Q==
proxyConnectionConfig.proxyServerUseSsl=false
proxyCCSManager.controller.ref=ccsWindowsController

Overview of all files updated and their relationship to each other.

Validation

Start up the solution in the following order. Ensure that the new Symantec Directory Router DSA is starting with no issue. If there are any syntax issues, isolate them with the command: dxserver -d start DSA_NAME.

Start the Router DSA first, then restart the im_jcs (JCS) service. The im_ccs (CCS) service will be auto-started by the JCS service. Wait one (1) minute, then check that both TCP Ports 20402 (CCS) and 30402 (Router DSA) are both in the LISTEN state. If we do not see these both ports, please stop and restart these services.

May use MS Sysinternals ProcessExplorer to monitor both services and using the TCP/IP tab, to view which ports are being used.

A view of the im_ccs.exe and dxserver.exe services and which TCP ports they are listening on.

Use a 3rd party LDAP client tool, such as Jxplorer to authenticate to both the CCS and the Router DSA ports, with the embedded service ID of “cn=root,dc=etasa”. We should see exactly the SAME data.

Use the IME or IMPS to perform a query to MS Active Directory (or any other endpoint that uses the CCS connector tier). We should now see the “cache” on the CCS service be populated with the endpoint information, and the base DN structure. We can now track all LDAP traffic through the Router DSA MITM process.

View of trace logs

We can monitor when the JCS first binds to the CCS service.

We can monitor when the IMPS via the JCS queries if the CCS is aware of the ADS endpoint.

Finally, we can view when the IMPS service decrypt its stored information on the Active Directory endpoint, and push this information to the CCS cache, to allow communication to MS Active Directory. Using Notepad++ we can tail the trace log.

Please note, this is a secure LDAP/S tunnel from the IMPS -> JCS -> CCS -> MS ADS.

We can now view how this data is pushed via this secure tunnel with the MITM process.

> [88] 
> [88] <-- #1 LDAP MESSAGE messageID 5
> [88] AddRequest
> [88]  entry: eTADSDirectoryName=ads2016,eTNamespaceName=ActiveDirectory,dc=im,dc=etasa
> [88]  attributes:
> [88]   type: eTADSobjectCategory
> [88]   value: CN=Domain-DNS,CN=Schema,CN=Configuration,DC=exchange,DC=lab
> [88]   type: eTADSdomainFunctionality
> [88]   value: 7
> [88]   type: eTADSUseSSL
> [88]   value: 3
> [88]   type: eTADSexchangeGroups
> [88]   value: CN=Mailbox Database 0840997559,CN=Databases,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=ExchangeLab,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=lab
> [88]   value: CN=im,CN=Databases,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=ExchangeLab,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=lab
> [88]   type: eTLogWindowsEventSeverity
> [88]   value: FE
> [88]   type: eTAccountResumable
> [88]   value: 1
> [88]   type: eTADSnetBIOS
> [88]   value: EXCHANGE
> [88]   type: eTLogStdoutSeverity
> [88]   value: FE
> [88]   type: eTLog
> [88]   value: 0
> [88]   type: eTLogUnicenterSeverity
> [88]   value: FE
> [88]   type: eTADSlockoutDuration
> [88]   value: -18000000000
> [88]   type: objectClass
> [88]   value: eTADSDirectory
> [88]   type: eTLogETSeverity
> [88]   value: FE
> [88]   type: eTADSmsExchSystemObjectsObjectVersion
> [88]   value: 13240
> [88]   type: eTADSsettings
> [88]   value: 3
> [88]   type: eTADSconfig
> [88]   value: ExpirePwd=0
> [88]   value: HomeDirInheritPermission=0
> [88]   type: eTLogDestination
> [88]   value: F
> [88]   type: eTADSUserContainer
> [88]   value: CN=BuiltIn;CN=Users
> [88]   type: eTADSbackupDirs
> [88]   value: 000;DEFAULT;192.168.242.156;0
> [88]   value: 001;DEFAULT;dc2016.exchange.lab;0
> [88]   value: 002;site1;server1.domain.com;0
> [88]   value: 003;site1;server2.domain.com;0
> [88]   value: 004;site2;server3.domain.com;0
> [88]   value: 005;site2;server4.domain.com;0
> [88]   type: eTADSuseFailover
> [88]   value: 1
> [88]   type: eTLogAuditSeverity
> [88]   value: FE
> [88]   type: eTADS-DefaultContext
> [88]   value: exchange.lab
> [88]   type: eTADSforestFunctionality
> [88]   value: 7
> [88]   type: eTADSAuthDN
> [88]   value: Administrator
> [88]   type: eTADSlyncMaxConnection
> [88]   value: 5
> [88]   type: eTADShomeMTA
> [88]   value: CN=Microsoft MTA,CN=EXCHANGE2016,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=ExchangeLab,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=lab
> [88]   type: eTADSAuthPWD
> [88]   value: CAdemo123
> [88]   type: eTADSexchangelegacyDN
> [88]   value: /o=ExchangeLab/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=Servers/cn=EXCHANGE2016/cn=Microsoft Private MDB
> [88]   type: eTLogFileSeverity
> [88]   value: F
> [88]   type: eTADSprimaryServer
> [88]   value: dc2016.exchange.lab
> [88]   type: eTADScontainers
> [88]   value: CN=Builtin,DC=exchange,DC=lab
> [88]   value: CN=Computers,DC=exchange,DC=lab
> [88]   value: OU=Domain Controllers,DC=exchange,DC=lab
> [88]   value: OU=Explore,DC=exchange,DC=lab
> [88]   value: CN=ForeignSecurityPrincipals,DC=exchange,DC=lab
> [88]   value: CN=Keys,DC=exchange,DC=lab
> [88]   value: CN=Managed Service Accounts,DC=exchange,DC=lab
> [88]   value: OU=Microsoft Exchange Security Groups,DC=exchange,DC=lab
> [88]   value: OU=o365,DC=exchange,DC=lab
> [88]   value: OU=People,DC=exchange,DC=lab
> [88]   value: CN=Program Data,DC=exchange,DC=lab
> [88]   value: CN=Users,DC=exchange,DC=lab
> [88]   value: DC=ForestDnsZones,DC=exchange,DC=lab
> [88]   value: DC=DomainDnsZones,DC=exchange,DC=lab
> [88]   type: eTADSTimeBoundMembershipsEnabled
> [88]   value: 0
> [88]   type: eTADSexchange
> [88]   value: 1
> [88]   type: eTADSdomainControllerFunctionality
> [88]   value: 7
> [88]   type: eTADSexchangeStores
> [88]   value: CN=EXCHANGE2016,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=ExchangeLab,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=lab
> [88]   value: CN=Mailbox,CN=Transport Configuration,CN=EXCHANGE2016,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=ExchangeLab,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=lab
> [88]   value: CN=Frontend,CN=Transport Configuration,CN=EXCHANGE2016,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=ExchangeLab,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=lab
> [88]   type: eTADSKeepCamCaftFiles
> [88]   value: 0
> [88]   type: eTADSmsExchSchemaVersion
> [88]   value: 15333
> [88]   type: eTADSCamCaftTimeout
> [88]   value: 0000001800
> [88]   type: eTADSMaxConnectionsInPool
> [88]   value: 0000000101
> [88]   type: eTADSPortNum
> [88]   value: 389
> [88]   type: eTADSDCDomain
> [88]   value: DC=exchange,DC=lab
> [88]   type: eTADSServerName
> [88]   value: 192.168.242.156
> [88]   type: eTADSDirectoryName
> [88]   value: ads2016
> [88]   type: eTAccountDeletable
> [88]   value: 1
> [88] controls:
> [88]   controlType: 2.16.840.1.113730.3.4.2
> [88]   non-critical

We can now monitor all traffic and assist with troubleshooting any CCS/MS-ADS challenges.

This same MITM methodology/process may also be used for the IMPS (TCP 20389/2039) and the JCS (TCP 20410/20411) services. We have used this process to capture the IME (JIAM) LDAP traffic to the IMPS Service, to isolate multiple queries for Child Provisioning Roles. Which has been used by the product team to enhance the solution to lower startup durations of the IME in the latest releases.

Binds/queries/add/modification all work with this approach, but we do see an issue with OID for IMPS ADS endpoint “explore process” on ADS OU object. We are reviewing how to address this last challenge that states “critical extension is unavailable” for a LDAP control property of the OU object. The OIDs captured appear to be related to SunOne/Iplanet.

Avoid locking a userID in a Virtual Appliance

The below post describes enabling the .ssh private key/public key process for the provided service IDs to avoid dependency on a password that may be forgotten, and also how to leverage the service IDs to address potential CA Directory data sync challenges that may occur when there are WAN network latency challenges between remote cluster nodes.

Background:

The CA/Broadcom/Symantec Identity Suite (IGA) solution provides for a software virtual appliance. This software appliance is available on Amazon AWS as a pre-built AMI image that allows for rapid deployment.

The software appliance is also offered as an OVA file for Vmware ESXi/Workstation deployment.

Challenge:

If the primary service ID is locked or password is allowed to expire, then the administrator will likely have only two (2) options:

1) Request assistance from the Vendor (for a supported process to reset the service ID – likely with a 2nd service ID “recoverip”)

2) Boot from an ISO image (if allowed) to mount the vApp as a data drive and update the primary service ID.

Proposal:

Add a standardized SSH RSA private/pubic key to the primary service ID, if it does not exist. If it exists, validate able to authentication and copy files between cluster nodes with the existing .SSH files. Rotate these files per internal security policies, e.g. 1/year.

The focus for this entry is on the CA ‘config’ and ‘ec2-user’ service IDs.

An enhancement request has been added, to have the ‘dsa’ userID added to the file’/etc/ssh/ssh_allowed_users’ to allow for the same .ssh RSA process to address challenges during deployments where the CA Directory Data DSA did not fully copy from one node to another node.

https://community.broadcom.com/participate/ideation-home/viewidea?IdeationKey=7c795c51-d028-4db8-adb1-c9df2dc48bff

AWS vApp: ‘ec2-user’

The primary service ID for remote SSH access is ‘ec2-user’ for the Amazon AWS is already deployed with a .ssh RSA private/public key. This is a requirement for AWS deployments and has been enabled to use this process.

This feature allows for access to be via the private key from a remote SSH session using Putty/MobaXterm or similar tools. Another feature may be leveraged by updating the ‘ec2-user’ .ssh folder to allow for other nodes to be exposed with this service ID, to assist with the deployment of patch files.

As an example, enabling .ssh service between multiple cluster nodes will reduce scp process from remote workstations. Prior, if there were five (5) vApp nodes, to patch them would require uploading the patch direct to each of the five (5) nodes. With enabling .ssh service between all nodes for the ‘ec2-user’ service ID, we only need to upload patches to one (1) node, then use a scp process to push these patch file(s) from one node to another cluster node.

On-Prem vApp: ‘config’

We wish to emulate this process for on-prem vApp servers to reduce I/O for any files to be uploaded and/or shared.

This process has strong value when CA Directory *.db files are out-of-sync or during initial deployment, there may be network issues and/or WAN latency.

Below is an example to create and/or rotate the private/public SSH RSA files for the ‘config’ service ID.

An example to create and/or rotate the private/public SSH RSA files for the ‘config’ service ID.

Below is an example to push the newly created SSH RSA files to the remote host(s) of the vApp cluster. After this step, we can now use scp processes to assist with remediation efforts within scripts without a password stored as clear text.

Copy the RSA folder to your workstation, to add to your Putty/MobaXterm or similar SSH tool, to allow remote authentication using the public key.

If you have any issues, use the embedded verbose logging within the ssh client tool (-vv) to identify the root issue.

ssh -vv userid@remote_hostname

Example:

config@vapp0001 VAPP-14.1.0 (192.168.242.146):~ > eval `ssh-agent` && ssh-add
Agent pid 5717
Enter passphrase for /home/config/.ssh/id_rsa:
Identity added: /home/config/.ssh/id_rsa (/home/config/.ssh/id_rsa)
config@vapp0001 VAPP-14.1.0 (192.168.242.146):~ >
config@vapp0001 VAPP-14.1.0 (192.168.242.146):~ > ssh -vv config@192.168.242.128
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to 192.168.242.128 [192.168.242.128] port 22.
debug1: Connection established.
debug1: identity file /home/config/.ssh/identity type -1
debug1: identity file /home/config/.ssh/identity-cert type -1
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug2: key_type_from_name: unknown key type 'Proc-Type:'
debug2: key_type_from_name: unknown key type 'DEK-Info:'
debug2: key_type_from_name: unknown key type '-----END'
debug1: identity file /home/config/.ssh/id_rsa type 1
debug1: identity file /home/config/.ssh/id_rsa-cert type -1
debug1: identity file /home/config/.ssh/id_dsa type -1
debug1: identity file /home/config/.ssh/id_dsa-cert type -1
debug1: identity file /home/config/.ssh/id_ecdsa type -1
debug1: identity file /home/config/.ssh/id_ecdsa-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug2: fd 3 setting O_NONBLOCK
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ssh-rsa-cert-v01@openssh.com,ssh-dss-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-dss-cert-v00@openssh.com,ssh-rsa,ssh-dss
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96
debug2: kex_parse_kexinit: hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96
debug2: kex_parse_kexinit: none,zlib@openssh.com,zlib
debug2: kex_parse_kexinit: none,zlib@openssh.com,zlib
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: hmac-sha1,hmac-sha2-256,hmac-sha2-512
debug2: kex_parse_kexinit: hmac-sha1,hmac-sha2-256,hmac-sha2-512
debug2: kex_parse_kexinit: none
debug2: kex_parse_kexinit: none
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: mac_setup: found hmac-sha1
debug1: kex: server->client aes128-ctr hmac-sha1 none
debug2: mac_setup: found hmac-sha1
debug1: kex: client->server aes128-ctr hmac-sha1 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<2048<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug2: dh_gen_key: priv key bits set: 141/320
debug2: bits set: 1027/2048
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug1: Host '192.168.242.128' is known and matches the RSA host key.
debug1: Found key in /home/config/.ssh/known_hosts:2
debug2: bits set: 991/2048
debug1: ssh_rsa_verify: signature correct
debug2: kex_derive_keys
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug2: key: /home/config/.ssh/id_rsa (0x5648110d2a00)
debug2: key: /home/config/.ssh/identity ((nil))
debug2: key: /home/config/.ssh/id_dsa ((nil))
debug2: key: /home/config/.ssh/id_ecdsa ((nil))
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Next authentication method: gssapi-keyex
debug1: No valid Key exchange context
debug2: we did not send a packet, disable method
debug1: Next authentication method: gssapi-with-mic
debug1: Unspecified GSS failure.  Minor code may provide more information
Improper format of Kerberos configuration file

debug1: Unspecified GSS failure.  Minor code may provide more information
Improper format of Kerberos configuration file

debug2: we did not send a packet, disable method
debug1: Next authentication method: publickey
debug1: Offering public key: /home/config/.ssh/id_rsa
debug2: we sent a publickey packet, wait for reply
debug1: Server accepts key: pkalg ssh-rsa blen 533
debug2: input_userauth_pk_ok: SHA1 fp 39:06:95:0d:13:4b:9a:29:0b:28:b6:bd:3d:b0:03:e8:3c:ad:50:6f
debug1: Authentication succeeded (publickey).
debug1: channel 0: new [client-session]
debug2: channel 0: send open
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug2: callback start
debug2: client_session2_setup: id 0
debug2: channel 0: request pty-req confirm 1
debug1: Sending environment.
debug1: Sending env LANG = en_US.UTF-8
debug2: channel 0: request env confirm 0
debug2: channel 0: request shell confirm 1
debug2: fd 3 setting TCP_NODELAY
debug2: callback done
debug2: channel 0: open confirm rwindow 0 rmax 32768
debug2: channel_input_status_confirm: type 99 id 0
debug2: PTY allocation request accepted on channel 0
debug2: channel 0: rcvd adjust 2097152
debug2: channel_input_status_confirm: type 99 id 0
debug2: shell request accepted on channel 0
Last login: Thu Apr 30 20:21:48 2020 from 192.168.242.146

CA Identity Suite Virtual Appliance version 14.3.0 - SANDBOX mode
FIPS enabled:                   true
Server IP addresses:            192.168.242.128
Enabled services:
Identity Portal               192.168.242.128 [OK] WildFly (Portal) is running (pid 10570), port 8081
                                              [OK] Identity Portal Admin UI is available
                                              [OK] Identity Portal User Console is available
                                              [OK] Java heap size used by Identity Portal: 810MB/1512MB (53%)
Oracle Database Express 11g   192.168.242.128 [OK] Oracle Express Edition started
Identity Governance           192.168.242.128 [OK] WildFly (IG) is running (pid 8050), port 8082
                                              [OK] IG is running
                                              [OK] Java heap size used by Identity Governance: 807MB/1512MB (53%)
Identity Manager              192.168.242.128 [OK] WildFly (IDM) is running (pid 5550), port 8080
                                              [OK] IDM environment is started
                                              [OK] idm-userstore-router-caim-srv-01 started
                                              [OK] Java heap size used by Identity Manager: 1649MB/4096MB (40%)
Provisioning Server           192.168.242.128 [OK] im_ps is running
                                              [OK] co file usage: 1MB/250MB (0%)
                                              [OK] inc file usage: 1MB/250MB (0%)
                                              [OK] main file usage: 9MB/250MB (3%)
                                              [OK] notify file usage: 1MB/250MB (0%)
                                              [OK] All DSAs are started
Connector Server              192.168.242.128 [OK] jcs is running
User Store                    192.168.242.128 [OK] STATS: number of objects in cache: 5
                                              [OK] file usage: 1MB/200MB (0%)
                                              [OK] UserStore_userstore-01 started
Central Log Server            192.168.242.128 [OK] rsyslogd (pid  1670) is running...
=== LAST UPDATED: Fri May  1 12:15:05 CDT 2020 ====
*** [WARN] Volume / has 13% Free space (6.2G out of 47G)
config@cluster01 VAPP-14.3.0 (192.168.242.128):~ >

A view into rotating the SSH RSA keys for the CONFIG UserID

# CONFIG - On local vApp host
ls -lart .ssh     [view any prior files]
echo y | ssh-keygen -b 4096 -N Password01 -C $USER -f $HOME/.ssh/id_rsa
IP=192.168.242.135;ssh-keyscan -p 22 $IP >> .ssh/known_hosts
IP=192.168.242.136;ssh-keyscan -p 22 $IP >> .ssh/known_hosts
IP=192.168.242.137;ssh-keyscan -p 22 $IP >> .ssh/known_hosts
cp -r -p .ssh/id_rsa.pub .ssh/authorized_keys
rm -rf /tmp/*.$USER.ssh-keys.tar
tar -cvf /tmp/`/bin/date -u +%s`.$USER.ssh-keys.tar .ssh
ls -lart /tmp/*.$USER.ssh-keys.tar
eval `ssh-agent` && ssh-add           [Enter Password for SSH RSA Private Key]
IP=192.168.242.136;scp `ls /tmp/*.$USER.ssh-keys.tar`  config@$IP:
IP=192.168.242.137;scp `ls /tmp/*.$USER.ssh-keys.tar`  config@$IP:
USER=config;ssh -tt $USER@192.168.242.136 "tar -xvf *.$USER.ssh-keys.tar"
USER=config;ssh -tt $USER@192.168.242.137 "tar -xvf *.$USER.ssh-keys.tar"
IP=192.168.242.136;ssh $IP `/bin/date -u +%s`
IP=192.168.242.137;ssh $IP `/bin/date -u +%s`
IP=192.168.242.136;ssh -vv $IP              [Use -vv to troubleshoot ssh process]
IP=192.168.242.137;ssh -vv $IP 				[Use -vv to troubleshoot ssh process]

Be safe and automate your backups for CA Directory Data DSAs to LDIF

The CA Directory solution provides a mechanism to automate daily on-line backups, via one simple parameter:

dump dxgrid-db period 0 86400;

Where the first number is the offset from GMT/UTC (in seconds) and the second number is how often to run the backup (in seconds), e.g. Once a day = 86400 sec = 24 hr x 60 min/hr x 60 sec/min

Two Gaps/Challenge(s):

History: The automated backup process will overwrite the existing offline file(s) (*.zdb) for the Data DSA. Any requirement or need to perform a RCA is lost due to this fact. What was the data like 10 days ago? With the current state process, only the CA Directory or IM logs would be of assistance.

Size: The automated backup will create an offline file (*.zdb) footprint of the same size as the data (*.db) file. If your Data DSA (*.db) is 10 GB, then your offline (*.zdb) will be 10 GB. The Identity Provisioning User store has four (4) Data DSAs, that would multiple this number , e.g. four (4) db files + four (4) offline zdb files at 10 GB each, will require minimal of 80 GB disk space free. If we attempt to retain a history of these files for fourteen (14) days, this would be four (4) db + fourteen (14) zdb = eighteen (18) x 10 GB = 180 GB disk space required.

Resolutions:

Leverage the CA Directory tool (dxdumpdb) to convert from the binary data (*.db/*.zdb) to LDIF and the OS crontab for the ‘dsa’ account to automate a post ‘online backup’ export and conversion process.

Step 1: Validate the ‘dsa’ user ID has access to crontab (to avoid using root for this effort). cat /etc/cron.allow

If access is missing, append the ‘dsa’ user ID to this file.

Step 2: Validate that online backup process have been scheduled for your Data DSA. Use a find command to identify the offline files (*.zdb ). Note the size of the offline Data DSA files (*.zdb).

Step 3: Identify the online backup process start time, as defined in the Data DSA settings DXC file or perhaps DXI file. Convert this GMT offset time to the local time on the CA Directory server. (See references to assist)

Step 4: Use crontab -e as ‘dsa’ user ID, to create a new entry: (may use crontab -l to view any entries). Use the dxdumpdb -z switch with the DSA_NAME to create the exported LDIF file. Redirect this output to gzip to automatically bypass any need for temporary files. Note: Crontab has limited variable expansion, and any % characters must be escaped.

Example of the crontab for ‘dsa’ to run 30 minutes after (at 2 am CST) the online backup process is scheduled (at 1:30 am CST).

# Goal:  Export and compress the daily DSA offline backup to ldif.gz at 2 AM every day
# - Ensure this crontab runs AFTER the daily automated backup (zdb) of the CA Directory Data DSAs
# - Review these two (2) tokens for DATA DSAs:  ($DXHOME/config/settings/impd.dxc  or ./impd_backup.dxc)
#   a)   Location:  set dxgrid-backup-location = "/opt/CA/Directory/dxserver/backup/";
#   b)   Online Backup Period:   dump dxgrid-db period 0 86400;
#
# Note1: The 'N' start time of the 'dump dxgrid-db period N M' is the offset in seconds from midnight of UTC
#   For 24 hr clock, 0130 (AM) CST calculate the following in UTC/GMT =>  0130 CST + 6 hours = 0730 UTC
#   Due to the six (6) hour difference between CST and UTC TZ:  7.5 * 3600 = 27000 seconds
# Example(s):
#   dump dxgrid-db period 19800 86400;   [Once a day at 2330 CST]
#   dump dxgrid-db period 27000 86400;   [Once a day at 0130 CST]
#
# Note2:  Alternatively, may force an online backup using this line:
#               dump dxgrid-db;
#        & issuing this command:  dxserver init all
#
#####################################################################
#        1      2         3       4       5        6
#       min     hr      d-o-m   month   d-o-w   command(s)
#####################################################################
#####
#####  Testing Backup Every Five (5) Minutes ####
#*/5 * * * *  . $HOME/.profile && dxdumpdb -z `dxserver status | grep "impd-main" | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-main" | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
#####
#####  Backup daily at 2 AM CST  -  30 minutes after the online backup at 1:30 AM CST #####
#####
0 2 * * *    . $HOME/.profile &&  dxdumpdb -z `dxserver status | grep "impd-main"   | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-main"   | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
0 2 * * *    . $HOME/.profile &&  dxdumpdb -z `dxserver status | grep "impd-co"     | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-co"     | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
0 2 * * *    . $HOME/.profile &&  dxdumpdb -z `dxserver status | grep "impd-inc"    | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-inc"    | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz
0 2 * * *    . $HOME/.profile &&  dxdumpdb -z `dxserver status | grep "impd-notify" | awk "{print $1}"` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-notify" | awk '{print $1}'`_`/bin/date --utc +\%Y\%m\%d\%H\%M\%S.0Z`.ldif.gz

Example of the above lines that can be placed in a bash shell, instead of called directly via crontab. Note: Able to use variables and no need to escape the `date % characters `

# set DSA=main &&   dxdumpdb -z `dxserver status | grep "impd-$DSA" | awk '{print $1}'` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-$DSA" | awk '{print $1}'`_`/bin/date --utc +%Y%m%d%H%M%S.0Z`.ldif.gz
# set DSA=co &&     dxdumpdb -z `dxserver status | grep "impd-$DSA" | awk '{print $1}'` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-$DSA" | awk '{print $1}'`_`/bin/date --utc +%Y%m%d%H%M%S.0Z`.ldif.gz
# set DSA=inc &&    dxdumpdb -z `dxserver status | grep "impd-$DSA" | awk '{print $1}'` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-$DSA" | awk '{print $1}'`_`/bin/date --utc +%Y%m%d%H%M%S.0Z`.ldif.gz
# set DSA=notify && dxdumpdb -z `dxserver status | grep "impd-$DSA" | awk '{print $1}'` | gzip -9 > /tmp/`hostname`_`dxserver status | grep "impd-$DSA" | awk '{print $1}'`_`/bin/date --utc +%Y%m%d%H%M%S.0Z`.ldif.gz
#

Example of the output:

Monitor with tail -f /var/log/cron (or syslog depending on your OS version), when the crontab is executed for your ‘dsa’ account

View the output folder for the newly created gzip LDIF files. The files may be extracted back to LDIF format, via gzip -d file.ldif.gz. Compare these file sizes with the original (*.zdb) files of 2GB.

Recommendation(s):

Implement a similar process and retain this data for fourteen (14) days, to assist with any RCA or similar analysis that may be needed for historical data. Avoid copied the (*.db or *.zdb) files for backup, unless using this process to force a clean sync between peer MW Data DSAs.

The Data DSAs may be reloaded (dxloadb) from these LDIF snapshots; the LDIF files do not have the same file size impact as the binary db files; and as LDIF files, they may be quickly search for prior data using standard tools such as grep “text string” filename.ldif.

This process will assist in site preparation for a DAR (disaster and recovery) scenario. Protect your data.

References:

dxdumpdb

https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-identity-and-access-management/directory/14-1/administrating/tools-to-manage-ca-directory/dxtools/dxdumpdb-tool-export-data-from-a-datastore-to-an-ldif-file.html

dump dxgrid-db

https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-identity-and-access-management/directory/14-1/reference/commands-reference/dump-dxgrid-db-command-take-a-consistent-snapshot-copy-of-a-datastore.html

If you wish to learn more or need assistance, contact us.

API Gateway and Docker Lab

While assisting a site with their upgrade process from CA API Gateway 9.2 (docker) to the latest CA API Gateway 9.4 image, we needed to clarify the steps. In this blog entry, we have capture our validation processes of the documented and undocumented features of API Gateway docker deployment ( https://hub.docker.com/r/caapim/gateway/ ), pedantic verbose steps to assist with training of staff resources; and enhanced the external checks for a DAR (disaster and recovery) scenario using docker & docker-compose tools.

Please use this lab to jump start your knowledge of the tools: ‘docker’, ‘docker-compose’ and the API Gateway. We have added many checks and the use of bash shell to view the contents of the API Gateway containers. If you have additional notes/tips, please leave a comment.

To lower business risk during this exercise, we made the follow decisions:

1) Avoid use of default naming conventions, to prevent accidental deletion of the supporting MySQL database for CA API Gateway. The default ‘docker-compose.yml’ was renamed as appropriate for each API Gateway version.

2) Instead of using different folders to host configuration files, we defined project names as part of the startup process for docker-compose.

3) Any docker container updates would reference the BASH shell directly instead of a soft-link, to avoid different behaviors between the API GW container and the MySQL container.

Challenges:

Challenge #1: Both the API Gateway 9.2 and 9.4 docker container have defects with regards to using the standardized ‘docker stop/start containerID‘ process. API Gateway 9.2 would not restart cleanly; and API Gateway 9.4 container would not update the embedded health check process, e.g. docker ps -a OR docker inspect containerID

Resolution #1: Both challenges were addressed in the enclosed testing scripts. Docker-compose is used exclusively for API Gateway 9.2 container, and touching an internal file in the API Gateway 9.4 container.

Challenge #2: The docker parameters between API Gateway 9.2 and API Gateway 9.4 had changed.

Resolution #2: Identify the missing parameters with ‘docker logs containerID’ and review of the embedded deployment script of ‘entrypoint.sh’

Infrastructure: Seven (7) files were used for this lab on CentOS 7.x (/opt/docker/api)

  1. ssg_license.xml (required from Broadcom/CA Sales Team – ask for 90 day trial if a current one is not available)
  2. docker-compose-ssg94.yml (the primary install configuration file for API GW 9.4)
  3. docker-compose-ssg92.yml (the primary install configuration file for API GW 9.2)
  4. docker-compose-ssg94-join-db.xml (the restart configuration file – use as needed)
  5. docker-compose-ssg92-join-db.xml (the restart configuration file – use as needed)
  6. 01_create_both_ssg92_and_ssg94_docker_deployments.sh (The installation of ‘docker’ and ‘docker-compose’ with the deployment of API GW 9.2 [with MySQL 5.5] and API GW 9.4 [with MySQL 5.7] ; with some additional updates)
  7. 02_backup_and_migrate_mysql_ssg_data_ from_ssg92_to_ssg94_db.sh (The export/import process from API GW 9.2 to API GW 9.4 and some additional checks)

Example of the seven (7) lab files’ contents:

  1. ssg_license.xml ( a view of the header only )
<?xml version="1.0" encoding="UTF-8"?>
<license Id="5774266080443298199" xmlns="http://l7tech.com/license">
    <description>LIC-PRODUCTION</description>
    <licenseAttributes/>
    <valid>2018-12-10T19:32:31.000Z</valid>
    <expires>2019-12-11T19:32:31.000Z</expires>
    <host name=""/>
    <ip address=""/>
    <product name="Layer 7 SecureSpan Suite">
        <version major="9" minor=""/>
        <featureset name="set:Profile:EnterpriseGateway"/>
    </product>

2. docker-compose-ssg94.yml

version: "2.2"
services:
    ssg94:
      container_name: ssg94
      image: caapim/gateway:latest
      mem_limit: 4g
      volumes:
         - /opt/docker/api/ssg_license.xml:/opt/SecureSpan/Gateway/node/default/etc/bootstrap/license/license.xml
      expose:
      - "8777"
      - "2142"
      ports:
        - "8443:8443"
        - "9443:9443"
      environment:
        ACCEPT_LICENSE: "true"
        SSG_CLUSTER_COMMAND: "create"
        SSG_CLUSTER_HOST: "localhost"
        SSG_CLUSTER_PASSWORD: "7layer"
        SSG_DATABASE_TYPE: "mysql"
        SSG_DATABASE_HOST: "mysql57"
        SSG_DATABASE_PORT: "3306"
        SSG_DATABASE_NAME: "ssg"
        SSG_DATABASE_USER: "gateway"
        SSG_DATABASE_PASSWORD: "7layer"
        SSG_DATABASE_JDBC_URL: "jdbc:mysql://mysql57:3306/ssg?useSSL=false"
        SSG_DATABASE_WAIT_TIMEOUT: "120"
        SSG_DATABASE_ADMIN_USER: "root"
        SSG_DATABASE_ADMIN_PASS: "7layer"
        SSG_ADMIN_USERNAME: "pmadmin"
        SSG_ADMIN_PASSWORD: "7layer"
        SSG_INTERNAL_SERVICES: "restman wsman"
        EXTRA_JAVA_ARGS: "-Dcom.l7tech.bootstrap.env.license.enable=false -Dcom.l7tech.bootstrap.autoTrustSslKey=trustAnchor,TrustedFor.SSL,TrustedFor.SAML_ISSUER -Dcom.l7tech.server.transport.jms.topicMasterOnly=false  -Dcom.l7tech.service.metrics.enabled=false -Dcom.l7tech.server.disableFileLogsinks=false "
      links:
        - mysql57
    mysql57:
      container_name: ssg94_mysql57
      image: mysql:5.7
      restart: always
      mem_limit: 2g
      ports:
       - "3306:3306"
      environment:
         - MYSQL_ROOT_PASSWORD=7layer
         - MYSQL_USER=gateway
         - MYSQL_PASSWORD=7layer
         - MYSQL_DATABASE=ssg

3. docker-compose-ssg92.yml

version: "2.2"
services:
    ssg92:
      container_name: ssg92
      image: caapim/gateway:9.2.00-9087_CR10
      mem_limit: 4g
      expose:
      - "8778"
      - "2143"
      ports:
        - "8444:8443"
        - "9444:9443"
      environment:
        SKIP_CONFIG_SERVER_CHECK: "true"
        ACCEPT_LICENSE: "true"
        SSG_CLUSTER_COMMAND: "create"
        SSG_CLUSTER_HOST: "localhost"
        SSG_CLUSTER_PASSWORD: "7layer"
        SSG_DATABASE_TYPE: "mysql"
        SSG_DATABASE_HOST: "mysql55"
        SSG_DATABASE_PORT: "3306"
        SSG_DATABASE_NAME: "ssg"
        SSG_DATABASE_USER: "root"
        SSG_DATABASE_PASSWORD: "7layer"
        SSG_DATABASE_JDBC_URL: "jdbc:mysql://mysql55:3306/ssg?useSSL=false"
        SSG_DATABASE_WAIT_TIMEOUT: "120"
        SSG_DATABASE_ADMIN_USER: "root"
        SSG_DATABASE_ADMIN_PASS: "7layer"
        SSG_ADMIN_USERNAME: "pmadmin"
        SSG_ADMIN_PASSWORD: "7layer"
        SSG_ADMIN_USER: "pmadmin"
        SSG_ADMIN_PASS: "7layer"
        SSG_INTERNAL_SERVICES: "restman wsman"
        EXTRA_JAVA_ARGS: "-Dcom.l7tech.bootstrap.env.license.enable=true -Dcom.l7tech.bootstrap.autoTrustSslKey=trustAnchor,TrustedFor.SSL,TrustedFor.SAML_ISSUER -Dcom.l7tech.server.transport.jms.topicMasterOnly=false  -Dcom.l7tech.service.metrics.enabled=false "
        SSG_LICENSE: "$SSG_LICENSE_ENV"
      links:
        - mysql55
    mysql55:
      container_name: ssg92_mysql55
      image: mysql:5.5
      restart: always
      mem_limit: 2g
      ports:
      - "3307:3306"
      environment:
        - MYSQL_ROOT_PASSWORD=7layer

4. docker-compose-ssg94-join-db.yml

version: "2.2"
services:
    ssg94:
      container_name: ssg94
      image: caapim/gateway:latest
      mem_limit: 4g
      volumes:
         - /opt/docker/api/ssg_license.xml:/opt/SecureSpan/Gateway/node/default/etc/bootstrap/license/license.xml
      expose:
      - "8777"
      - "2142"
      ports:
        - "8443:8443"
        - "9443:9443"
      environment:
        ACCEPT_LICENSE: "true"
        #SSG_CLUSTER_COMMAND: "create"
        SSG_CLUSTER_COMMAND: "join"
        SSG_CLUSTER_HOST: "localhost"
        SSG_CLUSTER_PASSWORD: "7layer"
        SSG_DATABASE_TYPE: "mysql"
        SSG_DATABASE_HOST: "mysql57"
        SSG_DATABASE_PORT: "3306"
        SSG_DATABASE_NAME: "ssg"
        SSG_DATABASE_USER: "gateway"
        SSG_DATABASE_PASSWORD: "7layer"
        SSG_DATABASE_JDBC_URL: "jdbc:mysql://mysql57:3306/ssg?useSSL=false"
        SSG_DATABASE_WAIT_TIMEOUT: "120"
        SSG_DATABASE_ADMIN_USER: "root"
        SSG_DATABASE_ADMIN_PASS: "7layer"
        SSG_ADMIN_USERNAME: "pmadmin"
        SSG_ADMIN_PASSWORD: "7layer"
        SSG_INTERNAL_SERVICES: "restman wsman"
        EXTRA_JAVA_ARGS: "-Dcom.l7tech.bootstrap.env.license.enable=false -Dcom.l7tech.bootstrap.autoTrustSslKey=trustAnchor,TrustedFor.SSL,TrustedFor.SAML_ISSUER -Dcom.l7tech.server.transport.jms.topicMasterOnly=false  -Dcom.l7tech.service.metrics.enabled=false -Dcom.l7tech.server.disableFileLogsinks=false "
      links:
        - mysql57
    mysql57:
      container_name: ssg94_mysql57
      image: mysql:5.7
      restart: always
      mem_limit: 2g
      ports:
       - "3306:3306"
      environment:
         - MYSQL_ROOT_PASSWORD=7layer
         - MYSQL_USER=gateway
         - MYSQL_PASSWORD=7layer
         - MYSQL_DATABASE=ssg

5. docker-compose-ssg92-join-db.yml

version: "2.2"
services:
    ssg92:
      container_name: ssg92
      image: caapim/gateway:9.2.00-9087_CR10
      mem_limit: 4g
      expose:
      - "8778"
      - "2143"
      ports:
        - "8444:8443"
        - "9444:9443"
      environment:
        SKIP_CONFIG_SERVER_CHECK: "true"
        ACCEPT_LICENSE: "true"
        SSG_CLUSTER_COMMAND: "join"
        SSG_CLUSTER_HOST: "localhost"
        SSG_CLUSTER_PASSWORD: "7layer"
        SSG_DATABASE_TYPE: "mysql"
        SSG_DATABASE_HOST: "mysql55"
        SSG_DATABASE_PORT: "3306"
        SSG_DATABASE_NAME: "ssg"
        SSG_DATABASE_USER: "root"
        SSG_DATABASE_PASSWORD: "7layer"
        SSG_DATABASE_JDBC_URL: "jdbc:mysql://mysql55:3306/ssg?useSSL=false"
        SSG_DATABASE_WAIT_TIMEOUT: "120"
        SSG_DATABASE_ADMIN_USER: "root"
        SSG_DATABASE_ADMIN_PASS: "7layer"
        SSG_ADMIN_USERNAME: "pmadmin"
        SSG_ADMIN_PASSWORD: "7layer"
        SSG_ADMIN_USER: "pmadmin"
        SSG_ADMIN_PASS: "7layer"
        SSG_INTERNAL_SERVICES: "restman wsman"
        EXTRA_JAVA_ARGS: "-Dcom.l7tech.bootstrap.env.license.enable=true -Dcom.l7tech.bootstrap.autoTrustSslKey=trustAnchor,TrustedFor.SSL,TrustedFor.SAML_ISSUER -Dcom.l7tech.server.transport.jms.topicMasterOnly=false  -Dcom.l7tech.service.metrics.enabled=false "
        SSG_LICENSE: "$SSG_LICENSE_ENV"
      links:
        - mysql55
    mysql55:
      container_name: ssg92_mysql55
      image: mysql:5.5
      restart: always
      mem_limit: 2g
      ports:
      - "3307:3306"
      environment:
        - MYSQL_ROOT_PASSWORD=7layer

6. 01_create_both_ssg92_and_ssg94_docker_deployments.sh

#!/bin/bash
##################################################################
#
# Script to validate upgrade process from CA API GW 9.2 to 9.4 with docker
#  - Avoid using default of 'docker-compose.yml'
#  - Define different project names for API GW 9.2 and 9.4 to avoid conflict
#  - Explictly use bash shell  /bin/bash  instead of soft-link
#
# 1. Use docker with docker-compose to download & start
#      CA API GW 9.4 (with MySQL 5.7) &
#      CA API GW 9.2 (with MySQL 5.5)
#
# 2. Configure CA API GW 9.4 with TCP 8443/9443
#              CA API GW 9.2 with TCP 8444/9444 (redirect to 8443/9443)
#
# 3. Configure MySQL 5.7 to be externally exposed on TCP 3306
#              MySQL 5.5 to be externally exposed on TCP 3307
#  - Adjust 'grant' token on MySQL configuration file for root account
#
# 4. Validate authentication credentials to the above web services with curl
#
#
# 5. Add network modules via yum to API GW 9.4 container
#   - To assist with troubleshooting / debug exercises
#
# 6. Enable system to use API GW GUI to perform final validation
#   - Appears to be an issue to use browers to access the API GW UI TCP 8443/8444
#
#
# Alan Baugher, ANA, 10/19
#
##################################################################


echo ""
echo ""
echo "################################"
echo "Install docker and docker-compose via yum if missing"
echo "Watch for message:  Nothing to do "
echo ""
echo "yum -y install docker docker-compose "
yum -y install docker docker-compose
echo "################################"
echo ""


echo "################################"
echo "Shut down any prior docker container running for API GW 9.2 and 9.4"
cd /opt/docker/api
pwd
echo "Issue this command if script fails:  docker stop \$(docker ps -a -q)  && docker rm \$(docker ps -a -q)   "
echo "################################"
echo ""


echo "################################"
export SSG_LICENSE_ENV=$(cat ./ssg_license.xml | gzip | base64 --wrap=0)
echo "Execute  'docker-compose down'  to ensure no prior data or containers for API GW 9.4"
docker-compose -p ssg94 -f /opt/docker/api/docker-compose-ssg94.yml down
echo "################################"
echo "Execute  'docker-compose down'  to ensure no prior data or containers for API GW 9.2"
docker-compose -p ssg92 -f /opt/docker/api/docker-compose-ssg92.yml down
echo "################################"
echo ""


echo "################################"
echo "Execute  'docker ps -a'   to validate no running docker containers for API GW 9.2 nor 9.4"
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.RunningFor}}\t{{.Status}}\t{{.Ports}}"
echo "################################"
echo ""


echo "################################"
echo "Change folder to execute docker-compose script for API GW 9.4 with MySql 5.7 with TCP 8443/9443"
echo "Execute  'docker-compose up -d'  to start docker containers for API GW 9.4 with MySql 5.7 with TCP 8443/9443"
docker-compose -p ssg94 -f /opt/docker/api/docker-compose-ssg94.yml up -d
echo "################################"
echo "Change folder to execute docker-compose script for API GW 9.2 with MySql 5.5 with TCP 8444/9444"
echo "Execute  'docker-compose up -d'  to start docker containers for API GW 9.2 with MySql 5.5 with TCP 8444/9444"
docker-compose -p ssg92 -f /opt/docker/api/docker-compose-ssg92.yml up -d
echo "################################"
echo ""


echo "################################"
echo "Backup current API GW 9.4 running container for future analysis"
echo "docker export ssg94 > ssg94.export.`/bin/date --utc +%Y%m%d%H%M%S.0Z`.tar "
docker export ssg94 > ssg94.export.`/bin/date --utc +%Y%m%d%H%M%S.0Z`.tar
echo "################################"
echo ""


echo "################################"
echo "Update API GW 9.4 running container with additional supporting tools with yum"
echo "docker exec -it -u root -e TERM=xterm ssg94 /bin/sh -c \"yum install -y -q net-tools iproute unzip vi --nogpgcheck\" "
docker exec -it -u root -e TERM=xterm ssg94 /bin/sh -c "yum install -y -q net-tools iproute unzip vi --nogpgcheck "
echo "Export API GW 9.4 running container after supporting tools are added"
echo "docker export ssg94 > ssg94.export.tools.`/bin/date --utc +%Y%m%d%H%M%S.0Z`.tar "
docker export ssg94 > ssg94.export.tools.`/bin/date --utc +%Y%m%d%H%M%S.0Z`.tar
echo "################################"
echo ""


echo "################################"
echo "Validate network ports are exposed for API GW Manager UI "
netstat -anpeW | grep -e docker -e "Local" | grep -e "tcp" -e "Local"
echo "################################"
echo ""

echo "################################"
echo "Sleep 70 seconds for both API GW to be ready"
echo "################################"
sleep 70
echo ""


echo ""
echo "################################"
echo "Extra:  Open TCP 3306 for mysql remote access "
docker exec -it -u root -e TERM=xterm  `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /bin/bash -c "echo -e '\0041includedir /etc/mysql/conf.d/\n\0041includedir /etc/mysql/mysql.conf.d/\n[mysqld]\nskip-grant-tables' > /etc/mysql/mysql.cnf && cat /etc/mysql/mysql.cnf "
#docker exec -it -u root -e TERM=xterm  `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /bin/bash -c "/etc/init.d/mysql restart"
#docker exec -it -u root -e TERM=xterm  `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /bin/bash -c "/etc/init.d/mysql status && echo -n"
echo "################################"
docker restart ssg94_mysql57
echo ""



echo "################################"
echo "Execute  'docker ps -a'   to validate running docker containers for API GW 9.2 and 9.4 with their correct ports"
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.RunningFor}}\t{{.Status}}\t{{.Ports}}"
echo "################################"
echo ""


echo "################################"
echo "Test authentication with the SSG backup URL for API 9.2 TCP 8444 - should see six (6) lines"
echo "curl -s --insecure  -u pmadmin:7layer  https://$(hostname -s):8444/ssg/backup | grep -e 'title' -e 'Gateway node' -e 'input' -e 'form action' "
echo "#########           ############"
curl -s --insecure  -u pmadmin:7layer  https://$(hostname -s):8444/ssg/backup | grep -e "title" -e "Gateway node" -e "input" -e "form action"
echo "################################"
echo ""


echo "################################"
echo "Test authentication with the SSG backup URL for API 9.4 TCP 8443 - should see six (6) lines"
echo "curl -s --insecure  -u pmadmin:7layer  https://$(hostname -s):8443/ssg/backup | grep -e 'title' -e 'Gateway node' -e 'input' -e 'form action' "
echo "#########           ############"
curl -s --insecure  -u pmadmin:7layer  https://$(hostname -s):8443/ssg/backup | grep -e "title" -e "Gateway node" -e "input" -e "form action"
echo "################################"
echo ""


echo "################################"
echo "Next Steps:"
echo "       Open the API GW UI for 9.2 and create a new entry in the lower left panel"
echo ""
echo "Example: "
echo "       Right click on hostname entry and select 'Publish RESTful Service Proxy with WADL' "
echo "       Select Manual Entry, then click Next"
echo "       Enter data for two (2) fields:"
echo "                  Service Name:  Alan "
echo "                  Resource Base URL:  http://www.anapartner.com/alan "
echo "       Then select Finish Button "
echo "################################"
echo ""

7. 02_backup_and_migrate_mysql_ssg_data_from_ssg92_to_ssg94_db.sh

#!/bin/bash
#######################################################################
#
# Script to validate upgrade process from CA API 9.2 to 9.4 with docker
#  - Avoid using default of 'docker-compose.yml'
#  - Define different project names for API GW 9.2 and 9.4 to avoid conflict
#  - Explictly use bash shell  /bin/bash  instead of soft-link /bin/sh
#
# 1. Stop docker containers for CA API GW 9.2 & 9.4 (leave mysql containers running)
#    - To prevent any updates to mysql db during migration
#
# 2. Use mysqldump command to export CA API 9.2 MySQL 5.5 ssg database with stored procedures (aka routines)
#   - Review excluding the audit tables to avoid carrying over excessive data
#
# 3. Use mysql command to import sql file to CA API 9.4 MySQL 5.7 ssg database
#   - Review if dropping / recreate the ssg database will avoid any install issues
#   - Keep eye on table cluster_info {as this has the Gateway1 defination with the host IP address}
#
# 4. Restart CA API GW 9.2 & 9.4 containers
#
#    - Challenge 1: CA API GW 9.2 docker image has issue with docker stop/start process
#    the reference /root/entrypoint.sh will loop with creation of a license folder
#    - Addressed with custom docker-compose file to recreate image to join existing MySQL 5.5 container
#
#    - Challenge 2: CA API GW 9.4 docker image has issue with docker stop/start process
#    The new heathcheck.sh process calls a base.sh script that compare the date-time stamp for two files
#    , the datestamp for one file is not updated correctly upon docker start process.
#    - Addressed with custom docker bash script to "touch" the primary file to allow date stamp to be updated.  Validate with: docker logs ssg94
#      WARNING 1      com.l7tech.server.boot.GatewayBoot: Unable to touch /opt/SecureSpan/Gateway/node/default/var/started:
#                  /opt/SecureSpan/Gateway/node/default/var/started (Permission denied)
#
#    - Challenge 3: CA API GW 9.4 docker image appears to have similar issue for hazelcast startup
#    The container may hold for 300 seconds due to hazelcast configuration not completing correctly
#     SEVERE  1      com.hazelcast.instance.Node: [172.17.0.3]:8777 [gateway] [3.10.2] Could not join cluster in 300000 ms. Shutting down now!
#     Unable to start the server: Error starting server : Lifecycle error: Could not initialize Hazelcast cluster
#     WARNING 107    com.hazelcast.cluster.impl.TcpIpJoiner: [172.17.0.3]:8777 [gateway] [3.10.2] Error during join check!
#    - Addessed with different project names to avoid conflict between API GW 9.2 broadcast to API GW 9.4
#
#    - Challenge 4: CA API GW 9.2 appears to be stuck in a while loop for /opt/docker/entrypoint.sh
#      apim-provisioning: INFO: waiting for the shutdown file at "/userdata/shutdown" to be created
#    - Addressed:  Does not seem to have impact with current testing.  Ignore.  Validate with:  docker logs ssg92
#
# 5. Important Note: Ensure that the SSG_CLUSTER_HOST and SSG_CLUSTER_PASSWORD values for CA API GW 9.4 docker-compose file
#    match those set in the configured MySQL database.
#    After CA API GW 9.4 container connects to the existing Gateway database, the Container Gateway will automatically upgrades
#    the ssg database if the ssg database version is lower than the version of the Container Gateway.
#    - Ensure the jdbc hostname
#
#    - Ref:  https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-api-management/api-gateway/9-4/other-gateway-form-factors/using-the-container-gateway/getting-started-with-the-container-gateway/connect-the-container-gateway-to-an-existing-mysql-database.html
#
#
# Alan Baugher, ANA, 10/19
#
#######################################################################

echo ""
echo "####################################"
echo "Early check: Address a file permission issue with the API GW 9.4 container"
echo "docker exec -it -u root -e TERM=xterm `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c 'chmod 666 /opt/SecureSpan/Gateway/node/default/var/started' "
docker exec -it -u root -e TERM=xterm `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c "chmod 666 /opt/SecureSpan/Gateway/node/default/var/started"
echo "May validate issue with:  docker logs ssg94 "
echo "####################################"


echo ""
echo "####################################"
echo "Temporarily shutdown the API GW containers for 9.2 and 9.4 to avoid any updates to the mysql db during export & import"
echo "docker stop ssg92 ssg94 "
docker stop ssg92 ssg94
echo "####################################"
echo ""


echo "####################################"
echo "Validate API GW container are down and the MySQL db containers are up and working"
echo "Pause for 5 seconds:  sleep 5"
sleep 5
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.RunningFor}}\t{{.Status}}\t{{.Ports}}"
echo "####################################"
echo ""


echo "####################################"
echo "Export the API GW 9.2 MySQL 5.5 ssg db with stored procedures (aka routines)"
echo "time docker exec -i `docker ps -a | grep mysql:5.5 | awk '{print $1}'` mysqldump -u root --password=7layer ssg  --routines >  ssg92.backup_with_routines.sql  2> /dev/null "
time docker exec -i `docker ps -a | grep mysql:5.5 | awk '{print $1}'` mysqldump -u root --password=7layer ssg  --routines >  ssg92.backup_with_routines.sql  2> /dev/null
echo "View the size of the MySQL 5.5. ssg db for API GW 9.2"
ls -lart | grep ssg92.backup_with_routines.sql
echo "####################################"
echo ""


echo "####################################"
echo "Export the API GW 9.4 MySQL 5.7 ssg db with stored procedures (aka routines) as a 'before' reference file"
echo "time docker exec -i `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /usr/bin/mysqldump -u root --password=7layer ssg  --routines >  ssg94.before.backup_with_routines.sql  2> /dev/null "
time docker exec -i `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /usr/bin/mysqldump -u root --password=7layer ssg  --routines >  ssg94.before.backup_with_routines.sql  2> /dev/null
echo "View the size of the MySQL 5.7. ssg db for API GW 9.4 as the 'before' reference file"
ls -lart | grep ssg94.before.backup_with_routines.sql
echo "####################################"
echo ""


echo "####################################"
echo "Import the MySQL 5.5 ssg db with stored procedures (aka routines) into MySQL 5.7 for API GW 9.4"
echo "time docker exec -i `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /usr/bin/mysql -u root --password=7layer ssg    < ssg92.backup_with_routines.sql 2> /dev/null "
time docker exec -i `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /usr/bin/mysql -u root --password=7layer ssg    < ssg92.backup_with_routines.sql 2> /dev/null
echo "####################################"
echo ""


echo "####################################"
echo "Export the API GW 9.4 MySQL 5.7 ssg db wht stored procedures (aka routines) as a 'after' import reference file"
echo "time docker exec -i `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /usr/bin/mysqldump -u root --password=7layer ssg  --routines >  ssg94.after.backup_with_routines.sql 2> /dev/null "
time docker exec -i `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /usr/bin/mysqldump -u root --password=7layer ssg  --routines >  ssg94.after.backup_with_routines.sql 2> /dev/null
echo "View the size of the MySQL 5.7. ssg db for API GW 9.4 as the 'after' reference file"
ls -lart | grep ssg94.after.backup_with_routines.sql
echo "####################################"
echo ""


echo "####################################"
echo "Restart the API GW containers for 9.2 and 9.4 "
# Note: Restart of the ssg94 container will 'auto' upgrade the ssg database to 9.4 tags
echo "docker restart ssg94 "
docker restart ssg94
#docker rm ssg94
#docker-compose -p ssg94 -f /opt/docker/api/docker-compose-ssg94-join-db.yml up -d
echo "####################################"
# Note:  API GW 9.2 docker image was not designed for stop/start correctly; rm then redeploy
export SSG_LICENSE_ENV=$(cat ssg_license.xml | gzip | base64 --wrap=0)
echo "Remove the API GW 9.2 container via:  docker rm ssg92"
docker rm ssg92
echo "Redeploy the API GW 9.2 container "
echo "docker-compose -p ssg92 -f /opt/docker/api/docker-compose-ssg92-join-db.yml up -d "
docker-compose -p ssg92 -f /opt/docker/api/docker-compose-ssg92-join-db.yml up -d
echo "####################################"
echo ""



echo "####################################"
echo "Validate API GW container are up and the mysql db containers are working"
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.RunningFor}}\t{{.Status}}\t{{.Ports}}"
echo "####################################"
echo ""


echo "####################################"
echo "Export the API GW 9.4 MySQL 5.7 ssg db after import & after the 'auto' upgrade as an 'after' auto upgrade reference file"
docker stop ssg94
echo "time docker exec -i `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /usr/bin/mysqldump -u root --password=7layer ssg  --routines >  ssg94.auto.after.backup_with_routines.sql 2> /dev/null "
time docker exec -i `docker ps -a | grep mysql:5.7 | awk '{print $1}'` /usr/bin/mysqldump -u root --password=7layer ssg  --routines >  ssg94.auto.after.backup_with_routines.sql 2> /dev/null
echo "View all the exported MySQL files to compare process flow"
ls -lart ssg*.sql
docker start ssg94
echo "View the auto upgrade from version 9.2 to version 9.4 with a delta compare of the exported sql files"
echo "diff ssg94.after.backup_with_routines.sql  ssg94.before.backup_with_routines.sql  | grep -i \"INSERT INTO .ssg_version.\" "
diff ssg94.after.backup_with_routines.sql  ssg94.before.backup_with_routines.sql  | grep -i "INSERT INTO .ssg_version."
echo "####################################"
echo ""


echo "####################################"
echo "Execute  'docker ps -a'   to validate running docker containers for API GW 9.4 and 9.2"
echo "docker ps --format \"table {{.ID}}\t{{.Names}}\t{{.RunningFor}}\t{{.Status}}\t{{.Ports}}\" "
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.RunningFor}}\t{{.Status}}\t{{.Ports}}"
echo "####################################"
echo ""


echo "####################################"
echo "Show current API GW 9.4 MySQL 5.7 databases"
echo "Validate that 'ssg' database exists "
echo "docker exec -it -u root -e TERM=xterm  `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=7layer  -e \"show databases;\" "
docker exec -it -u root -e TERM=xterm  `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=7layer  -e "show databases;"
echo "####################################"
echo ""


echo "####################################"
echo "Review for any delta of the MySQL ssg database after import"
echo "docker exec -it -u root -e TERM=xterm `docker ps -a | grep mysql:5.5 | awk '{print $1}'`  mysql --user=root --password=7layer  ssg -e \"show tables;\" "
docker exec -it -u root -e TERM=xterm `docker ps -a | grep mysql:5.5 | awk '{print $1}'`  mysql --user=root --password=7layer  ssg -e "show tables;" > ssg92.tables.txt
echo "docker exec -it -u root -e TERM=xterm `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=7layer  ssg -e \"show tables;\" "
docker exec -it -u root -e TERM=xterm `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=7layer  ssg -e "show tables;" > ssg94.tables.txt
echo "Observer for any delta from the below command"
echo "diff ssg92.tables.txt ssg94.tables.txt"
diff ssg92.tables.txt ssg94.tables.txt
echo "####################################"
echo ""


echo "####################################"
echo "Show current API GW 9.4 admin user in the MySQL 5.7 ssg database"
echo "docker exec -it -u root -e TERM=xterm  `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=7layer ssg -e \"SELECT name,login,password,enabled,expiration,password_expiry FROM internal_user;\" "
docker exec -it -u root -e TERM=xterm  `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=7layer ssg -e "SELECT name,login,password,enabled,expiration,password_expiry FROM internal_user;"
echo "####################################"
echo ""


echo "####################################"
echo "Show current API GW 9.4 admin user in the intermediate configuration file on the AIP GW 9.4 container"
echo "docker exec -it -u root -e TERM=xterm ssg94 /bin/bash -c \"grep -i -e l7.login -e l7.password /opt/SecureSpan/Gateway/node/default/etc/bootstrap/bundle/001_update_admin_user.xml.req.bundle\" "
docker exec -it -u root -e TERM=xterm ssg94 /bin/bash -c "grep -i -e l7.login -e l7.password /opt/SecureSpan/Gateway/node/default/etc/bootstrap/bundle/001_update_admin_user.xml.req.bundle"
echo "####################################"
echo ""


echo "####################################"
echo "Show all 'new' files created or linked in API GW 9.4 container with mtime of 1 day. Excluding lock (LCK) files"
echo "docker exec -it -u root -e TERM=xterm `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c \"find /opt -type f -mtime -1 -ls | grep -i -v -e '.LCK'\" "
docker exec -it -u root -e TERM=xterm `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c "find /opt -type f -mtime -1 -ls | grep -i -v -e '.LCK'"
echo "####################################"
echo ""


echo "####################################"
echo " View the license.xml file that was copied to the API GW 9.4 container bootstrap folder before copied to the MySQL 5.7 ssg db table "
echo "docker exec -it -u root -e TERM=xterm `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c \"ls -lart  /opt/SecureSpan/Gateway/node/default/etc/bootstrap/license \" "
docker exec -it -u root -e TERM=xterm `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c "ls -lart  /opt/SecureSpan/Gateway/node/default/etc/bootstrap/license "
echo "####################################"
echo ""


echo "####################################"
echo "View logon count for the API GW 9.4 admin user via MySQL 5.7 ssg db"
echo "docker exec -it -u root -e TERM=xterm `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=gateway --password=7layer ssg -e \"select hex(goid), version, hex(provider_goid), login, fail_count, last_attempted, last_activity, state from logon_info;\" "
docker exec -it -u root -e TERM=xterm `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=gateway --password=7layer ssg -e "select hex(goid), version, hex(provider_goid), login, fail_count, last_attempted, last_activity, state from logon_info;"
echo "####################################"
echo ""


echo "####################################"
echo "View the API GW 9.4 MySQL 5.7 mysql.user table"
### docker logs `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  2>&1 | grep  "GENERATED ROOT PASSWORD"
echo "docker exec -it -u root -e TERM=xterm `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=7layer  ssg -e \"SELECT User,account_locked,password_expired,password_last_changed,authentication_string FROM mysql.user;\" "
docker exec -it -u root -e TERM=xterm `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=7layer  ssg -e "SELECT User,account_locked,password_expired,password_last_changed,authentication_string FROM mysql.user;"
echo "####################################"
echo ""


echo "####################################"
echo "To remove any locked account (including pmadmin SSG Admin User ID) from the MySQL 5.7 ssg logon_info table  {or any account}"
echo "docker exec -it -u root -e TERM=xterm  `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=7layer ssg -e \"delete from logon_info where login ='pmadmin';\" "
echo "docker exec -it -u root -e TERM=xterm  `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=7layer ssg -e \"truncate logon_info;\"  "
echo "####################################"
echo ""


echo "####################################"
echo "To change root password for MySQL 5.7 mysql.user db"
echo "docker exec -it -u root -e TERM=xterm `docker ps -a | grep mysql:5.7 | awk '{print $1}'`  mysql --user=root --password=OLD_PASSWORD -e  \"SET PASSWORD FOR 'root'@'localhost' = PASSWORD('7layer');\" "
echo "####################################"
echo ""


echo "####################################"
echo "Sleep 30 seconds to address restart health check time-stamp issue with API GW 9.4"
sleep 30
echo "####################################"
echo ""


echo "####################################"
echo "Address API GW 9.4 container health check upon stop/start or restart gap.  (base.sh script)"
echo "docker exec -it -u root -e TERM=XTERM `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c \"date +%s -r /opt/SecureSpan/Gateway/node/default/var/started  && date +%s -r /opt/SecureSpan/Gateway/node/default/var/preboot\" "
docker exec -it -u root -e TERM=XTERM `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c "date +%s -r /opt/SecureSpan/Gateway/node/default/var/started  && date +%s -r /opt/SecureSpan/Gateway/node/default/var/preboot"
echo "Touch to update date-time stamp for one file: /opt/SecureSpan/Gateway/node/default/var/started"
echo "docker exec -it -u root -e TERM=XTERM `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c \"touch /opt/SecureSpan/Gateway/node/default/var/started\" "
docker exec -it -u root -e TERM=XTERM `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c "touch /opt/SecureSpan/Gateway/node/default/var/started"
docker exec -it -u root -e TERM=XTERM `docker ps -a | grep caapim/gateway:latest | awk '{print $1}'` /bin/bash -c "date +%s -r /opt/SecureSpan/Gateway/node/default/var/started  && date +%s -r /opt/SecureSpan/Gateway/node/default/var/preboot"
echo "####################################"
echo ""


echo "####################################"
echo "Sleep 30 seconds to allow health check status to update for API GW 9.4"
echo "May also monitor health and overall status with:   docker inspect ssg94 "
sleep 30
echo "####################################"
echo ""


echo "####################################"
echo "Execute  'docker ps -a'   to validate running docker containers for API GW 9.4 and 9.2"
echo "docker ps --format \"table {{.ID}}\t{{.Names}}\t{{.RunningFor}}\t{{.Status}}\t{{.Ports}}\" "
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.RunningFor}}\t{{.Status}}\t{{.Ports}}"
echo "####################################"
echo ""


echo "################################"
echo "Test authentication with the SSG backup URL for API 9.2 TCP 8444 - should see minimal of six (6) lines"
echo "curl -s --insecure  -u pmadmin:7layer  https://$(hostname -s):8444/ssg/backup | grep -e 'title' -e 'Gateway node' -e 'input' -e 'form action' "
echo "#########           ############"
curl -s --insecure  -u pmadmin:7layer  https://$(hostname -s):8444/ssg/backup | grep -e "title" -e "Gateway node" -e "input" -e "form action"
echo "################################"
echo ""


echo "################################"
echo "Test authentication with the SSG backup URL for API 9.4 TCP 8443 - should see minimal of six (6) lines"
echo "curl -s --insecure  -u pmadmin:7layer  https://$(hostname -s):8443/ssg/backup | grep -e 'title' -e 'Gateway node' -e 'input' -e 'form action' "
echo "#########           ############"
curl -s --insecure  -u pmadmin:7layer  https://$(hostname -s):8443/ssg/backup | grep -e "title" -e "Gateway node" -e "input" -e "form action"
echo "################################"
echo ""

View of the API Gateway via the MS Windows API GW UI for both API GW 9.2 (using the 9.3 UI) and API 9.4 (using the 9.4 UI). The API GW Policies will be migrated from API 9.2 to API 9.4 via the export/import of MySQL ssg database. After import, the API GW 9.4 docker image will ‘auto’ upgrade the ssg database to the 9.4 version.

Interesting view of the API GW 9.4 MySQL database ‘ssg’ after import and a restart (that will ‘auto’ upgrade the ssg database version). Note multiple Gateway “nodes” that will appear after each ‘docker restart containerID’