Building a 2-Node Direct Connect vSAN 6.7 Cluster for ROBO

2-Node vSAN deployments can be a great choice for remote office/branch office (ROBO) scenarios, as it can be managed by the same vCenter as your other infrastructure. Setting up the 2 nodes in a direct connect configuration can be beneficial if the remote site has a) limited switch port availability, or b) no 10GB switching available.

Note: Items in bold are of particular significance, as they were a sticking point for me during configuration. Paying attention to these items will save you a little trouble.

Each design will vary depending on network capabilities at the location, VM workload size, etc. In my case, we had limited network availability and a very light VM workload for our vSAN ROBO clusters. For our physical design, we directly connected two 10Gb ports for vSAN traffic, and two other ports for mgmt/VM/vMotion traffic. Our configuration incorporates NIC card/controller redundancy, as any configuration should. In a more demanding environment, I would recommend splitting the vMotion or VM traffic out onto its own vmk using separate ports. This image illustrates our logical wiring design (sans iLO).

For this setup, 6 IP addresses are needed:

  • 1 routable management IP per host (ROBO site VLAN)
  • 1 non-routable vSAN vmk IP per host (ROBO site non-routable VLAN)
  • 1 routable management IP for witness appliance (Witness site VLAN)
  • 1 routable witnessk vmk IP for witness appliance (alternate Witness site VLAN)

Once all the cabling and IP requirements have been satisfied, ESXi 6.7 can be installed following your normal procedures. Ensure that all disks that will be used by vSAN are configured in pass-through mode, and have caching disabled on their storage controller. Don’t forget to set the NTP server configuration on the hosts!

vSAN is completely dependent on proper network configuration and functionality, so it is important to take extra care during the setup of the DVS and port groups that the vSAN cluster will be using. In my case, these were the first vSphere deployments at their respective sites, so we created a new Distributed Virtual Switch (DVS) for each datacenter with 4 uplinks. Then, 2 port groups need to be created within those DVSs. The first port group is for host management, VM traffic, and vMotion and will be set to only use uplinks 1 and 2. Traffic on that port group will be load balanced based on physical NIC load. 

Management, VM, and vMotion traffic port group configuration
Management, VM, and vMotion traffic port group configuration

The second port group will be used for vSAN traffic only, and should be configured to only use uplinks 3 and 4.

vSAN port group configuration
vSAN port group configuration

At this point you’ll add your new hosts to the DVS. Assign the vmnic ports designated for management to the appropriate port group, and assign the vSAN vmnic ports to the newly created vSAN port group. Then, migrate the management vmk to the correct port group. The hosts should not be running anything through the standard vSwitch. This is a good opportunity to make sure you have vMotion enabled on the management vmk.

Add another VMkernel port and attach it to the vSAN port group, with vSAN being the only enabled service on the vmk. This will need to be done on each host.

vSAN vmk configuration

Once you have the vSAN VMKs on each host, you can confirm the vSAN kernel ports can ping each other by running the below command from each host:

vmkping <IP of vsan VMK on target> -I vmk1

Update (3/19): as pointed out to me by Benjamin Colart, RDMA enabled network adapters require additional configuration, but you may choose to disable RDMA in the BIOS. Improper configuration of RDMA without disabling it may result in connectivity problems within your vSAN environment

vSAN 2-node deployments require a witness appliance to maintain quorum. The witness is simply another ESXi host running in an alternate site. VMware provides a virtual appliance available for download on their website that can be deployed within your environment. Deploy the OVF at your witness site, and be sure to split the witness and management networks onto different subnets. It will need to be configured similar to any other host in your environment. Upon completion, right click your witness site datacenter object in vCenter and add your witness host to it. Up until this point, the witness appliance was simply a VM running ESXi, but it needs to be added to vCenter as a host in order to function as a vSAN witness. Once added, navigate to the VMkernel configuration for the witness appliance and modify the witness vmk so it uses the static IP obtained earlier. Be sure to check the box to override the default gateway for the adapter, and enter the gateway for the subnet the vmk resides on.

Let’s stop for a moment and consider what’s configured:

  • Two vSAN hosts are configured and reside within the ROBO site datacenter in vCenter.
  • One witness appliance is configured and resides in the witness site datacenter in vCenter.
  • A cluster has not yet been created, and vSAN is not enabled.

Before creating the vSAN cluster, patch the hosts and witness appliance to the latest version supported in your environment. It is recommended that the witness appliance runs at the same patch level as the nodes in the vSAN cluster.

In vCenter, create a new cluster within your ROBO site. Do not enable DRS or HA while creating the vSAN cluster object. Then, add the two vSAN hosts into the cluster. Within the cluster configuration, select vSAN -> Services and click Configure in the pane on the right. The vSAN configuration wizard will walk through the configuration of the core vSAN components: 

  • vSAN configuration type – Two host vSAN cluster
  • Optional –  enable deduplication and compression (for all flash storage hosts)
  • Optional – enable encryption (with enterprise vSAN license)
  • Claim disks for vSAN (see Associating NAA IDs with Physical Drive Bay Location on HPE Servers for help on HPE servers)
  • Select the previously configured witness appliance host as the vSAN cluster witness

Typically, that is all that is required for enabling vSAN for two host deployments, however direct connect clusters require an additional step. By default, vSAN sends witness traffic over the vSAN network. 

As can be seen in the image above, the vSAN network is directly connected to the other host, and is not routable for the witness. To fix the witness communication issue, a static route needs to be added to each of the vSAN hosts.

esxcli vsan network ip add -i vmk0 -T=witness

This command instructs the hosts to send any traffic destined for the witness to be sent out vmk0, which in this example is the management vmk. Once you have added the route to each host, run:

esxcli vsan network list

You should see vmk1 used for vSAN, and vmk0 used for witness traffic.

Witness traffic is now running through the management vmk
Witness traffic is now running through the management vmk

The only remaining tasks are to enable HA and DRS (license permitting) and to license the cluster. Within vCenter, navigate to your new vSAN cluster -> Monitor -> vSAN -> Health and run a  retest of the vSAN health. Assuming everything else has been configured properly you should be welcomed by a screen of green check marks!

Continue Reading

Associating an NAA ID with Physical Drive Bay Location on HPE Servers Running ESXi 6.7

While deploying vSAN clusters on HPE servers, I came across a problem. The vSAN wizard did not display the physical bay number of the drive, so I had to figure out another way to associate the displayed NAA ID of the drive with the physical drive location. 

Disks as they appear in the vSAN configuration wizard
Disks as they appear in the vSAN configuration wizard

In my situation, I want drives in bay 1, 3, and 5 to be in disk group 1, drives in bays 2, 4, and 6 to be in disk group 2. To visualize this end goal, check out the image below.

Desired disk group layout
Desired disk group layout

In order for this solution to work, it is required to have installed ESXi with the HPE ESXi image. The HPE custom image includes the Smart Storage Administrator CLI (SSACLI) utility, which is the secret sauce for identifying drive location programmatically in HPE servers. For ESXi 6.5 and forward, the following command will return physical drive information:

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd all show detail

The output will look something like this, with an entry for each drive.

physicaldrive 1I:1:4
Port: 1I
Box: 1
Bay: 4
Status: OK
Drive Type: Unassigned Drive
Interface Type: Solid State SAS
Size: 1.9 TB
Drive exposed to OS: True
Logical/Physical Block Size: 512/4096
Firmware Revision: HPD2
Serial Number: –
WWID: 58CE38EE2057D3C6
Model: HP MO001920JWFWU
Current Temperature (C): 29
Maximum Temperature (C): 30
Usage remaining: 100.00%
Power On Hours: 592
SSD Smart Trip Wearout: False
PHY Count: 2
PHY Transfer Rate: 12.0Gbps, Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: True
Sanitize Estimated Max Erase Time: 1 minute(s), 10 second(s)
Unrestricted Sanitize Supported: True
Shingled Magnetic Recording Support: None
Drive Unique ID: 58CE38EE2057D3C56193000858CE38EE

Pay attention to the highlighted WWID value. I have intentionally left off the last character. The highlighted string is the same as the NAA ID that appears in the vSphere wizard, minus the last character. We now know that the drive in this example belongs in bay 4, and is a capacity drive in disk group 2. Similar to this example, you can now use the output for the other drives to identify their physical location of each NAA value, and assign it to the proper disk group for your design.

Alternatively, you can use the serial number (removed from this example) to match up with what you can find in the iLO interface. 

However, logging into every host and running the command can be a bit tedious. Instead, I threw together a quick powershell script that leverages Plink to pull that information from specified hosts. You could easily modify the script to process multiple hosts at once or process the output however you like.
 

########## Define the below values ##########
$vmhost = "esx1"
$user = "root"
$Passwd = "password"
$PathToPlink = "C:\plink.exe"
######## DO NOT EDIT BELOW THIS LINE ########

$cmd = "/opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd all show detail"

$plink = "echo y | " + $PathToPlink + " " + "-ssh" + " " + "$user" + "@" + $vmhost + " " + "-pw" + " " + $Passwd + " " + '"' + $cmd + '"'

Connect-VIServer $vmhost -User $user -Password $Passwd
#Starts the ssh service on the host if it is not running
$sshstatus = Get-VMHostService  -VMHost $vmhost| Where-Object {$psitem.key -eq "tsm-ssh"}
if ($sshstatus.Running -eq $False) {
    Get-VMHostService | Where-Object {$psitem.key -eq "tsm-ssh"} | Start-VMHostService
    }
Write-Verbose -Message "Executing $cmd on $vmhost"

$cmdOutput = Invoke-Expression -command $plink

#If SSH was stopped on the host before running this script, this will put it back to a stopped state
if ($sshstatus.Running -eq $False) {
    Get-VMHostService | Where-Object {$psitem.key -eq "tsm-ssh"} | Stop-VMHostService -Confirm:$False
    }

$cmdOutput
Continue Reading

VMworld 2018: Lessons Learned

VMworld 2018 came and went in a flash. Not only was it my first time attending VMworld, but it was my first time at a tech conference, as well as my first time in Las Vegas.

There were a lot of lessons learned during VMworld 2018 for me, so I’ll try to summarize my experience and advice in a few quick lists.

Do:

  • Meet new people. For a conference full of nerds, these people are remarkably social and willing to talk to complete strangers. They collectively hold an ocean of insight that will go untapped if you don’t make those connections.
  • Participate in the hackathon. You may miss out on a party, but what you’ll get in return is time working on a project in small groups with some of the sharpest minds the conference has to offer. Simply choose a team doing a project related to something of interest to you; the learning will follow.
  • Wear comfortable shoes. I was putting in a couple of miles per day, and although I was doing alright, I heard plenty of complaints of hurting feet at the end of day 1.
  • Drink lots of water. Whether you choose to imbibe at any of the parties or not, you are still likely to be much more active during the conference then you would be on a typical day. Stay hydrated.
  • Try to get the playlist that runs in the hands-on labs area, before sessions, and in the village. That playlist is amazing but I can’t find it anywhere, and my phone was having a hard time detecting what was playing.

Don’t:

  • Try to do it all. Take some time for yourself, and relax. Your brain can only take so much information at once. Pace yourself.
  • Stay in your comfort zone. I would argue that the connections I made at the conference are more valuable than the content of the sessions. Talk to the experts. Talk to other customers. Talk about problems you’re facing, recent victories, hobbies, whatever. Do the hackathon. Attend roundtables. Attending sessions is great, but if that’s all you do, you’re missing a large part of what the conference offers.
  • Attend 100 level sessions. These are largely geared towards managers or entry level administrators. If you’ve been working in vSphere for any amount of time, focus on 200 and 300 level sessions.
  • Go back to work on Friday. VMworld is incredible, but at the end of it, you’ll likely want to crash for a few days.

Takeaways:

My first VMworld experience was overwhelming, but incredible. That’s largely in part to my TAM. He put in a lot of work making sure I was getting into valuable sessions and attending events I wouldn’t have otherwise known about, or chose to participate in on my own. I credit him for helping me out of my comfort zone.  Attending VMworld has been something I’ve always wanted to do, and it did not disappoint. It has inspired a drive for improvement not only in the technical aspects of my day job, but also in my own contributions back to the community.

Cheers to moving forward.

Continue Reading

Perl Script to install MPI Libraries on Raspberry Pi

This particular script runs through the first 16 Pi’s in our super-computer cluster and installs three dependent MPI Libraries: libcr-dev, mpich2, and mpich2-doc.

#!/usr/bin/perl
 
#Creates a loop to do all the pi's in the cluster
for ($count = 2; $count <= 17; $count++) {
 
  my $host = "192.168.0.$count";
  #Installs the libcr-dev library
  system("ssh pi@$host 'sudo apt-get --yes --force-yes install libcr-dev'");
    if ( $? == -1 )
    {
      print "command failed: $!n";
    }
    else
    {
      print "command exited with value %d", $? >> 8;
    }

  #Installs the mpich2 library
  system("ssh pi@$host 'sudo apt-get --yes --force-yes install mpich2'");
    if ( $? == -1 )
    {
      print "command failed: $!n";
    }
    else
    {
      print "command exited with value %d", $? >> 8;
    }
    
  #Installs the mpich2-doc library
  system("ssh pi@$host 'sudo apt-get --yes --force-yes install mpich2-doc'");
    if ( $? == -1 )
    {
      print "command failed: $!n";
    }
    else
    {
      print "command exited with value %d", $? >> 8;
    }
}
Continue Reading

Perl Script to Modify Hostname and IP

I have taken on Perl scripting as a directed study. One of the projects going on at work is to take our 32 RaspberryPi’s and turn them into a “super computer” of sorts. We have the master and node images captured, but I have been working on a script that we can run to modify the hostnames and IP addresses of all the nodes post-imaging.

Below is my script which accepts user input for the Hostname and IP.

# This needs to be run with sudo
#
print "Please enter the new hostname: n";
$newName = ;
chomp($newName);

#Attempt to edit the hostname file
$hostFile = "/etc/hostname";

open (FILE1, ">$hostFile") or die "Can't open $hostFile: $! n"; 
print FILE1 "$newNamen";
close FILE1;

print "Hostname file has been modified! n";

#Edit the hosts file and replaces the hostname

$hostsFile = "/etc/hosts";

open(FILE, "<$hostsFile") || die "Can't open $hostsFile: $! n";
my @lines = ;
close(FILE);

my @newlines;
foreach(@lines) {
   $_ =~ s/NodePi01/$newName/g;                       
   push(@newlines,$_);
}
print "Hosts orginal contents have been modified with new Hostname... n";
open(FILE2, ">$hostsFile") || die "Can't open $! n";
print FILE2 @newlines;
close(FILE2);
print "Hosts file has been successfully modified! n";

#Does the IP configuration....
print "Please enter the new IP Address: n";
$newIP = ;
chomp($newIP);
$dhcp = "dchp";
$static = "static";
$ipFile = "/etc/network/interfaces";

open(IP, "<$ipFile") || die "Can't open $ipFile: $! n";
my @lines2 = ;
close(IP);

my @newlines2;
foreach(@lines2) {
   $_ =~ s/iface eth0 inet dhcp/iface eth0 inet $staticn address $newIPn netmask 255.255.255.0n gateway 192.168.0.1n/g;                       
   push(@newlines2,$_);
}
print "Interface file orginal contents have been modified with new settings... n";
open(FILE2, ">$ipFile") || die "Can't open $ipFile $! n";
print FILE2 @newlines2;
close(FILE2);
print "IP Address and network settings have been successfully modified! n";

#Code to reboot the node 
$reboot = '/sbin/init 6';
print "You must reboot to apply changes, do you want to reboot now? (yes/no)n";
chomp($input = );

if($input eq "yes") {
print "Now rebooting ....n";
system $reboot;
	}
else{
print "Bye !n";
}
Continue Reading