I ran into a situation lately where we had a disk throw a predictive failure alert on an HPE Proliant server at a remote site running VMware ESXi 6.7. Using the iLO is a great way to figure out which drive bay contains the failed drive, but it does not provide a mechanism to “blink” the UID so that the on-site technician can easily identify and replace the correct drive.
This particular server belonged to a vSAN cluster. vSAN disk management within vSphere provides the ability to be able to enable the LED on a drive, but that functionality did not work for us. In an earlier post, I found that I could use the Smart Storage Administrator CLI (SSACLI) utility that is included with the custom HPE ESXi iso in order to determine drive bay location, so I was wondering if we could use the same tool to enable the UID (blinking light) for a specific drive.
I found this guide on github which had a lot of great examples of the usage of the SSACLI utility, including the ability to enable the UID on a particular drive. First, I ran the blow command to list all of the disks in the server to confirm the bay location of the failed drive.
/opt/smartstorageadmin/ssacli/bin/ssaclictrl slot=0 pd all show detail
The output will look something like this, with an entry for each disk in the system:
physicaldrive 1I:1:4 Port: 1I Box: 1 Bay: 4 Status: OK Drive Type: Unassigned Drive Interface Type: Solid State SAS Size: 1.9 TB Drive exposed to OS: True Logical/Physical Block Size: 512/4096 Firmware Revision: HPD2 Serial Number: – WWID:58CE38EE2057D3C6 Model: HP MO001920JWFWU Current Temperature (C): 29 Maximum Temperature (C): 30 Usage remaining: 100.00% Power On Hours: 592 SSD Smart Trip Wearout: True
After identifying the failed drive, and its location of 1I:1:4, I can then tailor the command to enable the UID of that particular drive.
By default, a vRealize Automation (vRA) 8.x deployment sets the password expiration policy on the root account to one year. If the password is not manually updated in that year, the account will expire which will prevent any future upgrades from succeeding. This issue is well documented in VMware KB 2150647, and if needed, the method for resetting the root password. David Ring did a great write-up on modifying the password expiration policy causing this condition. I would recommend bookmarking that procedure, because I ended up needing it more than one time.
I deployed the vRA environment at 8.0, and have updated it incrementally up to 8.4. So far, I’ve noticed with every upgrade that the password policy would be reset back to the default expiration value of 365 days. I have had to add an additional validation check at the end of our upgrade process to ensure that the account expiration polices remain correct after an update is completed.
I recently attempted to deploy the mon_disk_space script from VMware KB 2058187. The instructions from the KB are straightforward; users only need to modify the below two values to get started:
# Please provide email for alert messages
# Please provide percentage threshold for PostgreSQL used disk space
The script should send an email to the address provided when the PostgreSQL volume is utilizing more capacity than the provided (as a percentage) threshold. For my testing, I put the initial value at 10 knowing it would trigger the email to send.
After copying the script to /etc/cron.hourly on the VCSA and running ‘chmod 700 /etc/cron.hourly/mon_disk_space‘ to ensure the script is executable by cron, emails still were still not showing up, even after waiting over an hour. The troubleshooting began…
First, make sure cron is attempting to execute the script by running:
grep cron /var/log/vmware/messages
You should find entries similar to this in the log:
If you see those entries, then cron is able to execute the script, so the problem seems to be within the script itself. If you take a look at line 9 of the provided script, the variable ‘db_type’ is populated by running:
Let’s take a look at the provided script again. Lines 10-12 are looking for a single “PostgreSQL” entry, but the VCSA is providing back two values. This condition causes the script to exit, which explains why no emails are sent.
Simply adding a ‘uniq’ to line 9 will cause the script to produce a single, unique value. Line 9 of mon_disk_space ends up looking like this:
After making the change, I manually triggered the cron job by running run-parts /etc/cron.hourly. The alert properly triggered, and the email showed up in my inbox. Lastly, don’t forget to go back and modify the alerting threshold on line 6 of the script to something more sensible.
Disclaimer: As an attendee of Tech Field Day 19, my flights, accommodations, and other expenses were paid for by Tech Field Day. I was not required to write about any of the content presented, and I have not been compensated in any way for this post. Some presenting companies chose to provide gifts to attendees; any such gifts do not impact my reviews or opinions as expressed on this site. All opinions in this post are my own, and do not necessarily represent that of my employer or Gestalt IT.
The Cloud Automation Services Suite is a SaaS-based offering designed for multi-cloud management and automation and is currently composed of 3 products:
Cloud Assembly serves as the blueprinting engine, allowing users to deploy workloads, such as infrastructure or containers, to any connected public or private cloud environment. Service Broker is the “storefront” of sorts. It functions as a catalog of services available to users. Tailored policies and request forms can be applied to those available services to non-disruptively maintain organizational controls such as naming, access, cost controls, etc. Code Stream is the CI/CD platform of the product. It leverages the concept of “pipelines” to automate the delivery of applications or infrastructure. Users can integrate existing tools like Gitlab and Jenkins while using Code Stream to orchestrate the flow. Cody does an absolutely excellent job of explaining and demonstrating these products in his Tech Field Day presentations, so be sure to check those out for all the juicy details.
Those familiar with VMware’s current vRealize Automation product (vRA) will recognize that CAS clearly is a logical progression of the technology vRA offers. Improving the on-boarding process and developing new integrations with third-party tools and platforms are just two of the ways they’ve used customer feedback to improve the product. What remains to be seen is exactly what parallels will exist between CAS and the next version of vRA, other than the obvious difference in deployment models. Cody hints that we should pay attention to announcements at VMWorld 2019 for more information, and I intend to do just that.
What could not be ignored during the Tech Field Day presentations on CAS was just how flexible this product is. Perhaps the most concise description of that comes from Pietro Piutti:
Being able to connect to both public and private clouds and deploy workloads in just a matter of minutes provides an easy on-ramp for customers. Achieving similar functionality in recent versions of vRA is possible, but the configuration required to do so was more complicated.
That flexibility doesn’t end with the Cloud Assembly product. The entire Cloud Automation Services suite was designed with an “API-first” mentality. That allows the product to be extremely extensible. VMware isn’t asking customers to give up their tools. Do you want to continue to leverage GitHub or GitLab for your code repos? CAS supports that. Are you using Ansible or Puppet for your configuration management? No problem. While watching the demonstrations live at Tech Field Day, I couldn’t help but notice that VMware’s focus for this platform is to make it consumable, regardless of technical approach.
“We’ve taken concepts that could be very complex, and we’ve given people an on-ramp to use them.”
– Cody De Arkland, Technical Marketing Architect at VMware, on using Blueprints in VMware Cloud Assembly
Working in this field, it’s common to see a new product or platform that is impressive in function but requires users to abandon their existing tools or processes. Those processes then have to be rebuilt on the new platform with new methods. That isn’t the play by VMware with Cloud Automation Services. They understand that for this product to be adopted, it must be usable, and they must allow users, administrators, and developers to bring their own tools and processes.
Keep in mind that VMware Cloud Automation Services is a SaaS offering, and that comes with the added benefit of not having to manage the infrastructure to perform these functions. But, SaaS products aren’t for everyone. Although CAS is being touted as the next evolution of vRA, I don’t see vRA being deprecated in favor of CAS. I hope that feature parity is maintained between CAS and vRA moving forward so that the customer can decide what product is right for them, without sacrifice. Cody is refreshingly transparent in his presentations and makes clear that all of a customers’ desired product integrations may not exist yet, but that they take feedback very seriously and are rapidly developing to accommodate for customers’ needs. I’m looking forward to getting an update on the future of these products at VMWorld 2019.
In a nutshell, VMware’s Cloud Automation Services platform allows organizations to embrace DevOps methodologies without attempting to funnel customers into using a particular set of tools. I’m excited to see what is added and refined in the product, as this platform only became generally available early in 2019. If you want to get your hands on the product to learn more, VMware offers a hands-on lab specific to Cloud Automation Services.
Recently I had the opportunity to present on my journey deploying vSAN 2-node clusters at the Central Ohio VMUG UserCon, as well as a local Pittsburgh VMUG event. Overall, it was a great experience and I’m thankful to have had the opportunity! Check out the recording of the session below, and don’t forget to check out my post on Building a 2-node Direct Connect vSAN cluster.