Remote Proof of Concept testing seems to be gaining in popularity recently. The major difference in on-site vs remote testing is the access to HW to test drive unplug or physical network failure. What I use in case of disk failure testing in a vSAN cluster is vSAN Disk Fault Injection script that is available on ESXi. There is no need to download anything, it is there by default, check your /usr/lib/vmware/vsan/bin
path but use the script for POC/homelab only.
data:image/s3,"s3://crabby-images/deaaa/deaaafacf5ef46f9649d6ea2235fc5ad1e6c40aa" alt=""
We need to have a device id do run the script, we can test a cache or capacity drive per chosen disk group. In the example below I picked mpx.vmhba2:C0:T0:L0 which was a cache drive (Is Capacity Tier:false
).
You can use esxli vsan storage list
for that:
data:image/s3,"s3://crabby-images/94490/9449091d23b9e60784767d3e58e4961e488ca5a2" alt=""
Or check in the vCenter console under Storage Devices:
data:image/s3,"s3://crabby-images/500e2/500e228f63f12295a49e80a1e61e4e41fd3d9585" alt=""
Or under Disk Management:
data:image/s3,"s3://crabby-images/ed41c/ed41cd34568640899cf17f82c2bf4777690fb898" alt=""
python vsanDiskFaultInjection.pyc
has the following options:
data:image/s3,"s3://crabby-images/bc9b1/bc9b104363bf0d60c1942634204c618c6d5b9cd0" alt=""
I am using -u
for injecting a hot unplug.
data:image/s3,"s3://crabby-images/8ac30/8ac301f72651529b3767efee528b21626ea93579" alt=""
/var/log/vmkernel.log
is the place you can verify the disk status:
data:image/s3,"s3://crabby-images/e3eb5/e3eb575606903eff53c33db5ad91a35c5766b83b" alt=""
vSAN-> Disk Management will also show what is going on with a disk group that faced a drive failure.
data:image/s3,"s3://crabby-images/3a416/3a416e702232c49dada90dc7df35e32075f2d952" alt=""
And now we can observe the status of the data and the process of resyncing objects due to “compliance”.
data:image/s3,"s3://crabby-images/12323/12323161aa0229c869e60379fa0ad603f9d1377c" alt=""
After we are done with the testing, simple scan for new storage devices on the host will solve the issue.