Lately we have seen alot of Event ID 5120 with a status code of STATUS_IO_TIMEOUT or STATUS_CONNECTION_DISCONNECTED during rebooting a node.

Here is a statement from Microsoft about the issue and what to do when rebooting a node.

In the May cumulative update we introduced SMB Resilient Handles for the S2D intra-cluster network to improve resiliency to transient network failures.  This has had some side effects in increased timeouts when a node is rebooted, which can effect a system under stress.  Symptoms include event ID 5120’s with a status code of STATUS_IO_TIMEOUT or STATUS_CONNECTION_DISCONNECTED when a node is rebooted.

Until a fix is available, a workaround that addresses the issue is to invoke Storage Maintenance Node prior to rebooting a node in a Storage Spaces Direct cluster.  Let’s say when patching for example.

So, first drain the node, then invoke Storage Maintenance Mode, then reboot.

Here’s the syntax:

Get-StorageFaultDomain -type StorageScaleUnit | Where-Object {$_.FriendlyName -eq “<NodeName>”} | Enable-StorageMaintenanceMode

Once the node is back online disable Storage Maintenance Mode with this syntax :

Get-StorageFaultDomain -type StorageScaleUnit | Where-Object {$_.FriendlyName -eq “<NodeName>”} | Disable-StorageMaintenanceMode

Please note that Cluster Aware Updating does not put your nodes in Storage Maintenance mode.