Hello Friends, it’s been a while since my last post. With covid and all my inspiration has not been the best. And with all my spare time going into renovating the house there has not been so much time. But there will be more posts coming soon.
I got the chance to borrow some Lenovo MX1021 Azure Stack HCI nodes to play with and test. And with the soon to be release public preview of 21H2 release i wanted to start getting some real HW experience. So stay tuned for some Azure Stack HCI blog posts coming soon.
I setup the servers as a normal 2019 Storage Spaces Direct cluster, and registered it to Azure with the powershell command Register-AzStackHCI command. It went fine and the cluster was registered and happy. You need to register it within 30 days, and it needs to report every 30 days. If it looses connectivity for more then 30 days you can\’t deploy new vm’s to the cluster. It can still run what is on it, but it can\’t accept new VM’s.
So little over a week or so ago i noticed that the cluster was no longer reporting to Azure in Windows Admin Center and it was saying cluster was not registered.
This had me puzzled and i looked in azure and it had gone 22 days since it last reported. So i was hmm ok, i tried to register again but that failed.
Now i noticed there was a -RepairRegistration switch on the Register-AzStackHCI command. But that failed horrible with
Looking at the command i ran before trying to register the cluster again you will see that it fails on setting and verifying the certificate that it generates to secure the communications between the cluster and Azure.
Now running the repair command actually communicates with Azure and it says in the portal that the cluster communicated with Azure. As i did it one week ago it says so in the Azure Portal.
A fellow MVP had the exact same issue and created a Support ticket in Azure for a client. And got a quick reply and after some troubleshooting, found out that the cluster had had a time sync issue. This is logged in the Microsoft-AzureStack-HCI/Admin event log. You can get it out to a CSV by running this command. Get-WinEvent -LogName microsoft-AzureStack-HCI/Admin | sort-object timeCreated | select-object timecreated, machinename, id, ContainerLog, LevelDisplayName, userid, message | export-csv c:\\diag\\AzureHCI-Admin.csv -notype
\”14.05.2021 09:52:26\”,\”HYP220007.contoso.com,\”510\”,\”Microsoft-AzureStack-HCI/Admin\”,\”Error\”,\”S-1-5-18\”,\”Azure Stack HCI detected unexpected changes to the system clock that interfere with syncing state to the cloud. Azure Stack HCI is now out of policy. Please reinstall the operating system on this computer.\”
Now this is strange as the nodes are synced with the domain controller and i have not noticed any time drift in the domain, and neither had he.
I on the other hand had a different output but with the same result that the cluster was not synced.
\”,\”515\”,\”Microsoft-AzureStack-HCI/Admin\”,\”Error\”,\”S-1-5-18\”,\”Azure Stack HCI detected unexpected changes to the system clock that interfere with syncing state to the cloud.
Now if you look at the event 510, that my friend got it says, Please reinstall the operating system on this computer. And i was like ehmmm what?? This was confirmed by the HCI Support person that the only way for now is to reinstall the nodes one by one and add them back to the cluster to clear this error.
Now if your cluster is empty of vm’s you can simply remove the S2D config, and destroy the cluster. Then create the cluster, enable s2d and register the HCI cluster back to Azure. This works, but is only viable on a new cluster that does not have any workloads on it.
I will come back with a resolution from the support team or the MS HCI team to if and when there is a repair option on this.
But if this happens to you, the only 2 options today is to recreate the cluster, or reinstall each cluster node and re add to cluster and run the registration again.
Update: The feedback from support is that is seems to not be a time sync issue but something else causing this. As there has not been detected any time changes in the logs.