Vembu Cluster Links: Best Practices and Recommendations for Cluster Deployment

nikolayefremov987
Aug 20, 2023
6 min read

Kind of strange question, but can't we configure Failover cluster with RAID configuration, so if a node is malfunctioning, the second node can run the workloads of VMs? The book / documentation I have are only using Storage Space Direct (S2D) for storage configuration. This is the reason for this question to popup...

vembu Cluster links

Download

The other scenario, Hyper-v replica, I can see it's a manual process, if primary VM is down, the replica didn't start automatically. This is why I may consider failover clustering instead of replication... or what should be best practice?

You can use StarWind VSAN as a shared storage on top of underlying RAID array to build Failover Cluster. StarWind will create replicated shared storage pool and you will be able to bring it to cluster as CSV. The following guide will descrive you the configuration process: -library/starwind-virtual-san-for-hyper-v-2-node-hyperconverged-scenario-with-windows-server-2016/

Zoho Corp., which builds software for businesses, is creating a cluster of small rural offices that could each house about 25 people, so that no employee would need to travel more than 10-20 km. This model will be replicated in other districts and States, said its CEO Sridhar Vembu. Excerpts:

We did a survey to find out where our people are. Based on that census, location and data, we are putting up regional centres, say in a cluster that has hundred people or so. These regional centres will have their own office and we will set these up We will take up an existing building and convert it into an office. This way, we can actually move in within 2-3 weeks.

One of the most distinguishing features of VMware Virtual SAN 6.1 is the availability of stretched cluster deployment. The stretched cluster allows the Virtual SAN customer to configure two geographically located sites, while synchronously replicating data between the two sites. This provides high availability and protection against a single site failure.This VMware white paper examines the performance aspects of a Virtual SAN stretched cluster deployment. Specifically, it examines the overhead of synchronously replicating data across two geographical sites by bench marking against a regular, single site Virtual SAN cluster deployment.

Failure scenarios of a single hard disk and entire site failure are considered. The Virtual SAN stretched cluster can handle both failure scenarios robustly. The Virtual SAN stretched cluster architecture is different from how the regular (non-stretched, single fault domain) Virtual SAN cluster behaves. The following are the main differences.

1) Write latency: In a regular Virtual SAN cluster, mirrored writes incur the same latency. In a stretched Virtual SAN cluster, the write operations need to be prepared on the two sites. Therefore, one write operation needs to traverse the inter-site link, and thereby incur the inter-site latency. The higher the latency, the longer it would take for the write operations to complete.

2) Read locality: The regular cluster does read operations in a round robin pattern across the mirrored copies of an object. The stretched cluster does all reads from the single object copy available at the local site.

One of the major distinguishing factors of a Virtual SAN stretched cluster deployment is the active/active site configuration and a metropolitan area link between the two sites. The performance of the Virtual SAN stretched cluster largely depends on bandwidth and latency available on this inter-site link. It is recommended that the inter-site link have preferably 10 Gbps bandwidth. Higher inter-site bandwidth may be required for larger sized clusters. This bandwidth is required mainly to accommodate recovery from a failure scenario. In the event of any failure, the replicas in the failed node must be recreated from the data available on the other site. In case of a single HDD failure, recovery traffic could be very high. In the experiments with site failure, recovery peak traffic was measured at over 2 Gbps. In terms of inter-site latency, higher inter-site latency affects the latency of write I/O transactions. The experiments show that with DVD Store, the Virtual SAN stretched cluster can sustain the overhead of 5ms latency without much impact to the benchmark performance. However, a 5ms inter-site latency does impact the write latency manifold when compared to a regular Virtual SAN deployment. It is recommended to limit inter-site latency to the order of 1ms, unless customer applications can tolerate high write latency.

A higher value of DOMResyncDelayMultiplier delays recovery traffic, making the recovery process longer but less intrusive to virtual machine I/O traffic. Caution must be exercised while changing this flag because higher values may make recovery very time consuming. Similarly, a lower value may make recovery traffic very disruptive. There could be extreme failure scenarios such as disk group failure (caused by SSD failure). Such a failure could cause a heavy volume of recovery traffic because of a large volume of data objects in the failed disk group; these objects now need to be backed up. HDD performance is on the critical path of these scenarios. Therefore, if such failure scenarios are foreseen, one remedy is to design the cluster with higher RPM HDDs that can sustain higher random I/O performance.

This white paper examines the various performance overheads that can exist in the Virtual SAN stretched cluster design. Testing shows that the Virtual SAN stretched cluster provides protection from site failure without introducing a significant performance penalty. This paper also describes the performance of the cluster under several failure scenarios. Testing shows that the Virtual SAN stretched cluster can adequately recover from site failure and balance recovery traffic with virtual machine I/O traffic.

The development of intratumor heterogeneity and subclonal reconstruction. Tumor composition over time (i), the resulting distribution of variant allele frequencies (VAFs) (ii), the result of successful inference of the VAF clusters (iii), and the desired output of subclonal inference (iiii). SSM, simple somatic mutation; VAF, variant allelic frequency.

SSM-based subclonal reconstruction algorithms attempt to reconstruct the subpopulation genotypes based on VAF clusters (and their associated mutation sets) identified by fitting statistical mixture models to the VAF data either without phylogenic reconstruction [18,19,21,32], before phylogenic reconstruction [33] or concurrently with it [16,17]. Often, as in Figure 1, the clusters overlap, which introduces uncertainty in the exact number of mutation sets represented in the tumor (as well as in the assignment of SSMs to clusters). Adding more clusters to the model always provides a better data fit, so to prevent overfitting, the cluster number is selected by balancing data fit versus a complexity penalty (e.g. the Bayesian information criteria) or by Bayesian inference in a non-parametric model [17,18,32]. In panel (iii) in Figure 1, the correct number of clusters has been recovered along with appropriate central VAFs.

Assuming that the correct VAF clusters can be recovered, the subclonal lineages corresponding to each mutation set must still be defined. Defining the subclonal lineages is equivalent to defining the tumor phylogeny, and often multiple phylogenies are consistent with the recovered VAF clusters (e.g. panel (iiii) in Figure 1). Complete and correct reconstruction of subpopulation genotypes requires resolving this ambiguity. To do so, reconstruction methods make one of a handful of assumptions about the process of tumor evolution.

Unfortunately, the ISA alone is often unable to resolve reconstruction ambiguity fully. As such, some methods [16,33] also make a sparsity assumption to select among ISA-respecting phylogenies consistent with the VAF data. This assumption, which we call strong parsimony, posits that due to expansion dynamics, there are a small number of subpopulations still present in the tumor [16,33], and that many of the VAF clusters are vestigial. These methods therefore select the phylogeny (or phylogenies) that maximizes the number of vestigial VAF clusters [16], or equivalently, the number of branchpoints where the parental subpopulation has a zero frequency in the current tumor [16,33]. The strong parsimony assumption does resolve some ambiguity, and leads to the correct reconstruction in Figure 1, but it is risky as its empirical validity is not yet established. For example, under some conditions, a linear (i.e. non-branching) phylogeny can be mistaken for a branching one; the risk of this occurring increases as the VAF measurement noise or the number of subpopulations in the tumor increases. This background distribution of false positive vestigiality is not yet considered by either of the methods that assume strong parsimony.

By assigning all SSMs within a VAF cluster to the same mutation set, reconstruction methods make another implicit assumption, which we call weak parsimony. This assumption does not hold if two mutation sets have the same population frequency. Note that if the ISA is valid, by the pigeonhole principle, weak parsimony is guaranteed to be valid whenever the population frequency of the mutation set is >50%.

PhyloWGS, like its predecessor PhyloSub [17], does not make the strong parsimony assumption nor does it report only a single tree. Instead it reports samples from the posterior distribution over phylogenies. Because the clustering of the VAF is performed concurrently with phylogenic reconstruction, PhyloWGS is able to perform accurate reconstruction even when the weak parsimony assumption is violated in a strict subset of the samples available, for example, if the VAF clusters overlap in one sample but not another. Our Markov chain Monte Carlo (MCMC) procedure samples phylogenies from the model posterior that are consistent with the mutation frequencies and does not rule out phylogenies that are equally consistent with the data. From this collection of samples, areas of certainty and uncertainty in the reconstruction can be determined. 2ff7e9595c

FRANKLIN

DAY SCHOOL

Vembu Cluster Links: Best Practices and Recommendations for Cluster Deployment

vembu Cluster links

Recent Posts

Comments

​FRANKLIN

DAY SCHOOL​

vembu Cluster links

Comments

FRANKLIN

DAY SCHOOL