With Advanced Cluster Management, an add-on for OpenShift, Red Hat promises a tool to manage multiple clusters, as well as monitoring their state, configuring and controlling them. “One cluster to rule them all, one cluster to find them, one cluster to bring them all and to the HUB bind them”, so to speak.
On our own OpenShift cluster, which we are running in the context of gepardec-run, we’ve often had the problem of being blind when a cluster stopped working, so we cobbled together our own solution, sending the most important metrics and log files from the OpenShift clusters to a separate, so-called HUB-cluster.
Since we were facing the issue of the HUB-cluster buckling under the load time and time again and the implementation of these functionalities proved to be less trivial than they may sound, we ended up looking for the Cluster of Power. While searching we quickly stumbled upon Advanced Cluster Management (ACM) by Red Hat.
Red Hat promises to simplify the management of multiple OpenShift clusters with the ACM and claims to offer the following benefits:
- Improved visibility and control: ACM offers a central console for the administration and monitoring of multiple OpenShift clusters.
- Heightened security through the enforcement of policies.
- Automated cluster management: ACM automates many manual administrative tasks in clusters, reducing time and effort necessary to manage these environments
- Improved scalability: With ACM, multiple clusters can be scaled through a single console
Since we at Gepardec don’t trust things that sound too good to be true, we took a closer look at Red Hat’s claims with a Learning Friday project. Over the course of 3 Fridays we used the ACM’s features and tried to find out which promises have been kept, how comfortable it is to use and where we believe there’s still room for improvement.
We used the first day for basic setup and playful exploration of ACM. We used the installer to set up a small OpenShift Cluster on AWS, since we didn’t want to deal with the resource consumption and complexity of a bare-metal installation, instead focusing on ACM itself. After installation, we configured the newly made cluster to a HUB through the ACM operator and were ready to start.
One of ACM’s most prominent features is the ability to easily create and destroy new clusters. Both cloud providers and private setups, e.g. on vSphere, are supported. We put in the necessary credentials and a couple of configuration parameters and had two additional OpenShift clusters on AWS, ready to go.
The Resource Search
A neat feature of ACM is the little search icon located at the top of the overview section’s header. You can query resources across all managed clusters with rather simple filter statements. For instance, to look up non-running pods you use an expression like “kind:pod status:!Running”. Other options include filtering for labels, namespaces, clusters or restarts. The latter can be combined with a condition like “>” or “!=” in order to enhance the filtering.
ACM also includes a feature for saving your search queries. The overview section of your web console displays various pre configured or custom search results. This is a good choice for large filter constructs or important attributes.
The search function is implemented to be fast and easy to use. For us, it is a core reason for using ACM.
A compound of clusters yields a large potential for implementing a centralized monitoring solution. Red Hat must’ve thought the same, and that’s how the Multicluster Observability Operator was born and packaged with ACM.
Configuring observability is elementary. The only thing you need is object storage like S3, GCS, ABS or ODF. You just need to create a namespace called “open-cluster-management-observability” that contains:
- A “multiclusterhub-operator-pull-secret” which you can copy from the “open-cluster-management” namespace
- A secret called “thanos-object-storage” which contains configurations for storage
- An instance of a CRD “MultiClusterObservability” which references the “thanos-object-storage” secret
The observability service consists of a Thanos instance for storing metrics and an Observatorium instance that provides a unified interface for different observability types like metrics, logs, and traces. The data can be visualized via a Grafana instance.
This infrastructure is installed and configured out of the box, which is the greatest strength of ACM observability. However, it hardly provides any more features beyond metrics. In theory, observation should provide integration for established tools for metrics, logs, and traces. Observatorium is currently not finished yet. It only supports Thanos and Loki but lacks feature completeness, configuration options as well as documentation for these two. Other systems are not included yet, but Tempo/Jaeger and Open Telemetry are considered for tracing.
In conclusion, ACM Observability is rather ACM Metrics. ACM is suited really well in this use case, but is not ready for applications beyond that. Loki can be configured apart from ACM, but there are currently no more options. However, it is worth keeping an eye on!
Security standards and compliance are managed in ACM via the use of Policies. Policies define what state is desired across a subset of managed clusters and whether to alert on noncompliance or attempt to correct the issue, if possible. Which clusters get supervised is defined by PlacementRule CRs. Policies can be used to oversee the cluster concerning the existence and definition of Pods, memory consumption, container image vulnerabilities and more. A particularly cool feature is the ability to check compliance against an entire security standard, such as E8 and CIS with a single policy. For this to work various operators have to be installed on the managed clusters, depending on which policy types are to be deployed, perhaps most notably the Compliance Operator.
Creating policies can be done through the interface, which includes a number of example templates. The extent to which customization can be done through the UI is limited though, and anything but the simplest use-cases is impossible to configure without editing the YAML or writing it from scratch.
The Governance Dashboard is immensely useful in assessing compliance and possible risks across clusters. Offending Clusters and the specific policies being violated can be identified at a glance, including, if defined in the policies’ metadata, which security or company internal standard they relate to. Automating compliance supervision in this manner heavily reduces the threat of configuration drift as well as the administrative overhead of audits.
There is also the possibility of enforcing policies automatically, e.g. installing missing operators, changing pod definitions or even defining custom Ansible Jobs to be executed on policy violation.
ACM ships with an integrated GitOps operator, which delivers a configured ArgoCD. It is responsible for deploying your applications (extended with some CRDs we’ll describe below). Only the hub cluster requires an instance of GitOps which is responsible for rolling out resources on managed clusters.
Let’s get an overview of the CRDs used by ACM for a deployment:
This CRD is used to declare a source for your applications. It is a pointer to a git or helm repository.
As the name suggests, this CRD is used to decide on which cluster/namespace subscriptions/applications should be deployed
Subscriptions observe channels for new versions of Kubernetes resources.
You can view applications as sets containing subscriptions, simplifying management and monitoring of your deployments.
You basically define a channel for your resources. Then you define the target clusters with a placement rule, like a “prod-cluster” for instance. Thereafter, you can subscribe to minor versions. Finally, your application will be rolled out. 🙂
However, there are more ways of how you can deploy your apps. We’ve covered deploying to one environment. Furthermore, you can work with multiple environments using individual placement rules.
ACM primarily supports GitHub Repositories with static resource definitions. Templating with tools like Helm, Kustomize or Jsonnet is significant in our opinion, especially in multi-cluster environments where each individual cluster might need slightly different values. ACM supports Helm and Kustomize, but its configuration is a bit finicky. It might be faster to just use ArgoCD without the ACM overhead.
While still in tech preview, cluster pools look like a useful feature. They are essentially hibernated or running clusters with no workload, easy to access in case of a need of cluster wide scaling of applications. Therefore, it is the equivalent of the machine autoscale applied on a cluster level. It can be configured to automatically create new clusters if the number of claimed clusters reaches its maximum, which can also be reversed to destroy or hibernate unnecessary clusters.
The ACM provides a common interface to easily connect your managed clusters via VPN. This is based on the Submariner project, which spawns a gateway-plane in every cluster and a control-plane on the hub for VPN service discovery. Unfortunately, submariner requires different pod networks to function properly – kind of obvious that you can’t connect the same network – therefore we didn’t investigate this feature all too much.
Constantin, Simon, Shahin, Felix