Executive Summary
This document describes the Risk factors, probability assessment and actions which should be taken when running an OCP environment in production. The document focuses on three main aspects: Application security risks (code), platform security risks (Kubernetes), Node security (cloud) and deployments (CD processes and containers) security risks.
Risk Assessment methodology
The document is written from a security perspective classified by the following threats:
- Unauthorized access
- Misuse of information
- Data leakage or unintentional exposure of information
- Data Loss
Each threat will be classified by the following assessment:
- Inherent risk and impact (Low, Medium, High)
- Likelihood (Low, Medium, High)
Existing risk controls will be described along with recommendations if any.
A “Risk rating” will be established for each risk with accordance to the impact and likelihood of the threat calculated as follows:
Impact (if exploited) * Likelihood = Risk Rating
Risk rating will be one of the following:
- Severe – a high risk requiring immediate attention
- Elevated – A high risk to system and data integrity exists, remediation should be prompt
- Low – Normal and general threat, while acceptable, it is advised to remediate or mitigate these threats
Risk Assessment
This section will be divided into 4 main categories (The 4 C’s):
- Cloud – Nodes OS and networking
- Cluster – The Kubernetes platform
- Containers – the containers in which applications are packed
- Code – the application code behavior
Cloud
The cloud (the co-located servers that make the cluster) is the base layer on which the cluster and its applications run. Security measures on this layer are imperative as a misconfigured/vulnerable cloud layer may impact all other layers depicted here.
Node SSH Access
The nodes should not be accessible from any machine, apart from the bastion or install server for management purposes.
All SSH connections to Nodes should be tracked and monitored for off-hours/unauthorized access.
If exploited: A user gaining SSH access to the nodes can basically intervene/disable/inject pods to running nodes thereby getting access to application information and or disable the node. In case of master nodes, A user gaining access to the master nodes using SSH can actually disable the whole cluster, remove deployments and applications etc.
Impact: High
Likelihood: Medium
Risk Rating: Elevated
Bastion / management server access
Access to the bastion/management server should not be done as root. In addition, a limited amount of users should be able to reach this server and login.
Auditing user login to this server is imperative and alerts should be set for off-hours/unauthorized server access.
If exploited: since the bastion/management server has passwordless (SSH key) access to all nodes in the cluster, gaining access to this server will result in the same damage as depicted in node ssh access.
Impact: High
Likelihood: High
Risk Rating: High
SELinux and change tracking
SELinux should be enabled on the bastion/management server along with change tracking to prevent and detect changes of server configuration (SSH keys, sudoers etc…)
If exploited: The bastion/management server login will be done with a normal user which will be able to impersonate (sudo) root for certain operations. As root, the user will have the ability to give elevated since the bastion/management server has passwordless (SSH key) access to all nodes in the cluster, gaining access to this server will result in the same damage as depicted in node ssh access.
Impact: High
Likelihood: High
Risk Rating: High
Nodes incoming communication
The nodes should have no incoming connections from anywhere aside for kin nodes. It is imperative to establish a separate vlan for the cluster and make sure that no incoming communication is allowed for this vlan apart from the ingress points of the clusters.
If exploited: Network access to nodes may allow an attacker to perform DDoS attache on the Kubelet (tcp/10250), SDN (udp/4789) and other key components running on each node. In case of masters it is even more severe as etcd (tcp/2379,tcp2380) can be bombarded with packets up to a halt and/or try brute force attacks .
Impact: Medium
Likelihood: High
Risk Rating: Elevated
Cluster
Apiserver Access
The Kubernetes api should be open to specific users in the organization. Access to the api should be regulated and audited as to detect spikes of automated tasks going out of balance, and hacking attempts.
If exploited: The master VIP and masters (tcp/8443 – ocp 3.x, port 6443 ocp 4.x) could be used to perform password guessing brute force and/or perform a DDoS attack on the kubernetes apiserver which will delay to a halt or completely block valid api requests – including nodes from being digested. This can lead to cluster operation performance degradation and/or interfere with master->node communication causing various malfunction all the way to the application layer.
Impact: High
Likelihood: High
Risk Rating: High
Identity provider
Kubernetes can accommodate multiple identity providers. As such, a login attempt may occur using more than one authentication source. It is important to establish a single authentication source in the production environment to avoid identity merger and or replacement inside the cluster and monitor authentication attempts from unauthorized identity providers.
If exploited: If the cluster authentication scheme is changed to allow more than one identity provider for example, the identity provider of choice is Kerberos and a cluster admin adds an option to authenticate using htpasswd file, users can than be added to that htpasswd file allowing for undetected authentication attempts and/or misconfiguration of Role Bindings to users from different sources.
Impact: Medium
Likelihood: Low
Risk Rating: Low
PSP / SCC
Pod security policy and Security context constraints can enable a container to access the node it’s running on. This type of access should not be allowed unless extreme and special circumstances arise (such as monitoring applications, security compliance etc) and even then – these applications should be closely audited and a container with access to node resources is equal to root access to the node.
Pods should normally run with the restricted SCC.
If exploited: If a privileged container is attacked by a local or remote hacker it can damage the node or any process running on it (including other Pods), in addition it is fairly simply to access other running containers when having privileged or elevated permissions.
Impact: High
Likelihood: Medium
Risk Rating: Elevated
QoS
Pods QoS determine which pod will be evicted first in case of resource pressure. Setting Qos to guaranteed may cause pod which is under attack to last longer as eviction considerations will allow evicting burstable and besteffort pods beforehand.
If exploited: If a Pod with guaranteed QoS is undergoing a DDoS attack, it may load the node while being the last one to get evicted by the scheduler, this can potentially slow down or even disable the node and Pods running on it.
Impact: Low
Likelihood: Medium
Risk Rating: Low
Network Policies
With the CNI installed, most K8s networking topologies allow Pods to communicate with other pods and services inside the same namespace. Communicating with entities in other namespaces requires configuration. The option of choice should be network policies which is a rule based object allowing or denying ingress/egress traffic to and from entities.
If exploited: misconfigured or unconfigured network policies may allow pods access to other Pods they should not have access to. This in turn may be used by an attacker running malicious code inside a Pod to access other Pods and try various attacks internally on the cluster.
Impact: Medium
Likelihood: Medium
Risk Rating: Elevated
RBAC Governance
As with all systems, Kubernetes clusters have a single owner (be it a team or a person) every other kind of admin is a descendant of the owner and should not be allowed to perform potentially destructive cluster wide operations nor should such a user be able to bind elevated permission roles to other users. Developers should have access to their own projects and the recommendation is to pre-create projects for continuous development thus avoiding unknown deployments and or / Orphaned projects on the cluster.
Cluster role binding should be heavily audited and approval should be give to any cluster role binding occurring on a production cluster.
If exploited: A User with elevated permissions especially one who is able to bind users to cluster roles may accidentally or intentionally bind (allow) cluster-admin Role to an unauthorized user(s).
Impact: High
Likelihood: High
Risk Rating: High
Encryption of secret data at rest
Secrets in a Kubernetes cluster hold classified potentially destructive information such as certificates, access keys, username and passwords etc. gaining access to such information may compromise the data held in the target system (for example reading a secret which holds the username and password for a database is allowing whoever has the credentials to access the database and leak data.
Encryption of secret data at rest adds an additional layer of security by encrypting the secrets while they are kept on the etcd.
If exploited: An attacker can read the secret content and potentially get a username and password for a secure system by viewing the content of a secret outside the Pod.
Impact: High
Likelihood: Medium
Risk Rating: Elevated
Code
Deployment Privileges
Many organizations use deployment applications such as helm, these applications should not have elevated permissions for example cluster-admin as the access to these deployment applications is done by developers and devops engineers and having a tools which has cluster-admin privileges is making the RBACs mute and irrelevant as a user may use this tools to actually bind any role or cluster role to any user.
If exploited: If an attacker gains access to such a deployment application for example helm the attacker can apply virtually any yaml he or she desires including modification, creation or deletion of pods, nodes, deployments and even cluster infrastructure such as logging, monitoring, identity provider definitions etc.
Impact: High
Likelihood: High
Risk Rating: High
Artifact repository access
Artifact repositories are a very important part of the cloud native approach. The artifact repository keeps artifacts ready to deploy and aligned with current versions for every application. In order to maintain continuity and allow for fast recovery (design for failure) production artifacts should reside in a separate artifact repository.
If exploited: If the same artifact repository is used for production and any other environment, an attacker can push a malformed or malicious artifact to the production cluster thereby performing unwanted actions or getting classified information from the cluster or the applications running on it. In addition, a dev/debug version of an application may be mistakenly deployed on the production cluster which may hold severe vulnerabilities and compromise cluster and application security and integrity.
Impact: High
Likelihood: Low
Risk Rating: Elevated
Pods running in Host namespace
Given the proper PSP or SCC, pods can run on the same kernel namespace of the node thereby accessing the host’s network, process space, networking and storage or run as specific users on the node. Such Pods should not be allowed unless some special circumstances are required to run a security or monitoring application and even then, auditing such Pods should be done heavily.
If exploited: A Pod with privileged abilities can actually make changes on the node itself, kill processes, modify/delete files etc. Should an attacker get access to such a Pod damage could be done to both the cluster and the applications running on it and data running through the node may be compromised.
Impact: High
Likelihood: Medium
Risk Rating: Elevated
Project annotations
Project annotations can be destructive in nature even without the threat of malicious activity – but even more so if an attacker can use such annotations to deploy Pods on the masters for example (project nodeselector annotation).
If exploited: If a project is annotated wrongfully it may have grave effect on the way Pods are deployed on that project and other significant issues (quota etc.) annotating a project should be reserved to the user or team with ownership on the cluster to prevent accidental or malicious deployment outcomes such as deploying Pods on master nodes in the case of the project nodeselector: “” case.
Impact: Medium
Likelihood: High
Risk Rating: Elevated
Isolated Pods
By default, Pods inside a project are unisolated, meaning any Pod on the namespace can access any other Pod on the same namespace. Some applications require complete isolation from other Pods except for specific communication requirements due to the sensitive nature of information flowing through these pods.
It is important to apply the proper NetworkPolicies for isolated Pods
If exploited: Unisolated Pods may be accessed or attacked from within the cluster should an attacker gain access or plant code inside a kin container and attack a sensitive Pod withing the same namespace.
Impact: Medium
Likelihood: Low
Risk Rating: Low
Isolated Pods
By default, Pods inside a project are unisolated, meaning any Pod on the namespace can access any other Pod on the same namespace. Some applications require complete isolation from other Pods except for specific communication requirements due to the sensitive nature of information flowing through these pods.
It is important to apply the proper NetworkPolicies for isolated Pods
If exploited: Unisolated Pods may be accessed or attacked from within the cluster should an attacker gain access or plant code inside a kin container and attack a sensitive Pod withing the same namespace.
Impact: Medium
Likelihood: Low
Risk Rating: Low