Pod Network Loss
Introduction¶
- It injects packet loss on the specified container by starting a traffic control (tc) process with netem rules to add egress loss
- It can test the application's resilience to lossy/flaky network
Scenario: Induce network loss of the target pod
Uses¶
View the uses of the experiment
coming soon
Prerequisites¶
Verify the prerequisites
- Ensure that Kubernetes Version > 1.16
- Ensure that the Litmus Chaos Operator is running by executing
kubectl get pods
in operator namespace (typically,litmus
).If not, install from here - Ensure that the
pod-network-loss
experiment resource is available in the cluster by executingkubectl get chaosexperiments
in the desired namespace. If not, install from here
Default Validations¶
View the default validations
The application pods should be in running state before and after chaos injection.
Minimal RBAC configuration example (optional)¶
NOTE
If you are using this experiment as part of a litmus workflow scheduled constructed & executed from chaos-center, then you may be making use of the litmus-admin RBAC, which is pre installed in the cluster as part of the agent setup.
View the Minimal RBAC permissions
apiVersion: v1
kind: ServiceAccount
metadata:
name: pod-network-loss-sa
namespace: default
labels:
name: pod-network-loss-sa
app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-network-loss-sa
namespace: default
labels:
name: pod-network-loss-sa
app.kubernetes.io/part-of: litmus
rules:
# Create and monitor the experiment & helper pods
- apiGroups: [""]
resources: ["pods"]
verbs: ["create","delete","get","list","patch","update", "deletecollection"]
# Performs CRUD operations on the events inside chaosengine and chaosresult
- apiGroups: [""]
resources: ["events"]
verbs: ["create","get","list","patch","update"]
# Fetch configmaps details and mount it to the experiment pod (if specified)
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get","list",]
# Track and get the runner, experiment, and helper pods log
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get","list","watch"]
# for creating and managing to execute comands inside target container
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["get","list","create"]
# deriving the parent/owner details of the pod(if parent is anyof {deployment, statefulset, daemonsets})
- apiGroups: ["apps"]
resources: ["deployments","statefulsets","replicasets", "daemonsets"]
verbs: ["list","get"]
# deriving the parent/owner details of the pod(if parent is deploymentConfig)
- apiGroups: ["apps.openshift.io"]
resources: ["deploymentconfigs"]
verbs: ["list","get"]
# deriving the parent/owner details of the pod(if parent is deploymentConfig)
- apiGroups: [""]
resources: ["replicationcontrollers"]
verbs: ["get","list"]
# deriving the parent/owner details of the pod(if parent is argo-rollouts)
- apiGroups: ["argoproj.io"]
resources: ["rollouts"]
verbs: ["list","get"]
# for configuring and monitor the experiment job by the chaos-runner pod
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create","list","get","delete","deletecollection"]
# for creation, status polling and deletion of litmus chaos resources used within a chaos workflow
- apiGroups: ["litmuschaos.io"]
resources: ["chaosengines","chaosexperiments","chaosresults"]
verbs: ["create","list","get","patch","update","delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-network-loss-sa
namespace: default
labels:
name: pod-network-loss-sa
app.kubernetes.io/part-of: litmus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: pod-network-loss-sa
subjects:
- kind: ServiceAccount
name: pod-network-loss-sa
namespace: default
Experiment tunables¶
check the experiment tunables
Optional Fields
Variables | Description | Notes | |
---|---|---|---|
NETWORK_INTERFACE | Name of ethernet interface considered for shaping traffic | ||
TARGET_CONTAINER | Name of container which is subjected to network loss | Optional | Applicable for containerd & CRI-O runtime only. Even with these runtimes, if the value is not provided, it injects chaos on the first container of the pod |
NETWORK_PACKET_LOSS_PERCENTAGE | The packet loss in percentage | Optional | Default to 100 percentage |
CONTAINER_RUNTIME | container runtime interface for the cluster | Defaults to containerd, supported values: docker, containerd and crio for litmus and only docker for pumba LIB | |
SOCKET_PATH | Path of the containerd/crio/docker socket file | Defaults to /run/containerd/containerd.sock |
|
TOTAL_CHAOS_DURATION | The time duration for chaos insertion (seconds) | Default (60s) | |
TARGET_PODS | Comma separated list of application pod name subjected to pod network corruption chaos | If not provided, it will select target pods randomly based on provided appLabels | |
DESTINATION_IPS | IP addresses of the services or pods or the CIDR blocks(range of IPs), the accessibility to which is impacted | comma separated IP(S) or CIDR(S) can be provided. if not provided, it will induce network chaos for all ips/destinations | |
DESTINATION_HOSTS | DNS Names/FQDN names of the services, the accessibility to which, is impacted | if not provided, it will induce network chaos for all ips/destinations or DESTINATION_IPS if already defined | |
SOURCE_PORTS | ports of the target application, the accessibility to which is impacted | comma separated port(s) can be provided. If not provided, it will induce network chaos for all ports | |
DESTINATION_PORTS | ports of the destination services or pods or the CIDR blocks(range of IPs), the accessibility to which is impacted | comma separated port(s) can be provided. If not provided, it will induce network chaos for all ports | |
PODS_AFFECTED_PERC | The Percentage of total pods to target | Defaults to 0 (corresponds to 1 replica), provide numeric value only | |
LIB | The chaos lib used to inject the chaos | Default value: litmus, supported values: pumba and litmus | |
TC_IMAGE | Image used for traffic control in linux | default value is gaiadocker/iproute2 |
|
LIB_IMAGE | Image used to run the netem command | Defaults to litmuschaos/go-runner:latest |
|
RAMP_TIME | Period to wait before and after injection of chaos in sec | ||
SEQUENCE | It defines sequence of chaos execution for multiple target pods | Default value: parallel. Supported: serial, parallel |
Experiment Examples¶
Common and Pod specific tunables¶
Refer the common attributes and Pod specific tunable to tune the common tunables for all experiments and pod specific tunables.
Network Packet Loss¶
It defines the network packet loss percentage to be injected in the targeted application. It can be tuned via NETWORK_PACKET_LOSS_PERCENTAGE
ENV.
Use the following example to tune this:
# it inject the network-loss for the egress traffic
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: pod-network-loss-sa
experiments:
- name: pod-network-loss
spec:
components:
env:
# network packet loss percentage
- name: NETWORK_PACKET_LOSS_PERCENTAGE
value: '100'
- name: TOTAL_CHAOS_DURATION
value: '60'
Destination IPs And Destination Hosts¶
The network experiments interrupt traffic for all the IPs/hosts by default. The interruption of specific IPs/Hosts can be tuned via DESTINATION_IPS
and DESTINATION_HOSTS
ENV.
DESTINATION_IPS
: It contains the IP addresses of the services or pods or the CIDR blocks(range of IPs), the accessibility to which is impacted.DESTINATION_HOSTS
: It contains the DNS Names/FQDN names of the services, the accessibility to which, is impacted.
Use the following example to tune this:
# it inject the chaos for the egress traffic for specific ips/hosts
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: pod-network-loss-sa
experiments:
- name: pod-network-loss
spec:
components:
env:
# supports comma separated destination ips
- name: DESTINATION_IPS
value: '8.8.8.8,192.168.5.6'
# supports comma separated destination hosts
- name: DESTINATION_HOSTS
value: 'nginx.default.svc.cluster.local,google.com'
- name: TOTAL_CHAOS_DURATION
value: '60'
Source And Destination Ports¶
The network experiments interrupt traffic for all the source & destination ports by default. The interruption of specific port(s) can be tuned via SOURCE_PORTS
and DESTINATION_PORTS
ENV.
SOURCE_PORTS
: It contains ports of the target application, the accessibility to which is impactedDESTINATION_PORTS
: It contains the ports of the destination services or pods or the CIDR blocks(range of IPs), the accessibility to which is impacted
Use the following example to tune this:
# it inject the chaos for the ingrees and egress traffic for specific ports
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: pod-network-loss-sa
experiments:
- name: pod-network-loss
spec:
components:
env:
# supports comma separated source ports
- name: SOURCE_PORTS
value: '80'
# supports comma separated destination ports
- name: DESTINATION_PORTS
value: '8080,9000'
- name: TOTAL_CHAOS_DURATION
value: '60'
Blacklist Source and Destination Ports¶
By default, the network experiments disrupt traffic for all the source and destination ports. The specific ports can be blacklisted via SOURCE_PORTS
and DESTINATION_PORTS
ENV.
SOURCE_PORTS
: Provide the comma separated source ports preceded by!
, that you'd like to blacklist from the chaos.DESTINATION_PORTS
: Provide the comma separated destination ports preceded by!
, that you'd like to blacklist from the chaos.
Use the following example to tune this:
# blacklist the source and destination ports
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: pod-network-loss-sa
experiments:
- name: pod-network-loss
spec:
components:
env:
# it will blacklist 80 and 8080 source ports
- name: SOURCE_PORTS
value: '!80,8080'
# it will blacklist 8080 and 9000 destination ports
- name: DESTINATION_PORTS
value: '!8080,9000'
- name: TOTAL_CHAOS_DURATION
value: '60'
Network Interface¶
The defined name of the ethernet interface, which is considered for shaping traffic. It can be tuned via NETWORK_INTERFACE
ENV. Its default value is eth0
.
Use the following example to tune this:
# provide the network interface
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: pod-network-loss-sa
experiments:
- name: pod-network-loss
spec:
components:
env:
# name of the network interface
- name: NETWORK_INTERFACE
value: 'eth0'
- name: TOTAL_CHAOS_DURATION
value: '60'
Container Runtime Socket Path¶
It defines the CONTAINER_RUNTIME
and SOCKET_PATH
ENV to set the container runtime and socket file path.
CONTAINER_RUNTIME
: It supportsdocker
,containerd
, andcrio
runtimes. The default value isdocker
.SOCKET_PATH
: It contains path of docker socket file by default(/run/containerd/containerd.sock
). For other runtimes provide the appropriate path.
Use the following example to tune this:
## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: pod-network-loss-sa
experiments:
- name: pod-network-loss
spec:
components:
env:
# runtime for the container
# supports docker, containerd, crio
- name: CONTAINER_RUNTIME
value: 'containerd'
# path of the socket file
- name: SOCKET_PATH
value: '/run/containerd/containerd.sock'
- name: TOTAL_CHAOS_DURATION
value: '60'
Pumba Chaos Library¶
It specifies the Pumba chaos library for the chaos injection. It can be tuned via LIB
ENV. The defaults chaos library is litmus
.
Provide the traffic control image via TC_IMAGE
ENV for the pumba library.
Use the following example to tune this:
# use pumba chaoslib for the network chaos
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: pod-network-loss-sa
experiments:
- name: pod-network-loss
spec:
components:
env:
# name of the chaoslib
# supports litmus and pumba lib
- name: LIB
value: 'pumba'
# image used for the traffic control in linux
# applicable for pumba lib only
- name: TC_IMAGE
value: 'gaiadocker/iproute2'
- name: TOTAL_CHAOS_DURATION
value: '60'