AWS SSM Chaos By ID
Introduction¶
- AWS SSM Chaos By ID contains chaos to disrupt the state of infra resources. The experiment can induce chaos on AWS EC2 instance using Amazon SSM Run Command This is carried out by using SSM Docs that defines the actions performed by Systems Manager on your managed instances (having SSM agent installed) which let us perform chaos experiments on the instances.
- It causes chaos (like stress, network, disk or IO) on AWS EC2 instances with given instance ID(s) using SSM docs for a certain chaos duration.
- For the default execution the experiment uses SSM docs for stress-chaos while you can add your own SSM docs using configMap (.spec.definition.configMaps) in chaosexperiment CR.
- It tests deployment sanity (replica availability & uninterrupted service) and recovery workflows of the target application pod(if provided).
Scenario: AWS SSM Chaos
Uses¶
View the uses of the experiment
coming soon
Prerequisites¶
Verify the prerequisites
- Ensure that Kubernetes Version > 1.16
- Ensure that the Litmus Chaos Operator is running by executing
kubectl get pods
in operator namespace (typically,litmus
).If not, install from here - Ensure that the
aws-ssm-chaos-by-id
experiment resource is available in the cluster by executingkubectl get chaosexperiments
in the desired namespace. If not, install from here - Ensure that you have the required AWS access and your target EC2 instances have attached an IAM instance profile. To know more checkout Systems Manager Docs.
-
Ensure to create a Kubernetes secret having the AWS access configuration(key) in the
CHAOS_NAMESPACE
. A sample secret file looks like:apiVersion: v1 kind: Secret metadata: name: cloud-secret type: Opaque stringData: cloud_config.yml: |- # Add the cloud AWS credentials respectively [default] aws_access_key_id = XXXXXXXXXXXXXXXXXXX aws_secret_access_key = XXXXXXXXXXXXXXX
-
If you change the secret key name (from
cloud_config.yml
) please also update theAWS_SHARED_CREDENTIALS_FILE
ENV value onexperiment.yaml
with the same name.
Default Validations¶
View the default validations
- EC2 instance should be in healthy state.
Minimal RBAC configuration example (optional)¶
NOTE
If you are using this experiment as part of a litmus workflow scheduled constructed & executed from chaos-center, then you may be making use of the litmus-admin RBAC, which is pre installed in the cluster as part of the agent setup.
View the Minimal RBAC permissions
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: aws-ssm-chaos-by-id-sa
namespace: default
labels:
name: aws-ssm-chaos-by-id-sa
app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: aws-ssm-chaos-by-id-sa
labels:
name: aws-ssm-chaos-by-id-sa
app.kubernetes.io/part-of: litmus
rules:
# Create and monitor the experiment & helper pods
- apiGroups: [""]
resources: ["pods"]
verbs: ["create","delete","get","list","patch","update", "deletecollection"]
# Performs CRUD operations on the events inside chaosengine and chaosresult
- apiGroups: [""]
resources: ["events"]
verbs: ["create","get","list","patch","update"]
# Fetch configmaps & secrets details and mount it to the experiment pod (if specified)
- apiGroups: [""]
resources: ["secrets","configmaps"]
verbs: ["get","list",]
# Track and get the runner, experiment, and helper pods log
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get","list","watch"]
# for creating and managing to execute comands inside target container
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["get","list","create"]
# for configuring and monitor the experiment job by the chaos-runner pod
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create","list","get","delete","deletecollection"]
# for creation, status polling and deletion of litmus chaos resources used within a chaos workflow
- apiGroups: ["litmuschaos.io"]
resources: ["chaosengines","chaosexperiments","chaosresults"]
verbs: ["create","list","get","patch","update","delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: aws-ssm-chaos-by-id-sa
labels:
name: aws-ssm-chaos-by-id-sa
app.kubernetes.io/part-of: litmus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: aws-ssm-chaos-by-id-sa
subjects:
- kind: ServiceAccount
name: aws-ssm-chaos-by-id-sa
namespace: default
Use this sample RBAC manifest to create a chaosServiceAccount in the desired (app) namespace. This example consists of the minimum necessary role permissions to execute the experiment.
Experiment tunables¶
check the experiment tunables
Mandatory Fields
Variables | Description | Notes |
---|---|---|
EC2_INSTANCE_ID | Instance ID of the target ec2 instance. Multiple IDs can also be provided as a comma(,) separated values | Multiple IDs can be provided as id1,id2 |
REGION | The region name of the target instace |
Optional Fields
Variables | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | The total time duration for chaos insertion (sec) | Defaults to 30s |
CHAOS_INTERVAL | The interval (in sec) between successive chaos injection | Defaults to 60s |
AWS_SHARED_CREDENTIALS_FILE | Provide the path for aws secret credentials | Defaults to /tmp/cloud_config.yml |
DOCUMENT_NAME | Provide the name of addded ssm docs (if not using the default docs) | Default to LitmusChaos-AWS-SSM-Doc |
DOCUMENT_FORMAT | Provide the format of the ssm docs. It can be YAML or JSON | Defaults to YAML |
DOCUMENT_TYPE | Provide the document type of added ssm docs (if not using the default docs) | Defaults to Command |
DOCUMENT_PATH | Provide the document path if added using configmaps | Defaults to the litmus ssm docs path |
INSTALL_DEPENDENCIES | Select to install dependencies used to run stress-ng with default docs. It can be either True or False | Defaults to True |
NUMBER_OF_WORKERS | Provide the number of workers to run stress-chaos with default ssm docs | Defaults to 1 |
MEMORY_PERCENTAGE | Provide the memory consumption in percentage on the instance for default ssm docs | Defaults to 80 |
CPU_CORE | Provide the number of cpu cores to run stress-chaos on EC2 with default ssm docs | Defaults to 0. It means it'll consume all the available cpu cores on the instance |
SEQUENCE | It defines sequence of chaos execution for multiple instance | Default value: parallel. Supported: serial, parallel |
RAMP_TIME | Period to wait before and after injection of chaos in sec |
Experiment Examples¶
Common and AWS-SSM specific tunables¶
Refer the common attributes and AWS-SSM specific tunable to tune the common tunables for all experiments and aws-ssm specific tunables.
Stress Instances By ID¶
It contains comma separated list of instances IDs subjected to ec2 stop chaos. It can be tuned via EC2_INSTANCE_ID
ENV.
Use the following example to tune this:
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: aws-ssm-chaos-by-id-sa
experiments:
- name: aws-ssm-chaos-by-id
spec:
components:
env:
# comma separated list of ec2 instance id(s)
# all instances should belongs to the same region(REGION)
- name: EC2_INSTANCE_ID
value: 'instance-01,instance-02'
# region of the ec2 instance
- name: REGION
value: '<region of the EC2_INSTANCE_ID>'
- name: TOTAL_CHAOS_DURATION
value: '60'