Interview Bank
  • Interview Bank
  • Web
    • Persistent Connection and Non Persistent
    • CDN
    • Code Review
    • JWT
      • JWT vs Session Based Authentication
      • JWT Challenge
      • JWE
      • JWS
    • Content Security Policy (CSP)
    • Same-origin Policy (SOP)
    • Cross-Origin Resource Sharing (CORS)
      • Exploiting CORS
    • HTTP Strict Transport Security (HSTS)
    • SQL Injection (SQLi)
    • Password Encryption in Login APIs
    • API Security
      • API Principles
    • Simple bypass PHP
    • Server-side Template Injection (SSTI)
    • Javascript Object and Inheritance
    • HTTP/2
    • Cookie vs Local vs session Storage
    • XML External Entity (XXE)
    • What happened when enter domain name in browser
    • Prototype Pollution - Part 1
    • Prototype Pollution - Part 2
    • Nginx vs Apache
  • OT Security
    • Securing Operational Technology: Understanding OT Security
  • Quantum Computing
    • Quantum Computing: Unveiling the Cryptographic Paradigm Shift
    • Quantum Obfuscation: Shielding Code in the Quantum Era
  • DevSecOps
    • Continuous Integration/Continuous Deployment Pipeline Security
    • Chaos Engineering Overview
      • Security Chaos Engineering
    • Mysql VS redis
    • Kubernetes (k8s)
    • How MySQL executes query
    • REDIS
    • Difference between cache and buffer
  • Windows
    • Pentesting Active Directory - Active Directory 101
    • Pentesting Active Directory - Kerberos (Part 1)
    • Pentesting Active Directory - Kerberos (Part 2)
    • AD vs Kerberos vs LDAP
    • Active Directory Certificate Services Part 1
    • Unconstrained Delegation
    • AS-REP Roasting
    • NTLM Relay via SMB
    • LLMRN
    • Windows lateral movement
    • Constrained Delegation
    • Resource-Based Constrained Delegation
    • IFEO (lmage File Execution Options) Hijacking
  • UNIX
    • Setuid
  • Large Language Models (LLMs)
    • Tokens
    • LangChain
    • Integration and Security
  • Android
    • Keystore
  • Red team development
    • Secure C2 Infrastructure
    • P Invoke in c#
    • D Invoke
    • ExitProcess vs ExitThread
  • Blue Team
    • Indicators of Compromise
    • Methods to prevent Email domain spoofing
    • Windows Prefetching
  • CVE
    • XZ Outbreak CVE-2024-3094
    • Log4J Vulnerability (CVE-2021-44228)
    • SolarWinds Hack (CVE-2020-10148)
    • PHP CGI RCE (CVE-2024-4577)
    • Windows Recall
  • Software Architecture
    • Microservices
    • KVM
  • Docker
    • Overview
    • Daemon Socket
    • Tips to reduce docker size
  • Blockchain
    • Overview
    • Smart Contract
  • Business Acumen
    • Market Research Reports and Perception
    • Understanding Acquisitions
    • Cybersecurity as a Business Strategy
  • Cyber Teams
    • Introduction to Purple Teaming
  • Malware
    • Dynamic Sandbox Limitations
Powered by GitBook
On this page
  • Principle of Chaos Engieering
  • Practising Chaos
  • Steps to Perform
  • Practical Usage
  • Best Practices
  • Integration into DevOps pipelines
  • Integration with Recovery Plans
  • Pre-launch Checklist
  • Interview Questions
  • Author
  • References
  1. DevSecOps
  2. Chaos Engineering Overview

Security Chaos Engineering

PreviousChaos Engineering OverviewNextMysql VS redis

Last updated 1 year ago

Principle of Chaos Engieering

Practising Chaos

Steps to Perform

As mentioned in the , the theory will now be implemented for practical application.

  1. Identify areas within the system to test

    • Rather than testing the whole system at once, test in parts for easier management to better understand clusters of services that eventually adds up to the entire system

    • Only implment on the whole system once there is enough confidence on the in-depth understanding of the system

    • Looking at areas could develop more niche edge cases that creates faults

  2. Ask questions about potential technical gaps to form baseline of chaos engineering experiments

    • Important to define the objectives of experiment to minimise impact radius of other services outside the scope of experiment

    • Need to identify services within testing area that have large sample size to collect essential data

    • Can rely on threat modeling methods and other frameworks for identifying attack vectors

  3. Detect vulnerable issues within the area

    • MITRE ATT&CK lists of tactics and procedures could be replicated

    • Random fault injection to servers can test for unexpected faults and recovery automation (if any)

  4. Observe, measure and log the statistical data for further improvements

    • Data obtained from monitoring the chaos testing creates a baseline of how resilient the system currently is

    • Faulty areas could be improved to prevent scenarios playing out in prodcution

    • Other areas can benefit as similar issues can reuse the same solution

Practical Usage

Netflix spearheaded Chaos Engineering as a way to fully test their complex system. However, most organisations cannot compete with Netflix's budget and methodology of testing in production environment (business revenue issues) therefore, a separate environment before production should be performed upon.

Even testing in non-production environments can still expose areas of faults and poor resilience within the current infrastructure, which can be improved and promote greater resilience when deploying into production environment. Once these gaps are filled, it is also important to validate the system again and perform real-time monitoring on the system to watch for unexpected changes within the system.

Best Practices

Integration into DevOps pipelines

Allows developement teams to automate resilience and fault tolerance testing at various stages. Weak areas could be identified earlier to be rectify, reducing time at future development stages. Continuous integration can be observed due to how code changes impacts on stability of system.

Integration with Recovery Plans

Working with incident response team to utilise recovery plans and automate the process after system failure will show the effectiveness of such plans. Weaknesses in recovery plans can be improved upon if the recovery timings are not optimal/efficient enough.

Pre-launch Checklist

Minimise risks of faulty experimentations which induces unnecessary development time. A checklist ensures that the required applications and its configurations are in place and optimal for chaos engineering testing to ensure accurate experimental results, and not due to careless development mistakes.

Interview Questions

  • What are the constraints faced when perform chaos testing on production environment?

  • What are the best practices done to ensure effective chaos testing?

  • How do you perform chaos engineering safely without crashing the entire system?

Author

References

🍞

Zheng Jie
CSA - Security Chaos Engineering
Maddevs - Chaos Engineering
Datadog - Security Chaos Engineering for the cloud
overview