Faced with the challenge of maintaining stability across hundreds of tightly coupled microservices, the Central Reliability Engineering team at Flipkart developed a solution that moves beyond reactive incident response. By integrating LitmusChaos with their Kubernetes infrastructure, the team now executes 90% of their chaos experiments in staging environments before major traffic events, such as India’s flagship festive sales.
To bridge the gap between their microservices and legacy virtual machine workloads, the engineers built four custom extensions, including a DaemonSet-based high-availability model for parallel fault injection. This shift has enabled the company to identify bottlenecks and validate observability frameworks without the risk of production outages. Beyond hardening their own systems, Flipkart contributed five core fixes and enhancements back to the upstream LitmusChaos project, addressing long-standing issues like database index uniqueness and workflow configuration errors.
Aditya Sridasyam, software development engineer at Flipkart, presented the implementation during a keynote at KubeCon + CloudNativeCon India 2026. The company now plans to integrate automated chaos testing as a mandatory phase in its software development lifecycle and intends to open source its custom DaemonSet injection model for the broader cloud-native community.


)

Comments (0)
No comments yet. Be the first!