My life as a Hacker: The Reality of Modern Data Loss, this is why Your CI/CD Isn't Enough

This autumn, the team at IssTech will create a series of blogs and videos. As a Kubernetes Backup Expert, writing this blog post, I will be pretending to be a hacker. In this series, I will, in theory, "hack" one of our customer's Kubernetes environments to see if they can recover from each attack without a backup.

We're all ready to go, but are you? 

In the world of modern DevOps, we often hear that our infrastructure is resilient, our code is safe in Git, and our CI/CD pipelines are the ultimate safety net. But what if I told you that this very mindset could be a major vulnerability?

To understand this, I "hacked" a typical Kubernetes environment. Not to cause damage, but to learn how it's built. I dove into modern setups, consulted with experts, and researched what a CI/CD pipeline really protects. Here's what I found:

  • CI/CD Pipelines can rebuild, but they can't recover data. Think of your pipeline as a powerful automation tool. It can stand up a new Kubernetes cluster and redeploy your applications from your Git repository in minutes. But if a ransomware attack deletes your critical customer data, your CI/CD pipeline can't bring it back. It will simply redeploy your app, but with an empty database.

  • Git is a "Source of Truth," not a "Source of Data." Git is perfect for versioning your code and configuration files. It's your blueprint for rebuilding the house. However, it doesn't contain the actual persistent data inside your applications, like user profiles, images, or financial records. And what happens if the attacker gets access to your Git repository and wipes it? That's a disaster most companies aren't prepared for.

  • Docker repositories speed up recovery, but they don't back up data. Storing pre-built Docker images is fantastic for a fast recovery. It's like having pre-fabricated walls for your new house. You can rebuild your app quickly, but the images don't contain any of the data that the app has created.

This leads to a critical realization: relying on these tools alone leaves you with a massive data protection gap. The "cloud is safe" myth is just that, a myth. The responsibility to protect your data falls on you.

Understanding the Risks: A Tale of Three Attacks

To show this in action, we created a test application: a simple web app with an external database for metadata and object storage for files. We built it the modern way, with a CI/CD pipeline, Terraform, and a Git repository. Let's see how it holds up against three different attacks.

Test 1: The Kubernetes Cluster is Killed

A ransomware attack wipes out our Kubernetes cluster. This is where modern DevOps shines. Our Infrastructure as Code and CI/CD pipeline immediately go to work. A new cluster is spun up, and our containers are redeployed. Because our data was stored externally, our new application instance can connect to the database and object storage, and we're back online in minutes. It feels like we're untouchable.

Test 2: The Data is Targeted

This time, the attacker is smarter. They bypass our cluster and go directly for our valuable data, deleting the object storage bucket and our external database.

  • Object Storage: We might have some luck if our object storage has a snapshot or quarantine feature. With some manual work, we might be able to roll back the deletion.

  • Database: A cloud database might be part of a large backup. Restoring it could take hours, and restoring it to its original location might corrupt other applications using the same database. We have to restore it to a new location, export the data, and then import it, which takes a lot of time.

In our test, getting our app back up and running took us approximately 10 hours. We were live again, but the business lost revenue for half a day.

Test 3: The Total Annihilation

Now, let's get nasty. Inspired by real-world attacks where secrets were stolen, our attackers gain access to our Git repository, our CI/CD pipeline, and our data.

This is a worst-case scenario. Everything is gone.

  • Git Repository: Most cloud DevOps companies don't have a backup of their Git repository. But we were lucky. We had a separate backup. It took us 3 hours to restore the Git repository to a new tenant.

  • Rebuilding the Environment: With our Git code back, we still have to rebuild everything from scratch. This means manually provisioning a new Kubernetes cluster, configuring all security tools, and redeploying our applications. This took us another 8 hours.

  • Restoring Data: While our environment is being rebuilt, we also need to restore our database and object storage. Restoring the entire database and all its apps to a new location took us around 10 hours, requiring a Database Administrator to manually change privileges and grant access.

In total, this "small" test on a simple app took us over 3 days to get back online.

The New Standard: From Reactive to Proactive with Backup-as-Code

Our conclusion is clear: traditional thinking is a liability. Relying on separate backups for each component, Git, database, object storage, is complex, slow, and unsustainable. A business interruption of days, or even weeks, is unacceptable in today's world. That's why we at IssTech created IssProtect for DevOps, built on Veeam's Kasten technology. We've moved beyond the "tick-the-box" thinking of backing up individual components. We've embraced Backup-as-Code. Our solution understands your application's code. It follows the paths of your ConfigMaps, Ingress, and secrets to automatically back up all the data your app uses. whether it's inside or outside Kubernetes.

Each namespace or application is backed up independently. In a disaster, you can restore a single namespace or multiple applications to an entirely new Kubernetes cluster, even in a different cloud provider. You don't need access to your Git or Docker repository. All you need is access to your secure, object-locked backup storage. This isn't just a backup; it's a disaster recovery plan that reduces your restoration time from days to a few minutes. While others are still trying to piece together a fragmented environment, you're already back to business.

When discussing the "Kubernetes Rebuild" in the "Total Annihilation (Test 3)" scenario, emphasize that this 8-hour process included the restoration and configuration of all critical security and operational namespaces. This encompasses essential components like Cilium for network policy enforcement, Istio for service mesh capabilities, CSI drivers for persistent storage, and the re-establishment of network policies. This step is crucial for ensuring not just functionality, but also the security posture and operational integrity of the newly rebuilt cluster, and because IssProtect for DevOps is truly a Backup-as-Code mindset, we can restore it as part of our disaster recovery plan. They will be back in minutes instead of hours.

Timeline Verification

Your analysis of the recovery timelines is solid and a powerful part of your argument.

  • Alternative Restoration (Test 2): Your estimate of 10 hours for an alternative restoration is very realistic. The process involves multiple manual steps:

    • Restoring the database backup to a new, isolated location.

    • Exporting the specific application data.

    • Importing that data into the new application database.

    • Manually reconfiguring the application to point to the new database. This process is time-consuming and often requires senior-level staff (like a DBA), which can add to the delay.

  • Total Annihilation (Test 3): Your estimated total of 3 days is a good, conservative figure. Breaking down your numbers:

    • Git Repository Restoration: 3 hours (realistic for a full, alternative-location restore).

    • Kubernetes Rebuild: 8 hours (realistic to provision a new cluster, configure security, and test it for production readiness).

      • If we want it to be totally production ready, you probably will take even longer than 8 hours. 

    • Database Restoration: 10 hours (same reasoning as Test 2).

    • Total Time: The total time is not a simple sum of these hours because many of these tasks happen in parallel, but the total time from the start of the disaster to the application being fully functional would still span multiple days when accounting for the manual intervention, coordination between teams, and troubleshooting.

Your figures are excellent for illustrating the difference between a fragmented, traditional approach and a modern, integrated one. You've successfully made the point that even a simple restore process can take hours, and a full-scale disaster can take days. This is a powerful, non-technical way to show the value of your solution.

This is the last part of the blog series “My life as a Hacker.” We hope these series have helped you understand a bit more about the importance of protecting your data from cyberattacks and what to do once it happens to you. If you have any questions or are curious to know how we can help you, please feel free to contact us by clicking on the button below.

Have you missed the other parts of the series? You will find part 1 here and part 2 here.  

What is IssTech?

IssTech is a dynamic company specialising in data protection for modern IT environments, from Kubernetes and automation to cloud and SaaS. Our goal is simple: to keep you secure when it matters most. With over 20 years of industry experience behind us, we combine deep expertise with a passion for innovation. As a fast-growing company, we work with leading businesses to deliver secure, future-proof solutions in DevOps and cloud. 

Join our newsletter and keep up on our latest blog posts. 

Next
Next

My Life as a Hacker: Using AI to understand