How Do You Test Cyber-Security Response?

It was 11:15 a.m. on a Thursday morning. It started first with an outage and then chaos ensued. HipChat rooms were ablaze with ops folks’ stressy, emergency response chatter. A team of eight people sat in a room, palms sweaty, staring confusedly at www.warbyparker.com. Why, oh goodness, why? Everybody’s worst nightmare—your site’s been taken over:

Screen Shot 2016-05-05 at 2.14.11 PM

 

 

“Can you think up some fun tests for our Cyber Security Response team?”

Two weeks prior, a member of our Cyber Security Incident Response Team (CSIRT!) stopped by my desk. “We’re due for a test of our Cyber Security Incident Response Plan (CSIRP!). I’d love to make it a bit more of a surprise, maybe do some tabletop exercises? Can you think up some fun surprises for the team to work through?”

Over the next few days, my mind raced through various mean scenarios. I wondered if I could ask Brian Krebs to send an email to our PR inbox. “I’m about to publish a story about all your data being for sale on the darknet. It’s going live in one hour. Care to comment on the story?” That’d feel real! But it had no real technical element—it wasn’t quite a comprehensive test.

The cyber-security incident response plan includes elements like:

  • Customers: How do we communicate to them that something bad is happening? How do we communicate to our Customer Experience team? Do we need to prep our retail staff for customers asking questions on the floor of the store?
  • Legal: Do we need to call the FBI? Do we need to call our legal partners for guidance? What are our notification obligations? Do we need to bring in our insurance provider?
  • Technical: Can we stop or mitigate the current incident? Do we have forensics available? Should we escalate to our response partner? Is this resulting in an outage?  Can we recover?
  • HR: How do we communicate this to employees?

I needed something to exercise the bulk of that response plan.

Ultimately, I landed on a scenario where “bad guys” (i.e. me) would take over www.warbyparker.com and replace it with a nefarious ransom site. As we have an internal DNS, it would be possible to take over the site ONLY for employees—not for actual customers. That is: If you were in our corporate network and went to www.warbyparker.com, you’d see a site under ransom.

At 10:00 a.m. on Thursday I was running around to the executive team giving them a heads up on what was to come.

At 11:20 a.m. we started our dastardly plan:

  1. We changed our bind server’s www.warbyparker.com entry to point to a nefarious website http://wpsharklaser.github.io/site/
  2. We deleted ssh keys for most users from the bind server and blocked an easy technical resolution
  3. A ransom email went to our legal team

Screen Shot 2016-05-05 at 2.16.26 PM

The stress was immediately palpable to those not in the know:

Screen Shot 2016-05-05 at 2.16.34 PM

On the other hand, the three of us “bad actors” were gleeful as could be in our own HipChat room:

Screen Shot 2016-05-05 at 2.16.43 PM

The test really worked. It felt a lot like a real cybersecurity incident across the company. It was intense, lots of employees were worried…some scared!

The response team jumps into action!

11:26 a.m.: Our CSIRT springs into action. They realize quickly that we have a security incident and declare it so.

Minutes later, a HipChat went to our Customer Experience team advising them of the situation.

11:28 a.m.: An email went to all of our customer-facing leaders (Social Media, CX, Retail, etc):

Screen Shot 2016-05-05 at 2.16.50 PM

11:30 a.m.: We pulled our CTO out of an interview. We grabbed one of our CEOs out of a meeting to advise him and get approval on company-wide communications.

We called our PR agency to inform them about the (practice) situation.

We pretended to call our external outside legal counsel (yeah, we didn’t see a reason to actually pay them for a practice run).

We pretended to call our contacts at the FBI (we didn’t want to go to jail, so we didn’t actually call them).

11:35 a.m.: Our Security Engineering team cleared all non-security engineers from the incident and jumped into action, devising a plan to quickly snap an image of the existing DNS servers (for forensics reasons), terminate the compromised DNS servers, and bring up new DNS servers with limited access:

 

Screen Shot 2016-05-05 at 2.16.57 PM

11:46 a.m.: We emailed the entire Warby Parker team:

Screen Shot 2016-05-05 at 2.17.05 PM

 

By about noon we felt we had exercised a pretty comprehensive, effective response and started winding down.

Learnings

While we got communications out to our customer-facing team very rapidly (within minutes), we didn’t have pre-written communications ready for all employees—and we lost time both crafting the communications and getting them approved. It took us about 30 minutes to get communications to all employees. (Update: We have since drafted a communications plan for different scenarios so we have it ready to go immediately.)

Practice Makes Perfect

A lot of companies think cyber-security is largely an engineering issue. In some ways it is—detecting and preventing outages requires incredible engineering effort and know-how.

But cyber-security response is truly a communications issue. Your team has to learn to communicate efficiently in the midst of a crisis. The clock is on—your team can’t sit and wait an hour to get guidance on customer communication. Your staff on the floor can’t wait—they are getting customer inquires within minutes of the compromise.

You need to train and practice under duress and conditions as close to real as possible. I highly recommend all of you find similar cyber-security tests and run your team through them.

Posted Under:

Posted on