Facebook’s Project Storm – Crash Data Centers in Real-World Stress Test

Posted on September 5, 2016

“It’s easier to take a data center down than to put it back together,” says Facebook vice president of engineering Jay Parikh. But the company’s software engineers are getting better at the putting-it-back-together part, thanks to a series of regular stress tests conducted on Facebook’s operational network by the company’s disaster special weapons and tactics, or SWAT, team. Parikh described the effort, dubbed “Project Storm,” to the audience of invited engineers at the third annual @Scale conference held in San Jose this week. @Scale brings together engineers who build or maintain systems designed for vast numbers of users, including companies like Google, Airbnb, Dropbox, Spotify, Netflix, and others.

Facebook’s Project Storm originated in the wake of 2012’s Hurricane Sandy, Parikh reported. The superstorm threatened two of Facebook’s data centers, each carrying tens of terabits of traffic. Both got through Sandy unscathed, Parikh said, but watching the storm’s progress led the engineering team to consider what exactly would be the impact on Facebook’s global services if the company did indeed suddenly lose a data center or an entire region. The company assembled a SWAT team comprising the leaders of the various Facebook technology groups, who, in turn, marshaled the entire engineering workforce to figure out the answers.

The group began running a number of tests and fine-tuning mechanisms for shifting traffic should a data center drop from the network, Parikh reported. They created tools and checklists of tasks both manual and automated; and they set time standards for completing each task. We wanted, Parikh said, “to run like a pit stop at a race; to get everything fixed on the car in the shortest period of time, realizing, however, that this is like taking apart an aircraft carrier and putting it back together in a few hours, not just taking apart a toy that I got for Christmas.”

In 2014, Parikh decided Project Storm was ready for a real-world test: The team would take down an actual data center during a normal working day and see if they could orchestrate the traffic shift smoothly.

Other Facebook leaders didn’t think he’d actually do it, Parikh recalls. “I was having coffee with a colleague just before the first drill. He said, ‘You’re not going to go through with it; you’ve done all the prep work, so you’re done, right?’ I told him, ‘There’s only one way to find out’” if it works.

That first takedown, which involved virtually the entire engineering team and a lot of people from the rest of the company, turned out to be a bit of a mess—at least from the inside. But users didn’t appear to notice. Parikh presented a chart tracking the traffic loads on various software systems—something that should have displayed smooth curves. “If you’re an engineer and see a graph like that, you’ve got bad data, your control system is not working right, or you have no idea what you’re doing.”

The Project Storm team forged ahead, continuing to hit Facebook’s networks with stress tests—although, Parikh recalls, there never seemed to be a good time to do them. “Something always ended up happening in the world or the company. One was during the World Cup final, another during a major product launch.” And the switchovers got smoother.

The live takedowns continue today, with the Project Storm team members coming up with crazier and crazier ambitions for just what to take offline, Parikh says. “You need to push yourself to an uncomfortable place to get better.”

spectrum.

Suitcase with stamps flags

Los Angeles downtown skyline at sunset

Appetizing queen prawn brochette

Google desperately wants to win over geeks’ hearts

Which states in America have the best and worst internet?

Google will cite where its song lyrics come from following Genius dispute

Fly to Los Angeles for under $200

Burj Khalifa – The World Tallest Tower

Sea Dragon Pirates Cruise in Panama City Beach, Fl

Los Angeles downtown skyline at sunset

White architecture on Santorini island

Phoenix city lights at dusk

Benefit Cosmetics Benefit Cosmetics GALifornia Mini for $15.00 Buy 1, Get 30% Off 1 Free Gift with Purchase

Bright Eye Illuminating Device $29.97

Urban Decay Vice Special Effects $9 was $18

Kendall Jenner Sports the Ultimate Rain-Ready Updo in New York City

How Safe Is Your CBD? The FDA Is Asking—And You Should, Too

Chrissy Teigen Reveals Her Secret to Perfect Legs

Tallia Orange Men’s Modern-Fit Silver Plaid Floral Dinner Jacket for $52.96 from $350.00

Ocean & Coast® Nano Vest

ଆମ ସମାଜର ଏକ ଅପ୍ରିୟ ସତ୍ୟ

Calvin Klein Calvin Klein Jacket With Seaming for $55

2017 American music awards

Suitcase with stamps flags

Fridas Mexican Cuisine in Cumming

Pumpkin Brioche

Cole Slaw with Yogurt Dressing

Fridas Mexican Cuisine in Cumming

Pumpkin Brioche

Cole Slaw with Yogurt Dressing

What is the sun made out of?

How do pilots steer an airplane?

Why can’t I see just after I turn off the lights?

weightloss using Keto diet

Remembering Gabe Grunewald: Her Thoughts on Running, Cancer, and Inspiring Thousands

Does Cheerios Cause Cancer? Everything You Need to Know About Glyphosate

LeBron Deal With Reebok

SEC Summerfest July15 – Centennial Olympic Park Atlanta

gym and fitness

weightloss using Keto diet

Remembering Gabe Grunewald: Her Thoughts on Running, Cancer, and Inspiring Thousands

Does Cheerios Cause Cancer? Everything You Need to Know About Glyphosate

Law of Karma

CO-INCIDENCE OF LIFE

Health benefit of lemon

Player’s hand with tennis ball preparing to serve

Woman playing tennis and waiting for the service

red groats”, is a sweet fruit dish from Denmark and Germany

Leave a Reply Cancel reply

decoding-hidden-meanings-olympic-symbols

FreedomPop’s Connected Car Kit to make car into a mobile hotspot