Sunday, October 07, 2007

Internet's unsung guardians labor in obscurity to keep Web moving

Sunday, October 7, 2007

Ask Derek Schlecht what he does for a living and he'll tell you he's an IBX site engineer. He may then hit you with a string of technical jargon about cooling units and backup power systems.

What Schlecht's job really is, though, is keeping the Internet running.

Not the whole Internet, of course. By the very nature of the Web, there is no central control room. Schlecht's responsibility is to maintain a small piece of it. And it's thanks to thousands of people like him around the world that your home page shows up when you log on in the morning.

Schlecht works at the SV3 data center belonging to Equinix Inc., a nondescript low-slung building located next to a strip mall in San Jose. It's one of hundreds of sites across the globe where the Internet's biggest users hook up with each other.

Inside the cavernous facility, past stern-looking security guards and multiple sets of doors protected by biometric hand readers, lie the guts of the Internet. In giant rooms linked by a series of corridors, banks of server computers owned by telecommunications companies like AT&T and content providers such as digital game seller Electronic Arts are housed in dozens of locked cages. They connect to other servers at the site and ultimately to the rest of the Web via bundles of cable that exit the building, linking to the fiber-optic trunk lines that speed digital bits and bytes around the world.

Look up at the cables traveling overhead in long trays suspended from the ceiling and you could be seeing part of the path taken by an e-mail your mother sent you this morning.

Schlecht, one of four engineers at SV3, doesn't carry the weight of that precious e-mail on his shoulders as he makes his daily rounds. He's too intent on the task at hand, which could be changing lightbulbs one day and performing critical maintenance on a diesel generator the next.

"I don't think on that large a scale," said the 26-year-old Sunnyvale resident. "What's important is what's in front of me."

To appreciate the part Schlecht plays in keeping the Web going, you need to understand the nature of the Internet. It's really just a collection of computer networks that connect with each other - a network of networks.

Forget for the moment about Wi-Fi, which operates primarily on the edges of the wired world. The networks that compose the Internet must have hard-wire bridges between them to shuttle data back and forth.

"They need a physical location to interconnect and exchange traffic," said Equinix spokesman Jason Starr.

Around the world, hundreds of data centers like SV3 - called co-location facilities - perform that service. They provide space where the Web's biggest users can link up, creating the seamless network we take for granted.

The Bay Area has one of the world's densest concentrations of co-location centers, with 56 in the region, according to Tier 1 Research.

"People think of the Internet as incredibly massive," said Aristotle Balogh, chief technology officer at VeriSign Inc., the Mountain View company that runs software systems that keep the network operating. "It's really a couple of hundred data centers that interconnect and another 10,000 to 20,000 centers that house content and services."

That's both the genius and the weakness of the Web.

On the plus side, because there are hundreds of co-location sites, thousands of content sites and no center, there's little danger of a widespread Internet failure.

"You couldn't lose a single site and get rid of the Internet," said Ernest Holloway, Equinix's manager of facility operations.

But the fact that there are so many points of connection on the Internet means there are many points of vulnerability. Data centers can and do go down, often with significant consequences.

In July, a San Francisco co- location center belonging to 365 Main Inc. lost power for almost half of its customers for up to 45 minutes when some backup generators failed to function during a PG&E electrical failure. Several Web sites operating at the center, including community bulletin board Craigslist and virtual world site Second Life, went down. Craigslist says it was offline for 11 hours.

That's just the sort of minor catastrophe Schlecht is charged with preventing at SV3. His 7 a.m.-to-4 p.m. shift is devoted largely to the exacting routines of preventive maintenance.

His binder of monthly maintenance tasks is more than an inch thick.

During a recent tour of SV3, Schlecht showed off the emergency generators, the cooling equipment and the bundles of cable snaking below the ceiling.

"That's the Internet running over your head," he said.

With a trim build and a head of close-cropped hair, Schlecht has the earnest, polite manner of a soldier and speaks with the precision and literalness of an engineer.

"My job is to keep the building up and running," he said.

Uninterrupted power

Data centers like SV3 are simple in concept - essentially they're protected places where computer networks can plug into each other. But the execution is not simple. To keep everything running right, the center must provide uninterrupted power and keep computers and network equipment from overheating.

It must also keep out anybody who might want to do damage - whether they try to enter physically by the front door or digitally by hacking.

To the uninitiated, power failure might seem like the greatest threat. Yet, the 365 Main outage aside, it's not a tough technical problem to keep the lights on and the servers running. SV3 has five 2-megawatt diesel systems that kick in without lag time if outside power goes down.

"Responding to power outages is not a very stressful situation," Schlecht said.

The limiting factor in most data centers is keeping the equipment cool. All those computers, stuffed with microprocessors, churn out enormous amounts of heat - at SV3, the equivalent of 480 barbecues running continuously. Standing behind a bank of servers is like being in front of an open oven. If cooling systems crash, the room would heat up to the point of equipment failure within an hour.

To keep the center between 66 and 74 degrees - which to a computer feels like sweater weather - SV3 uses five 500-ton chiller units, enormous machines that circulate cooled water through the building. Schlecht's manuals spell out maintenance procedures on power, cooling and other systems in mind-boggling detail. The first step requires that two engineers sign a form verifying they are working on the right piece of equipment. Nothing is left to chance

Schlecht - a high school graduate with vocational training - has been working at SV3 since the site opened in 2001, when he was still a teenager. For most of that period, networking company AboveNet owned the facility. Several years ago, Equinix bought the site, named it SV3 and redesigned it to its own specifications. Schlecht approached Equinix and was hired. Now, barely in his mid-20s, he's by far the longest-tenured employee in the place.

"When someone says a panel number (for a piece of equipment), I know where it is," Schlecht said. "It's kind of freakish."

Schlecht's job has to do with Internet's hardware and physical infrastructure. He has counterparts who maintain critical software and make sure digital traffic is flowing smoothly through the Web.

One of them is Manoj Koshti, a senior network engineer at VeriSign.

VeriSign performs several functions vital to the Internet, including routing digital traffic and maintaining the collection of Web addresses known as the domain name system. You might say the company puts the dot in .com - or in .edu, for that matter. The company is one of several operators whose servers store data on Web addresses, which computers around the world can use instantly to make sure they're traveling to the right location on the Internet.

Koshti, 35, who is from Bhopal, India, works in a data center at VeriSign's Mountain View headquarters. Like Schlecht's, his workday is often routine - monitoring traffic to VeriSign's servers, making changes to software systems, updating and installing programs. Unlike his counterpart at Equinix, who roams a big data center, Koshti spends most of his day in his cubicle, watching computer screens.

Keeping it secure

Things can get exciting for Koshti. Because of the key role VeriSign plays, its servers are sometimes attacked by hackers with visions of bringing down the Internet by disabling the system of Web addresses. The fingerprints of such intruders can often be detected by unusual surges of traffic.

If they see something, Koshti and the 11 other members of his networking and security team "can pull the fire alarm," said Balogh, VeriSign's technology chief.

Koshti himself is matter-of-fact about this part of his job.

"We want to find out if somebody is doing something illicit," he said. "We have devices all over the world which find out what is the source and what that guy is trying to do. We see the bandwidth graph is increasing, so we get the alert. We find out who is trying to get to us, and then we block the source."

All in a day's work.

What Schlecht and Koshti have in common is a sensibility that turns the extraordinary into the ordinary. To them, the work is never grandiose. It's always about the task at hand, which they approach with calm, methodical competence.

Asked how often he has found himself in a situation at SV3 that got his adrenaline pumping, Schlecht thought for a moment.

"That's a big part of it," he replied. "Not getting your adrenaline pumping."

E-mail Sam Zuckerman at szuckerman@sfchronicle.com.

http://sfgate.com/cgi-bin/article.cgi?f=/c/a/2007/10/07/BUR6S4N9R.DTL

This article appeared on page F - 1 of the San Francisco Chronicle

No comments:

Blog Archive