Today, countless websites are facing the epic amounts of online data that first hit Facebook a half decade ago. But according to Facebook engineering bigwig Jay Parikh, these sites have it so much easier.
That’s because many of the web’s largest operations — including Facebook — spent the last few years building massive software platforms capable of juggling online information across tens of thousands of servers. And they’ve shared much of this “Big Data” software with anyone who wants it.
Together with Yahoo, Facebook spearheaded the creation of Hadoop, a sweeping software platform for processing and analyzing the epic amounts of data streaming across the modern web. Yahoo started the open source project as a way of constructing the index that underpinned its web search engine, but others soon plugged it into their own online operations — and worked to enhance the code as necessary.
The result is a platform that can juggle as much as 100 petabytes of data — aka hundreds of millions of gigabytes. “Five years ago, when we started on these technologies, there were limitations on what we could do and how fast we could grow. What’s happened with the open source community is that a lot of those limitations, those hindrances, have been removed,” says Parikh, who oversees the vast hardware and software infrastructure that drives Facebook. “People are now able to go through the tunnel a lot faster than we did.”