Data Engineering with Avishkar: Hadoop Architecture and its Usage at Facebook

February 20, 2013

Hadoop Architecture and its Usage at Facebook

Lots of data is generated on Facebook
– 300+ million active users
– 30 million users update their statuses at least once each day
– More than 1 billion photos uploaded each month
– More than 10 million videos uploaded each month
– More than 1 billion pieces of content (web links, news stories, blog posts, notes, photos, etc.) shared each week

Data Usage
Statistics per day:
– 4 TB of compressed new data added per day
– 135TB of compressed data scanned per day
– 7500+ Hive jobs on production cluster per day
– 80K compute hours per day
Barrier to entry is significantly reduced:
– New engineers go though a Hive training session
– ~200 people/month run jobs on Hadoop/Hive
– Analysts (non-engineers) use Hadoop through Hive

Where is this data stored?
Hadoop/Hive Warehouse
– 4800 cores, 5.5 PetaBytes
– 12 TB per node
– Two level network topology
1 Gbit/sec from node to rack switch
4 Gbit/sec to top level rack switch

Data Flow into Hadoop Cloud

Move old data to cheap storage

Data Engineering with Avishkar

February 20, 2013

Hadoop Architecture and its Usage at Facebook

No comments:

Creating DataFrames from CSV in Apache Spark

Search This Blog