How to PWN Facebook, or "Why Nothing on the Internet is Private"

by Scott Peacock

Disclaimer

Why I Wrote This

Why it's Easy

The Basic Structure of the Attack

Guessing Passwords

Crawling Profiles

Spreading through the Network

The End Game

Some Happier News

What to Do About It

Notes

Disclaimer

Top
Disclaimer: This document is a thought experiment which describes how a smart attacker could access and archive the vast majority of Facebook's content. I haven't taken any of these steps or actions, nor do I intend to. But I looked at what I think is possible, and I think that this attack is very possible and very simple. Don't shoot the messenger. I may have written with a slightly sensationalist and paranoid tone. Think nothing of it -- it's only because I'm naturally pretty tedious and I don't want you to get too bored. Most of the statistics I provide have been gleaned from various internet sources, and passed through the "does that seem about right?" filter. If you see something you disagree with, please let me know.

Why I Wrote this

Top
People think of Facebook as private - it's not. People think of it as secure - it's not. People seem to get the impression that you can post something on the internet and control what happens to it - you can't. Facebook has (relatively speaking) strong, granular privacy settings, and yet it all means nothing in the face of a moderately smart, moderately determined attacker, who can then distribute their findings however they wish. I'm not a big fan of that.

One thing I'm not going to talk about here is why privacy is important, or all the kinds of things an attacker might do with Facebook data (I hint at a couple). The risks involved with loss of privacy are the subject for another discussion (one that I hope to have at some point ;-) . This thought experiment is meant to consider what is possible.

Why it's Easy

Top
If I claimed that I could steal and archive all of Yahoo's content, or Hotmail's, or Gmail's, or (pick forum here), that would be pretty sensationalist. You'd challenge me to do it, and I wouldn't be able to do it. So why do I think it's possible (easy, even) to do with Facebook?

My guess is that when most people think of security for an online system, they think of something likeYahoo or Hotmail or Google. In order to 'get' someone's account, you have to crack their password. In order to steal most of the data, you have to crack most of the accounts and passwords (millions). Unfortunately Facebook isn't like other online systems. There are key differences, and all our old assumptions break hideously.

On Facebook, the attacker only needs to know the password of someone who can see your profile. This would be any of your friends, or (if you, like about 75% of people, didn't change the default settings) anyone on any of your networks. [1]

If you are a member of a large regional network, this could be any of 100,000+ people.

Now, how much do you want to bet that one of them has a bad password? I mean, a password that really stinks. Something like their login name, "password", "abc123" or a clever strengthening of it like "loginname1", "password1" or "pass1word".

Judging from the password research I've found available, this might be up in the region of 0.66% - pretty small. But if you're looking at 100,000 people, this is about 660, which is more than enough for an attacker.

Based on these numbers, on a network (or friends list) of 300 people, there's an 85% probability that you'll find someone with an atrocious password.

On a network of 150 people, this drops to a 63% chance, which is still not comforting, considering that the average # of friends a person has on facebook ranges from 130-160.

The attack leverages the same principle behind "six degrees of separation" (you've received the group invite, right? ;). Despite how hokey the handwaving behind that particular initiative may be, it's a heck of a powerful and well documented phenomenon. This attack spreads virally, because once you crack an account, you can attempt to crack the accounts of all their friends. An attack that spreads virally is no joke.

The Basic Structure of the Attack

Top
At the simplest level, the attack guesses bad passwords, downloads everything that the user can see, and attempts to crack the passwords of all the user's friends. In slightly more detail, it does the following:

  1. The attacker runs a script that tries to crack a user's password. It records the username on a master 'attempted' list.
  2. When the the password is cracked, a screen scraping script [2] is run which logs on as the user. It browses and archives everything that they can see.
  3. The script lifts a list of usernames (e-mail addresses) from discovered profiles
  4. The script checks the master list and attempts to crack every username (step 1) which has not yet been attempted
This attack depends on a methodical, patient approach which spreads virally and that is difficult to detect. It's quite effective, because it doesn't shoot for perfection and adopts a somewhat sloppy approach. The following sections look at different aspects of the attack in detail.

Guessing Passwords

Top
This section is long, because it's the most important, and the most difficult portion of the attack. Let's first examine some of the obstacles to guessing passwords for normal online services, and why it tends to take more effort than your average casual attacker is willing to expend.

The general security concensus is that it's fairly easy to discourage attackers from online password guessing. It's slow to begin with, due to network delays. If the login screen enforces a delay between attempts, it gets even slower. Add a lockout feature, and it limits the total number of attempts you can make. The core technique of automated password cracking (trying millions of combinations extremely rapidly) becomes impossible.

Facebook's login security is pretty good. I was impressed. You may only enter 5 wrong passwords without punishment. When I tried, it locked me out after the sixth wrong attempt. Not just a "try again in 10 minutes" lock-out, but a "you must reset your password" lockout. I don't believe that that bad password attempts are associated to the computer you are trying from, either, so trying from different computers will not give you more attempts. Six attempts really puts the screws to your malicious intent.

But the very heart and soul of this attack is the premise that you only have to crack a fraction of the existing accounts in order to gain near total coverage. But what is that fraction? A simplistic calculation would lead you to believe that since every person has 100+ friends, you'd only have to guess less than 1% of passwords to achive near total coverage. Add the network hole and this will drop further. Talking about a fraction of a percent of passwords almost seems plausible, doesn't it ;)

There's another massive advantage that this attack has which basic brute force attacks don't have: it only cares about bad passwords. I mean lousy passwords. I mean really bad, straight up terrible passwords, the worst of the worst.

If you google for something along the lines of "myspace password analysis" you'll discover that a phishing scam swiped some 100,000 usernames and passwords which were released into the wild. Security researchers were very happy - it gave them a chance to examine a large collection of real-world passwords, something they don't get to do very often ;) The result? 0.34% used the user portion of their e-mail address. The three (non-username) most popular passwords account for another massive 0.33 percent (I'll bet you've never heard the word massive used to describe 0.33% before ;) of passwords.

Bear in mind, I'm a patient, focused attacker. I don't need your password. I don't need every password. I don't need the password right away. My script won't waste its time - it will try to cherry pick the top passwords (oh, say thirty or so). It won't waste computing resources - it will try newly discovered accounts first, and will only return to crunching on difficult accounts if it's got nothing better to do.

If you didn't choose one of the thirty worst passwords known to mankind, I will never crack your password. But how sure are you that all of your friends are as diligent? This attack works because the (extremely) poor habits of the bottom percent or so are enough to undermine everyone else.

I suspect that bad passwords are somewhat evenly distributed across demographics. It seems to me more of a personality and personal interest issue than an age, location or occupation (perhaps not occupation) issue. What this means is that most people probably know someone with a bad password (6 degrees of separation, right?)

My script will be patient - it will only try 4 or 5 attempts at a time, after which it will let the account 'cool off' for a couple days. People who are influential on facebook (people with lots of friends; people whose passwords I really want to crack) will probably be pretty compulsive about logging on. [3] If the script lets the account sit for a couple days, they'll log on, and reset the counter ;-). In a month, I can try 60 different passwords - in a fortnight, 30. I'm not absolutely sure where the sweet spot for cool-off is. It might be as short as 24-36 hours. It's not really important - a smart script would analyze its database of previous attempts and adjust its cool-off period and its top password list as it went along.

If password guessing proves difficult, I'll just cheat and 'phish' a little bit to boost numbers. If I find things going really tough, I'll start making my script smarter, optimising my top password list, by crawling profiles for birthdates, partner's names and other stuff that people often use to create passwords. Password choosing tendancies are pretty well understood and documented. If you have personal information about someone, you can make some pretty specific guesses about what their password is likely to be.

I'll be smart and correlate their Facebook attack with attacks on other sites. If another site has poor security (let's say they don't lock you out at the login), I can crack passwords there and match up e-mail addresses with known facebook users. How many people 'share' passwords across different forums and websites? Even if passwords don't match directly, I can derive their password convention and make intelligent guesses.

The bottom line is that I'm confident in my ability to steal the (extremely small) critical mass of passwords needed to discover most (I don't really care about all, perfection is for math professors) profiles.

Crawling Profiles

Top
It's not a difficult task to write a web-spider that logs on as a user, visits their profile and the profiles of their friends, and dumps the data into a database. The basic term for this is "screen-scraping". Facebook doesn't know that it's not a real person asking for the HTML page ;)

This attack won't steal private messages - they're a piece of data that you need to crack a password to access. You'd have to crack about 50% of all accounts to get near total coverage of messages (everyone sees both halves of a conversation). But it will grab everything else. Wall posts, photos, groups, events, etc.

Probably the most important piece of information that the script will grab is the user's e-mail address, because it's also their username. I want to try and crack it, so I can see *their* friends. It's reasonable to expect that some people will not list their e-mail address publically. It's reasonable to expect that some people will change their listed e-mail address, so that it's not the same one as their user name. I don't care about those people - people who are security concious enough to keep their (already private) e-mail address hidden probably aren't using passwords that I'll have success cracking. The people who are my goldmine are those who have hideously bad passwords and who list their e-mail addresses. This attack just takes advantage of the fact that it's their bad security which is shooting you in the foot.

Facebook does check its logs for suspected scripting activity. If a 'user' browses every single friend in order, at lightening speed, it's probably not a real person. If it does *anything* at lightening speed, it's probably not a real person. This is why my script will be smart. It will have as many accounts as it likes to crawl. While it's being smart, waiting for an account to cool off, it will be stealing from another account. It will be processing accounts as soon as they are cracked - no need to wait for all passwords to be discovered before archiving begins. A good script will be able to keep a great many computers busy, all the time, running in parallel. All the while it will avoid being too hasty, remaining invisible to the logs. If you surf like a user, you're a user. If you're a trifle too aggressive, no biggie. Some random user (not you) gets their account locked, and you adjust your script to be a little more careful.

If I'm a halfway skilled hacker, I'm going to recruit a few extra machines for my purpose, and I'm going to make sure that no single computer is doing all this work. It just makes sense - I can work faster, and it's harder to detect, because generally a single computer doesn't access millions of facebook accounts by itself.

Spreading Through the Network

Top
As mentioned before, this attack spreads virally - most compromised subjects will yield 100-100,000 new accounts to try.

People aren't connected evenly with one another. Our relationships are clustered, mostly around geography. Think of the attack as a plague, and cities/regional networks as little (semi) isolated Caribbean islands of people who mostly know each other (although not everyone within the island will know everyone else within the island). Once you infect an island, the plague spreads quickly among all the residents. This includes those who never joined the regional network (they're fishing in canoes just off the island) - one of their friends who *has* joined the regional network will be exposed and fall sick, exposing them. The plague will quickly spread through schools and colleges on the island, infecting individuals, their families and their friends.

In order to spread from one island to another, the plague needs a certain number of 'shared people' or friendship connections. As mentioned before, if there are 300 connections, the chance of spreading is about 85%. For 600 it is about 98%. It should be fairly obvious that the plague won't have any difficulty spreading from island to island. Again, think of six degrees of separation, and realize that a given outbreak doesn't have to spread very far at all.

There will be isolated people and corners of the archipelago that the plague won't be able to reach. If all of your friends are security concious and you haven't joined any networks, you might be one of the few who fall through the cracks. The fewer friends you have and the less connected you are, the less the probability that you'll be infected. Unless, of course, you chose your login or 'password1' as your password. The vast majority of people will be infected - those who have left their profiles exposed to a large regional network are virtually guaranteed to catch the plague.

The End Game

Top
So, after the attacker successfully compromises Facebook... then what?

Once all the (needed) accounts are cracked and all the data archived, the script just has to keep up with new developments (no, I don't mean your feed) and new users. It will just log in periodically and rescan pages. It's not going to go over all your old wall posts again - it can just scrape until it hit the first duplicate. People who choose lousy passwords won't keep a rigid policy of changing them often - good times ;-) Any new users will have a crack attempted against them. After the initial crack, keeping an archive up to date is relatively simple.

Smart attackers won't be greedy. They won't do anything other than browse Facebook. They won't make fake wall posts or notes or anything like that - they'll just browse. They'll be invisible and non-intrusive (apart from the fact that they're groping through your private profile). And they'll be hard to catch.

Smart attackers will use what they know from Facebook (cracked passwords, personal information, etc) to try and crack e-mail addresses on other sites. Your e-mail password isn't based off of any "private" information revealed in your Facebook profile, is it? Identity thieves will have a field day.

If you can't think of something to do with (or some way to massively profit from) all the personal information on Facebook, you haven't thought very hard. Think of raw data as a little bit like crude oil. It doesn't do a whole lot of good to anyone if it's stuck in the ground, but if you can dig it out and refine it, it has a whole lot of value to a lot of people.

Think of writing a search tool that isolates all people in a city with wall posts mentioning "vacation", "away" and/or "feed (pet)". I can think of a few people who would pay some money to know that. But don't feed off the crumbs of my imagination - use your own to come up with some ways that you could abuse your profile data.

The point is that the data is not private. One of the big problems in computer security is that all it takes is one person to write an automated attack tool that anyone can use, and then anyone can attack. Additionally, once the data is archived, there's no way of controlling what happens to that archive.

Some Happier News

Top
Realistically, no one's going to archive *all* of Facebook. It's quite plausible that someone would grab data for a city or a region, but archiving *all* of Facebook is beyond the reach of the average attacker. They're certainly not going to steal your photographs. My back of the envelope calculations put the possible size of the Facebook photo collection at as much as 1100 TB (terabytes) which would require several thousand consumer 250GB hard disks to store. That's a lot of money to buy them, a lot of power to keep them going, and (more importantly) a humongous pipe to fill them up. Try downloading 1100 TB and see what your ISP has to say about that ;-) [4].

There's also the problem of masking the attack. Any attack will necessarily involve millions of password attempts and page requests. As previously mentioned, a single computer which spawns millions of login requests and generates millions of login sessions will stick out. It's a far cry from typical usage, and leaves one almighty footprint. This means that anyone who is seriously attempting to archive Facebook content on a grand scale will need to recruit some machines or play some tricks to stay hidden. In order to run the crawler at in a timely fashion, an attacker will certainly want to have multiple machines available so they can perform activities in parallel.

This won't necessarily deter someone who's interested in just a local portion of Facebook (their city perhaps), or someone who's looking for a specific profile. [5] Someone just has to write a crawler tool once, and anyone can use it - an attacker doesn't have to be smart or original to be a threat. If an attacker is willing to ignore photographs and surf from the public library, many of the practical obstacles are greatly diminished. In addition, petty hackers who aren't afraid of gathering a few zombie machines [6] will have no qualms about trying this kind of thing.

What to Do About It

Top
There is no way to guarantee that your profile will stay private. You cannot change this fact.

You can acknowledge it and adjust your profile content accordingly. You can choose what you post.

There are some things you can do to make it less likely that your profile will be exposed:

Notes

[1]
Yes, the default setting on Facebook is that all members of your networks can see your profile. If you didn't know this, I strongly recommend that you set your profile to "Friends only" and/or never join any regional networks. Exposing your profile to several thousand people is not exactly prudent. The exception to this is if you *want* to be found by strangers online, in which case you don't really care about privacy. I would consider this to be foolhardy but it remains your choice.Back

[2]
Screen scraping is a technique where you write a program that requests web-pages from a server, and which, instead of displaying them in a browser, looks through the source of the returned web-pages for information. It is most certainly against Facebook's terms of service, and running a web-bot that does it will most likely get you banned. We don't care about that though; we're attacking Facebook ;-) What we care about is hiding the fact that we're screen scraping so that we don't get caught. Back

[3]
Facebook brags that more than half of its active users log in at least once a day. Sweet! Back

[4]
My ISP says that at 30 GB per month residential high speed, I'd better be prepared to wait 3 years per TB, unless I want to pay their outrageous rates for exceeding my bandwidth cap. No thanks.Back

[5]
It would be really trivial to add a search option to a crawling script so that it stops when it reaches a specific profile, or to restrict its search to a limited area. The amount of traffic required to locate a single profile could be drastically reduced as you wouldn't be snooping through people's groups, events, wall archives, etc. Back

[6]
A "zombie computer" is a computer connected to the internet that has been compromised by a virus or a trojan which allows a cracker to control it. Typically zombies are grouped in "bot-nets" which are commonly used to launch denial-of-service attacks and general internet mischief. If a bad person wants to get at your Facebook data, getting the necessary resources won't be a technical or an ethical hurdle for them.Back