Latest SEP (Search Engine Poisoning) Research, Part 1
[This is the first of a series of blog posts providing some of the backstory on my upcoming RSA presentation on Search Engine Poisoning. There were a lot of screenshots and accompanying notes that simply wouldn't fit into a 45-minute presentation...]
Two years ago, I gave a presentation at RSA on Search Engine Poisoning. It was fun, but my malware research afterwards gradually moved on to other topics, since there's a lot of malware happening out there.
Oh, there was an occasional blog post when we noticed something new, but in general we prefer to focus more on the "pointy end" of malnets (the attack sites that actually serve the exploits and payloads), rather than on the earlier portions of the network (the vectors used to get the victims to the attack sites).
So why am I back at RSA this year with a presentation on SEP?
Mostly because of this:

It's a graphic from our mid-year report last Summer on Web security and malware, and it shows research from one of my colleagues (thanks, Tim!) on automatically calculating where malnet attacks originated. And SEP was the dominant #1, more than the others put together. (The calculated daily averages were as follows: SEP: 39.2%, Webmail: 6.9%, Porn 6.7%, SocNet 5.1%. I'll discuss the "Unrated" set in a minute.)
That got my attention. Wow, I wondered, how are the Bad Guys doing this, when Google and Bing are putting a lot of effort into keeping this stuff out of their search results? And so a research project was born...
But first, here's the corresponding chart from the just released Annual Report, summarizing the data for the last half of 2011:

(The average daily percentages with amount of change shown in [brackets] are: SEP 40.8% [+1.6], Webmail 14.7% [+7.8], SocNet 7.4% [+2.3], Porn 2.9% [-3.8].)
It's important to understand just what these charts are saying (and what they're not). The basic idea is to to chart data from our malnet tracking module. Every day, thousands of new URLs are flagged in real time as members of malicious or suspicious malnets, representing a lot of new sites coming on line. One topic of interest for us is how the would-be victims are being "led" to those sites.
(I should point out that we start with a basic assumption that our users are not deliberately surfing for malware; they're busy with other goals in mind when the malware "happens".)
So what are those other goals the users are busy with when they are ambushed by a malnet? To find out (or at least, approximate), the system starts with the site where we caught the Bad Stuff, and then traces back, to see which site the victim came from. Then it repeats the process until it gets to a "well-known" site. What's a "well-known" site? Basically, it's any large site that is known to not be (deliberately) evil. Good examples would be: search engines like Google and Bing; Webmail sites like Hotmail and Gmail; and social networking sites like Facebook and Twitter. (We also stop back-tracing when we hit known Porn sites, and sites in other categories, even though they're not necessarily "well-known" sites. You get the idea...)
Anyway, once we hit the well-known sites, it's a simple task to tabulate the results, and we found that most of the time we ended up on one of the "Big Four" categories, or Unrated*.
Note that these charts are NOT saying that 40% of all malware on the Internet is driven by search engine traffic. (And we're certainly not claiming that 40% of all visits to Search Engine sites result in a malware attack!) Remember that the malnet tracker is only one of the modules in WebPulse, and this data mostly represents its view of the world. We think it's an important view, chiefly because of the highly organized nature of the attacks driven by malnets, but it's not the only view.
But anyway, the SEP research started by simply wondering what the Bad Guys were doing that was (still) being so successful....
--C.L.
* Okay, so here's the quick scoop on Unrated. Some sites are very small and/or very new, and so they're not rated in our database. Sometimes there is an incomplete or missing referrer chain. Examples of this include requests that originate from e-mail clients like Outlook, where you're not in your browser when you're looking at your mail, but as soon as you click a link in one, your browser launches to display the page. Any other app that launches a browser to display a Web page would be similar. Also, some of our customers do not configure their systems to include the referring site when they call WebPulse. And so on... Even though I think that we've gotten a bit smarter since the first report, there is still a sizable chunk of malicious traffic where the referrer chain is uncertain.