Latest SEP (Search Engine Poisoning) Research, Part 4
[This is part four of a series of blog posts providing some of the backstory for my RSA presentation on Search Engine Poisoning. There was a lot of material that simply wouldn't fit into 45 minutes...]
RESEARCH QUESTION #2:
Seeing that no really interesting results -- well, at least, not enough for a conference-length presentation -- were going to come from the "who's the safest search engine?" research (see Part 3), I looked for something that promised to be more interesting.
Remember that this whole project started with our malnet-tracking tools (see Part 1): they start "at the pointy end of the harpoon" (the malware attack), trace back to see where attacks had begun, and tell us which were SEP attacks. In Part 3, I discussed the idea of taking that one step further, and having the tool, once it had determined that the attack had begun at a search engine site, look for the actual search terms the user had typed in.
Since the tool had been dutifully logging those search terms for some time, I had a ready stock of terms to type into the engines. As I looked at the lists from the database, day after day, I noticed that there were recurring "themes" in the searches that people had been doing, which had led them into the SEP attacks. The next interesting question, then, was "what types of searches are the most dangerous?", and this would drive the rest of the project, spawning another three sub-questions in the process.
Over 2300 "search term sets" were collected from a five-to-six week period in October and November 2011. Each of these was manually assigned to a specific category "bucket" -- such as Porn, Non-English, etc. -- with the set of buckets being based on categories that recurred noticeably as I browsed through several days' data. Here is the final table of results, with Definitions and Observations following:
DEFINITIONS AND OBSERVATIONS:
- Proxy/Unblocker: Basically, these were people searching for ways around the office/school Web Filter, and the SEP gangs were waiting for them. Not a huge number, but a pretty consistent category -- at least, on Mondays through Fridays. (The number was zero on Saturdays, Sundays, and Holidays.)
- Porn: Includes everything from the soft-porn area of "adult" content on up. Note that searches that could have gone into other buckets (e.g., Non-English, Celebrity, Video/Stream, etc.), that were clearly looking for porn-type content, went into this bucket.
- Non-English: In the "old days" of SEP research, we rarely came across link-farms that contained non-English content pages. (In other words, the link-spam might be propagated on a non-English site, like a Japanese BBS page, or a comment page in a Chinese forum, but when you followed the links, you ended up on link-farm pages that had content targeting English search terms.) Seeing this many SEP attacks targeting non-English content was an eye-opener. The search term sets were generally in Russian, Spanish, Portuguese, or Arabic, with a smattering of other languages.
- Celebrity: As mentioned, if the search term set was something like "[celebrity name] [porn terms]" then it was counted as Porn, not Celebrity. This eliminated most of the celebrities you've actually heard of. The "celebs" that were left were mostly people I'd never heard of -- I had to google them to find out who they were. (Oh, BTW, if I searched for a name and found out it was a porn star, then that also went into the Porn bucket.) Mostly, these turned out to be personalities on obscure cable TV shows I'd never heard of, or (female) news reporters from various cities around America, with a few book authors mixed in.
(This turned out to be an interesting enough area that it warranted its own phase of the research, which will be covered in a later post.)
- Video/Stream: People looking to watch movies, TV shows, or anime on-line. (Many of these searches were clearly for copyrighted material, in case you're wondering.)
- Specific Site: There are a lot of people who actually type complete domain names into Google or Bing, and then click on the top result, instead of simply typing it into their browser's address bar. (My kids do this. It drives me crazy.) The Bad Guys seem to know this, and have a lot of content designed to show up high in these sorts of searches. (BTW, a lot of these searches were clearly for sites in the Video/Stream bucket, so you can imagine that number being a bit bigger if I'd chosen to slice the data differently.)
- App/Software: These were split about 50/50 between people searching for mobile phone "apps" and those searching for more traditional software. Similarly to Video/Stream, many of these searches were obviously from people looking for "warez" (free) versions.
- Holiday: Since I was doing this research in October and November, I made a special category for all searches related to Halloween, Thanksgiving, and Christmas. (I didn't notice any for Hannukah or Kwanzaa, etc. If I had, they would have gone in this bucket as well.)
- Misc: Not really a shock, but still surprising to see just how many of the search term sets simply didn't fit into one of the other buckets. Most of the SEP activity is clearly focused on the "long tail" of the Web. (More thoughts on this in some of the later posts...)
In addition, by the time I was through the 2300+ search term sets, I had decided that if I were to do something like this again, I would break a couple of other recurring categories out of the "Misc" bucket:
- Health/Medical: Pretty self-explanatory. (Note however, that I don't remember seeing any "viagra" type searches in the SEP attack logs. Apparently, the Bad Guys are only after your money with such sites, not your computer.)
- Sample Letters: This one is interesting. There are plenty of people who are faced with a need to write a formal letter of some sort, and, no doubt due to the pernicious influence of first e-mail, and then texting and Facebook wall posts, are not sure how to proceed with anything like that. So they turn to Google or Bing and search for something like "sample letter for friend going through divorce" or "sample letter for child's school teacher". And again, the Bad Guys are attuned to this, and have prepared SEP content targeting those types of searches...
One other observation from this phase of the research: our database of search term sets also includes a "malnet ID code" for each entry. In other words, each of the malnets we track has an ID code, and our tracker tool knows which malnet it's tracking when it hits the search engine page on a back-trace, so it's easy to include that code along with the search term set.
It was very interesting to see that the same malnet ID code kept recurring for most of the Holiday search term sets. And a different code occurred over and over on Sample Letter searches. Several codes accounted for the majority of the Porn SEP attacks. And so forth. So the SEP gangs do tend to "specialize" somewhat in the content they serve to the search engines.
Here's a screenshot from the presentation illustrating an "old school" (but still successful) link-farm site, that happens to show some specialized content (Christmas-themed searches, in this case, mixed in with other content):
- This site is hosted on a "free host" domain. (One characteristic that makes it "old school".)
- 20 folders on the site, the red-box inset shows some of the files in one of the folders (there were 100 HTML pages in this one). So that's about 2000 pages in this link-farm.
- A typical link-farmer would have dozens and dozens of similar sites, so the total page count could easily reach into the hundreds of thousands.
- Notice that it was created in November of 2010. I collected this sample in November 2011, and it was showing up in searches on multiple search engines, so even though it's "old school" and a year old, it's still working (i.e., the engines haven't figured out that it's a malicious site)...