News

Anatomy of a Website Compromise

Recently I had the "pleasure" of cleaning up one of the websites we host, and encountered one of the sneakiest website compromises ("hacks") I've seen so far. I've decided to document the details, in case the information is of any use to other people whose sites have been compromised the same way. This incident is also a good example of how sophisticated (or at least sneaky) these attacks have become, and the amount of effort required to cleanup a site compromised in this way.

Read more for the gory details. Many website compromises are easy to spot - typically attackers will add code that redirects visitors to other sites (or opens pop-up windows in their web browser), or they add content to pages on the site, or they outright replace entire pages. This compromise was much more subtle though, and it was only noticed when spam links began appearing in the Google search results for that site.

When we were asked for help with the problem, I immediately started on the standard cleanup tasks:

  • creating a compressed backup of the infected site & the recent web logs (for later analysis)
  • searching through the site's files for words that were in the spam links/search results with "grep" (a UNIX tool that searches through files for specific text)
  • and creating a list of all the site's files sorted by their modification date

The "grep" search turned up several files that contained key words from the spam links/search results, which I promptly removed. The list of recently-modified files turned up some more infected files, mostly "backdoor" scripts hidden in the /images folder, which I also removed. Backdoor scripts are usually web-based file managers, which attackers can use to modify files on the website. To use a real-world analogy, it's like to breaking into a house by picking the front-door lock, and then making some other more subtle change to make it easy to break in again - like disabling the lock on a window.

 

So far, this was all fairly standard stuff that I'd seen before on other compromised sites. But the recently-modified files list also showed that the site's index.php file had been modified, which I found a bit surprising because there were no obvious signs of infection when viewing the site in a web browser. Checking the index.php file, I found an "include" statement that had been added & was used to load another file, which is where things started getting interesting.

Most scripting languages used for web sites have some sort of "include" function, that lets files load content or scripting code from separate files - that's typically done to make development easier, by organizing content (or portions of web-based applications into separate files). In this case, though, the include statement was loading a file with a .pdf extension, which had been hidden in the folder /home/username/mail/tmp (probably to make it harder to find).

When I examined the include'd .pdf file, it contained some binary data (in base64 format) along with PHP code used to decode that data. In other words, the file contained PHP code, and the most of the actual content had been obfuscated by converting it to base64 - and when the file was run (via the include statement in index.php), the base64 data was decoded & the obfuscated PHP code was run.

A brief explanation: attackers often use this method, because it's a fairly effective way to hide the malicious code they've uploaded to a website. Say, for example, an attacker modifies a website to add advertisements for fake rolex watches - normally, if the files are in plain ASCII (text), you can find the infected files by just running  a grep for the word "rolex". But if the attackers encode the content using base64 (or some other similar method), then a grep for "rolex" won't find any matches - because in base64, the word "rolex" would be encoded as "cm9sZXg=" instead, and it would only be converted to plain text when the PHP script is executed (typically by being viewed in a web browser).

So in this particular case, the attackers had taken some PHP & HTML code containing their spam advertisements and then they had run it through a base64 encoder. But they didn't stop there - instead they took the initial base64 encoded data, and then ran it through the base64 encoder again, effectively adding a 2nd layer of obfuscation to the file. It's most likely that this was done to in order to evade virus & malware (malicious software) scanners, which many servers use to detect these types of compromises. Many of these scanners will automatically decode base64-encoded data, but not all of them are sophisticated enough to detect that the decoded data is itself base64-encoded.

That's still not the end of the story, however - I was puzzled by the fact that I couldn't spot any modifications to the website when viewed in a browser, even before I removed the malicious file (and the include statement pointing to it). So I decided to take a copy of the file with the base64 encoded data & make some small modification to it, so it would simply display the decoded data when viewed in a web browser instead of actually running the code. The actual data was a PHP script, and a quite a long one (nearly 8,500 lines) - but the main thing that jumped is that contained code to check the IP address & user agent of any visitors against the indexing bots used by many search engines, primarily Google.

Remembering that the only publicly-visible results of the infection were some spam content in the Google search results for the compromised site, I used the "Fetch as Googlebot" tool (in Google's Webmaster Tools suite) to load the malicious page. Viewing the results of how the page appeared to Google's indexing bot finally gave me (almost) the complete story. It appears that the code was setup to detect if it was being viewed by Google's indexing bot, and display the spam content - while hiding it from visitors who viewed the page in a web browser (probably to reduce the chance that the website's owners or visitors would spot the compromise).

As for the exact goal of the attackers, it's difficult to say from looking at the code. They were able to add code to the site that added spam content to that site's Google results, but there were no actual links visible in the search results - the links were only visible through Google's cached copies of those pages. So it's unlikely that anyone would have clicked on the links, which largely defeats the purpose of adding the spam content. I believe that the most likely explanation is that the attackers were trying to boost the Google ranking of the site(s) that the spam links pointed to, essentially trying to leech off the Google ranking of a legitimate site in order to boost their own search ranking. And a few of the links also appeared to contain affiliate IDs, so it's possible that the attackers were attempting to drive traffic to their affiliate link in order to defraud another site's affiliate system.

Lastly, it appears that the attackers were able to compromise the site by exploiting a security vulnerability in an old, unused (and out-of-date) copy of WordPress that was present on the site. Even though the current site isn't using WordPress, the attackers were still able to break in using a WordPress vulnerability (or a vulnerability in one of its plugins) - and they were able to use that access to modify the active website.

Unfortunately there's no magic bullet to prevent these types of attacks - although there are a few general practices that will help. If your website runs WordPress, make sure it's updated regularly (along with any installed plugins). Don't leave un-used applications installed on your website, because they may contain security vulnerabilities that attackers can use as a backdoor into your site. And it's a good idea to check the Google search results for your site from time to time, because that can help alert you to problems that may not be immediately apparent.






Comments

Linux and Windows web hosting plans start at just $7.95/mo.