Twitter: thinking of retiring my 5 year old hp laserjet 1012 and springing for something wireless [...]

AOL Hits Rock Bottom AND Pisses Off Google

Aug 07, 2006 in ,

Usually after hitting rock bottom once, a company learns from its actions and fixes things. Apparently this is not the case with AOL. Earlier yesterday they released a 439MB file (~2GB uncompressed) of approximately 20 million search queries collected from about 650k unique users over roughly 3 months. The AOL research site (the relevant page of which has been taken down) claimed that the data was to be used to..

The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research.

Now the story gets interesting when you realize that AOL search is simply rebranded Google search. You can imagine what kind of a hissy-fit Google will get into now that sensitive information such as their most powerful keywords are thriving on P2P communities at this very moment. With the aforementioned file in hand, an SEO expert can find out which keywords work best and pay well for services like Google AdSense and Google AdWords. Now when a spammer studies this file, they will be having a field day. As one site put it, “Google is gonna get mega spammed.” Follow the rest of the blogosphere’s reaction on Techmeme.

The utter stupidity of this is staggering. AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the abilitiy to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.
Techcrunch

Promote this article on various sites or email to your friends:     



17 Comments

  1. What can you say? AOL is and will always stay my most hated ISP.
    Did you know AOL dishes out US-IPs in Germany?
    I guess I’ve got some SEO to do on my site.

  2. When I woke up and saw this hitting the headlines I couldn’t believe what I was reading. Just another reason to boycott AOL and switch to a proper broadband service.

  3. I couldn’t believe what I was reading too…

  4. AOL really broke the trust with its members.

  5. Is AOL on drugs again? They’ve really done it now. Looks like I got some SEO myself to do. :P Somebody’s gotta take advantage of these.

  6. How low can you go….

  7. Why is AOL still around, anyway?

  8. Nice write-up, did you puposedly not link to a backup of the file? I’m just wondering as other tech sites, especially Techcrunch, were called irreponsible by the blogsphere.
    Also you might be able by this other AOL (bad, really shockng ) news…from STLToday.com http://tinyurl.com/z2nku

  9. I didn’t link to it for the same reason I don’t link to piracy and bittorrent sites. In case something goes down, I don’t want to have linked to it. =)

  10. I was able to grab this download. Does anyone have any good ideas or even websites that have popped up on the best ways to analyze this data? So far I have just come up with simple things like filtering by url and looking at search terms resulting in clicks for that url.

  11. I think datamining just went mainstream

  12. Lesson to be learned: when you give your money to stupid companies, you enable them to do stupid things.

    I’m not going to touch this log file; it’s unethical and anyone who does look at it should be ashamed of themselves.

  1. [...] Paul notes that the AOL data is really Google data, since AOL search is rebranded Google. Zoli has the post that started it all. [...]

  2. [...] Je zal maar AOL klant zijn of gebruik maken van hun zoekmachine… Paul Stamatiou meldt dat AOL heeft afgelopen weekend een lijst van 20 miljoen zoekacties heeft gepubliceerd van 650.000 gebruikers. Ondanks dat de data is geanonimiseerd door de AOL identificatiecode te vervangen door een getal bevat het bestand wel hele persoonlijke informatie. TechCrunch geeft een overzicht van de mogelijkheden voor het combineren van de gegevens: AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the abilitiy to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box. - Techcrunch [...]

  3. [...] AOL search is simply rebranded Google search AOL released information about Web searches conducted by 658,000 of its members between March and May Elliot Black shows that a huge amount of social security numbers were included in the AOL data. Some more examples of the search keywords and phrases that could cause privacy problems can be found here. More bloggers covering the topic can be found here and here. AOL Search Data Shows Users Planning to commit Murder. Aol Releases Googles most prized Keyword List… Google is gonna get mega spammed. August 7th, 2006 at 9:45 am Mirrored on yousendit and rapidshare: http://www.yousendit.com/transfer.php?action=download&ufid= DDD1D4D0017BB5BE http://rapidshare.de/files/28486410/user-ct-test-collection-01.txt.gz http://rapidshare.de/files/28486473/user-ct-test-collection-02.txt.gz http://rapidshare.de/files/28487603/user-ct-test-collection-03.txt.gz http://rapidshare.de/files/28487606/user-ct-test-collection-04.txt.gz http://rapidshare.de/files/28490426/user-ct-test-collection-05.txt.gz http://rapidshare.de/files/28491016/user-ct-test-collection-06.txt.gz http://rapidshare.de/files/28491416/user-ct-test-collection-07.txt.gz http://rapidshare.de/files/28491781/user-ct-test-collection-08.txt.gz http://rapidshare.de/files/28492144/user-ct-test-collection-09.txt.gz http://rapidshare.de/files/28492729/user-ct-test-collection-10.txt.gz AOL’s accidental unleashing of hundreds of thousands of AOL customer’s private searches has already resulted in the discovery of at least one specific person. The New York Times explains how 62-year-old Thelma Arnold’s search keywords and phrases were revealed to all. No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men” to “dog that urinates on everything.” [...]

  4. [...] Paul notes that the AOL data is really Google data, since AOL search is rebranded Google. Zoli has the post that started it all. aol search, reason aol, social security numbers, asian pornography, xxxx, grep, queries, social secruity, scandal, txt, search engine research, research arena, tab delimited files, and card patterns [...]

  5. [...] AOL Hits Rock Bottom AND Pisses Off Google - PaulStamatiou.com Man this is a mess. on. so. many. levels. [...]

Post a comment, receive Stammy points.


Send a trackback.


  • If you plan on posting code, run it through Postable first.
Copyright © 2005 - 2008 PaulStamatiou.com  Privacy Policy - Terms of Service Can't spell my name? Use PSTAM.com. Go back up ↑.