Controlling Search Engine Access with Cookies & Session IDs ~ Encyclopedia - Online Marketing With Google Yahoo MSN

Monday, February 4, 2008

Controlling Search Engine Access with Cookies & Session IDs

What's a Cookie?

A cookie is a small text file that websites can leave on a visitor's hard disk, helping them to track that person over time. Cookies are the reason Amazon.com remembers your username between visits and the reason you don't necessarily need to log in to your Hotmail account every time you open your browser. Cookie data typically contains a short set of information about when you last accessed a site, an ID number, and, potentially, information about your visit.

As a website developer, you can create options to remember your visitors using cookies for tracking purposes or to display different information to users based on their actions or preferences. Common uses include remembering a username, maintaining a shopping cart, or keeping track of previously viewed content. For example, if you've signed up for an account with SEOmoz, we'll give you options in your My Account page about how you want to view the blog and remember that the next time you visit.

What are Session IDs?

Session IDs are virtually identical to cookies in functionality, with one big difference. Upon closing your browser (or re-starting), session ID information is no longer stored on your hard drive (usually). The website you were interacting with may remember your data or actions, but they cannot retrieve session IDs from your machine that don't persist (and session IDs by default expire when the browser shuts down). In essence, they're more like temporary cookies (although, as we'll see below, there are options to control this).

While technically speaking, session IDs are just a form of cookies without an expiration date, it is possible to set session IDs with expiration dates similar to cookies (going out decades). In this sense, they're virtually identical to cookies. Session IDs do come with an important caveat, though - they are frequently used in the URL string, which can create serious problems for search engines (as every request produces a unique URL with duplicate content). A simple fix, however, uses conditional 301-redirecting to show bots a non-sessioned version of the page (described in detail here - search engine friendly cloaking by removing session IDs).

IMPORTANT NOTE: Any user has the ability to turn off both session IDs and/or cookies in their browser settings. This often makes web browsing considerably more difficult, and many sites will actually display a page saying that cookies/sessions are required to view their content or interact. Cookies, persistent though they may be, are also deleted by users on a semi-regular basis. This Comscore study from 2007 found that 33% of web users deleted their cookies at least once per month.

How do Search Engines Interpret Cookies & Session IDs?

They don't. Search engine spiders aren't built to accept cookies or session IDs and act as browsers with this functionality shut off. However, unlike visitors with non-cookie-accepting browsers, the crawlers can sometimes reach sequestered content by virtue of webmasters who want to specifically let them through. Many sites will have pages that require cookie or sessions to be enabled, but will have special rules for search engine bots, permitting them to access the content as well. Although this is technically cloaking, search engines generally allow this type of segmented content delivery.

Despite the occasional access granted engines to cookie/session-restricted pages, the vast majority of cookie & session ID usage create content, links, and pages that limit access. As web developers, we can leverage the power of this "accepted cloaking" to build more intelligent sites and pages that function in optimal ways for both humans and engines.

Why Would I Want to Use Cookies or Session IDs to Control Search Engine Access?

There are numerous potential tactics to leverage cookies and session IDs for search engine control. Below, I've listed many of the major strategies one can implement with these tools, but there are certainly limitless other possibilities:

  • Show Multiple Navigation Paths While Sculpting the Flow of Link Juice

Visitors to a website often have complex needs for the ways in which they'd like to view or access content. Your site may benefit from offering many paths to reaching content (by date, topic, tag, relationship, ratings, etc), but expend PageRank or link juice that would be better optimized focusing on a single, search-engine-friendly navigational structure. By showing one group of navigation to cookied users and another to the engines, you can effectively have your cake and eat it, too.

  • Keep Limited Pieces of a Page's Content Out of the Engines' Indices

Many pages may contain both content that you'd like to show to search engines and pieces you'd prefer only appeared for human visitors. These could include ads, login-restricted information, links, or even rich media. Once again, showing non-cookied users the plain version and cookie-accepting visitors the extended information can be invaluable. Note that this is often used in conjunction with a login, so only registered users can access the full content (think sites like Facebook or LinkedIn).

  • Grant Access to "Human-Only" or "Registered User-Only" Pages

As with snippets of content, there are often entire pages or sections of a site on which you'd like to restrict search engine access. This can be easily accomplished with cookies/sessions and even help to bring in search traffic that may convert to "registered-user" status. For example, if you had desirable content that you wished to restrict, you could create a page with a short snippet and an offer to continue reading upon registration, which would then allow access to that work at the same URL. Registered visitors would continue linking to the same URL spiders index and rank, yet wouldn't give away the content for free in a cached version. Be aware that in these instances, the search engines will only be able to "see" the content you've listed on the non-registered-user page, so be careful to target your titles and snippets with keywords to receive the most traffic possible. You can see examples of this at sites like the Economist, the New York Times, and WebMasterWorld.

  • Avoid Duplicate Content Issues

One of the most promising areas for cookie/session use is to prohibit spiders from reaching multiple versions of the same content, while allowing visitors to get the version they prefer. As an example, here at SEOmoz, logged-in users can see full blog entries on our blog homepage, but search engines and non-registered users will see only the snippets. This prevents our content from being listed on multiple pages (the blog homepage and the specific post pages), and provides a positive user experience for our members. I discussed this specifically in this post on dealing with pagination and duplicate content on blogs.

  • Display Content Based on a User's Actions or Patterns of Action

Many sites like to keep track of their users' activities and serve targeted content that is more likely to fit their interests. In the case of many media websites, this means advertising, while for e-commerce sites, it's more likely to be related or recently-viewed products. Bluefly.com is a good example of this - showing visitors the clothing they've most recently browsed.


Source: http://www.seomoz.org/blog/controlling-search-engine-access-with-cookies-session-ids

No comments: