Content marking for Google: Using Data Highlighter

The purpose of this document is to explore the Data Highlighter and to serve as a step by step manual for usage of this tool.

This guide was created for the content team of websites, with the intention of releasing the content team (and the site) of the need to employ professional programmers to apply code markup.

1.General introduction: content marking for search engines

In the past, search engines were able to index information served on web pages, and retrieve it when a search was performed for relevant “keywords” (ie- words that exist on the page or used as links to the page), meaning- using keywords, search engines knew how to identify that certain information exists on the page, but not to understand it. In recent years, search engines companies worked hard to develop methods not only to index and retrieve information based on keywords, but also to “understand” the content of the information and it’s character, in order to be able to serve the most relevant information as an answer to a broader range of queries, not limited just to the keywords of the page. One of the tools which aid search engines in understanding the type of information being indexed is data marking. In this method, cretin specific code tags are used to indicate what type of information exists in the indexed content. There are several different methods to mark data, and while all of them are meant to assure that the content will be recognized by SE, they are all still recommendations only, and none of them can assure that SE will indeed accept and make use of the data that was marked. However, in most cases, and especially with Google, the data is indeed being used. When the marked data is accepted and put to use, the marked page and content receive a different visual representation in search results, usually much broader and more attractive than usual results. Such enhanced visual representations give a distinct advantage on other search results and competing sites, thus contributing to increased traffic volume and mostly- to the attraction of users who are more focused to the site’s content (and as a result- an improvement in conversion ratio and other usability parameters). Until recently, all forms of content marking were based on adding code snippets to the page. This markup is somewhat cumbersome, and requires many small adaptations to be made to the snippets, thus usually requiring the assistance of a programmer (which meant time and money consuming work). As a result, many websites avoided content marking, and those who did perform it- gained a tremendous advantage over the competition. Google has recently released a new tool- the Data Highlighter- which enables users to quickly and easily mark content without the use of code alterations, and get instant response as for the validity of the content markup.

Data Highlighter- A Goggle tool for codeless content marking The ease of use of this new tool is quite impressive- less than a minute to mark a page. In addition, the Data Highlighter has an auto-tagging ability, which allows it to automatically suggest content marking, based on examples provided by the user.

The ease of use of this tool, especially when compared to code marking, has made it the tool of choice for content marking, especially when time related content is concerned.

However, at this point, it should be noted that as this is a Google tool, the marking made with this tool can only be utilized by Google, and other SE’s will not have access to the content markings (the content will still be available). While this is not a problem in some markets, Where Google is the only viable SE (ie Israel, Croatia etc), it could potentially pose a problem in markets where Google is not the only big player (ie Russia, China and even the US). In such cases, where other SE’s are crucial to the success of the site, it would be advised to continue using code marking.

When both methods are applied simultaneously to the same content, Google will not allow the Data Highlighter markup and follow only the code markup.

Another point that is important to remember, is that any site that has content that can be marked, may mark it- there is no ownership of markup in these cases. For example- if website A contains information about an exhibition in the Smithsonian, it can be marked on it, and not only on the Smithsonian website.

 

2.1.Accessing the Data Highlighter and stating to use it

The Access to the Data Highlighter tool is done through the site’s Google Webmasters Tools account (http://www.google.com/webmasters/). It is located in the GWT, under the Optimization tab. If this is the first time that this tool is being used on this website, you will see the welcome screen. Start by pressing the blue “Start Highlighting” button.

Screenshot: the Data highlighter tool under the Optimization tab, and the “start highlighting” button on the welcome screen

the Data highlighter tool under the Optimization tab, and the “start highlighting” button on the welcome screen

If this is not the first time this tool is being used on this website, then the opening screen would look a bit different, and the “start highlighting” button would be red, and located in a different place

start highlighting” button would be red, and located in a different place

After pressing the “start highlighting” button, a dialoged box will appear. Enter the URL of the page that contains the data you want to mark. Note that the entire URL must be entered, including the protocol (httporhttps). In the example below, if we would try to continue- an error message will appear, as the protocol was not entered.

Enter the URL of the page that contains the data you want to mark- Data Highlighter

After the URL was typed, the appropriate content type must be chosen. These are the types currently available:

appropriate content type must be chosen - Data-Highlighter

  • Articles: Textual content, characterized as an article, similar to magazine, newspaper or even blog content. Requires the marking of at least a title and an author. For more information on this type of content, see https://support.google.com/webmasters/bin/answer.py?hl=en&answer=3108687&topic=2774098&ctx=topic
  • Events: content regarding actual events with a time aspect. Requires the marking of at least a name, date and address. This content type is meant for events such as concerts, shows, festivals etc, not any content that has a time aspect (ie a sale of a weekend in a hotel is not considered an event, but a weekend of special sales in a market place is, if the data is organized in the right way. For more information on this type of content, seehttps://support.google.com/webmasters/bin/answer.py?hl=en&answer=2774099&topic=2774098&ctx=topic
  • Local Business: This content type refers to physical places of business, for which you are able to supply (at least) a name, address and phone, like hotels, shops etc. Through this markup you can also supply reviews and opinions for the place of business. For more information on this type of content, see https://support.google.com/webmasters/bin/answer.py?hl=en&answer=3106959&topic=2774098&ctx=topic
  • Restaurant: referrers to restaurants that may supply a physical address only. This is a private case of Business markup, and requires at least information about a name, address and phone. For more information on this type of content, see https://support.google.com/webmasters/bin/answer.py?hl=en&answer=3110704&topic=2774098&ctx=topic
  • Movie: marking information about movies- requires at least information about the movie title. For more information on this type of content, see https://support.google.com/webmasters/bin/answer.py?hl=en&answer=3110700&topic=2774098&ctx=topic
  • TV episode: marking information about TV episodes. Requires at least the name of the series, season number and episode number. For more information, see https://support.google.com/webmasters/bin/answer.py?hl=en&answer=3110839&topic=2774098&ctx=topic
  • Software: marking information about software products (but not the software- not the code itself). Requires only information about the product name. For more information on this type of content, see https://support.google.com/webmasters/bin/answer.py?hl=en&answer=3110870&topic=2774098&ctx=topic
  • Product: currently under trial due to misuse for promotions. Currently this content type is still available for marking in the data highlighter tool, and as long as it is available it is the most relevant and most attractive markup for website operators, as almost anything can be marked as a product (hotel weekend, bartender course, car insurance)- any data that contains a name and a price. No link is available for Google information sources as this data type is still under consideration.

 

For the sake of simplicity, we will demonstrate in this guide how to operate the data highlighter tool using the events content type as an example. The tool works in the same manner for all other content types.

The tool works in the same manner for all other content types. - Data-Highlighter

The remaining options in the above dialogue box- “tag this page and others like it” or ”tag just this page” allow the user to chose between two work modes: marking just one page (“Tag just this page”) or marking a sample page that would later be used by the tool as the basis for auto-marking of other pages. When the structure in which content is displayed is not unique, it is advisable to choose the first option, and utilize the tools learning and auto marking abilities. The tool does a good job at identifying content, and also requires the user’s confirmation, so there is little risk that inadequate date will be marked.

2.2.Content markup process

After typing the URL, choosing the content type and choosing a single or multiple page markups and clicking OK, the system will load the page we chose into the Highlighter tool. At the top of the page, there is a “progress bar”, showing the user’s current location in the marking process.

 

At the top of the page, there is a “progress bar”, showing the user’s current location in the marking process.- data-highlighter

On the right hand side of the loaded page, a box appears with all the tags that may be applied for this data type- some are obligatory (marked “required”), while others are optional. For example, in the screenshot below, we can see that the name, date and address tags are shown as required for the “Events” data type.

box appears with all the tags that may be applied for this data type - Data-Highlighter

At this point, the user should highlight the text that should be marked- press and hold the left mouse button, and drag the marker over the segment of text that should be marked. Immediately after highlighting, a menu would pop up, offering to choose the tag we wish to apply for this content. In the example shown on the following screenshot, a random segment of text was chosen and highlighted (marked yellow), the tag choosing menu is shown to the right of the highlighted text, and on the right side of the page the tag box is shown.

choose a tag in the tags box- data-highlighter

After choosing the tag we want to apply to the chosen text, the tag menu will disappear, and in the tag box (right hand side of the page) the highlighted text will appear in the box of the chosen tag. The following screenshot demonstrates how the page looks after choosing which tag to apply. Note the possibility to cancel the chosen text by clicking the X to the right of the tag

choose a tag name in the tags box- data-highlighter

This stage should be repeated until all the content that can be marked has been tagged with the appropriate tags. The following table explains the content that can be fed to each tag:

content that can be fed to each tagcontent that can be fed to each tag

The above table is of course for the type of content we chose as the example in this guide- Events. For each content type, an appropriate table can be found in the links attached above (in the description of each content type), or at the Data Highlighter support center: https://support.google.com/webmasters/bin/topic.py?hl=en&topic=2774098&parent=2692946&ctx=topic

Going back to our example: the name tag is a specifically problematic issue, as it is not always easy to follow what is acceptable and what isn’t under Google’s guidelines. Here are a few examples for good choices (source- Google Webmasters Tools))

name tag suggestions

The following are additional general recommendations for tagging content (source-GWT)