The purpose of this document is to explore the Data Highlighter and to serve as a step by step manual for usage of this tool.
This guide was created for the content team of websites, with the intention of releasing the content team (and the site) of the need to employ professional programmers to apply code markup.
1. General introduction: content marking for search engines
In the past, search engines were able to index information served on web pages, and retrieve it when a search was performed for relevant “keywords” (ie- words that exist on the page or used as links to the page), meaning- using keywords, search engines knew how to identify that certain information exists on the page, but not to understand it. In recent years, search engines companies worked hard to develop methods not only to index and retrieve information based on keywords, but also to “understand” the content of the information and its character, in order to be able to serve the most relevant information as an answer to a broader range of queries, not limited just to the keywords of the page.
One of the tools which aid search engines in understanding the type of information being indexed is data marking. In this method, cretin specific code tags are used to indicate what type of information exists in the indexed content. There are several different methods to mark data, and while all of them are meant to assure that the content will be recognized by SE, they are all still recommendations only, and none of them can assure that SE will indeed accept and make use of the data that was marked. However, in most cases, and especially with Google, the data is indeed being used. When the marked data is accepted and put to use, the marked page and content receive a different visual representation in search results, usually much broader and more attractive than usual results. Such enhanced visual representations give a distinct advantage on other search results and competing sites, thus contributing to increased traffic volume and mostly- to the attraction of users who are more focused to the site’s content (and as a result- an improvement in conversion ratio and other usability parameters).
Until recently, all forms of content marking were based on adding code snippets to the page. This markup is somewhat cumbersome, and requires many small adaptations to be made to the snippets, thus usually requiring the assistance of a programmer (which meant time and money consuming work). As a result, many websites avoided content marking, and those who did perform it- gained a tremendous advantage over the competition. Google has recently released a new tool- the Data Highlighter- which enables users to quickly and easily mark content without the use of code alterations, and get instant response as for the validity of the content markup.
Data Highlighter- A Goggle tool for codeless content marking the ease of use of this new tool is quite impressive- less than a minute to mark a page. In addition, the Data Highlighter has an auto-tagging ability, which allows it to automatically suggest content marking, based on examples provided by the user.
The ease of use of this tool, especially when compared to code marking, has made it the tool of choice for content marking, especially when time related content is concerned.
However, at this point, it should be noted that as this is a Google tool, the marking made with this tool can only be utilized by Google, and other SE’s will not have access to the content markings (the content will still be available). While this is not a problem in some markets, Where Google is the only viable SE (i.e. Israel, Croatia etc.), it could potentially pose a problem in markets where Google is not the only big player (i.e. Russia, China and even the US). In such cases, where other SE’s are crucial to the success of the site, it would be advised to continue using code marking.
When both methods are applied simultaneously to the same content, Google will not allow the Data Highlighter markup and follow only the code markup.
Another point that is important to remember, is that any site that has content that can be marked, may mark it- there is no ownership of markup in these cases. For example- if website A contains information about an exhibition in the Smithsonian, it can be marked on it, and not only on the Smithsonian website.
2.1. Accessing the Data Highlighter and stating to use it
The Access to the Data Highlighter tool is done through the site’s Google Webmasters Tools account (http://www.google.com/webmasters/). It is located in the GWT, under the Optimization tab. If this is the first time that this tool is being used on this website, you will see the welcome screen. Start by pressing the blue “Start Highlighting” button.
Screenshot: The Data highlighter tool under the Optimization tab, and the “start highlighting” button on the welcome screen
If this is not the first time this tool is being used on this website, then the opening screen would look a bit different, and the “start highlighting” button would be red, and located in a different place.
After pressing the “start highlighting” button, a dialoged box will appear. Enter the URL of the page that contains the data you want to mark. Note that the entire URL must be entered, including the protocol (httporhttps). In the example below, if we would try to continue- an error message will appear, as the protocol was not entered.
After the URL was typed, the appropriate content type must be chosen. These are the types currently available:
- Articles: Textual content, characterized as an article, similar to magazine, newspaper or even blog content. Requires the marking of at least a title and an author. For more information on this type of content, see Data Highlighter: Articles
- Events: content regarding actual events with a time aspect. Requires the marking of at least a name, date and address. This content type is meant for events such as concerts, shows, festivals etc, not any content that has a time aspect (ie a sale of a weekend in a hotel is not considered an event, but a weekend of special sales in a market place is, if the data is organized in the right way. For more information on this type of content, see Data Highlighter: Events
- Local Business: This content type refers to physical places of business, for which you are able to supply (at least) a name, address and phone, like hotels, shops etc. Through this markup you can also supply reviews and opinions for the place of business. For more information on this type of content, see Data Highlighter: Local Businesses
- Restaurant: referrers to restaurants that may supply a physical address only. This is a private case of Business markup, and requires at least information about a name, address and phone. For more information on this type of content, see Data Highlighter: Restaurants
- Movie: marking information about movies- requires at least information about the movie title. For more information on this type of content, see Data Highlighter: Movies
- TV episode: marking information about TV episodes. Requires at least the name of the series, season number and episode number. For more information, see Data Highlighter: TV Episodes
- Software: marking information about software products (but not the software- not the code itself). Requires only information about the product name. For more information on this type of content, see Data Highlighter: Software Applications
- Product: currently under trial due to misuse for promotions. Currently this content type is still available for marking in the data highlighter tool, and as long as it is available it is the most relevant and most attractive markup for website operators, as almost anything can be marked as a product (hotel weekend, bartender course, car insurance)- any data that contains a name and a price. No link is available for Google information sources as this data type is still under consideration.
For the sake of simplicity, we will demonstrate in this guide how to operate the data highlighter tool using the events content type as an example. The tool works in the same manner for all other content types.
The remaining options in the above dialogue box- “tag this page and others like it ”or” tag just this page” allow the user to choose between two work modes: marking just one page (“Tag just this page”) or marking a sample page that would later be used by the tool as the basis for auto-marking of other pages. When the structure in which content is displayed is not unique, it is advisable to choose the first option, and utilize the tools learning and auto marking abilities. The tool does a good job at identifying content, and also requires the user’s confirmation, so there is little risk that inadequate date will be marked.
2.2. Content markup process
After typing the URL, choosing the content type and choosing a single or multiple page markup and clicking OK, the system will load the page we chose into the Highlighter tool. At the top of the page, there is a “progress bar”, showing the user’s current location in the marking process.
On the right hand side of the loaded page, a box appears with all the tags that may be applied for this data type- some are obligatory (marked “required”), while others are optional. For example, in the screenshot below, we can see that the name, date and address tags are shown as required for the “Events” data type.
At this point, the user should highlight the text that should be marked- press and hold the left mouse button, and drag the marker over the segment of text that should be marked. Immediately after highlighting, a menu would pop up, offering to choose the tag we wish to apply for this content. In the example shown on the following screenshot, a random segment of text was chosen and highlighted (marked yellow), the tag choosing menu is shown to the right of the highlighted text, and on the right side of the page the tag box is shown.
After choosing the tag we want to apply to the chosen text, the tag menu will disappear, and in the tag box (right hand side of the page) the highlighted text will appear in the box of the chosen tag. The following screenshot demonstrates how the page looks after choosing which tag to apply. Note the possibility to cancel the chosen text by clicking the X to the right of the tag
This stage should be repeated until all the content that can be marked has been tagged with the appropriate tags. The following table explains the content that can be fed to each tag:
The above table is of course for the type of content we chose as the example in this guide Events. For each content type, an appropriate table can be found in the links attached above (in the description of each content type), or at the Data Highlighter support center:
Going back to our example: the name tag is a specifically problematic issue, as it is not always easy to follow what is acceptable and what isn’t under Google’s guidelines. Here are a few examples for good choices (source- Google Webmasters Tools)
The following are additional general recommendations for tagging content (source-GWT)
Note that the Highlighter tool will not be able to handle content that is already marked by code, but there is no problem to mark other parts of content on the same page. as mentioned above, after feeding all the desired tags, the user should check if there are any error marks in the tag box (orange triangles), and correct whatever errors may have occurred. When all is in order, click on the “Done” button (red button above the tag box) to save the content markup.
2.3. Create Data Set
To understand the next stage, one must first get to know 2 terms:
- Data set- a group of pages that the Data Highlighter tool predicts may have similar data structures as the example provided by the user.
- Publish- making the marked content available for Google. In the next time that Google crawls the content, it will index also the marked tags. At this point, it is important to note that Google does not obligate itself to make any use of the marked tags and /or to display the marked content in any of its products- the content markup is a recommendation only. However, when the marked tags are valid, and the content is acceptable by Google’s regulations, the marked content is usually displayed in search results.
Back to working with the tool: after pressing the “Done” button in the previous stage, our data will be saved, and the tool will begin to create a data set:
If, in the beginning of the process, we chose the option tag just this page, then there is no meaning for the data set-the tool will automatically create a “one page set” and publish it, thus completing the markup process.
However, if in the beginning we chose tag this page and others like it, then at this point the tool will scan the site and try to locate pages that may have similar content structures. After the scan is complete, the user is presented with a dialogue box and an option: choose a set of pages that the tool marked (displayed to the user using thumbnails of the pages), or choose “Costume”, in which we may type a URL structure that the tool should use to locate similar pages.
The Data Highlighter’s auto scan is performed by URL structure, so if a site is built with a clear hierarchy and with organized folders, the tool is likely to perform well. In sites with less organized URL structures, we will have to opt for the “Costume” option, and perform more manual work.
If the site lacks any organized content-URL pattern, then the user will have to give up on the auto markup function and the use of large data sets, and manually mark each page (using a “one page data set” option for each page).
In the dialogue box we will also be given an option to name the dataset.
After choosing between the auto generated data set and the costume option and typing a name, click on the “Create Page Set” button/ The system will save the chosen data set and move to the next stage.
A screenshot of the aforementioned box:
2.4.Checking the auto-tagging and completing information (Tag more examples phase)
After creating a data set in the previous stage, the system will display content that it tagged automatically, based on the example we provided in the first stage. It is of high importance to review all the tags that the system marked, and correct them if necessary: to cancel the incorrect tags and re-tag correct content as explained in previous stages. If a warning sign is displayed (Alert Icon), but the tag is correct, it is possible to click on the warning symbol (Alert Icon) and choose “clear warning”.
When all the tags in the page are correct, click “Next”- a red button above the tag box.
If the page does not contain any content we wish to mark for Google, it is possible to remove the page form the data set by clicking “Remove Page”.
After checking all the pages in the set and all the tags, click “Done” (the done button will appear where the Next button is currently located).
2.5.Reviewing and publishing a multi-page data set
After clicking “Done” in the previous stage, the data is saved and we will move to the “Review and Publish” phase- the last phase in the process.
In this stage, the system will display all the tags marked in the data set. If there are still errors- they must be first addressed, as explained in the previous chapter. It’s possible to click on any tag, and go directly to the relevant page to make changes.
When examining a large data set, it’s possible to utilize some filters offered by the system in order to improve the accuracy of the tool: When in the “Review and Publish” page, click on the right side of the search box. The system will display several filters (see screenshot below). Choose one of the filters, and examine the displayed pages for errors. When done, choose another filter and repeat the process, until there are no more errors. Screenshot of the preview and publish page, showing the filters:
click the “Publish” button (red button above the tag box)
At this point, the content markup becomes available for Google, and the user is moved back to the homepage of the Data Highlighter tool.
Cancelling the publication of a page or data set enables the user to instruct Google to cease using the marked tags, but does not delete or change the data set or the tags marked within it. Thus, an unpublished data set actually functions as a draft. This way, it is possible to take the time and review or change the data set as we see fit, and publish it again when convenient.
To Un-Publish, go to the Data Highlighter homepage, and choose the relevant data set by tucking the box to the left of it. At this point, the “Un-Publish” button will appear at the top of the table- clicking this will unpublished the data set, at which point the button would change to a “Publish” button. Clicking this will publish the data set again.
2.6.2. Save draft
At any point in the process, it is possible to exit the markup process and go back to the data highlighter homepage. Doing this will result in saving the work as a draft. Leaving the process and saving the draft is performed by clicking the “Back To Webmasters Tools” button- a left arrow button (see below). This button may appear in any of 2 locations, depending on the stage of the process the user is in. The following screenshot shows the 2 possible locations (the left button only appear upon hover of the area)
2.6.3. Renew work on a draft or edit a published data set
It is possible to renew work on a draft or edit a published data set at any time, by clicking the draft (or data set) name in the Data Highlighter page
2.6.4. Deleting a data set or a draft
Please note- this action is not reversible!
The way to delete a data set is dependent on the location within the process.
In the data highlighter homepage: choose the data set or draft. At the top of the table, a “delete” button will appear.
In the publish page: clicking on the trash can icon will delete the data set.
If you wish to delete a draft that has not yet reached the publish page- pressing the “cancel” button will delete the draft.
3. Links to additional information:
- A description of the types of data the tool can mark, and details about the tags for each data type (Data Highlighter guide, Google)
- Tips for using the Data Highlighter tool (Data Highlighter guide, Google)
- Solving common problems related to the Data Highlighter tool (Data Highlighter guide, Google)
- General information about content markup without using the Data Highlighter tool (Google)