Increasingly, search engines like Google and Yahoo are supporting a kind of HTML markup schema called microformats. Microformats are a kind of an embedded document inside a standard web page (which can be thought of as a sort of macroformat, if you will).

There are two flavors of mircoformats: one for HTML (which introduces no new elements or attributes) and one for XHTML (which adds a few new attributes, but no new tags). The proposed RDFa standard can be thought of as the flavor of microformats for XHTML, since it is based on the RDF XML format. The reality is the there are two groups of people (RDFa, microformats) working in the same space, so there is overlap. However for now, the HTML vs. XHTML partitioning works well to categorize these two embedded document efforts.

There are a few things you can do with these embedded syntaxes. If you regularly review businesses or products (like yelp.com does), you might consider using the hReview mircoformat to identitfy parts of your review, as shown in the following snippet:

<div class="hreview">
   <span class="item">
      <span class="fn">Taskboy Feed Bag</span>
   </span>
   Reviewed by <span class="reviewer">Joe Johnston</span>
   on 
   <span class="dtreviewed">
      <span clsas="value-title" title="2010-02-07"/>Feb 7, 2010
   </span>
   <span class="summary">Terrific news source for free</span>
   <span class="description">The Feed Bag RSS aggregator on Taskboy 
has replace Google news as one stop shop to get the news of what's 
going on in tech and politics now.</span>
    <span class="rating">4.5</span>
</div>

As you can see, that’s a lot of semantic markup for so little visible text. Google can pull this code apart and display it under the link for the product or service discussed.

In addition to reviews, Google and Yahoo understand at least four other species of embedded documents: people/businesses, products, events, and video. Because there are two standards bodies at work, there are microformats and RDFa specs for each. The following table summarizes this with links to give examples of these embedded documents.

To test your embedded document, try Google’s microformat tool.

I have a few concerns about microformats. The first is that it requires a lot of additional markup. I understand that a blogging system can create a form to collect these discrete features, but it still seems like a lot of work for casual use. The second concern I have is that microformats reuse the class attribute that is normally used for CSS. This creates a whole bunch of reserved words to avoid for class names in your site’s CSS. Perhaps its not that big a deal, but I do not like namespace conflicts. I prefer the RDFa spec, which simply introduces new attributes (typeod, property, etc.) specific to its purpose. That seems a lot cleaner to me. However, the various RDFa formats are not as well documented as their microformat counterparts. As in all things, “good enough” often trumps “clean design.”

There is no doubt that embedded documents are a bit of a moving target. I don’t expect the formats for things already defined to change, but more objects will be described by new specifications.