Google snippet dates — a bag of hurt

I have a lot of painful experience trying to persuade Google to display correct dates for my PainScience.com articles in their search results. It should be easy, but it’s not. It took me years of trial and error to finally get it right.

Even now, with the dust mostly settled, I am still amazed by how poorly Google handles these dates, and a particularly bizarre example inspired this post: Google not only picked a snippet date for an article from an embedded Tweet, but pre-empted their own recommended machine-readable metadata!

Super weird. I’m still shaking my head.

Tweet date hijacks my article’s publication date

The correct date for the article should have been September 6, 2020. Instead, the search snippet was July 10, a date that Google perversely extracted from a spurious and trivial source, way below the fold: the date in an embedded Tweet, of all things. Brilliant.

I have seen many examples of wrong dates that Google cherry-picked from a page, but there was something more sinister going on here: Google not only ignored the standard plain-text publication date at the top of the article (weird enough), but it also ignored “structured data” I had just installed with high hopes.

Google-endorsed “structured data” explicitly declares machine-readable dates of publication and update. It should pre-empt any other date — even a clearly stated date of publication right under the title — but it’s truly wacky that it was passed over for the date in an embedded tweet.

Here’s the definitely-wrong search snippet for the article in question. It should be Sep 6, 2020 — a “should” based on a lot of painful experience.
And this is where Google got that spurious date. Wut?

The importance of snippet dates

Google publishes the date of publication or last-update in search result snippets, and those dates matter.

Even if Google didn’t care about them, people surely do. I find myself increasingly unwilling to waste my time on pages that don’t have clear and suitably recent dates on them. I rarely need to look at an eight-year-old article about tech.

But Google does care about those dates. Page “freshness” is widely thought to be a Google search ranking factor. While the effect of updating a page is unpredictable, I have seen many examples of pages that got a good rank boost after relatively minor updates.

Snippets without dates, or — much worse — really old dates

Years ago I noticed that many of the snippets for PainScience.com were missing or stale — and they exactly matched the dates of old updates listed at the bottom of the page.

Strange. 🤔

I began a tedious process of trial and error trying to clean this up. Two years later, I finally had a reliable method of getting good snippet dates, and a many examples of how to get the wrong ones. All that work boiled down to this simple formula:

I know it seems like it shouldn’t have taken two years to figure that out, but I had to be careful, and the solution was simple but not obvious. There were quite a few stages of disbelief to work there. “Really, Google? You’re using that date?” It took me a long time to even consider eliminating other dates, because I thought that I my real document dates would surely take precedence if I declared them the right way. Ha ha! So cute!

Just something weird about PainScience.com?

In a way. There is something about PainScience.com that undoubtedly made my experience with snippet dates more frustrating: I publish logs of dated updates for my articles, which isn’t exactly standard practice, or even common (even if it should be for YMYL content).

Eventually I stopped using full dates for any of those updates, and replaced them with rough estimates, just “April” or “2014,” so they wouldn’t pre-empt the real last-modified date.

Countless pages around the internet contain dates other than the publication date. Is Google actually systematically mistaking those dates for publication dates? Not every time, but often, yes — I have seen countless examples over the years. But few webmasters ever seem notice, because it probably only affects a few pages on any given site. From a publisher perspective, it’s a rare problem — very few of them are routinely using other dates that can pre-empt the real publication date, and so they rarely notice a problem.

Otherwise, PainScience.com pages are bog standard HTML and prose, and shouldn’t be giving Google any trouble correctly detecting their publication dates.

“Structured data” didn’t clean up this mess

“Structured data” is basically an enhanced form of metadata — like the HTML meta-tags that define the page title and description, but richer and creamier.

Google is “pushing” structured data, insofar as it’s now thoroughly documented, and their (alleged) support for it is greatly expanded. I remember scratching my head about it a few years ago and thinking, “Where’s the beef?” There were so few supported data-types it hardly seemed worth it.

But there are a lot of options now, and it sure seems like Google wants us to use them.

When I saw that they now support structured data for the “article” type — which includes dates of publication and last-update — it seemed like a hallelujah moment. This is exactly what I needed five years ago when I was tearing my hair out of mysteriously wrong snippet dates: machine-readable canonical doc dates would have felt like a cold beer in hell.

Assuming that you could trust the machine to actually read them.

This is exactly what I published that did not work

I was excited to take structured data out for a test drive, so I added a simple sample to five test articles. Why only five? Because I am not crazy. Anything that can have the slightest effect effect on Google search results is like juggling old dynamite. So just five minor articles to start, and here’s what the new bit looked like:

<script async="" src="//www.google-analytics.com/analytics.js">
</script><script type="application/ld+json">{
 "@context": "https://schema.org",
 "@type": "Article",
 "headline": "Chronic Pain and Inequality",
 "datePublished": "2020-06-11T00:00:00-07:00",
 "dateModified": "2020-09-06T00:00:00-07:00"
 }</script>

Ugly? Yes. But … specific and completely machine readable (and valid, I checked). This should entirely eliminate the need for Google to “figure out” the page date. If it’s just declared like this, that should be the end of it. Just use that date, Google! But Google did not.

An exasperating-but-helpful coincidence

Out more than 200 articles, by coincidence I picked the only one with this embedded tweet problem. (I don’t embed a lot of Tweets.)

After publishing the new structured data, I asked Google to reindex those pages, and then I monitored their search snippets. In particular, I was curious about the one that had just been updated, waiting for the snippet to reflect its shiny new modification date, Sep 6 — something I have done with many hundreds of other articles over the last few years. I noticed the incorrect date on the snippet but assumed it was just the legitimate previous modification date, destined to be replaced with Sep 6 — having been read from the structured data! Or, even failing that new-fangledness, surely it would be taken from the date under the title.

When it didn’t update, I got suspicious and discovered — to my horror — that the snippet was using that damned Tweet date, and ignoring my shiny new structured data. And if I hadn’t tested one of the only pages on the whole site that just happened to have an embedded tweet, it would have taken me a lot more testing to discover that the date in the structured data was not being respected.

Why u no like structured data, Google?!

The mind boggles. As cynical as I have been about this, even now I cannot quite believe that Google is ignoring that structured data.

It’s extremely disappointing. Even when Google promotes and thoroughly documents a method of declaring machine-readable page dates… still not good enough?! This is not only disappointing for document dates, but it makes me seriously question the value of using any structured data at all.

People justifiably give Google a lot of credit for changing the world with their technology. But I often accuse Google search of being a bit “janky” despite its obvious miraculousness, and people often ask me what I mean by that.

This is what I mean by that.