Microsoft’s Scott Prevost Interviewed by Eric Enge

Comments Off on Microsoft’s Scott Prevost Interviewed by Eric Enge

Published: May 24, 2009

For over a decade, Dr. Scott Prevost has worked to bring natural language processing technology to the marketplace. As a graduate student at University of Pennsylvania, he developed theoretical models of prosody for synthetic speech, as well as technology to generate dialogue for autonomous agents. In post-doctoral research at the MIT Media Lab and FX Palo Alto Lab, he integrated gestures, facial expressions, and other interactional cues into his research, creating lifelike 3D characters with speech recognition, dialogue processing, and vision capabilities.

Dr. Prevost co-founded and served as CEO of Headpedal, Inc., a software company that specialized in creating virtual character interfaces for customer-facing applications on the web. Dr. Prevost also previously served as CEO of Animated Speech Corporation, which produces interactive, animated tutors for speech and language development. Dr. Prevost was General Manager and Director of Product at Powerset, where he was focused on developing the user experience for natural language search. Powerset was acquired by the Microsoft Live Search division in August 2008, where Dr. Prevost currently holds the position of Principal Development Manager.

Interview Transcript

Eric Enge: Can you provide a quick overview about yourself and Powerset?

Scott Prevost: I have been working on natural language systems with the goal of helping information retrieval in particular for quite a while now. Powerset was founded with the notion that we can improve search results by having a much better understanding of the meaning of the documents and of what people intend with their search queries.

The way that we do this is to apply very deep natural language processing technologies to the documents as we are creating an index. And we also apply that to the queries at runtime so we can do a better job of actually matching meaning to meaning as opposed to just finding the keywords.

Powerset was founded in 2006 and we launched our product in May of 2008, which was initially a Wikipedia search engine. Then we were acquired by Microsoft in the summer and closed the deal on August 1, 2008.

Eric Enge: Can you talk a little bit more about the goal of better understanding a searcher’s intent and the mechanics that you use after doing that?

Scott Prevost: One of the key points that I want to make is that Powerset is not just about understanding intent in queries. That’s part of the equation for getting better search results, but once you have that, you also have to have a much better understanding of what’s in the documents as well. So, it’s not enough to know that a user is looking for a certain kind of search result, you also need to be able to match that to what’s actually in the document.

So, what we propose to do is very different from what most other search engine startups do. Most search engine startups are trying to take the existing keyword search model and add some bells and whistles to it or put a new front-end on it. What we did is completely reinvent how the index is built by applying technology that we licensed from PARC, which allows us to do very deep linguistic processing.

We essentially look at a document, break it into sentences and then we analyze each sentence using a very robust linguistic parser. We extract semantic representations out of that, and it actually has semantic representations that we store in our index.

We do a similar processing on queries at runtime, and then we look to match these semantic properties, the keyword properties and other document properties. What this means is that we can find sentences that may have the right meaning, but use slightly different words. If you type in “When did earthquakes hit Tokyo” in, you will see answers that use words like strike instead of hit. Then you will see that we are actually able to highlight dates in the captions for those answers because we’ve done the linguistic analysis on the sentences, not merely matching keywords.

Eric Enge: So how is this different from Latent Semantic Analysis or Latent Semantic Indexing?

Scott Prevost: We are actually doing the semantic processing upfront, and we are doing all the hard work on the backend, so that’s one big difference from all the other approaches that we’ve seen out there.

Eric Enge: So you are doing some preprocessing?

Scott Prevost: Yes. We are processing the documents as we index them. We are also trying to do some analysis at query time, because natural language technology is still quite expensive in terms of the compute power that’s needed. So, the degree to which we can compile all that out in the index means we can produce a runtime that’s on power with a keyword search runtime in terms of latency properties.

Eric Enge: So, when we talk about the problems with traditional search engines, one of the things that I saw you focus on was the fact that they required users to speak their language?

Scott Prevost: That’s right, yes. Generally, we’ve all gone through the process of trying to find that document where we try to figure out what the right collection of words that will pull this document up is. That means that you have to start thinking like the author of the document, imagining how the thing that you are looking for might have been expressed.

We generally try our query a few times before we find what we are looking for. By adding the semantic analysis, we are allowing people to be a little more natural in the way they express themselves. You don’t necessarily have to worry about the specific keyword, because we are likely to find a synonym.

You also don’t have to worry about excluding stop words or which words are going to be matched with which words in the matching algorithm. We just want people to be able to write a natural phrase or even a question, and then let the search engine do the hard part; figuring out what the appropriate matches are.

Eric Enge: Right. In existing search engines it can be a disadvantage to have extra words that aren’t actually necessary to the query. This is a result of using a more basic method for matching up the words in query with words on page.

Scott Prevost: That’s right, yes. And of course it creates some interesting issues for us, because now we are trying to change user’s behavior a little bit. They have grown very accustomed to thinking of a search engine as words and documents that include these words. So now that we are messing with that interaction model, our hope is that people’s behavior will gradually change as they start to realize the power of the system that we are introducing.

One thing that we have been very careful with at Powerset is trying to maintain the old model as much as possible. So, if you just type keywords into Powerset, you will still get results that are just as good as those from Google, Live Search or Yahoo.

Eric Enge: So you have talked a little bit about stop words, can you expand upon that a little bit? Define what they are, how they are treated by regular search engines and why making use of them in Powerset is important?

Scott Prevost: Stop words are words that the search engine just disregards; prepositions or words like “what” and “where.” It’s a very salient limitation to implementation. Basically the idea is that if you try to match documents on those words, they tend to be less important in the query because they would match so many documents. But in reality they are the linguistic glue in the query and in language. They start to tell you how the other important words in the query link together, and that allows us to look for those links in the document when we are matching a query by processing them linguistically. Let’s go back to the earthquake example. I am not specifically searching for the word “did,” but that word is still part of the verb complex in that query. So the parser knows that “did” and “hit” go together. Basically, we are not matching for that specific word, but we are matching verbs together that semantically match. So instead of “did hit,” we can use the word “strike.”

Eric Enge: Right. So for example, you could accidentally get something like “did not hit?”

Scott Prevost: Yes. We are not currently processing negation in parser on a real detailed level because it is such a tricky problem. It would actually match queries that get the negation incorrect, but that is generally useful information for the user anyway because it is relevant to their query even if it isn’t an exact answer.

Eric Enge: Right. So, that’s an example of something that you would be working on in the future?

Scott Prevost: Oh, absolutely. That and things like sentiment analysis are all things that we will be working on in the future. For sentiment analysis, say you want to know what positive things a particular politician said about a particular topic. You would get a different set of results then if you just asked what they said about the particular topic.

Right now we are basically working on sentence level linguistic matching along with other broader document properties like keywords, anchor text and using all of these things to rank our results. But as the technology improves, we’ll start to look at many more of these kinds of discourse level properties so we can really understand what the most important sentences in the document are and how they relate to each other. And as we can learn from these kinds of approaches, I think we’ll see the relevance of search results improving with time.

Eric Enge: Right. For example, if someone types in “The Office,” they probably don’t just want to search the phrase “Office.” They probably mean the TV show.

Scott Prevost: Yes. And in fact if you type that into Powerset, you will get a result that’s tabbed at the very top, for The Office television show. There is also a tab for the UK television series by that name, one for the band and one for Microsoft Office. So that’s a pretty ambiguous query, but chances are you probably meant the television show by phrasing it that way. That’s the one that comes up first.

Eric Enge: Right. So let’s get back to Latent Semantic Analysis. One of the things that you do is look at the entire set of documents, and determine relationships between words by proximity and frequency. This way you might discover that doctor and physician probably mean the same thing, or at least almost the same thing. What I am getting at here is the analysis of the corpus of documents to extract relationships.

Scott Prevost: We are not using what you are thinking about as Latent Semantic Analysis. We are actually using more of a symbolic approach to the linguistic processing. That’s the first phase of what we are doing. We look at a document and break it into sentences, and then we actually parse the sentences using technology that we’ve licensed from PARC. What this does is it allows us to create fairly complex semantic representations of the meaning of those sentences.

And it also allows us to represent ambiguity in those interpretations as well. This way we can index he most likely reading of that sentence, and the other possible readings as well. What happens then is that these things become semantic features that get thrown into the mix with keyword and other document property features that are used by our retrieval system and ranking system.

We are not retrieving results just based on meaning matches and partial meaning matches. It throws that into the mix, and that retrieval and ranking system is a machine-learning based algorithm. In that sense we are starting to use statistical approaches, but we start with a very symbolic representation of the meaning in the document. Then that is used by a machine learning algorithm to retrieve and rank the documents.

We are not pulling the relationships based on things like frequency, we are actually uncovering the linguistic and semantic relationships through symbolic approaches.

We actually do have other projects going on within the company that are looking at more statistical approaches to these problems. But I would currently characterize that system as a hybrid.

Eric Enge: What exactly does it mean to say that it’s a symbolic approach?

Scott Prevost: It means that it’s rule-based semantic processing as opposed to just uncovering things from machine developed approaches. For example, if we have a rule in our system that says if you kill something it dies.

Eric Enge: What are some examples of search queries that highlight the power of this approach?

Scott Prevost: Let’s start with something like Siddhartha. The first thing you will see is the summary of Wikipedia pages that are relevant and that you can tab through. You probably were looking for Siddhartha, the founder of Buddhism, when you typed it in, but there is also a film, a novel and an American rock band by that name as well. You can just click on the tabs to see those different snippets.

In the section below that, you will see something called facts from Wikipedia, and these are some of the semantic relations that we have automatically extracted using these linguistic techniques. In the second line you will see “Siddhartha renounced the world,” and if you click on world, you will see sentences from which we extracted that fact. We extracted that from three different sentences on three different Wikipedia pages, and you will see that it’s not the case that we are using proximity in the second one.

Siddhartha is actually pretty far away from the word renounced, but linguistically they are tightly tied together. It’s just that there is another phrase intervening. So this starts to show you how we are taking data that’s in Wikipedia and starting to structure it. If you click the More link at the bottom of that section, you’ll see that there are a bunch of other relationships that we’ve pulled from.

Eric Enge: They are just a little less tightly matched.

Scott Prevost: Exactly. Now you can also get to this structured information pretty directly. So, if you type in “What did Siddhartha attain,” you will see Enlightenment and Nirvana. So, in a sense, these subject-relation-object semantic triples are great for answering questions.

So, try something like “What was banned by the FDA.” Now, if you are at the right part of the screen, you will see More. If you click that you will see up the longer list. And if you say click on something like “cyclamate” you will see the sentences from which we extracted that fact.

We are basically allowing a whole new type of interaction. I type a simple subject-relation-object question, and now I get a list of answers that are supported by the text that we’ve uncovered through this linguistic analysis. And you’ll also note that we can start to make distinctions between a query, like “who defeated Hulk Hogan,” and “who did Hulk Hogan defeat?”

If you search “who defeated Hulk Hogan,” and you click on More you will see the whole list. And if you do the other query, “who did Hulk Hogan defeat,” you will see that the lists are different because we are actually looking for these things in the correct relationship to each other in the text. We are not just looking for the keywords “Hulk,” “Hogan,” and “defeat.”

That’s an example of a pair of queries that would be very hard for a typical search engine to distinguish between, because the key phrases are the same and the word order is what defines the difference. So let’s pick a query for the regular search results. Let’s type in “how many nuclear reactors does Japan have?” Now, here is a query with a lot of stop words, right? But it’s a query where I think it is pretty easy to tell what the user is looking for. In the very first caption we can see that Japan has 55 reactors.

We are basically interpreting the fact that you typed in “how many” as the fact that you are looking for the particular number of nuclear reactors. This is just something that you don’t get when you use Google, Yahoo or Live Search, or any of the keyword search engines.

Let’s try “Who mocked Sarah Palin?” Now obviously, the other search engines do a pretty good job of finding relevant results for this. But what I want to show you are some of the captions in the blue link results. So we get things about impersonating Palin and parodies of Palin. It’s not that we are necessarily just looking for the specific words Mock Sarah Palin, but we find synonyms that are semantically related to and can highlight those right in the answers.

The hope here is that we can help users better understand when one of these blue link results is actually truly relevant to them, and we can save the clickthroughs when they are not. Another thing that we can talk about is pulling data, or pulling search results from structured data. So, if you type “GM board of directors,” we actually connect with Freebase in order to produce this result at the top.

Eric Enge: Along with the pictures of each of the members.

Scott Prevost: Right. If you type in “what movies did Heath Ledger star in,” you will get the same results as if you typed in “films with Heath Ledger,” because we are actually doing semantic analysis and you are essentially looking for the same thing whether you type in the first phrase or the second.

Eric Enge: The list of movies shown didn’t change at all. There were just some subtle changes to the results below that. Those are interesting examples. Currently you are operating this on Wikipedia?

Scott Prevost: That’s right.

Eric Enge: What was the reason why you chose Wikipedia in particular?

Scott Prevost: Well, there are few reasons. First of all, as we were developing the technology, Wikipedia was a great test bed because it covers just about every topic that there is to cover. We wanted to make it very clear that our technology was about linguistic processing, and that we didn’t have to be within a specific, very narrow semantic domain for the technology to work. Some other natural language approaches have taken that very narrow approach, and that’s not what we’ve done. So the fact that Wikipedia is so broad was very appealing to us.

The second reason is that Wikipedia is well written, so it parses pretty nicely. Although, our technology is designed so that when we can’t parse something, we still index it as keywords. It has to be graceful degradation into the keyword world.

The final reason is that Wikipedia is prevalent in so many search results these days. It’s almost hard to find a search query that doesn’t have a Wikipedia result in the top ten. So we know it has a very valuable set of documents to index. When it came time to define a product to launch, we had some resource constraints. It takes a lot of hardware to spin an index that has as much information as the Powerset index.

So we had to find a smaller set of documents, and then it becomes a challenge to find a small set of documents that hangs together for the user in a meaningful way. So we decided initially to restrict ourselves to Wikipedia alone, rather than having Wikipedia and a few other smaller document sets that might not fit in.

But now we are currently expanding the index. We’ve been continually playing around with other kinds of documents. The technology is not particularly wedded to anything that’s specific to Wikipedia, but it’s such a valuable set of documents on the web that so many people use.

Eric Enge: So, if we think about this as runtime, if someone enters a query is there reason to believe that Powerset is more or less compute-intensive than regular search?

Scott Prevost: It’s marginally more compute-intensive at runtime, but the reason that it is only marginally more compute-intensive at runtime is because we do the real compute-intensive things at index time.

Eric Enge: I assume that at that time it’s probably significantly more compute-intensive.

Scott Prevost: Actually the only thing that’s more intensive at runtime is the fact that we are parsing the query. Once we’ve parsed the query, then the actual retrieval it is very similar to keyword retrieval, except we are retrieving on semantic features as well as keyword features. But it’s very similar apparatus.

Eric Enge: Right. But you probably have a higher level of investment to build the index, because, you are doing all that preprocessing?

Scott Prevost: That’s right. We are doing very deep processing on the documents as opposed to just pulling out the words.

Eric Enge: Is there any insight you can give us at to how much more difficult it is.

Scott Prevost: It depends on the degree to which we do it. It’s a very granular system and we can adjust a lot of knobs. It can be a anywhere from ten to one hundred times more expensive. I am sure we could make it a thousand times more expensive if we thought we would get the benefit from it. Our goal initially has been to improve relevance while disregarding cost in some sense. But obviously, we are pragmatic when push comes to shove. The goal was to find out which of these features are most important for improving relevance. Then as we learn more, we can simplify and skip some of the computation that’s not giving us as much bang for the buck.

Eric Enge: Are there any components of Powerset that are integrated in the Live Search at this point?

Scott Prevost: We’ve integrated a few things. We’ve integrated some of our direct answers using Freebase, some improved captions and snippets under the blue links for Wikipedia. And we’ve also done some things with related searches. And of course we are working on a much more robust integration plan, although I don’t have any plans to announce anything today. But some exciting stuff will be coming down the pipe for sure.

Eric Enge: Any closing comments?

Scott Prevost: We are excited at Powerset to be having the opportunity to take this technology to scale and to integrate it in a product like Live Search. We are really thrilled because it allows us to see our dream actually come to fruition. And I think that we have just a lot of exciting stuff coming down the road.

Eric Enge: Thanks Scott!

Scott Prevost: Yes, thank you Eric!

Have comments or want to discuss? You can comment on the Scott Prevost interview here.

Other Recent Interviews

About the Author

Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns.

Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at:

For more information on Web Marketing Services, contact us at:

Stone Temple Consulting
(508) 485-7751 (phone)
(603) 676-0378 (fax)

May 27th 2009 News

Orion House Bed and Breakfast

Comments Off on Orion House Bed and Breakfast

Visit the Orion House Bed and Breakfast site Compared to many other bed and breakfast (B&B) websites that we’ve seen, the Orion House Bed and Breakfast site is outstanding. We’re convinced that evveryone’s online search for B&B accomodation would be far easier and pleasanter if all such sites were of this standard.

May 24th 2009 Uncategorized

YouTube’s Product Management Team Interviewed by Eric Enge

Comments Off on YouTube’s Product Management Team Interviewed by Eric Enge

Published: May 17, 2009

Our first interviewee, Tracy Chan is a Product Manager at YouTube. Prior to working at YouTube he was a Financial Analyst at Google. He has also worked as an Associate at Stockamp & Associates and a Corporate Strategy Intern at eBay. He got his degree at the University of California, San Diego.

Our second interviewee, Matthew Liu is the lead product manager on YouTube Sponsored Videos. In this role, he focuses on building an advertising platform that allows video creators — from the everyday user to a Fortune 500 advertiser — to reach people who are interested in their content, products, or services, with relevant videos. Previously, Matthew led numerous other projects at YouTube for advertising, content partnerships and rights management, and community.

Matthew has a MS in Management Science & Engineering and a BS in Electrical Engineering from Stanford University.

Interview Transcript

Eric Enge: Can you provide an overview of what Insight is, and why you created it?

Matthew Liu: There are millions of people watching hundreds and millions of videos every single day on YouTube. We started to hear from advertisers, content providers and everyday users that they wanted to understand YouTube’s audience,. They were asking questions such as: “How do we really standout, how do we understand our ecosystem, and how do we know how the videos are performing”? They basically wanted to learn more about their audience in order to make better content.

As YouTube was growing, it started turning into the world’s largest focus group. So basically what we did was build up a pretty powerful analytics tool that helps content providers, advertisers and users better understand their performance on YouTube.

This is a tool that’s free to anyone who has ever uploaded a video. When I uploaded my first video, I got a hundred views in the first two days. And that was actually surprising to me, because it was a little animation video that was actually not that interesting. I was wondering if my mom was just watching it over and over or if other people around the world were watching my video.

With this in mind, we built up a product almost a year ago; which we launched on March 26 of last year. We started with basic functionality that could give you information on my views over a certain period of time, maybe a month. On a personal note, it helped me figure out that my video was watched 50 times by my mom in California, but it also got a lot of views in Spain and the UK. So it was really interesting, because you could finally see where your audience was coming from and what the lifecycle of your video looked like.

On top of that, we built a feature called Popularity, which analyzes how your video’s performance compares to other videos. You can see how well your video performed on any given day relative to all the videos within YouTube or within specific geographic regions. And what is really interesting is that what we found our businesses were starting to use this basic data in very interesting ways.

The obvious value was in understanding the lifecycle of your video, and on what days of the week it was most popular. This can help content owners really start to own their program strategies on YouTube. If they get most of their views within the first three weeks, for example, serial content providers could start uploading their videos every three weeks. Then they could maximize the number of views that they get on YouTube.

Another interesting phenomenon we observed was that bands would put up concert footage or their new video clip, and they would have interesting pockets of audiences in different areas across the US. They’d actually start planning the touring schedules around them, because nothing is worse than scheduling a concert and having no one show up. But by having their content on YouTube, they could understand where the views were coming from and they could better plan their concert strategies.

Another really interesting use of the tool involves measuring ad effectiveness. This could save you money on promotional dollars within the YouTube ecosystem. You could really start to see the effect of specific advertising campaigns that you ran them and if you got the views and spikes you expected. Here is an interesting example: if you ran a homepage ad on YouTube, you would expect that the video that you ran the ad on would get a spike in views.

But, what we also saw was that all the other videos within that uploader’s channel got spikes and views even if they put just one video on the homepage. So, you could really start to see the halo effects of advertising. Interestingly enough, you could also see the effectiveness of the different offline promotions that you were doing.

If you had a movie screening in Michigan, for example, you could see if that made people in Michigan start looking for your YouTube content, and then the halo effects of the surrounding states that potentially heard of it as well. So a lot of really interesting stuff is coming off of the first features we created for YouTube Insight, showing basic views trended over time and space.

A couple of weeks after we launched Insight, we added a discovery feature that allows publishers to understand how people get to their video. They can see if they found it through a search on YouTube or Google, or if it was an external link that they found somewhere on the web.

It may be an embedded video across the web or a part of the YouTube site that drove traffic back to your video. Now this is actually pretty obvious, and again there is an opportunity to devise optimization strategies around how people find your content. For example, if there were blogs that embedded your video, you could reach out to them and form business relationships.

One of the interesting stories that we heard involved the band Weezer. Weezer debuted one of their videos off of their latest album on YouTube, and what they found is they got almost 2,000,000 views within the first couple of days, which is a fantastic performance. When they looked in Insight, they found that a lot of those views were actually driven by tech blogs such as Valleywag and TechCrunch, which was a big surprise to them.

So what they did with this information was actually more interesting than the information itself. The single preceded the album release, so when they were promoting the album release and their tour, they actually spent a lot of their media money on tech blogs since they knew they were already established there.

Eric Enge: So they reached out directly to the tech blog, because clearly the tech blog had an interest in them at that point as well.

Matthew Liu: You can imagine all the types of relationships that you could form from that. Not only do we show you the sources of traffic, we allow you to drill down more specifically. So, for example, you can actually see the search terms that led people to your videos. We have a great promotional product called Promoted Videos which is basically Adwords for YouTube, that allows you to advertise against specific keywords. So you can have your search and your video results show up with organic search results on the site.

Again, Insight has proven to be a very powerful product, because now you can know which search terms are really effective and which terms were less effective. And the combination of the two really helps people start to find the audience that was looking for their content, whether they be advertisers or content providers.

Eric Enge: Right. So if you are a commercial entity that produced a neat video that you put on YouTube, you may want to buy advertising just to create visibility for your video. Then you could use the analytics functionality to see how that campaign performed.

Matthew Liu: Absolutely. Another really interesting thing about YouTube is that a lot of people just come to the site to be entertained. So for example, we get a lot of crazy, funny videos. You may find that the term “funny video” actually drives a lot of video views to a video such as Tea Partay. Because you now have access to this information, you can understand those general search terms that you may not have thought about before and really start to optimize. Insight is very real time. You can optimize in the middle of your campaigns, and it will really start to tell you what your strategies should be.

The next feature that we launched was the Demographics function, which basically shows you the makeup of your audience in terms of a sex breakdown and an age breakdown. This is pretty important to both advertisers and content providers, because they need to see if they are reaching their target demographic.

One of the things that we’ve realized about YouTube is that since it has such a massive audience, you can find any niche audience you want. An example we had was of a PBS producer who produced a show. He wanted to put the pilot up on YouTube, but the management at PBS wasn’t really sure that YouTube was the right place, because they thought YouTube was geared towards a younger audience.

What they did was put the pilot up on YouTube and let it run for two weeks, and they found that actually 75% of their audience was over 35, which was their target demographic. So it really proves that there is an audience on YouTube for any type of content. We also found that people are starting to use the demographic information provided by YouTube Insight to close deals.

One of the most popular comedians on YouTube is a guy named Paul Telner. And he used the demographic information in Insight to show that he appealed to the right target audience and sign a deal with MuchMusic, which is Canada’s #1 cable music network. Another example is Chris Bosh, who is NBA All-Star for the Toronto Raptors and also a member of the US Olympic team. Sharing information on his YouTube demographic helped him get a sponsorship deal with AOL Sports.

Eric Enge: You could view it from the opposite point of view, which is if you are a content provider who needs to decide who you want to target as a potential advertiser.

Matthew Liu: Absolutely. And we think people experiment with their content too. They put up multiple creatives to see what demographics these different creatives resonate with. It’s using that focus group in a very, very controlled way, but it’s very quick and free as well. And you have access to such a wide audience, so you can really see how things resonate within different groups.

The most recent feature that we’ve launched is Hot Spots. All the previous features focused around using aggregate geographic data, but Hot Spots starts to dig deep into specific views. It shows how your audience related to the video during playback.

What you basically see is a graph alongside your video, so you can actually play the video and see how your audience is responding second-by-second. If people dropped off, your graph would go downwards. If people rewound, you’d see spikes in attention. We also give you an overall attention score so you can understand how your video is performing relative to others.

We show this attention score and your Hot Spots graph relative to videos of a similar length. This is important because in a vacuum people drop off more and more as videos continue past certain lengths. It’s an aggregate.

Eric Enge: Can you talk about exportable reports?

Matthew Liu: Basically we heard from our content providers, our power users, and our advertisers, loud and clear, that they want broad access to the data. So we have launched exportable reports. The premise behind it is that we want to give these power users the data how they want it, when they want and where they want it.

Exportable reports provides a lot more flexibility on top of the tools we already give you today. There are groupings of these videos that publishers want to look at that they are never going to tell YouTube. So, for example, if you had one marketing department focusing on a set of videos and another one focused on a different set of videos, there was no way to arbitrarily group those up, because YouTube had no way of knowing which individual works on each set of videos.

Now they can download analytics for the specific videos and then make those comparisons. Until today, we gave you discovery sources by geography and by time, but in order to see things over time you had to select different date ranges. So if you wanted to see the number of times a specific keyword was searched on a daily basis, you could do it in Insight prior to the release of this new feature, but it previously required some manual work, because you had to switch filters and things like that.

With exportable reports you can target the specific types of data that you want.

If it is a keyword term, you can select that, filter it in the list and then try and put it on a timeline. Or if you wanted to look at views from certain keywords versus having your video embedded on a certain blog, you can compare those sources side-by-side. There are a lot of interesting things that we have heard content providers and advertisers want to do with this data, such as plugging it into their own systems and comparing their advertising campaigns on YouTube versus those on the radio.

Now they are able to have that flexibility, and if they want to plug into a wider ecosystem, exports can take them a long way in getting there.

Eric Enge: Is the export a manual process?

Matthew Liu: Yes. It is a link and we provide it on a per-video and a per-channel basis. We are going to make improvements in terms of including more types of data and making it easier to access it, but we actually launched this feature very quickly from its conception. It was a 3-week cycle, so our goal was to launch it very fast, get users access to the features that we were promoting and then make improvements as we get feedback.

Eric Enge: Can you export any of the data in Insight or just specific things?

Matthew Liu: Right now, we basically have two reports. The first report gives you views, uniques, popularity information and engagement information. You can see comments, ratings and favorites on a daily basis by country and by video. And then the second report is referral data, so views by referral source are broken down by all the granularity that we have on a daily and country basis.

Eric Enge: That is some good stuff for people to pull out. They can combine it with their other analytics data as well.

Matthew Liu: Absolutely. We think that would be a great use of the exported data. We have heard some advertising agencies have their own internal reporting tools, and anytime that there is a reporting system that can plug-in, it makes them more efficient in terms of optimizing campaigns.

Eric Enge: Right, You can just export the CSV file out and then run their other tools.

Matthew Liu: Yes, absolutely. We are excited about this new feature, and we have received pretty good press from the blogosphere and from comments back on the YouTube blog where we can see people are finding it useful.

Eric Enge: Any comments you can make on plans to enhance the analytics further?

Matthew Liu: Insight had just two features when we launched a year ago, and now we have about six full-featured modules. So we are evolving very, very quickly. I can’t speak specifically to features that we are going to be building up, but you can imagine there is a lot we can do with all the data that YouTube has. We display a lot of data such as engagement within the sites and how people are commenting on and rating the videos. You can imagine that expanding over a number of dimensions.

Eric Enge: And now YouTube has become the # 2 search engine on the web, so that really adds to the value of this data.

Matthew Liu: We are looking forward to helping people use the tool, because quite frankly we’ve been surprised about all the different use cases. Optimizing for search is a great way that people can enhance their experience on YouTube.

Eric Enge: Thanks for joining us today Tracy!

Matthew Liu: Thanks for having me!

Eric Enge: Hi Matt! Can you give us an overview of your role with YouTube?

Matthew Liu: Hi Eric, my name is Matthew Liu. I am a Product Manager, working alongside Tracy and others for YouTube advertising platforms. I am working on one of our newest launches, which happened at the end of last year, Promoted Videos. We think of it as the equivalent to Adwords on YouTube, as it is a paid Video Search product.

Eric Enge: From an optimization point of view, the first thing you have to do is produce content that is interesting to people who end up discovering it on YouTube, which sort of goes without saying.

Matthew Liu: Yes, absolutely. I think we’ve always had the philosophy at YouTube, whether it’s talking to our users, content partners or advertisers, that whatever it is that you want to share should be good content. So when we speak to advertisers we ask them to try to make their advertisements videos that people would want to watch any ways.

By using our advertising products advertisers are able to help put a little bit of gasoline on the fire and allow it to spread more quickly and potentially become viral. Similarly, our content partners and everyday users trying to get viewership should really think about what the community is looking for in general at a specific moment. And they should really try to personalize their video for the YouTube community as opposed to simply just taking content that might otherwise have ran on television or some other medium.

Eric Enge: So I think there are a couple of key non-SEO things that people typically talk about. For example, advertising and allowing people to share your videos is a good thing to do. Also, making sure that the content in some way reinforces the brand rather than just being entertainment without purpose, so to speak. Allowing ratings and well selected thumbnails are also good promotional strategies as well, right?

Matthew Liu: Yes, absolutely. You touched on a couple of those things, such as ratings, comments and also on embedding. One of the larger paradigms is that a lot of people put content on YouTube and they allow themselves to engage in conversation with the community. Sometimes we see our larger content partners or advertisers shy away from that, because they are afraid of what comments and what ratings they are going to get.

Accepting comments and ratings may feel a bit more risky, but it definitely offers you very valuable instant feedback. So if we are able to get a couple thousand views and see what the ratings are and what people’s comments are, it empowers you to make changes. And if you are getting positive feedback, not only is your video getting out there, but you are spurring positive conversation as well. So that’s definitely one thing we recommend.

Eric Enge: I guess it gets back to the old social media lesson, the conversation is going to take place with or without you.

Matthew Liu: Yes, that’s a perfect statement.

Eric Enge: The choice becomes very obvious once you think about it that way. So do you have any interesting case study examples of someone who used advertising as a way to really launch a successful video?

Matthew Liu: Yes. The first example involves OfficeMax, which is a large retail supplier of various office products. It is a traditional brand advertiser, with its own TV commercials in most cases, but they knew they wanted to do something a little bit edgier, with a potential to go viral.

They commissioned The Escape Pod to be their agency, because they wanted to do something much more creative. So they came up with interesting series of videos, the Penny Pranks videos, for their Back to School campaign. These involved a funny looking guy who would go to various places in New York City and try to pay for everything with pennies, and everyone would be outraged. He would try to buy a car with 200,000 pennies, or something similar to that.

They decided to use advertising to drive those initial views. They wanted to accelerate that and also as a byproduct increase the discoverability on organic search and on YouTube. So they worked with us using Promoted Videos and some other paid mediums.

What they found was they were able to get fairly efficient views, so they were very pleased with the price. They were able to get a ton of clicks, which drove a lot of traffic to their videos. And as a result, they started that viral loop. So over time, we saw that for many search query terms. On the organic side, for some of their target queries, their videos became the top search result.

OfficeMax actually was able to become so embraced by the community that our search engine deemed them to be the most relevant for that time period. And they also saw additional uplift on their other videos; not just the videos that they promoted from users watching and clicking on more from OfficeMax, but more views on the related videos as well.

They were very pleased, because they had a very successful campaign that they were able to conduct in a very efficient way. That’s one major example where you can think of brand advertisers trying to efficiently drive traffic to their online videos, engage in positive conversation and even potentially engage in that viral spreading of video.

The second example that we can talk about regards a producer of consumer gadgets and products. During the launch of Promoted Videos they participated with us in producing a couple of videos that highlighted their iPhone 3G cases. The company is Zagg, and their product is called the Invisible Shield. It’s an invisible, scratch-resistant film that goes on the iPhone. You could take a key or a knife to it and it will prevent your iPhone from being scratched.

So in the video they show two iPhones side-by-side, one with the cover and one without it, and they show the different results. When promoted against terms such as iPhone and iPod, it was not only able to drive traffic to that video, but ZAGG was able to convert the traffic into sales.

The amazing thing about it is that they were actually able to drive conversions at a cheaper value than they would have been able to do on Google and other competing search engines. One of the hypotheses we have is that for certain types of products where the user may not be as aware as to exactly what it is, being able to see it is far more compelling than just three lines of text.

Eric Enge: So what about the power of send to a friend, and other options for sharing?

Matthew Liu: There are a bunch of different sharing options, from sending to a friend, to embedding that video, to sharing on Facebook or MySpace, to even just copying and pasting the URL so you can go back to it later. So these all have various different positive benefits. I won’t go into the details as to which ones we found most successful, but I think there is a reason why we encourage video distribution through different means beyond just YouTube, whether it’s IM, Connections on YouTube or posting to third party sites. They definitely have a lot of positive values driving additional viewership and potentially even subscriptions. It just creates an overall deeper engagement.

Eric Enge: Let’s get into more basic SEO kinds of things. Standard advice in the industry places a lot of emphasis on category selection, titles and descriptions, and the use of tags. Can you talk about that a little bit?

Matthew Liu: If you pull up a YouTube watch page, you’ll see three main areas of tags that the user can input. We do have the title and the description tags just as you mentioned, but I think what a lot of people are missing when they use these three fields is comprehensiveness. A lot of times we see videos with very short titles, very short descriptions and somewhat erratic tags.

The first thing I would say is if your video has subtopics or a subtitle, include them in the original title, and include all the details in the description. We offer a lot of space where we usually type in all the details, and obviously we are indexing all those descriptions and tags, and they are going to be surfacing in both YouTube video search and Google video search. So it’s important that you have comprehensive data.

Secondly, we would say be consistent. A lot of videos we see have a good title and a good description, but then totally random tags. So we actually do have measures that penalize this poor behavior. We recognize when videos are trying to spam, and that’s actually something we penalize. So be consistent with your title description and tags. Make them clearly about that video and don’t try putting unrelated keywords in any of those fields.

Another layer of video SEO is to make your video open. Allow it to be embedded and allow users to comment on it and rate it. We definitely do take user feedback as an additional ranking mechanism. This can hurt you if you end up getting a lot of negative ratings, but the positive benefit of getting higher ratings outweighs that risk.

Now let’s talk a little bit more about engaging with that user. You mentioned the thumbnail which is probably one of the most basic things. Pick a thumbnail that is both representative of your video and engaging. Right now we will give you three thumbnails that we take from areas that we think are representative of your video, so any user that uploads a video should definitely take the time to find the best thumbnail.

There are some positive benefits to higher quality videos. Users may or may not care as much about the quality of the video itself, but because we are taking that thumbnail from the video, the higher quality of video will make the thumbnail a higher quality as well. And higher quality thumbnails are something that we definitely notice attract our users.

Eric Enge: Right. So you’ve got to care about the content and the quality of the thumbnail.

Matthew Liu: Absolutely. Then going further along with engagement, we’ve launched some features such as path annotations. These are becoming more and more powerful overtime, as they are an additional way for you to communicate with your users. We are able to put speech bubbles or links to your other YouTube videos.

Often times, savvy users do very interesting video tours where they link back to one another through different videos, or they even have games you can play by clicking on different annotations. It’s interesting how you can create an extended cycle of viewership through annotations.

Then, rather than just interacting with their user base, they are also interacting with the rest of the YouTube community. So what we’ve seen is that a lot of successful people can cluster together. A lot of our top users have formed this community where they send video responses to each other, they comment on each other’s videos and they subscribe to each other. So we definitely encourage people who are trying to get increased viewership to tag back.

We don’t want to have people spamming or just randomly adding irrelevant videos as video responses, or comment spamming, and we definitely penalize videos that do these things, but when it is legitimate, posting video responses is a good way to network with other community members. Think of it almost as a message that you would get back on a social networking feed or a Twitter feed.

Just continue that dialogue with important members of the community. Often times if that original video does get traffic, then your video response may get additional traffic and help viewers discover you as a new source of quality videos.

Eric Enge: You get value by building relationships.

Matthew Liu: Yes, completely.

Eric Enge: Should people strive to avoid “stop words” in their titles. Similarly, should you include the word video in your title or description, so that if somebody searches on tech software video, for example, then you have a better chance of coming up. Do those things make sense as well?

Matthew Liu: Yes, they do. Especially in the context of discovery from Google, because Google also indexes YouTube videos. Another thought that I forgot to mention is if your video was shot at a particular location or on a particular day, then you should also include some of that information in the video’s description.

Eric Enge: Another suggestion I’ve heard is to use adjectives such as happy or sad to pick up mood-based searches.

Matthew Liu: What I can tell you is that YouTube search and Google search are a bit different at times. It’s not in all cases, but we have seen that some users tend to search in more generic terms,. So you’ll see users searching for very specific pieces of content, such as “CBS video” or “NBA video”. You will also see users searching for terms such as funny videos. What I would say is video owners should target both the very specific terms and they should also potentially broaden out a little bit so that there are more generic queries in the description and the tags.

Eric Enge: What’s the best way to get a sense of the best keywords within the YouTube environment?

Matthew Liu: Great question. We don’t have anything to announce for now, but we are working on various keyword tools. We have a couple of very basic keyword tools as part of Promoted Videos right now, which allow you to checkup similar keywords. The Insight tool that Tracy talked about also helps to understand keywords that are already driving traffic to your video.

We are working on a couple of other similar projects where we’ll be able to have much more robust keyword suggestions in the near future. But in general, I would say use Insight and use the keyword tools that are already available in Promoted Videos, and those are probably going to be your best bet in the short term.

Eric Enge: I have also heard a suggestion that you go to the search tool when you start entering a query, and then the search suggestions that you can get there may be in volume order from largest to smallest?

Matthew Liu: I can’t comment specifically about that. Those are suggested queries that we think users might be searching for as they start typing certain letters.

I will add a caution that publishers should avoid keyword stuffing because it’s very easy for you to potentially broaden the scope for your video by adding a couple of keywords. But, it only takes one or two irrelevant keywords to trigger us to think that the video is trying to spam the system. Our penalties will outweigh the benefits you can get with keyword stuffing.

Eric Enge: But you did say earlier that it’s important to be comprehensive, which means that you should include all the keywords that are in fact relevant (without putting too many total keywords), correct?

Matthew Liu: Yes, there is definitely a balance you have to find. It’s actually more of an art than a science. Use keywords that are related, but don’t type in every letter in the alphabet. Just come up with the most important relevant keywords and add all those words into your description and tags.

Eric Enge: It’s got be highly relevant and something that people can search to discover your video, and then have a good chance of being happy when they get there. At a minimum, they get relevant content, even if it is not exactly what they are looking for.

Matthew Liu: Yes, absolutely.

Eric Enge: Thanks a lot Matt!

Matthew Liu: Yes, thank you Eric!

Have comments or want to discuss? You can comment on the YouTube interviews here.

Other Recent Interviews

About the Author

Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns.

Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at:

For more information on Web Marketing Services, contact us at:

Stone Temple Consulting
(508) 485-7751 (phone)
(603) 676-0378 (fax)

May 18th 2009 News

Google’s John Mueller Interviewed by Eric Enge

Comments Off on Google’s John Mueller Interviewed by Eric Enge

Published: May 11, 2009

John Mueller is currently a Webmaster Trends Analyst at Google Zurich. Prior to working at Google he became well known for his active participation in Google Groups and a variety of SEO forums.

Interview Transcript

Eric Enge: Can you provide me with your definition of cloaking?

John Mueller: The standard definition of cloaking is to show Googlebot something different than you would show your users. So, in a worst case situation, you would show Googlebot a nice family-friendly homepage, and when a user comes to visit that page, they would see something completely different.

Eric Enge: Like porn or casino ads or something of that nature?

John Mueller: Exactly. So if the user was searching for something and finds what he thinks is a good result, he clicks on it, and then there is nothing even related to what he was searching for on that page.

Eric Enge: Right. So that’s clearly an extreme form of cloaking. There are many different levels of cloaking, and I’d like to explore some of those.

Some people, for example, may have a content management system that just insists on appending session IDs or superfluous parameters on the URLs. They may not be superfluous from the CMS’ point of view because they are using the parameters to pull information from a database or something like that. And given the content management systems that they have, it’s actually very difficult and very expensive to fix this problem at its core. So one solution would be to serve the same content to users and to Googlebot, but to modify the URL seen by Googlebot to remove the superfluous parameters and the session IDs.

John Mueller: That’s something that we’ve seen a lot of in the past. We currently have a great new tool that can really help fix that problem without doing any redirects or without really changing much at all, and that’s the rel=”canonical link element. You can place it in the header of your pages and specify the canonical URL that you would like to have indexed. So you could take away all the session ID parameters or anything else that you don’t need, and just specify the one URL that you want to have indexed.

Eric Enge: Right. And that’s something that you announced with the other search engines just a few weeks ago, correct?

John Mueller: Yes, it’s fairly new. It’s something that not a lot of people have already implemented, and there are a lot of people who are already using it to clean up this problem. Crawling a website and finding many duplicate versions of the same content with different URL parameters such as session IDs can confuse search engines. Using this link-element helps to make it a bit clearer and can help to resolve this problem.

Eric Enge: So you basically implement the canonical tag on various pages and you tell people what the canonical URL is. If, for example, somebody has different sort orders for their products in the e-commerce catalogue (e.g. by price, brand, size, color, …), you can basically point Googlebot back to the canonical version of the URL, it’s supposed to behave much the same way the 301 redirect would, except for it does not actually take the user to the different URL specified? Is that a fair summary?

John Mueller: Yes. It’s not a command that you would give a Googlebot, it’s more like a hint that you would give us. One thing we’ve also seen is that people try to use it, but they use it incorrectly. For instance, they specify their homepage as a canonical for the whole site. And if we were to follow that as a 301 redirect, we might completely remove their website. So we have to take that information and determine if it is really a canonical for the other URL, or if the user may be doing something incorrect.

Eric Enge: And of course one way you could do that is by making sure the content on the two pages is identical.

John Mueller: Yes.

Eric Enge: So if you make a mistake and use canonical tag to send everyone to the home page of your site, presumably the content will differ from the other pages. And, as I understand it, the gold standard solution is to fix the problem at its core and not have to rely on the canonical tag.

John Mueller: If you can move to the cookie-based session tracking, then that would really help. But we know it’s not always easy to change to a system like that. There might be a lot of money involved. So at least with this system there is fairly simple way to fix that problem.

Eric Enge: Right. So it’s the backup plan that should be used if you can’t fix it at its core or if it’s just too expensive to fix it at its core?

John Mueller: Exactly.

Eric Enge: Yes, that makes sense. Now I imagine there are also people out there who served a different URL to Googlebot and its users before the canonical tag existed. Is that problematic?

John Mueller: I would suggest doing that for all new users who come to the site without cookies, instead of just for Googlebot. This way, if a user accesses an old URL that has a session ID, you can just redirect him to the proper canonical. That would treat users and search engines in the same way, and it would still help solve this problem.

Sites that are currently showing prettier URLs to Googlebot should not panic, as long as their intent is genuine and it is properly implemented. But I’d advise against this for sites that are in the process of a redesign or sites that are being newly created. Using rel=”canonical” is the current best practice for tackling this problem.

Eric Enge: But if the system is relying on the session IDs, then it’s there for a reason, right?

John Mueller: Yes, but usually most CMSs resort to session IDs if they can’t access a cookie. So if you see that a user doesn’t have a cookie, you can redirect them away from the session ID. And I think the important thing here is that you find a way that you can treat users and search engines the same.

Eric Enge: Right. You could use JavaScript to append your various tracking parameters to the URL upon the click. So that, in principle, is treating users and Googlebot the same.

John Mueller: Yes, but that really doesn’t solve the problem, because there would be something that would happen within the site. But when the search engine crawls a site, they don’t execute the JavaScript, so it would have to work with and without the JavaScript enabled.

Eric Enge: Right. So users that don’t have JavaScript would of course be handled in an identical fashion to the search engine robots, and users who do have JavaScript would be able to benefit from whatever the tracking parameters are meant to give them.

John Mueller: Exactly. That’s similar to using AJAX on a website. If you have a normal HTML website and you start adding AJAX components to that website, a user with a limited browser, maybe from a mobile phone or even search engine crawler, would still be able to navigate your site using standard HTML.But someone who has JavaScript enabled would be able to use all those fancy AJAX elements, and that would also usually generate slightly different URLs, so I think that’s completely normal.

Eric Enge: Right. So, let’s talk a bit about A/B or multivariate testing, which is something supported by Google’s Website Optimizer product. It creates a scenario where users come to a page and some piece of code runs and decides what version of the page to show users, usually implemented in JavaScript. And of course the Googlebot will only see the one version, it won’t see the alternate versions.

John Mueller: Exactly. So, the clue here is that the intent matters, as is generally the case with Google. If the intent is really that the webmaster wants to test the various versions of the same content, then that’s no problem. And if the intent is there to show the user something completely different, then that would be on the border. You would have to look at that.

Eric Enge: I mean, you can always take any technique that was created with good intentions and find ways to abuse it. So let’s say somebody is testing out four different versions of a key landing page on their site to see which performs the best for them. Maybe they are changing the logos and moving elements around, they might be changing the messaging a bit to see if one tagline is more effective than another, or they may be changing the call to action.John Mueller: If you are doing that with good intent to find the best solution for your users, and you are showing more or less the same content, then I wouldn’t really worry about that.

Eric Enge: Say you have a graphic of some sort, an image file on your site that might be a menu link or a logo. And there are various techniques for showing the search engine’s robot or any specific user agent’s text instead of the graphic. What are your general thoughts in that area?

John Mueller: Generally speaking, if you do that with good intent and you more or less match the content up, then it’s fine. So, for example, you could have a menu where you use JavaScript and graphics to create a really nice user experience with an alternate version that’s in static HTML that might be behind the graphic menu then. If it matches up, that’s fine. And if the home link has an alternate text tag, then that’s fine too. But if you have a home link and alternate text that says, “click here to see our great cleaning products available in these 22 cities,” then that’s kind of sneaky, and not something that we would like to see.

Eric Enge: So, there are various grades of this, correct? One level is where the text matches up a hundred percent with what is in the image. And there is a notion of substantially similar, and then you could actually several more grades and have somewhat similar, and then completely different. And, I think you just highlighted an example that’s completely different. So, an identical is an easy case, I think you already addressed that. What if something is substantially similar, but is not word-for-word identical?

John Mueller: I would say it depends on the case, but if you are not trying to deceive the search engine crawler or the user, then it’s generally okay, but in general I would be cautious as soon as the content is not identical.. So if you have a link that goes to your homepage and it has a graphic of a house, then you wouldn’t have to use house as an all-text. You could just say “go to homepage,” or something like that, and it’s fine.

Eric Enge: So again it gets back to the notion of intent that you’ve already raised?

John Mueller: Exactly.

Eric Enge: And, of course, one flavor of this is sIFR, which stands for Scalable Inman Flash Replacement. sIFR uses text input to render what is shown in Flash so it is guaranteed to be identical.

John Mueller: Exactly. Where we start to see problems is when a website has a completely Flash-based interface and a lot of different pages all on the same URL hidden behind it. Then it would be hard to include ten pages of HTML on a single page that match exactly what is written in the Flash file. So you have to find a solution for yourself there; how much really makes sense and how much you might have to cut back and just leave the basics in HTML and keep the bulk of your content in Flash.

Eric Enge: Right. And of course when you get to that scale, you are past what you do with sIFR, which is really intended for putting anti-aliased fonts on your page, which is a more limited technology. But I think once you get into the more complex situations, you can use SWFObject, correct?

John Mueller: Yes, it would be something like that.

Eric Enge: That technology doesn’t guarantee that the alternate version shown in text is identical to what is in Flash.

John Mueller: Exactly.

Eric Enge: So it is open for potential abuse, but I would imagine that the policy again gets back to what you actually do and what your intent is in doing it.

John Mueller: Yes. And there are two other things that also play a role in that. The first factor is that we have started crawling and indexing Flash files. If you have a lot of content in your Flash file, we will try to at least get to that and include it in our search results.

The second is that there are still a lot of devices out there that can’t use Flash. So if you have a website that relies on Flash and you suddenly notice that there are a bunch of mobile users who are trying to use their iPod, iPhone or Android Phone to access your website, then you would start seeing problems because they wouldn’t see the Flash content at all. , And if the HTML content doesn’t match up with what you are trying to bring across to the user, they will simply leave the site.

Eric Enge: One grade of this problem occurs when you try to implement something in Flash, but you are not going to be doing it with the intent of rendering the same thing that you can easily render in HTML. You are probably using it because you want to create a highly graphical type experience. It is not always the case of course, but certainly one of the things that’s appealing about Flash is that you can create a really attractive visual experience. Say you have a man driving a fast car on the German autobahn, the Flash isn’t going to narrate the course of the drive.

But in your text rendering of what is in the Flash, you would want to describe what is happening. For example, “it’s a nice day and a man gets into his expensive car and heads out onto the Autobahn”. So you are actually implementing text that isn’t in Flash, but the content essentially is.

John Mueller: Yes, that’s generally fine. If the intent is okay and it matches up so you can see that there is a car and a man driving on the autobahn, then that would be fine.

Eric Enge: So again, it is about making sure that you are pretty much rendering the same information so that there isn’t anything confusing in the user experience? Like if you flip from one mode to another, Flash, JavaScript or AJAX enabled or disabled, so to speak.

John Mueller: Yes. If you can think about it from a user-experience point of view; if the user sees the HTML content in the search results and clicks on that page, does that match up what he would be expecting?

Eric Enge: So what about serving different content based on an IP address to address things like language and national or even regional issues? Just to think of a regional issue, the products that your customer base in Florida buys could be quite different than the products your customer base in Minnesota buys. So you want to serve up the Florida user one set of offerings and the Minnesota user a different set of offerings. John Mueller: That is something that I see a lot as a European user, because in Switzerland we have four different official languages, and as soon as you start using a web site, it automatically tries to pick a language that they think is right. They are wrong most of the time, and it is something that really bothers me a lot. So I guess I might be a little bit emotional about that.

One thing that I have noticed that you have differentiate between whether or not your content is really limited to a specific language or geographic location. For example, you have a casino website that you can show to users in Germany and in France, but you can’t show it to users in the US. That’s kind of an extreme situation, but in a situation like that you would still have to treat Googlebot like any other user that would come from that same location.

So if we crawl your website from the US, and your website recognizes us as an American visitor, then you should show us exactly the content that an American visitor would see. And it would be a little bit problematic if the website started blocking all American users because of legal reasons. So what you would do then is make a public website that everyone can access and then just link to your private website that has been limited to users in a specific region.

So, for example, you would have a general homepage that tells what your website does, gives some information and provides something that search engines can crawl and index. Then when users get to the right location they can click through to your actual content.

Eric Enge: So are you suggesting that if a user accesses that website from Germany, they come to some initial page and then they have to click further to get through to page they are actually looking for?

John Mueller: Exactly.

Eric Enge: So it is not acceptable to just simply serve them?

John Mueller: Yes, that might cause problems when Googlebot visits. The other problem there is that IP location and language detection is often incorrect. Even at Google, we run into situations where we think, an IP address is from Germany so we would show German content. But in reality, the user maybe based in France, and it is really hard to get that right. So if you try to do that automatically for the user, you are almost guaranteed to do something wrong at some point.

That leads to leads to the other version of this problem, where users in the wrong location can still access your website. And in a case like that, we would be able to crawl and index the website normally, but I recommend that you include elements on your website that help the user find the version of the website that they really want to use.

The important thing there is that you use different URLs for the different locations or different languages so that we would be able to crawl all of the specific content. So when I go to from Germany, for example, I have a little banner on top that says “Hey, don’t you want to go Amazon Germany? We are much closer; we have free shipping”. And that way, the search engine would still be able to see all the content, but users would still find their way to the right website.

Eric Enge: So this of course is a little bit different than the scenario where you implement a website at, or, or .com, or, where you really are creating versions that are meant to be indexed in the local version of the search engines?

John Mueller: Exactly, yes.

Eric Enge: So that’s a different scenario that someone could use if they wanted to.

John Mueller: I think the key point is whether or not users are allowed to access the wrong version of the website, or if there is a legal reason why it is blocked completely.

Eric Enge: So if the legal reason isn’t there and it is just that you want the default language that a German user sees, and you are willing to accept the fact that you are right about 90% of the time and you are wrong about 10% of the time, they can click the French link if they are really from France?

John Mueller: Yes. I think that the important part, especially with languages, is that you really provide separate URLs so that Google can crawl all language versions. And this way you also don’t have to do language detection on your site. The user will search for something using a German or French-speaking Google, and we will show the French or German-speaking pages appropriately.

Eric Enge: So they end up in the right place through that mechanism?

John Mueller: Yes. And you don’t even have to do anything on your side. Maybe if you have a homepage you could show a little drop-down and let the user choose. Or you could have it pre-populated with the determined location by default, but you are still giving the user a choice between the different language versions. You give the search engine a choice and we will try to send the users directly to the right version.

Eric Enge: What are your thoughts on serving up different content based on cookies, such as explicit or inferred user preferences.

John Mueller: I think the general idea is also to make sure that you are not trying to do anything deceptive with that. Say, for example, you have a website where you just have general information. If a normal unregistered user comes there and you show that same general information to Googlebot, that is fine, because even a logged in user finds more information when he accesses the same URL. So if you make sure that it matches up with what a user would see, then that’s generally not a problem.

Eric Enge: And since we are talking about cookies, presumably we are talking about a user who has been at the site before. So if they come back, their expectations may be for somewhat of an enhanced experience based on their interactions.

John Mueller: Exactly. So if you have it setup in a way that logged in users or users who have preferences get to see more detailed content, then that’s fine in general. But if you have it in a way that users who were logged in see less content or see completely different content, then that would be problematic.

Eric Enge: Right. Can you give us an overview of First Click Free and what its purpose is?

John Mueller: We started First Click Free for Google News so that publishers could provide a way to bring premium content to their users. For example, if you have a subscription based model for your website, you could still include those articles in the Google News search results and a user who goes to those articles would still be able to see them and read that article normally. But as soon as they are trying to access more on your website, they would see a registration banner, for example.

Now, we have extended that to all websites, because we know not everyone can be accepted into Google News; it is kind of a special community. So if you have some kind of subscription or premium content, you can show that to Googlebot and to users who come in through search results. But as soon as something else is accessed on that site, you are free to show a registration banner so that users who are really interested in this content have a way to signup and actually see it.

Eric Enge: So the idea here is you have subscription-based content and Google wants to make its users aware that that content is there and it exists.

John Mueller: Exactly.

Eric Enge: So the user goes to Google, they see the article, they decide to go read it, the site implementing First Click Free checks the referrer and makes sure it is from Google, in which case they show the full article including all pages of a multi-page article, not just the first page?

John Mueller: Yes.

Eric Enge: And then the user potentially gets the registration banner when they go on to or a subscription box on a different article?

John Mueller: Exactly.

Eric Enge: Now, can a user just go back to Google and search on something and try to find that same article somewhere else in the search results?

John Mueller: Theoretically, yes. That would be possible, but we found that most users don’t do that. It is more work that way, and if it is content they are really interested in, they will figure out a way to access it normally. When you like the content, you might say a subscription and say “Okay, this is a good website. I want to come back and read more of this content. It is fine if I just pay a small amount for it”.

Eric Enge: I would imagine that for most subscription-based sites that it is an effective program to expose their content and increase their subscriptions.

John Mueller: Yes, absolutely.

Eric Enge: Exposure is really good. To do this, you basically bypass the login screen and give it access to all the content that you do want to index when Googlebot comes to the site.

John Mueller: Exactly, yes. I would expect that you could probably do the same for other search engines. You might want to check with them, but I think that is generally acceptable if the user sees the same content as the search engine crawler would see.

One thing that I have noticed when I talk to people about this is that they are kind of unsure how they would actually implement it and if it would really make a difference in their subscription numbers. It is generally fine to run a test and take a thousand articles and make them available for First Click Free, make them available for Googlebot to crawl and make them available for users to click on.

You can leave the rest of your articles blocked completely from Googlebot and from users. Feel free to just run a test and see if it is going to make a difference or not. If you notice it is helping your subscriptions after a month or so, then you can consider adding more and more content to your First Click Free content.

Eric Enge: Right. You can take it in stages. Are there other questions on these topics that you hear from people at conferences or out on the boards?

John Mueller: Another thing about cloaking is that we sometimes run into situations where a website is accidentally cloaking to Googlebot. That happens, for example, with some websites that throw an error when they see a Googlebot user agent. It is something that can happen to Microsoft IIS websites, for example, and that would technically also be cloaking. But in a case like that, you are really shooting yourself on the foot because Googlebot keeps seeing all these errors and it can’t index your content.

One thing that you can do to see if that is happening, is to access your website without JavaScript, with the Googlebot user agent, and see what happens there. If you can still access your website, then that’s generally okay. Another problem that sometimes comes up with language detection is that a website will use the same URLs for all languages and just change the content based on user or browser preferences.

The problem here is that Googlebot will only find one language, and we will just crawl the whole website in that one language. So, for example, we have seen cases where Googlebot was accidentally recognized as a German-based user, and we re-crawl the whole website in German and suddenly all the search results were only showing up to German users.

Eric Enge: So people in the UK couldn’t see the UK-English version of the site, because, the Googlebot wasn’t aware the content was there?

John Mueller: Users in the UK would be able to see that content, but since the Googlebot was recognized as a German user, it was seeing the content in German only. In this case, the old pages would be re-indexed in German, so if someone was searching for an English term, they wouldn’t even find that site anymore.

The lesson here is to really make sure you have separate URLs for your content in different languages and locations.

Eric Enge: Right, for purposes of this example, we are assuming that the content is identical but translated.

John Mueller: Exactly.

Eric Enge: And, you want to have separate pages for the different language versions of the content.

John Mueller: Exactly.

Eric Enge: Excellent, thanks John!!

John Mueller: Excellent, thank you!

Have comments or want to discuss? You can comment on the John Mueller interview here.

Other Google Interviews

About the Author

Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns.

Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at:

For more information on Web Marketing Services, contact us at:

Stone Temple Consulting
(508) 485-7751 (phone)
(603) 676-0378 (fax)

May 15th 2009 News


Comments Off on Underdogs

My latest New Yorker piece, on how David beats Goliath, is here. 

I’ve been very pleased with the reaction. I did want to respond, though, to a number of comments that have been made about the parts of the piece dealing with Rick Pitino and college basketball. (Nothing is quite as fun as arguing about sports,)

Since most of the commenters make the same arguments, I’m going to pick a post  by Ben Mathis-Lilley, over at New York magazine’s blog. He writes, in part:

The truth is that almost every team tries to make its opponents work for all 94 feet in some fashion, and not every underdog is born to run a full-court press. For example, take a team of mediocre players plus two pretty good athletes — one a tiny but quick guard, the other a big man who’s strong but slow on his feet. If that team ran a full-court press, the opposition would exploit the big guy by sending the player he guards sprinting down the floor on a fast break, while the small guard would be wasted guarding someone who probably doesn’t have possession, since the standard reaction to a press is to pass the ball around. A better strategy would be for the quick guard to pressure the opposition’s ball handler while the other players retreat, giving the big guy time to lurk near the basket and shot-block.

The first sentence–that almost every team makes its opponents work for all 94 feet–is, of course, nonsense. But the rest of the paragraph makes perfect sense. The press is not for everyone. But then the piece never claimed that it was. I simply pointed out that insurgent strategies (substituting effort for ability and challenging conventions) represent one of David’s only chances of competing successfully against Goliath, so it’s surprising that more underdogs don’t use them. The data on underdogs in war is quite compelling in this regard. But it’s also true on the basketball court.  The press isn’t perfect. But given its track record, surely it is under-utilized. Isn’t that strange?

        The New York piece then goes on:

The most misleading part of Gladwell’s case concerns Rick Pitino, the Kentucky coach who was famously defeated on a last-second play by Duke in the 1992 NCAA tournament when he decided not to guard Grant Hill, who was inbounding the ball (ignoring the inbounder is a key component of the press).

Hmmm. Small point. Ignoring the inbounder is not a key component of the press. It is a key component of someversions of the press. Pitino also uses a version of the press that does guard the inbounder. (Also Pitino is no longer the coach at Kentucky. He’s now the Louisville coach.) The piece then objects to my attempt to “shoehorn Pitino’s teams into the underdog category” because Pitino’s 1996 Kentucky team “featured featured a staggering nine players who would go on to play in the NBA.” A number of others have pointed this out, and I’m still somewhat baffled by the criticism.

Pitino has been a college head coach since 1978 at four schools–Boston University, Providence College, Kentucky and the University of Louisville. At BU, he took over a team that had won 17 games in the two years before his arrival. He went 91-51 in five years, and took the team to the NCAA. At Providence, he took over a team that had gone 11-20 the year before. Two years later, he won 25 games and went to the Final Four with what may have been one of the most spectacularly untalented teams to have ever reached that level. And at Louisville he took his team to their first final four in 19 years in 2005. The star of that squad? Francisco Garcia. Ever heard of him? Exactly. Not to mention this year’s Louisville squad which reached the Elite Eight with really only one NBA caliber player. You can also make an argument (and Bill Simmons at ESPN does) that Pitino did an awful lot with a very little while at the Boston Celtics, briefly, in 1998. Pitino’s Kentucky experience is an anomaly. And by the way the nine players who got drafted into the NBA off that anomalous 1996 Kentucky squad consisted of eight journeymen and one, marginal star–Antoine Walker. Pitino has had a fraction of the talent that his contemporaries at Kansas, Carolina, Duke or Connecticut have had.


May 13th 2009 News

Under New Management

Comments Off on Under New Management

Late in 2005 I had the notion that most people felt accessible web sites are boring and basically without merit offering only the most simplistic functionality and style. As long as that notion persisted, I thought, it would effectively hold back the masses from embracing the needs of all users, not just those they felt were important. I thought that I could perhaps alter that notion, dispelling the myth, by showcasing sites that are accessible yet still look and work great.

May 12th 2009 Uncategorized

Up Close Look at Eric Ward’s Link Building Desktop

Comments Off on Up Close Look at Eric Ward’s Link Building Desktop

I was reading Matt Cutts post the other day titled My 8.7M Pixel Display, and it hit me.There is no site devoted to showing the various desktop rigs for those of us who earn our livings in the SE/SEO-SEM/Link Building/Online Publicity field ( is available, btw). I’ve found a couple other people like Matt and Danny Sullivan who’ve posted pictures and descriptions of their desktop setups

May 5th 2009 News