Showing posts with label google. Show all posts
Showing posts with label google. Show all posts

Saturday, January 27, 2007

Sitemaps and Blogger.com

Let's say you're fond of Google Webmaster Tools that help you to manage you're site in relation to the world's best known search engine. When you're working with your own site, it's quite simple, you use the file or meta notification to see the information about how Google sees your site and you can help it to index your site by submitting sitemap (which you either create yourself or you use some kind of available generator). I like it quite a lot, it seems to me that it speeds up the process of including my pages into Google's search engine.

Then you're start using Blogger.com for your domain and used to Google Webmasters Tools you would like to use it for your blog too. You would like to know how well Google's indexing it and you don't trust it's indexing it differently than by using web spider (only following the links the web spider finds) although as a Blogger.com operator it knows all your pages, don't you?.

First problem may be to validate your site. You can't upload your file, but you can use meta tag validation (thorough description on this blog, but you probably don't need a step-by-step instructions, otherwise you wouldn't be interested, do you?). Second problem would be submitting the sitemap. You can't create a proper one, but at least you can use the Atom feed provided. It doesn't list all your pages, but this doesn't matter that much, the new ones are there to be indexed. But without knowing the old Blogger.com, it can be confusing a bit.

Originally Blogger.com used for your blog feed this kind of URL: http://en.zubicek.eu/atom.xml (yep, that would be mine). But with the new system it introduced Cool URIs and new feed URL (http://en.zubicek.eu/feeds/posts/default), which can't be added as a sitemap, strangely Google counts only the same directory as the same domain (it's a little odd, logic behind this would probably be that someone who can add files to higher directory has also control over subdirectories, but not vice versa). So, you won't be able to add this URL.

Fortunately, people at Google aren't stupid and thought about those who change old Blogger for the new one. If someone had their feed in his reader it would stop to function with this change. So the old URL keeps working. This means you just add the old style URL (for me http://en.zubicek.eu/atom.xml) as a sitemap to the Webmaster Tools).

That's it, now you can see how Google sees your blog and even help it indexing it.

Why do I need keyword based filtering for AdSense

Imagine a website about the Half-Life 2 game. It's in Czech, while AdSense for Czech content is available for about two months, therefore there are not many advertisements leading to not to high bids.

As a Half-Life 2 website it's full of articles containing words like "Valve" (meaning that software publishing company), Source (3D game engine) or Steam (software for game distribution). And of course lots of Czech texts.

Now, when AdSense servers try to match the appropriate adverts for my web, they most probably choose the one with for the word with higher bid. So mostly adverts for Steam sterilizers, water controls or in the best case software development. And of course in English. I can't speak for all of the visitors but I'm sure most of them aren't interested in those topics and when the click on them, they do by accident. So it's a lose-lose-lose situation for me (less income), Google (the same) and the advertisers (worse conversion).

Yes, there is a possibility to filter inappropriate ads by their URL, but first it doesn't work well (too many times I see an add blocked many days ago) and there's too many unwanted ads that I'm unable to filter them all out.

There would be few ways how to get rid of them:

  1. Google fixes it's algorithms so they take into account not only the words but also their context (most difficult and improbable)
  2. Google fixes it's algorithm so the ads in different language have lower priority (but maybe they use it already but the priority for Czech ads is still too low)
  3. Google let me choose what words aren't relevant to content on my site

I hope it's obvious the latter is the best. It can't be misused to get irrelevant but high-priced ads and it would enable me to tell the algorithm, which doesn't truly understand human language, what are the irregularities in my texts. Fortunately people in Google are preparing this function and it's even said it's in beta testing. So I hope it will be publicly available soon (or that I will be invited to beta testing - do you hear me, someone from Google? ;))