Saturday, January 27, 2007

Sitemaps and Blogger.com

Let's say you're fond of Google Webmaster Tools that help you to manage you're site in relation to the world's best known search engine. When you're working with your own site, it's quite simple, you use the file or meta notification to see the information about how Google sees your site and you can help it to index your site by submitting sitemap (which you either create yourself or you use some kind of available generator). I like it quite a lot, it seems to me that it speeds up the process of including my pages into Google's search engine.

Then you're start using Blogger.com for your domain and used to Google Webmasters Tools you would like to use it for your blog too. You would like to know how well Google's indexing it and you don't trust it's indexing it differently than by using web spider (only following the links the web spider finds) although as a Blogger.com operator it knows all your pages, don't you?.

First problem may be to validate your site. You can't upload your file, but you can use meta tag validation (thorough description on this blog, but you probably don't need a step-by-step instructions, otherwise you wouldn't be interested, do you?). Second problem would be submitting the sitemap. You can't create a proper one, but at least you can use the Atom feed provided. It doesn't list all your pages, but this doesn't matter that much, the new ones are there to be indexed. But without knowing the old Blogger.com, it can be confusing a bit.

Originally Blogger.com used for your blog feed this kind of URL: http://en.zubicek.eu/atom.xml (yep, that would be mine). But with the new system it introduced Cool URIs and new feed URL (http://en.zubicek.eu/feeds/posts/default), which can't be added as a sitemap, strangely Google counts only the same directory as the same domain (it's a little odd, logic behind this would probably be that someone who can add files to higher directory has also control over subdirectories, but not vice versa). So, you won't be able to add this URL.

Fortunately, people at Google aren't stupid and thought about those who change old Blogger for the new one. If someone had their feed in his reader it would stop to function with this change. So the old URL keeps working. This means you just add the old style URL (for me http://en.zubicek.eu/atom.xml) as a sitemap to the Webmaster Tools).

That's it, now you can see how Google sees your blog and even help it indexing it.

Why do I need keyword based filtering for AdSense

Imagine a website about the Half-Life 2 game. It's in Czech, while AdSense for Czech content is available for about two months, therefore there are not many advertisements leading to not to high bids.

As a Half-Life 2 website it's full of articles containing words like "Valve" (meaning that software publishing company), Source (3D game engine) or Steam (software for game distribution). And of course lots of Czech texts.

Now, when AdSense servers try to match the appropriate adverts for my web, they most probably choose the one with for the word with higher bid. So mostly adverts for Steam sterilizers, water controls or in the best case software development. And of course in English. I can't speak for all of the visitors but I'm sure most of them aren't interested in those topics and when the click on them, they do by accident. So it's a lose-lose-lose situation for me (less income), Google (the same) and the advertisers (worse conversion).

Yes, there is a possibility to filter inappropriate ads by their URL, but first it doesn't work well (too many times I see an add blocked many days ago) and there's too many unwanted ads that I'm unable to filter them all out.

There would be few ways how to get rid of them:

  1. Google fixes it's algorithms so they take into account not only the words but also their context (most difficult and improbable)
  2. Google fixes it's algorithm so the ads in different language have lower priority (but maybe they use it already but the priority for Czech ads is still too low)
  3. Google let me choose what words aren't relevant to content on my site

I hope it's obvious the latter is the best. It can't be misused to get irrelevant but high-priced ads and it would enable me to tell the algorithm, which doesn't truly understand human language, what are the irregularities in my texts. Fortunately people in Google are preparing this function and it's even said it's in beta testing. So I hope it will be publicly available soon (or that I will be invited to beta testing - do you hear me, someone from Google? ;))

Thursday, January 18, 2007

Moving to a new place


View.
I've finally managed to move from the university dorm to a new place. Living at the dorm had become more and more restraining and my budget allows me now for something better.
Who'll be able to find out the address (it's not that difficult ;))?

Sunday, January 14, 2007

Blogger.com validation errors

Wow. I tried running my new blog through validation. I'm not the person who would be mad about validations error unless it breaks the page in Firefox or Opera, but still, 725 errors are quite too much. I don't know, why to use a standard w3c doctype when you're practically using your own, distantly related one?

And yes, I know this service is free. But making something free did never mean the right not to be criticised for mistakes and using something for free doesn't strip me of my right of expressing my opinion.

If you're asking what's the point of this post, you can consider it a product of the process of me getting the final opinion of this service. And since you're still reading it at this place, it's obvious it's not the most important thing. Nothing more, nothing less.

Saturday, January 13, 2007

Wordpress multilingual dilema

Originally I intended to dedicate my first post on a new English blog to the way I made my Wordpress blog bilingual. But since this post is on Blogger.com, something obviously went wrong.

As I've said before, I had the idea to start writing in English for some time. Some posts just need a bigger audience (yes, I'm an egoist a little and a vision of thousands people from all over the world, compared to few Czech visitors reading my site now, is an attractive one). So, I was thinking what do I need to start and how to implement it.

Also, I like the wordpress software. It seems to my as a nicely written piece of software, which is highly customizable - unlike for example phpBB it has a fine plugin API which enables you to use plugins without modification of the original software and also the templating system is based on PHP, so you don't have to learn another totally stupid system. Both makes the administration much more easier, especially upgrading is a question of only few minutes (have you ever tried to upgrade modified phpBB? And you have to upgrade frequently, as it's very poorly written and full of security bugs). stop of phpBB bashing NOW

What I wanted was to be able to write some articles in Czech, some in English and some even in both languages. I knew I wouldn't be able (and even would want) to translate everything. I wanted it to run on my domain and in the best case to use the same installation as my Czech blog - the same database, the same set of files, the same administration. So I tried to find a plugin to meet my needs - after all it shouldn't be hard, there so many plugins for WP, you know? Or not?

So I navigated to the translation and language plugins section of Wordpress web, quite sure I would find what I was looking for. Of 14 plugins available, I could dismiss 8 at the first look. Most of them were automatic translation plugins (not only I wasn't looking for translation, but also automatic translation is more funny than useful and moreover not available for Czech). I also couldn't find any use for Finnish quotes or encoding conversions (I use UTF-8).

With not many left, it was easy to look at each of one. Few were only for WP 1.5, so nothing to think about. Others (even one that call itself "The most advanced multilingual plugin for WordPress of these days" - it's a proof of relativity of the word "most") used cookies for language selection. Cookies are a very bad idea for localization, as they make indexing of language versions by search engines virtually impossible. Other problems included showing every article for each language, or showing all the languages together (very confusing for readers and search engines).

After all, there seems to be one plugin that would be OK and would meet my needs. It's name is Gengo. It supports translation of everything, you can use summaries in other languages, mark articles as a translation of other article, use of the language is URL based.

Why the hell am I not using it? The main reason is the demo page didn't work at the time I was examining it and I was too lazy (especially after being disappointed of all the other plugins) to try to find out more. And there were few problems mentioned that kept me from trying it (possibly MySQL4.1 required and problems with WP2.0.6). And I read about the new blogger.com for domains.

So, I'm finally here. I'll keep an eye on Gengo, maybe sometime I even switch to it. But for now I'll stick to the Blogger with it's large and more anonymous community ;).

(Yes, I know, what I'm writing about is basically how lazy I am).

first and hope not last...

It's been a while since I've started my own blog in Czech. I'm not posting on a regular basis, I don't feel like describing my everyday life to the whole (Czech speaking) world. Mostly I use it as a notebook to help my poor memory or to get out the anger.

There are times when I would like to address a much larger audience, things I think could be helpful for anyone. Hence the blog in English. And, of course, the much larger audience and number of bloggers using blogger.com make me feel more anonymous, so, maybe, I'll be able to post here more often then to my Czech blog.

Don't expect me to translate everything I put on my Czech site, I'm too lazy for this. Some things I will put on both blogs, others will go only to one of them.

Well, that would be all for the beginning.