Google Under European Fire.

European Union is something of a mess. In one hand plays the role of the protector of monopolies like with the approval of legislation that will unplug european citizens from the web in case of piracy (there goes internet access as a fundamental right down the drain). On the other hand wants to protect the citizens from the monopolies.

Curiously, this apparent contradiction, isn’t really a contradiction. The monopolies aren’t equal. In the first case Europe wants to protect those industries that are whining about loosing money, and that, they say,  will be in risk of bankruptcy. On the other case the monopolies are from companies that don’t complain, that innovate constantly and that have enough money in their bank accounts to save Ireland (or Portugal) several times in this economic crisis.

So, if your doing well, making money and you don’t whine… the EU will investigate you, accuse you, and ask you for a bribe… (ups, fine you). If you’re company wants to keep doing business as it did 50 years ago, then the EU will ask its citizens to pay up whatever these old farts want.

Get your act together EU!

For each action there’s a reaction…

In Physics this is true, and probably is also in many things of life…

Australia wanted to force a ban of infected computers from the networks, but suddenly in an moment of lucidity the government saw that this could backfire. The problem with infected computers is that the persistence of the infections has nothing to do with the removal of single nodes but with the topology of the network as an whole. I fear that this type of measure is misguided by another type of intentions. If a system that removes users computers from network is in place and users are accustomed to it, wont copyright agencies be the next ones to ask for this? (Well, they already passed the 3 strikes law). Or what will happen when someone’s computer is publishing political views different from those of the government? Or, could for example a rugby team forbid computers from the opponent team to be online? You see where this could be heading.

The internet grew in a self-organised way. (Self-organised not meaning randomly). This type of measures are constrains that affect the network. For now the negative feedbacks that these constrains impose are still minor compared to the positive feedback loops that the network has to expand and grow, but one day they will be to much and might hurt the network in a way that the giant component might break into smaller parts. Then all that these governments will have are a bunch of sticks that don’t really make a tree anymore.

The ways of the world are in some ways incomprehensible to politicians (not all, but the majority). Trying to rule on matters that are out of their control will end on failure of the rules or catastrophe of the system. Let’s hope that they stick to what they are best at (whatever that is).

Boilerplate: Article extraction from webpages

The amount of clutter text present at different webpages makes the task of discovering what is important a pain. At the observatorium I’ve been using a simple Tag to Text ratio to try to extract the important sections of text from webpages. The results are good, but not great, the method is fast and it works if one has in consideration that noise exists and can’t be totally eliminated.

The other day I found another technique that I think might become my de facto standard technique for text extraction from webpages as its first results are better than what I expected. The algorithm is able to detect the meaningful sections of pages with high accuracy and also has the benefit of being truly fast.

This is derived from the paper “Boilerplate Detection using Shallow Text Features” by Christian Kohlschüster et al. that was presented at WSDM 2010. and there’s a google code repository available with the Java source and binaries to download.

ZFS and Novell or not Novell?

Today I’m following with some interest two linux related stories:

  • The first one is that the ZFS performance in Linux is not that great. Linux users (and Mac users for what is worth it) have been touted about the super benefits of Suns’ ZFS file system for ages. Well… tests don’t show much. Personally I’m sticking with ext4.
  • The other story that hit the news today is that Novell is being sold for some gazillion dollars. Hm… We are witnessing a lot of cash movement on companies that have a big role in open source this year. What’s next? Canonical? Red Hat? The truth is that this buy has a fishy side to all of this and that is that part of Novell assets is being bought by a consortium put together by Microsoft. Taking into account that Novell and Microsoft were bestfriends for some time now…

Microsoft Kinect side projects

Microsoft Kinect might become one of the most interesting projects coming out from Redmond. This brilliant future for the kinect is probably due to the Open Source driver that already made Microsoft say that they welcome what people invent.  (That’s new, for a change). But let’s stop talking about politics and list some interesting cool things people are doing with Kinect (in no particular order):

I’ll updated this list with new projects and ideas. If you find one please put own in the comments so I’ll check them and update this list. (more…)

Java em duas versões… uma das quais é paga…

A Oracle finalmente começou a mostrar o que pretende fazer com o Java, e naturalmente não são boas notícias. A Oracle pretende ter duas versões do Java, uma gratuita (como até aqui) e outra paga pretendendo juntar a sua JRockit com a Hotspot que comprou à Sun. Pelos vistos vai haver diferentes performances do Java e quem pagar terá mais desempenho enquanto os restantes se arrastarão.

A meu ver esta estratégia de dois produtos é só o primeiro passo para a dado momento a Oracle alegar alguma incompatibilidade/custo económico/ etc para abandonar a versão gratuita… Daí que seja urgente que mais membros pesados da comunidade open source tomem parte do processo (IBM?), caso contrário a Oracle vai acabar por fechar o Java num nicho de mercado de onde vai extorquir fortunas.

Ao nível da guerra de linguagens de programação, esta poderia ser uma excelente altura para emergirem outras linguagens nos currículos universitários… hm…

Can we haz python, plz?

WordPress cache problem

I’ve setup my wordpress so it does object caching. This improves performance but is a pain when you have to upgrade your system. After the latest WP 3.0 -> WP 3.0.1 upgrade via automatic upgrade I noticed that I was stuck at the upgrade page saying that I didn’t need to upgrade my database and with the OK button redirecting me to my frontpage. Well, this is all good and all, but I wanted to go into my admin section, not my front page. There was a cache problem in wordpress. After fiddling around the only thing that manage to solve the problem was to delete the cache folder and rebuild it again.

So, If you’re experiencing some post upgrade wordpress problems and you’re using some sort of caching mechanism…

Delete the /cache folder at /wp-content

And while I was loosing my time trying to figure out why was I locked out of my WordPress admin pages, the website was running because I had the UNIVERSAL CACHING MECHANISM FOR WORDPRESS: static files

Google Wave: O que é que correu mal?

Google Wave

Não se compreende o que poderá ter corrido mal com o Google Wave. Finalmente quando o produto está estável e pronto para ser utilizado pelas massas não há ninguém (ou há muito poucas pessoas) interessado em utilizar o Google Wave. De novo paradigma que iria substituir o email a projecto pronto a fechar o que é que aconteceu? (more…)