Tuesday, May 26, 2009

MobileMe vs. SugarSync vs. DropBox

I now have tested MobileMe, SugarSync, and DropBox for quite a while to decide which service to buy for syncing my “electronic life” between my Macs (soon I’ll be managing two OSX Server blades, one Mini, and two MBPs!). After this period, there is no doubt to me: I’m syncing my iCal calendars and Address Book content via Google, my bookmarks with XMarks, and everything else via DropBox.

MobileMe’s iDisk is nothing more than a pain and a piece of junk, which I honestly did not expect. After all the problems they had last year launching Me.com, I thought they would have by now created a working service. But the iDisk and syncing my PIM (Personal Information Manager - I still use Yojimbo, as Evernote’s and Together’s handling of encryption are pure patched add-ons) was just a [bad] joke: You even need to buy extra software if you want to do file syncing, as iDisk’s “offline” sync is so slow and error prone I could not believe Apple dares to offer something like that. So you need to either use Lingon and rsync to sync to your online mode iDisk which doesn’t win a medal for simplicity, or buy something like ChronoSync - and that takes hours (!!!) to ensure 10 GB of data in about 30-40k Files are synced, every time. All the more, ChronoSync may be the fastest and safest syncer in the wild! What finally got me mad was the sync agent using 90% CPU all of the time, at times virtually locking you out of your own machine, while performing almost nothing. Finally, if you ever try to navigate that online iDisk, get yourself a cup of tea, you will have plenty of time to drink it up until that file is open…

Compared to SugarSync, DropBox with its simplicity and real versioning of files is significantly better performing than SugarSync, espcially if we talk upload and real volume, and if you ever tried SugarSync, it is a resource hugger (not as bad as iDisk, but it will stop your workflow). So in the end the choice for me was based on “mutual exclusion”, there is simply still no service that can hold the candle to DropBox - and just got me Pro account. As I am writing this, I am syncing up dozens of gigs of data to my 50 GB DropBox, and I hardly notice it happening!

Oh, if you get yourself an account for DropBox, either the free 2 GB or a full Pro account, I would appreciate if you register via this link, as it creates me some 500 MB extra space for referring you :o).

Thursday, April 30, 2009

News, Swines & Pigs

Usually, I prefer to steer free from the day-to-day news reporting, yet even I have to accept a low level of "noise" if I want to know at least something about the most significant things going on. However, currently I get the overwhelming feeling that the whole news world is grunting and snorting like a pigsty. You guessed it, I am concerned about all this "swine flu" reporting going on. As can be easily demonstrated, this whole "pandemic alert" and panic-making has gone completely out of proportion. It is yet another example of how ridiculous news agencies exaggerate or even blur the facts, and this attitude might be much more lethal than most biological illnesses if measured by its indirect impact.

First of, some flu virus classification: There are three types of influenza virus: A, B, and C. B and C play only very minor roles as they mutate slower, are less virulent, and affect far less species. Usually, when talking about a flu virus, type A is implied. Type A influenza then is further categorized by the HxNx nomenclature. The H refers to the hemagglutinin (HA) lectin and the N to the neuroaminidase (NA) glycoprotein. Both are found on the outside coat of the virus particle. HA mediates the virus' binding to target cells, while NA is responsible for the release of progeny virus from infected host cells. The numbers denote the antibody response of the virus, ordered by historic discovery - meaning, a virus with the same H/N number is identified by the same type of antibodies, which form part of your immune system/defense. HA and NA are essential for the virulence (the relative ability to cause disease) in terms of infectiveness and the epidemic capabilities of the strain. H1N1 denotes the class of flu virus HA/NA proteins that is the most commonly found form in human influenza. Our immune system defends us by recognizing mainly those two proteins (the "antigenes") through our antibodies. This means, from a pure immuno-defense point of view, this kind of strain is the most well known to the human body and immune system. This is one of the reasons why the new Hong Kong avian flu, with its H5N1 composition, is much more virulent than the current H1N1 swine flu or any other "regular" flu.

Endemic states and lethal properties: H1N1 most deadly appearance and the worst pandemic in modern human history was what we now know as the "Spanish Flu" in 1918; This specific influenza virus transformed its endemic properties (the ability to propagate within one kind of species, in this case birds) to a panzootic state (affecting animals including humans - epizootic would be the intermediate state that does not affect humans). Note that this pandemic occurred during WW I, largely facilitating its spread. In general, any kind of virus capable of overcoming the species barrier is potentially more dangerous, as it is likely to carry genetic material and protein structures the newly infected species has never seen before, therefor being much more virulent and lethal than the existing endemic strains. However, the Spanish Flu killed somewhere around 20 million people (taking conservative, low estimates), and its symptoms were so strong it was often misdiagnosed as some much more severe infection. Apart from the HA/NA properties already mentioned, there is the actual RNA (viruses commonly use RNA instead of DNA to carry their genetic information) that a virus uses to encode its proteins and other functions that are important to the survival and impact of a virus. Our immune defense looks for RNA sequences those are different to any sequence found in our body and "destroys" those foreign sequences by cleaving the strands into non-functional pieces with the help of so-called RNases. For this to work, the immune system therefor has to identify the RNA [as foreign]. But a strand of RNA coming from another species might actually contain nucleotide base sequences our immune system does not recognize (because it only recognizes already known foreign sequences), making the virus much more lethal. This process of changing the RNA and protein configuration is amplified by two related mechanisms called "genetic drift" and "antigenic shift". In the case of the Spanish flu, the RNA was very "new" to the human's immune system and had a very high mutation rate: how often the RNA sequence changes, roughly the meaning of the aforementioned genetic drift and antigenic shift. On the contrary, the new "Mexican" swine flu virus has a very similar RNA composition to regular virus strains found in humans, i.e. it does not appear to be significantly more leathal than any other flu. The only known difference to the regular flu circulating in humans is that it seems to affect more younger than older people, possibly due to the fact that older people's immune systems already have "seen" a similar version of this virus some time ago and therefor have antibodies that recognize the HA/NA glycoproteins.

Now compare this swine flu against any regular flu by numbers: The regular flu kills about 250,000 to half a million people per year. This overhyped swine flu managed to kill eight (8!) humans so far and it has been confirmed to have infected about 150 people worldwide (WHO data, 29th of April, 2009). I.e., on a daily average about a thousand times more people die from the regular flu than this new strain. Regular flu is almost continuously spreading somewhere in the world, i.e., if the WHO took this into account, we would be living in a nearly constant influenza pandemic. The swine flu just now made it to the last stage before even being defined as a WHO "pandemic": There must be a few known infections in at least two countries. Recall this definition and check the real numbers when somebody is talking about a new "pandemic". The infection with swine flu seems to be no more lethal than with any other flu, so your chances of dying from the swine flu outside of Mexico are so marginal it makes no sense to take them into account, while if you do go to Mexico, all you seriously need to do is make sure your current health state is good enough to survive any flu anywhere, which is much more prevalent and 1,000 times more likely to kill you by a global statistic. But the main point is: there is nothing dangerous or wrong about going to Mexico, at least concerning the flu. I would be much more worried about drug gangsters and hijackers there if I were you: They managed to kill several thousands of people this year alone already. In other words, the WHO rulings and suggestions, that are close to ridiculous given the circumstances, combined with the media hype are about to isolate Mexico from the world, which leads me to my final and most important point.

In general, this "pig-hyped" flu without any review of its background, no factual content, and exclusively based on beliefs and propaganda has only one really worrisome influence: it is weakening and isolating Mexico, both socially and economically. Our ignorance to real facts are estimated to cost Mexico City ("D.F.") alone around $88 million per day, and this figure will need huge updates for the crash Mexico's main (legal) economic sector, tourism, will suffer, plus the costs incurred on the country as a whole. This number will be similar to or more likely even exceed the daily cost of the U.S. oil war in Irak (estimated to about $250 million, in case you didn't know) - the only good news being that instead of about 100 (direct, not counting the indirect toll, which is estimated to be around five times higher) deaths per day, the swine flu's daily death toll is still below one. In other words, the combined direct and indirect negative impact of this insubstantial media hype on a close to imaginary "Mexican Flu" will cost and destroy much more lives than the virus itself most likely ever will have been capable of. Media propaganda crusades against a country nowadays have the same socioeconomic impact as the largest "real" war in decades if measured by the daily cost. Keep this in mind the next time you read news about swines from pigs.

Monday, April 27, 2009

Why I love Python 3.0: Unicode + UTF-8

Sorry if this is a nerdy topic, but it might be more than useful for anybody intending to write programs that use more than the ASCII characters (A-z, 0-9, and some symbols), which, given how i18n'ed most applications are today, is rather the norm than the exception. I also hope to encourage my fellow Pythoneers to update to 3.0 as soon as humanly possible, not only because of this change, but because of the general advantages of Python 3.0 (aka "no-where near 3000"...).

In case you do not understand the difference between Unicode and String arrays, here is a short paragraph to get you started. A String (str in pre-3.0 Python, bytes/bytearray in Python 3.x+) is a byte-array already bound to a specific character-lookup table (e.g. ASCII, Latin-1, UTF-8, etc.) to find the correct representation for that String. Note that this is not the glyph itself you see on-screen, as this depends on, e.g., what font you are using, and is handled by the GUI toolkit or the terminal. A Unicode array (unicode in pre-3.0, str in 3.x+) on the other hand is an array of "universal" bytes, so-called code-points usually managed as two-byte arrays, but has no native representation. Therefore, to create something readable from an Unicode object, you have to encode its bytes by using a codetable, such as ASCII or UTF-16, to the correct String representation ("bind the Unicode array to a code table"). On the contrary, to create a Unicode array from a String array, you need to decode ("unbind") the String's coding to get the "universal" (in quotation marks as not all programming langues have to use base 16 integers (aka hex, or two bytes)) Unicode. If you are not used to thinking in these terms, a general tip for pre-3.0 Python: your program should, when handling String input (SAX parsers for example already do the conversion for you), convert it to Unicode (decode the Strings), and when outputing your Unicode arrays, convert them back to the desired String representation (encode them) - while working with Unicode internally to avoid bugs and possible exploits. A (rather stupid, but you can interpolate the danger, I hope) snippet from Python's Unicode HOWTO might exemplify this:


def read_file(filename, encoding):
if '/' in filename:
raise ValueError(u"'/' not allowed in filename")
else:
return open(filename.decode(encoding), 'r')


Looks good at first, but what about sending that function a String not in any standard encoding? For example, the UTF-7 encoding for u"/etc/passwd" is "+AC8-etc+AC8-passwd" - a nasty mistake if that file is presented to a user... (the work-around in this trivial example is obvious: just decode before the if-clause - or, even better, when the string enters your program - and compare to u'/'). To summarize, in Python (not so in C, for example!) a Unicode array consists of two-byte elements (base 16 integers) called code-points, Strings are arrays of bytes which are bound to a codetable that helps the Python interpreter look up the bytes' character representations and send them to your terminal or GUI. Unicode to String conversion is called encoding ("binding"), String to Unicode conversion is decoding ("unbinding"). The fact that, when using the Python shell, you see "real" characters for a String or Unicode object is pure convenience and should not distract you from how they truly work internally.

After this lengthy Unicode vs. String intro, the best news first: if you can allow yourself the luxury to program with any Python version and are not dependent on external libraries, Python 3.0 is just made for you: The new native String object is always a Unicode representation, and the default encoding chosen for representing your strings is UTF-8. In other words, if you use Python 3.0 and are happy with UTF-8, you no longer have to worry about decoding your (byte) strings to Unicode arrays or binding your Unicode code-points to the right (byte-) string representations. While this might seem like something that should have been done long ago, for historic reasons older programming languages (plus Python pre-3.0) use ASCII as the default encoding, meaning you had to look after de-/encoding the whole time when working with input/output functionality of your programs and using most other languages other than English - and even there you might want to have special characters (don't be so naïve...). Sad side to this: what I am talking about here is standard in Java...

However, you no longer need to worry with 3.0: First, the totally useless old String object (str) has been removed (to be exact, it could be said it is now "integrated" into the bytes and bytearray objects), including the even more ridiculous "encode" method for old str objects: bytes and bytearray only support a "decode" message (to the new Unicode str objects), while the intended use of str.encode, transforming Byte objects that were represented as str objects in pre-3.0, like zip or base64, now has to be done through a new method called "translate" on the new bytes and bytearray objects in 3.0, or via encode on the new str object. This was a dangerous duck typing strategy to have str.encode in pre-3.0: as Unicode objects can and should have this method, too, but as you could not tell if you were calling encode on a Unicode object or a String object (without something like writing

assert isinstance(my_obj, unicode)

before every call to encode, at least), you could have been decoding Unicode and encoding Strings - and because Python was (yes, was (!) - see below) as "nice" as to do auto-coercion for you, without very thorough testing libraries such a bug could go unnoticed for a long time in pre-3.0. So, my praise to whomever was responsible for that decision!

On the other hand, the unicode object is now the new str object, sans the even more useless and dangerous "decode" functionality: the new (Unicode) str object only supports str.encode (for cases where you want something else than UTF-8), while str.decode is finally dropped from the Python Standarad Library. Obviously, you might have a system that does not want UTF-8, and encoding your Unicode str to whatever schema you need with str.encode the whole time would be a pain; To define a different encoding globally, Python uses your "coding" declaration in the first lines of your program as the default encoding schema for all your new, shiny Unicode str objects. I.e., writing

# -*- coding: funny-arab-dialect -*-

will be enough if you have some strange language sporting glyphs that require characters not found in the Unicode consortium's codetables, or you might want to set it back to ASCII (the default in pre-3.0) if you really need to ensure nothing other than good, old "7-bit" is output by your program. On a side note: UTF-8 is compatible with ASCII, while UTF-16 is not; i.e., an ASCII string encoded using the UTF-8 codetable still gives the right characters, trying this with UTF-16 encoding does not - and a good explanation why we have still not moved to UTF-16 in general.

Finally, the really dangerous auto-coercion of Python between Strings, Unicode representations, and Byte arrays is gone for good. Your message's argument types must now match the receiving object's type and comparisons between the different types always evaluate to false. This last change might sound drastic if seen from a purely rapid prototyping view, but everybody with some intent on not going crazy while programming will greatly appreciate this change. The bugs and exploits stemming from wrong (en/de-) coding, or, let's say, too much duck typing the str and unicode objects in pre-3.0 Python (yeah, I love to put the fault on somebody else...) are finally gone! Also, as all Strings are now represented as Unicode str objects, you no longer need to worry if, while comparing two str objects, they are using the same encoding - which was another fountain of bugs in pre-3.0 Python - as any String is internally managed as universal Unicode.

What is left to say? These changes are dramatic (even if they should have been made already long ago with 2.0), and it will take a while until Python 3.0 will have replaced 2.7 (the final, upcoming stable 2.x release, which will warn you about code that will break with 3.0). But the message should be clear: the effort of converting your libraries to the next generation of Python is more than worth it, and the 2to3 converter should help if you had your encoding/decoding correct. If not, converting to 3.0 might help you uncover some nasty bugs you were not even aware of! Other reasons to "convert" would be:

  • no more longs, which are now ints and unlimited in size (think of what happend when reaching maxint before...),
  • generator/views from most operations formerly returning lists (think: time used for creating and garbage collecting those temporary lists),
  • function annotations for metaclassing and advanced decorators,
  • nonlocal scope (similar to LISPs lexical scope),
  • dictionary comprehensions ("{k: v for k, v in my_dict}") and set literals ("my_set = {1, 2}"),
  • and tons of streamlining the syntax and Standard Library.

Exec Summary:

Python pre-3.0Python post-3.0
strbytes/bytearray
str.encodebytes.translate or (new) str.encode
str.decodebytes.decode
unicodestr
unicode.encodestr.encode
unicode.decoden/a
str("x") == unicode("x")bytes("x") != str("x")


Wednesday, April 1, 2009

Pictures of our Argentina-trip


Houses of la Boca II
Originally uploaded by fnl.es
Here are the pictures of Mayte's and my trip to Argentina. We went to Buenos Aires, Iguazu, and the La Rioja region, which boasts two national parks and several national reserves, a wonderful region. Fotos are now hosted on Flickr, as I do not like that face-recognition stuff from Google coming up and want to keep my private data as decentralized as possible (email and this blog is far more than enough already). Hope you do not mind!

Unlock the Camps in Sri Lanka