Monday, February 28, 2005


Mr. Ed published another article on his site. This one is called Wikiphilia and tries to present Wikis as an inadequate tool for most (all?) application it has been put to. Or at least that is how I understood him. He writes controversial articles and (IMO, again) wrote nothing particularly convincing since a couple of earlier good ones.

The "wikiphilia illness" goes too far as well. While I agree that Wikis (like any other tool) can be misused or used to procrastinate I don't buy his general criticism. To me, it mostly sounds like a projection of his own (or made-up) phobias and behavioral models onto Wiki's users. A quote:
Whilst anyone can add their opinion to a Wiki page, anyone can come along and remove it. Therefore it is vacuous to claim that a Wiki affords equal opportunity for all users to express their opinion, when it allows any user to suppress that opinion through simple deletion. To ensure that their contribution persists, a user has to watch over their content, re-inserting it after anyone else has deleted it. Some users resort to writing bots for this purpose.
I'd say this is highly atypical (and unethical) pattern that the user constantly reinserts his own statement which were deleted by the community (for a reason, I suppose). Or even write bots (sic!) for this purpose. Of course, a malicious one may do so but that's a different matter.

I don't find it compelling to argue with the rest of the article, interested readers may read it themselves and made their own conclusions.

Python Grimoire Wiki

Just stumbled upon an impressive Python Grimoire document. Woudn't it be great to have this content available as a Wiki, to ease maintaining it and expanding into some kind of uber-knowledge base for pythonprogrammers?

I've been looking for this kind of repository since I've appreciated the analogue in a Tcl world. It's just great and help tremendously.

Yes, I know about ActiveState's cookbook but this imho a different thing.

update: Fix formatting issues and add missing link
update: There is an effort-in-progress to convert Grimoire to a Wiki. See this.

Friday, February 25, 2005

The good, the bad and the ugly

disclaimer: I am living in a developing country and I am working for oversees customers so keep this in mind while reading the following. This post is not about advocating offshore development model either.

A bit too often for my taste, I encounter on the Web the statement that "third-world" developers are inferior to that of USA or, say, Germany. Just to illustrate my point, here is a quote from a recent thread on JoS business forum:

I find (on average) that off-shore coders know one or two programming languages and how to implement things in them specifically. I find (on average) that on-shore coders [ that get disgusted with and leave RAC] are the kind that are multi-talented and flexible due to this.

I see little reason trying to fight this legend and it's sort of natural to see this point of view emerging. It's convenient for those who feel resentful by this trend and gives some kind of moral satisfaction. People in general are lazy and our mind usually tries to absorb ideas in a way that doesn't alter an existing picture of the world much. And so it sticks.

Plus, there are always plenty of people who tell the horrors about how bad the source code was that these "code monkeys" produce. And if it's not bad then it must stolen from some GPL product googled off the web. Rendering the developers not simply unprofessional but practically unlawful.

And yes, there there are indeed plenty of supportive cases against the "code monkeys" so calling this a legend is probably a stretch. But a simplification it is. And yet, to throw in some anecdotic kind of proof, the worst code base I've dealt with was handled to us from "on-shore" and the best programmers I have the luck to work with were sitting next to me.

To repeat, I'm not to defend offshore model in general. It is inefficient and currently viable only because of the peculiarities of the world's economic and drastic changes that were made in the last 50 years. But people are just that - people. There are cultural, historical, economical and other differences, but they are minor and has little inherent impact on one's professional abilities. Of course, these differences do impact the opportunities one may have to realize himself.


Thursday, February 24, 2005

Blog of the week

Added Mike Spille to my blogroll. A couple of highlighting posts: Don't Let Yourself Get Unitized and Prevayler: There really must be one born every minute. Amazing reading.

Unfortunately, his full-text feed is badly formatted, so end up subscribing to excerpts.

Monday, February 21, 2005

announcing ua-devtalk

Just in case I've been reading by someone from Ukraine...

I invite you to join a ua-devtalk google group. It's an attempt to gather and organize ukrainian IT professionals community.

Project postmortem

As my project reached 1.0 milestone I decided to run a postmortem. This blog post contains some parts of the lengthy postmortem document I actually wrote. I excluded project details/conclusions to protect my IP (so to say ;-)) from potential competitors but kept sections which casual reader may find useful.

Btw, if you're interested in postmortems in general, here are some links I found myself useful:
And a couple of seemingly interesting books (haven't had a chance to read them
I'm still not sure whether I should have published this, but I'm driven mainly by curiosity - we'll see if this will lead to anything.

So, let's end with the foreword, here it goes.

List of known problems

Overall, there were a few bugs discovered during and after the deployment. Partly this is because the system is new, small, freshly developed and thus completely understood. Partly (hope so) because I adhered to test-heavy development process.
  • Database connection stuck. This was most severe one: site was dysfunctional for several days before I noticed the problem. Luckily, quick fix was trivial - just have to restart an AppServer. The problem was with SQLObject's handling of db connections - if the db was restarted any further attempts to call a db operation lead to OperationalError exception. Unfortunately, I didn't (yet) find a solution to fix the root cause.
  • HTML mangling in XSL transformations. Another problem (kind of annoyance, really) is related to two-step XSL transformation algorithm employed while rendering web pages. Second step was folding a pair of empty tags into single empty one, e.g. <p> <p> was folded into <p/>. This turned to be a big problem for some HTML tags as it caused the page to rendered incorrectly. The problematic tags I had to "fix" were TEXTAREA, TABLE, TD. I failed to fix the
    root cause again, and ended with inserting non-breakable space symbol to prevent XSL for doing the fold.
That's it for a list of known problems.

What went right

Here is a list of thing that I think I done right or had the luck to have it right:
  • Development platform/tools. I was already a rather experienced Python programmer and had used Apache/WebKit in several projects before. It didn't reveal any bad surprises during development and I feel this was a solid choice. The only new library I used for this project was SQLObject to handle object's persistence task. While I had some minor problems and one major one with it certainly was a net win for me and saved considerable amount of time.
  • Software quality. There were very few bugs discovered during and after the deployment. I attribute this (partly) to a test-heavy development process: the unit-test ratio coverage was at constantly at about 95%.
  • Good architecture. The system is organized in a simple, solid and straightforward way, providing a good starting point for further evolution. All major interfaces with third-party code, such as db persistence, web application server, XML toolkits
    are properly abstracted. Of course, there are some skeletons buried here and there but overall I'm quite satisfied with it.
  • Project management tool. Following Joel advice, I decided on Excel as my project management tool. It worked out nice. Not that it went without any friction, but good enough and with minimal overhead. I explored some other alternatives, such as: Microsoft Project, plain-text file with outline mode (in Vim) or some Palm-based project tool. But in retrospection, Excel seems like a right choice.
  • Project management process. Being a business head / programming team in a single person meant I could judge very accurately what should be done next. In previous projects, lacking some important details from business context, we (a development team) often found ourselves spent a lot of time and efforts on things that were not,
    strictly speaking, very important or sometimes even needed at all. The ability to guide development from the business perspective was very satisfying. I didn't understood this at first but in the end really appreciated it. The trick is to not getting charmed by technical problems but to concentrate on users and their needs.
  • Hosting provider. Being a web-project newbie, I had to figure out things like hosting, domain name registrations, etc. Luckily, my choice of hosting provider was solid and I have no problems with it.
  • Credit card processing. Certainly it made little sense to shot for a full card-processing routing on-site so I had to select a third-party vendor that would do this for me. I selected 2CheckOut and it was OK for me. At least, it had successfully processed a number of sales for me with relatively easy setup.

What went wrong

  • Business opportunity evaluation. It's quite possible that I shouldn't blame my lack of marketing skills for too much. There is a valid chance that the potential market is so small that no amount of marketing efforts would lead to a meaningful results. I had to think on project's feasibility harder.
  • Grossly overengineered solution. An awful lot of time (initially) was spent on things I didn't really need (yes, that YAGNI again). A remarkable example been a sophisticated, XSL-based, multilingual site generator, which produced all site's static pages in all supported languages. I ended up with a fully dynamic site and moreover, uni-language. Hence the effort was an (almost) complete waste of time. In a retrospect, I could have reduced dev time in about 5 times and still deliver a functional solution.
  • Marketing strategy. Well, I probably should say it was absent. Mind you, I have an idea but after it failed to bring expected results I
    felt short.
  • Site design. Being a one-man project and lacking good design skills I ended up with a poorly-designed (even for my taste) site. Moreover, without proficiency in web frontpage technologies (like CSS) countless number of times I found myself in a situation where I knew what I'd like the page to look like but don't know how to implement this.
  • Credit card processing. I did mention this in a what went right section and now I mention it here. While 2CheckOut worked OK for me it is not that great choice for several reasons, esp. compared with another one, like iKobo (which I didn't know about at the time). The weak points of 2CheckOut for me were: up-front setup fee, relatively big per-transaction fee and very cumbersome and expensive way to transfer money from 2CheckOut account if you're outside USA/Canada (like me).
That's it.

While I intended to list at least ten things I don't know what else should I mention. The above items were a real problems and other difficulties I had are minor in comparison.

Monday, February 14, 2005

Concurrent Java books

Recently I has been involved in a Java project and decided to refresh my multithread-programming skills and, more importantly, to grok how it gets done in Java.

I'm currently reading Concurrent and Real-Time Programming in Java by Andy Wellings. While the crux of the book is real-time system programming, it devotes about a third to the generally applicable concurrency topics and how they are exploited in Java. Plus, it even covers concurrency-related changes introduced in Java 1.5.

This is nice, especially compared to the latest edition of the classic Concurrent Programming in Java: Design Principles and Patterns, published in ... 1999! There are rumours that a third edition is underway, but until then this book could serve as a good, modern and concise introduction to concurrent programming in Java.

A pressure for concurrency

Herb Sutter published an article called The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. It's a fascinating reading in which he argues we (the programmers) are now facing the next major paradigm shift since introduction of object-oriented programming. The reason?
Applications will increasingly need to be concurrent if they want to fully exploit CPU throughput gains.
And this is caused by the new hardware trend towards multicore systems, in contrast with CPU clock speed wars we had till recently.

IMO, a need for concurrent progaramming would impact mostly the "high" end of the programmers spectrum: system programming (OS, RDBMS, Web servers, game engines, etc.) and library authors. For many (most?) end-user application the execution speed is not crucial for a long time.

PS: It will be interesting to watch out how tooling vendors react to this trend. For instance, CPython has an infamous GIL which looks like a real obstacle for Python programs on multicore systems.

Friday, February 11, 2005

Java may not be that bad after all

There is a popular belief (which I'm not going to argue with) that programmer's productivity in Java is "inferior" to that in more dynamic languages, like Python.

That's OK but Python seems "inferior" to Java when it comes to innovations (in the environment, not in the language itself). There are countless (ok, shall I say many?) python packages which started as a port of some Java library. Those that come to mind first: unittest and logging packages, which are now even part of the Python standard library. Other examples: PyContainer, PyFIT, Cheetah, you name it. To some degree, larger frameworks often borrow as well: WebKit (servlet, request/response architecture), pyworks (similar to webworks), probably others.

Sometimes this pale imitations evolve into more pythonic things, sometimes it got replaced by built-from-scratch better alternatives (like, say, py.test). Of course, there are genuine python packages which shine even compared with the brightest Java counterparts (Twisted comes to mind first), but these are comparatively few.

The ideas' circulation between languages/platforms is a good thing and there is nothing wrong with it. Still, why Python-Java relationship are so asymmetric? Is it just reflects the difference in size of the respective communities? Is it something else here? Or are my perception skewed?

Sunday, February 06, 2005

Efficient XML processing in Python

In a recent article Decomposition, Process, Recomposition on, Uche Ogbuji talks about strategies for processing large XML documents in Python. This led me to recall my own experience with processing of multimegabytes XML documents.

In general, there are at least three popular strategies to deal with XML in Python: SAX-based, DOM-based and, er, shall I say, pythonic. SAX realizes a speedy and memory-light approach but which shifts the burden of keeping a processing context onto programmer. DOM gives cross-language, standard, verbose and memory-hungry strategy which scale poorly for large documents. By pythonic I meant a range of libraries, like gnosis.xml.objectify, ElementTree, xmltramp which share a common attitude to provide a nice, Python-friendly API.

I used ElementTree (and recently, it's new, C-based re-incarnation) library. The task was as follows: given a large (~ 10Mb) XML document of some complex structure the program had to significantly "extend" it with new information and write to a new file. This new data was partially specified using the external sources and partially computed from the document itself, according to a bunch of arcane rules. Computation involved several traversals along the document structure to gather needed information.

The program (and transformation) itself was just a single link in a lengthy chain of transformations. There were other parts, written as XSLT procedures and DOM-based Javascripts that used to perform other

A decompose, process, recompose approach, outlined in the article was realized by the ElementTree itself, my task was only to provide necessarily processing steps. In contrast with SAX, ElementTree builds an accurate in-memory presentation of the entire XML document, but unlike DOM, it's memory footprint grows much more slowly with the size of document, thanks to efficient representation. This gives the best of both worlds: versatile and convenient presentation model with good scalability for large documents.

My only major complaint was the lack of XPath support. While ElementTree does offer some very limited support for basic XPath expression it goes a long way towards a full-blown implementation. Thus I was forced to write a bunch of custom finders where a single XPath would do the trick. Luckily, it's pretty straightforward to write them.