Canto RSS

0.6.10

You can grab the 0.6.10 release in tar or OSX portfile format from the download page.

Make HTML parser more resistant to broken HTML
Fix minor exception in c-f thread
Finally make exceptions play nice with ncurses

When it rains, it pours. Just a couple of other fixes, pretty self explanatory. There was a case that caused the HTMLParser to croak and except, but that's been fixed such that it can't happen again and indeed should result in better output for broken HTML anyway. The minor exception in c-f shouldn't have caused any headaches more than some cron output in your email if you use it that way. Lastly, it's become more likely that canto won't kill your terminal if it excepts (although it's not 100%, there are some cases in which it still won't kill curses properly, but I think that the exceptions stemming from those cases should be pretty rare by now).

Anyway, have fun. Report more bugs. Thanks to evg_krsk and cyborg for reporting these.

0.6.9

You can grab the 0.6.9 release in tar or OSX portfile format from the download page.

Fix setup.py generating null bytes in const.py
Add 30 second timeout to canto-fetch
Make selection data persist for hooks / filters
Unset signals before exit (avoid shell garbage)
Set User-Agent to Canto/x.y.z (fixes some 403'd feeds)
Fix multiple c-f subtle corruption bug
Sync docs now that site runs out of git.

I wasn't really planning on putting out another bugfix release, but so it goes. This release is a combination of fixes and minor improvements.

The 30 second timeout was implemented to keep canto-fetch from hanging on non-responsive feeds for a long long time. I think that 30 seconds is long enough since usually we're talking about a handful of kB per feed and even grabbing that over a (now practically non-existent) dial-up line or a torrent saturated broadband line shouldn't take that long.

Unsetting the signals keeps the shell from printing "zsh: alarm canto" after you exit sometimes.

The User-Agent addition has been a long time coming, especially since it's essentially a one-liner to add. I just never had any reason to mess with it since converting canto to feedparser. Then I discovered that urllib2 is discriminated against by a number of sites, particularly Wikipedia, so I changed it to "Canto/x.y.z"

The last thing I'll mention is the subtle corruption bug. 99% of you would probably never run into it. It took me running 12 simultaneous synchronized canto-fetch daemons trying to update every one of my feeds every minute to get the bug to regularly reproduce, but if you've ever had Canto magically forget an entire feed's worth of item state, this was probably the culprit. However, I will say that if you're running multiple canto-fetch daemons at a time over the same feed directory, you might want to stop that. Not because it won't work (because it should and does) but just because it's a waste. This is why I initially designed canto-fetch to be run as a cron-job and not a daemon.

Now, before you ask why oh why I'm doing homebrew Python disk storage when I could be using sqlite or something, let me explain. The first thing is that cPickle over a real database gives me Python objects right off of the bat, no populating, no encoding changes, I can literally read and write the feed state to disk with a single call to the cPickle module after I get a lock on the file. There is no restructuring of the data each time. I imagine this could be achieved by storing the pickled data in the sqlite database, but at that point you're just using sqlite to handle locking for you. The second thing is that there is no advanced querying going on. Every operation would get every item from a feed (table). If I wanted to use a SQL-like query language for filters, etc. this would be a different story. Lastly, I just don't fucking like SQL.

Anyway, now that that's straightened out, have fun! I'm back to working on new stuff, rather than fixing old stuff.

Git Awareness

Once again, in coordination with a major release, I've recoded the site. This time to add more git awareness. Since I already set it up to render out of git, the rest of the site now resides in a git repository, rather than a sqlite3 database. This is similar to the engine I wrote for my blog, although I'm still using Django as my glue, rather than my hand-crafted mod_python handler.

I'm curious about the performance of the site, but I'm also fairly certain that drawing on a git repo is only marginally slower than a sqlite3 file, especially when you're pretty low traffic anyway. The amount of content be hosted is also really small.

The advantage of the new system is that the site can be updated from the canto source and as such the documentation in the source and on the site can never be out of sync. Also, in the future, I'd like to make the site more branch aware, so that the current version of the documentation for the master and experimental branches just by changing the URL.

Documentation and content useful to be distributed with the source are taken from the canto source proper. So stuff like configuration and styling information are rendered straight from the master branch.

Other content, like news and download information is taken from a separate, new repository just for the site. This includes all of the combined git / django code.

In addition, the screens page has been taken down temporarily because I'd like to take more selective shots of a more recent release (I hope to do this next). Oh, and of course, I had to slap a new coat of paint over the whole site. I think this one's a little easier on the eyes.

Have fun, report bugs!

Materializing features and some status.

0.7.0 Work

I've been doing some work in git (although less, now that I have real work at real work) towards the features for 0.7.0. The first thing that's shown major progress is the new interface_draw hooks. These provide a much simpler way to add content to the interface (reader or gui) than the previous class inheritance method. For example, I moved over the slashdot department information renderer from canto/extra.py.

In < 0.7.0 you had to do this to get the information:

class slashdot_renderer(interface_draw.Renderer):
    def reader_head(self, dict):
        title = self.do_regex(dict["story"]["title"], self.story_rgx)
        return [(u"%1%B" + title, u" ", u" "),\
                (u"%bfrom the " + dict["story"]["slash_department"] +\
                u" department%B", u" ", u" "),(u"┌",u"─",u"┐%C")]

default_renderer(slashdot_renderer())

Which is a total pain in the ass for a number of reasons. First, if you ever wrote two snippets to add content to the reader_head function, then you'd have to really work hard to try and combine the content. So far this hasn't been an issue, but I believe that's because people have no idea how to do this, or it requires too much Python, or people just don't know what's in their feeds (more on this later). Second, if you applied this custom renderer to just a feed or two, this is a needless waste of memory because two Renderers (with their boilerplate code) have been instantiated when you make such a simple change. Lastly, the changes that are made are full of very cryptic garbage that the user doesn't care about 99% of the time.

So, the new syntax is much easier. Instead of editing the format that is convenient to the renderer, you can edit things in either HTML or plaintext. The above can be achieved with the following, much clearer bit of code.

# Define a hook
def add_slash_dept(dict):
    if "slash_department" in dict["story"]:
        dict["content"] = "%1from the " + dict["story"]["slash_department"]\
            + " department%0<br /><br />" + dict["content"]

# Register hooks
r = get_default_renderer()
...
add_hook_pre_reader(r, add_slash_dept, before="reader_convert_html")
...

This is still subject to change, but so far I'm happy with the conventions. Each hook gets a dict with the current state of the data being worked on, particularly dict["content"]. The hook makes it's changes and exits and then the next hook is called, makes its changes and so on. The default renderer comes with a number of basic hooks already included, like a hook to render HTML into normal text. This example is added to the pre_reader hooks before "reader_convert_html" so that it's content is rendered as HTML as well (notice the br tags in the hook). Changing data before conversion makes it simple to add things like links, or formatting like block quotes and strong tags.

This will obviously need to be well documented, but the bottom line is that it's 10x simpler to just add strings together than fuck around with the previous format and it's unnecessary Unicode handling, and undocumented functions.

The above code actually works with the HEAD of the experimental-idraw branch of git.

canto-inspect

As I mentioned above, one of the problems with previous versions of Canto and, in fact, all feed readers, is that's it's hard to take advantage of the great, non-standard content that's in them. For example, Slashdot's got extra content, Digg has all sorts of extras, like getting the current number of digs a particular item has, how many comments. I'm sure there are many more.

This is why 0.7.0 is now going to contain a third binary canto-inspect that will be able to either use feed data already downloaded for canto, or just grab a URL with feedparser and print out the interesting contents, to let users find interesting data that's in their RSS feeds and help them write an interface hook that will exploit that information.

More status

In other news, the proposed speed-ups for larger lists of items has been temporarily put on hold until I can figure out why the code is as fast as it is. I implemented some work on speeding up the rendering process, but after putting some hours in on the "new and improved" code, the profiler says it's actually 5% slower. Which is ridiculous, but I think it may mean that Python is doing something intelligent in the background, or that the rendering task is much more IO bound than I believed. It could also be just that I didn't stress either codebase hard enough. Before 0.7.0 I will figure this out.

Please, feel free to discuss these topics on the ML, I'd love to get some user feedback on the new style syntax.

In the meantime, have fun!

Lists migrated.

I finally got around to setting up Mailman on the server, and migrated canto-reader and canto-reader-announce to it. The Google groups are now a read-only mirror, but if you were subscribed to either of those lists, then you've automatically been subscribed to the new list.

The new canto-reader is here

And the new canto-reader-announce is here