computer programming

You are currently browsing the archive for the computer programming category.

Over the last month, I’ve been learning a lot of things and stuff by participating at stackoverflow.com daily, and I am slowly growing obsessed with it. The ‘help/learn/repeat’ cycle that they have is pretty well optimized for my learning style.

I’ve even had decent luck recommending it to other people; more and more of my relevant search results for dev-relevant things were appearing there, so eventually I signed up and I’m happy I did.

Tags: , ,

So, I’ve based a couple of apps on a combination of authlogic and the iPhone Objective Resource package, and when I mentioned this on the relevant lists I got a bunch of people asking me questions, so I thought I’d post about what I did generally.

There are a couple of steps, most of which are geared towards having authlogic not produce redirects and instead produce http status codes that are interpretable to the applications. Here’s an example of what I mean from my ApplicationController on an app

  def require_user
    unless current_user
      store_location
      flash[:notice] = "You must be logged in to access this page"
      respond_to do |format|
        format.html { redirect_to new_user_session_url }
        format.xml  { render :text => "you must be logged in to access this page.", :status => :unauthorized }
      end
 
      return false
    end
  end

In the default implementation of this filter, it always returns a redirect. ObjectiveResource can follow the redirect (if you have a recent enough version), but it doesn’t really do it a lot of good. This way, you get a status code that you can use.

Since Authlogic focuses on RESTful creation of things, a lot of operations map naturally. To create an account on the Rails app, create a User object, and to test whether the user can log in, create a user session. My application stores credentials between iPhone App invocations, so it creates a UserSession at startup if you have stored credentials.

- (void)applicationDidFinishLaunching:(UIApplication *)application {
//
// ... blah blah blah all sorts of stuff removed.
//
 
	UserPrefsManager* prefs = [UserPrefsManager sharedInstance];
 
	if ([prefs isLoggedIn]) {
		UserSession * us = [[[UserSession alloc] init] autorelease];
 
		us.login = [prefs username];
		us.password = [prefs password];
 
		NSError * err = nil;
 
		if(![us createRemoteWithResponse:&err])
		{
//
//  ... my actual detection stuff removed and just the default left ...
//
			[AlertHelper showAlertWithError:err];
			[prefs clearLogin];
		}
	}
	[ObjectiveResourceConfig setUser:[prefs username]];
	[ObjectiveResourceConfig setPassword:[prefs password]];
//
// ... and proceed on with your app's normal course of things ...
//	
    [window addSubview:tabBarController.view];
}

You see here that i’m creating an UserSession here and testing to see if it works and setting the ObjectiveResourceConfig properties if it does. I have some logic where if there isn’t saved credentials you will get a dialog asking you to login or create an account (or whatever) later on, which is the point of [prefs clearLogin];. This is matched by a change to the create method of the UserSessionController:

      respond_to do | format |
        format.html do
          flash[:notice] = "Login successful!"
          redirect_back_or_default account_url
        end
        format.xml do
          render :xml => @user_session, :status => :created
        end
      end

(For brevity, I’ve just shown the successful branch)

This should be enough to get people going — all the other changes you have to make can be figured out from these two statements, and when I have some spare time I’m going to create a version of the demo application that works with ObjectiveResource. The only other suggestion I have is to use the withResponse: versions of everything and code defensively, and note that in some versions of ObjectiveResource getting an error from your application will cause the application to throw. In the meantime, feel free to ask me any questions

Tags: , , , , , ,

Hey folks, I’ve been working on a couple iPhone applications recently, and things are at the point where I could use a couple beta users. If you’re interested in any of these, let me know by commenting or emailing me — email temp200911@corprew.org. You can also use the contact page at corprew.org.

  • I’m looking for some users for App A, and it would help if you lived in the Seattle or Portland area and had Celiac disease for this one to be helpful for you (and for you to provide useful data for me.)
  • I’m looking for some users for App B, mostly for people who travel a lot. If you’re traveling by air this holiday season, you’re welcome to give it a shot.
  • I’m also testing a game for the iPhone. For this, it would be handy if you lived in the Capitol Hill region of Seattle and liked fun. As strange as it seems, some people don’t like fun. This should be fun.

All of these are useful and/or fun. Android versions will be coming relatively soon after the iPhone versions, ideally.

I’m making this post to elucidate some conversations I had late night last night, none of this is particularly rocket science or necessarily even model rocket science. One hilarious thing that keeps coming up is federating search — combining search results from multiple datastores, which is a moderately hard problem to come up with a general solution for, but relatively easy (frequently) to come up with a solution for a particular purpose.

Complicated general solutions (such as that found in GeoNames for a lot of content information, but it doesn’t federate that with other results and uses a couple of other data sources that aren’t relevant to this.)

Here’s the (relatively trivial) code that does ‘content’ (normally: fulltext but in content management systems called ‘content search’) searching.

module Contentsearch
  module ClassMethods; end
  def self.included(klass)
    klass.extend(ClassMethods)
  end
 
  def ft_index
    logger.debug("[contentsearch::ftindex] submitting #{self.id} #{self.name}")
    Bj.submit "./script/runner jobs/add_to_consearch.rb -t #{self.class.name.downcase} -i #{self.id}"
  end
 
  def ft_deindex
    logger.debug("[contentsearch::ftdeindex] removing #{self.id} #{self.name}")
    Bj.submit "./script/runner jobs/remove_from_consearch.rb -t #{self.class.name.downcase} -i #{self.id}"
  end
 
  module ClassMethods
    def ft_search(kw_string)
      clsname = self.name.downcase + "s"
      return self.find_by_sql(["select distinct #{clsname}.id, lots of stuff i deleted here, MATCH(content) against (?) as relevance FROM #{clsname},consearches WHERE consearches.ftable_id = #{clsname}.id and consearches.ftable_type='#{self.name}' and match(content) against (?) order by relevance limit 6",kw_string,kw_string])
    end
  end
end

This example is written in Ruby (on Rails), and the first part is just a convention for putting class methods into a ruby class. How Ruby (and smalltalk and similar languages) handle methods is a fascinating but different discussion but essentially it’s a metaprogramming party and everyone’s invited.

Ruby makes this relatively simple to add as a module to pretty much any class. The reason that ft_index and ft_deindex run in a background process is because taking a document in or out of a fulltext indexed mysql database is way slower than you would want to present to the user in an interactive process. This is common in web applications and is part of why you see things like “your [whatever] may not appear in searches right away” from a lot of applications. If you leave them to run on their own they’re fast enough but generally would make the user unhappy.

But basically, what’s going on here is that there’s two separate tables (and different table types in mysql — one of which does fulltext searching and the other of which has ACID properties.) By joining these two tables together, you can search against the content tables and get results back from the main table that stores domain objects. This is probably the simplest version of federating two different search types together. It works pretty smoothly and this sort of thing is in a number of different products.

(And for rails people, the relevant string in the model classes for this is has_one :consearch, :as => :ftable)

But this is obviously trivially simple: the objects in the content search table are representations of the objects in the main table, and there are entirely separate semantics between the two tables (and unfortunately i deleted the examples using both the main and consearch table, but it’s a join and you get the idea.) One of the tables does ‘field operator value’ type searching (ie: relational) and the other is the kind referred to these days as ‘google searching.’

Things get progressively more difficult when one of these things aren’t true — that there isn’t a store that has the single unified version of the document or that the semantics are related but not either identical or entirely different. For example, if I’m searching two different instantiations of my own product, it’s fairly easy — all the fields mean the same thing between the two different databases.

If the products differ by schema or meaning of the schema, you have to make a (semantic) translation between the two to make the search work, and also you have to make some sort of translation on the search results to have the results displayed to the user in a way that makes sense. This might be as simple as ‘one repository has names and one things have titles’ or it might be more complex (names versus ids, names in particular formats, URLs versus descriptive strings, date formats that give seconds versus those that are accurate to the day, etc…)

It’s when you start combining these sorts of things that stuff starts getting more complex. (This is also leaving aside the issue that the protocols to access all of this information is different (although these days more and more of this becomes an adventure in XML parsing and not DLL hell.) Let’s take a simple example, sorting.

Say I have three different datastores Repo1 – Repo3, and they both return objects with titles on them, and I’m sorting the titles:

Repo1: [Alpha, Bravo, Charlie, Delta, Echo, Foxtrot ]
Repo2: [Able, Baker, Charlie, Dog, Easy, Fox]
Repo3: [Alligator, Crocodile, Pterosaur]

It’s fairly easy to implement this sort, there are a few small issues (like paging results versus the page sizes of the underlying repositories, but regardless alphabetic sorts are well-understood in most locales.

However, if you’re searching by something more like ‘relevance‘, you get back a number associated with each result (so the document that had the ‘Alpha’ before might have a score of ‘0.91′). It’s simple to order numbers as well, but how do you tell that a number in one datastore corresponds to a number in another? For one thing, those numbers are calculated (mostly) with regards to the particular collection of documents on a given datastore and for another, one repository may just tend to return a higher number for documents that are theoretically as relevant (because there isn’t any agreement about what 0.91 means, it’s just what a function returns.)

So those two things are where it starts to get more complex and needing actual customization and specialization.

In conclusion, this is way too (f) long for the blog, but I was typing it up to explain things that I was talking about yesterday anyway. HTH. Feel free to comment, but if you’re the person I’m proximately writing this for, you should probably send email.

Tags: , , , , , , , , ,

So, for the last while I’ve been working on celiaq.com. It’s a social network for people with Celiac disease and other forms of Gluten Intolerance. It’s designed around the resources that that community has trouble getting reliable information about on the internet, and I think that it’s coming along pretty well.

Right now, it’s looking like that will be done and in a somewhat stable state by 7/15. It’s based on rails and hosted at slicehost right now, although I suspect that it will be moving to Amazon EC2, CloudFront and S3 as it scales to make things easier. It’s a rails/mysql application for the most past, although it would probably run on other db backends.

It’s been an interesting time taking this entire application from vision to deployment, and it’s been a good time. Currently it’s in a beta state, and is soft launched to let interested people enter resources and test the system.

Tags: , , , ,

This is more or less a good guide to what I am up to at the moment, although it should be noted that I’ve written this same code fragment in three languages in the last while (this is ruby, the others were PHP and Java, although the Java one was a festival of reflection due to type wackiness.)

I actually have another version of the same code that puts a URL in the debug log that can be used to click directly to google maps. Why? I don’t know. I’m beginning to value Aptana Studio’s remix of Eclipse more and more as time goes on though because I now have Java, PHP, and Ruby/Rails all in the same highly (mostly) performing IDE. The alleged iPhone mode doesn’t work on my computer but I have CRAZY LIBERRIES installed at the moment and I suspect that that’s in large part my own fault — the apple tools still work.

  def geo_desc ( geo_loc, extended = false)
     #
     #  specialized pretty printer for address types.
     #  note that there is pretty much a standard mixin for geo stuff and
     #  this works across all the geocoding packages and model types.
     #
     return "[nil location]" if geo_loc.nil?
     desc = "[" 
     desc < < geo_loc.country_code.downcase unless geo_loc.country_code.nil?
     desc << "." + geo_loc.state.downcase unless geo_loc.state.nil?
     desc << "." + geo_loc.city.downcase unless geo_loc.city.nil?
     desc << "." + geo_loc.zip unless geo_loc.zip.nil?
     desc << "] "
     desc << "["
     desc << geo_loc.lat.to_s unless geo_loc.lat.nil?
     desc << ","
     desc << geo_loc.lng.to_s unless geo_loc.lng.nil?
     unless geo_loc.precision.nil? or geo_loc.precision == "unknown"
       desc << " (" + geo_loc.precision + ")" 
     else
       desc << " ?"
     end
     desc << "]"
     if extended
       desc << " " + geo_loc.full_address unless geo_loc.full_address.nil?
     end
     return desc
  end

Tags: , , , , , , ,

Drupal 5 has a few problems in its security layer, as I’ve mentioned other places, and some of them stem from the sort of ‘it-works-for-me’ philosophy of open source. This is particularly a problem in a complex system like Drupal, which in most installations is made up of a few dozen modules in addition to the core.

The current issue I’m having is that nodes created by the aggregation module get their taxonomy stripped when they’re updated because of how another module uses the security functionality, which is just hilarious in a site that’s largely organized organically by taxonomy. So, after talking with the people I’m working for on the site, I ended up creating a simple PHP script to run through cron that fixes the issues ‘the hard way.’

If you check out this query…

function fix_object($name, $sqlcon)
{
  $query = "SELECT term_data.name name, term_data.tid termid, node.nid nodeid, node.title title FROM node LEFT JOIN term_node  ON ( term_node.nid = node.nid ) LEFT JOIN term_data ON ( term_data.tid = term_node.tid ) WHERE node.type = 'aggregation_item ' AND node.title LIKE 'Xxxxx " . $name . "%'";
 
  // Perform Query
  $result = mysql_query($query);
 // ... and so on...

You can see that this is a fairly normal sql query that looks for all the nodes of type aggregation_item and titled a particular pattern. Because of the way the joins are structured, that means that any nodes that have lost their taxonomies will have NULL for termname and termid. Those nodeids with NULL termids can then have the proper taxonomy entries stuffed back into them…

function insert_taxo_4_node($node_id, $taxo_id, $con)
{
  $query = "INSERT INTO term_node (nid, tid) VALUES (". $node_id . "," . $taxo_id . ")";
 
  $result = mysql_query($query);
  // Check result
  // This shows the actual query sent to MySQL, and the error. Useful for debugging.
  if (!$result) 
    {
      $message  = 'Invalid query: ' . mysql_error() . "\n";
      $message .= 'Whole query: ' . $query;
      die($message);
    }
}

I’m largely posting this up in case people run into the same problem — this is a hilariously simple fix for a difficult to fix problem in drupal, but it’s a generic information architecture issue of what to do when the system that you’re working on is unreliable. I should probably mention that the issues with security in drupal aren’t related to authentication, but instead are related to item ACLs denying access to things for strange reasons, and are not crucial security bugs in the OMG MUST PATCH NOW sense.

Tags: , , , , , , , , , , , ,

There have been a lot of people asking angry questions to Apple today because the Apple β that they gave out to iPhone developers was timed to expire today and a lot of devs now have bricked their main mobile phone until an update appears. Lots of people appear angry, but they’re missing the main issue for Apple:

Dear Apple, why are you letting people this stupid into your β programs

People frequently forget what beta for software means in these days where everything is β until people find a way to make money off of it. It means untested, believed working properly but may blow up at any time, not ready for production. So, I’m halfway between bemused and annoyed at the outrage that some folks seem to be fielding on various fora.

Also, calling a phone ‘bricked’ when you can easily recover it by downloading new software hours later is hitting the epistemological puff pastry with a hammer.

Tags: , , ,

My first access to a unix machine was around 19 years ago, and I’m still amazined that sudo tcsh is a valid command on most systems.

I’m not saying that it isn’t convenient, mind you, but the fact that I can then execute emacs is also hilarious. Especially because sudo emacs is prohibited.

here is your system log, let me save you the trouble of auditing it by running a shell.

(I’m aware, incidentally, that it’s basically impossible to stop people from running a shell as long as they can run any naive-turing-complete interpreter or compiler. Maybe it’s time to only fight battles you can win.)

Tags: , , ,

note that the latest revision of this blog’s theme seems to have introduced a weird bug with the code layout plugin (wp-syntax) on some browsers. i’m looking into it.

I think the single most useful thing I’ve figured out recently in programming for MacOSX and the iPhone is this little snippet right here.

- (void) updateListForEntityNamed:(NSString*) entityName andSearchString:(NSString*) queryString
{
[...]
 
	MyDocument* current = [[NSDocumentController sharedDocumentController] currentDocument];
	if(current && current != self)
	{
		NSLog(@"CurrentDocument:%@ != self:%@", current, self);
		[current updateListForEntityNamed: entityName andSearchString: queryString];
		return;
	}
[...]
}

What this does is intercept incoming messages that are supposed to go to the current window, and redirect them to that instance. I’ve run into issues in Leopard (MacOSX 10.5) where this is an issue. To some extent, this is probably a misconfiguration in interface builder somewhere, but it also an issue when using CoreData, because the ManagedObjectContexts are particular to instances of NSManagedDocument, and there are issues that arise if you end up using the wrong context.

I am slowly becoming a great fan of CoreData, it’s a great persistence/object-graph-management layer. More on this later.

Tags: , , , , , , , , ,

For the last while, I’ve been working on a project that involves scanning large numbers of RSS/Atom feeds, and then using Bayesian1 classifiers to break it into one of a number of categories for summarization and display (the system that I’m using to do this is available as a sample website, but really needs more data in the training sets before it’s ready to entertain all of you.) The categories are pretty straightforward, and they fit into a somewhat neat controlled vocabulary (ontology/thesaurus/whatever.)

There’s a relation, though, between the different terms in this sort of classification and the training data used to build the Bayesian Classifier. If the terms are arranged in a hierarchy (and certain assumptions are made about that hierarchy, like subterms encompassing part of the range of meaning of their parent term and nothing else)2, then the training data used for classifying terms can be shared.

For example, all positive training data that belongs to the child terms can also be used for the parent. So, for (a constructed) example, positive training data for tamiflu also belongs in the positive data for bird flu vaccines. The reverse is true of negative training data. For negative data, the negative data for the parent can also be used for the child terms.

This is highly useful information when you’re making a large scale text classifier (and having it classify texts as belonging to categories or not, as opposed to just clustering texts into the categories that actually appear. It’s easier to use things like bayesian classifiers do to this if you’re looking for somewhat fine-grained detail.

Currently, I’ve been using Classifier4J for doing the classification and text summarization3. The text summarization is sort of annoying, though, because it’s based on a simple statistical choice of sentences which occasionally picks up date-lines and partial phrases because of what’s ‘important.’ I’m resorting the urge to go completely POS-tagging nuts on the whole thing and only selecting sentences of certain types or completeness because this is, after all, a side project. (The number of times I see things like ‘this sentence no verb.’ is astounding, though, and slowly driving me nuts.)

So, another day in the life.

1 although i’m also using a vector space classifier for a related, larger project and it’s driving me less nuts training it.
2 this is called a meronymous (’part-of’) relationship, and given that half the people who regularly read this blog were in LIS530 or its equivalent at some point, you should remember this.
3 and will probably eventually switch to jNBC http://jbnc.sourceforge.net/ before i go nuts

So, I was talking to someone today about their application (which was Ruby on Rails-based), and we had a long conversation about locking. There’s a couple of different sorts of locks that show up in software development, but there’s one in particular that mostly only shows up in enterprise software development, the Long-lived Lock.

Locks are used to keep other processes from modifying resources in the system. These can show up at a variety of levels ranging from Critical Sections (Java / Win ) that synchronize access to particular pieces of code, to database locks, which keep people from reading from or writing to rows or tables while operations are done.

However, all of these operations are for short periods of time. You can’t keep a read or write lock on a row in a database for an extended period of time (or in cases where you can, you almost certainly shouldn’t..) About the longest time a row in a database should be locked is to perform a single transaction (which may be spread between multiple databases, rows, or what have you, but the time is just the changes for the transaction, not all the time that people spend staring at a screen and enterting data before hitting the return key.)

But how do you let a user lock information for an extended period of time? For example, say the user is locking a row in the database that represents a document that they’re updating (a frequent setup in most ECM/DM systems.) Well, since that’s part of the ECM system, that should happen inside the logic of that application. It shouldn’t be achieved through database locking, but should instead be stored as information within the database.

It’s possible to set this up a number of different ways, but lets assume you have a document table document and it has, by convention, an id column that represents the primary key on the table. I’m also going to make the assumption that writing to a document is done by a particular user. Your application’s security system may vary.

So, let’s look at a table set up for locking on the document table:

TABLE doc_lock
     document_id : INTEGER
     user_id : INTEGER
     lock_expires: DATETIME
END

And you just join this table in when you need to know if there are locks on a particular object, and you otherwise create and delete locks as needed. One particular thing about this sort of locking strategy is that you end up with expired locks accumulating on documents, so you want to clean those up, and also when you join in the lock table you want to have non-expired locks only.

Your app needs behavior about various things to surround this, like what’s the security model surrounding locks (who can know about them, are they on a user/group/role basis, etc…), and when can a lock be broken. Sooner or later, you’ll need to break locks, like for an employee on vacation who’s got documents locked or similar. But that’s all above the database structure and the immediate operations on the lock table, which I’m discussing here.

Well, that’s part one of three. The next segment will be the Ruby-on-Rails implementation I sketched out for my interlocutor, and the last will be some variations on and exceptions to this idea. I consider long-lived locks a design pattern, because it’s a recurring pattern in enterprise computing.

Some comments on Hivelogic – The Narrative – Building Ruby, Rails, Subversion, Mongrel, and MySQL on Mac OS X

I’ve been using this set of instructions to install ruby on rails on MacOSX for a while (in case you’ve ever wondered, which you haven’t, I use a MacBook Pro set up to run Windows XP and MacOSX 1.4.x.) It doesn’t work well for me, because I use ‘tcsh’ and not ‘bash’ as my shell on the computer. I also like confining changes to my own account.

So, I use the instructions given in the cited article, with the following difference.

Paths
Here, add the following line to the end of your .cshrc

setenv PATH /usr/local/bin:/usr/local/sbin:/usr/local/mysql/bin:/sw/bin:$PATH

(This is all just one long line)

For the rest, I replace all instances of ’sudo command’ with ’sudo tcsh’ followed by the command. More concretely, instead of:

curl -O ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6.tar.gz
tar xzvf ruby-1.8.6.tar.gz
cd ruby-1.8.6
./configure --prefix=/usr/local --enable-pthread --with-readline-dir=/usr/local
make
sudo make install
sudo make install-doc
cd ..

I do:

curl -O ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6.tar.gz
tar xzvf ruby-1.8.6.tar.gz
cd ruby-1.8.6
./configure --prefix=/usr/local --enable-pthread --with-readline-dir=/usr/local
make
sudo tcsh
make install
make install-doc
exit
cd ..

This has the advantage of keeping my root environment clean and running bash, which have been disadvantages to the other solutions I’ve seen for this sort of thing. There’s a related issue of whether you should be able to sudo a shell, but that’s not the point of this article to argue about — this article is about making sure you have the right environment variables when you type ‘make install,’ basically.

I haven’t provided exact conversions of all the sets of commands because if you can’t figure the rest out, you might want to switch your account shell back to bash to avoid more trouble later. In particular, you will want to execute the ‘rehash’ shell command on occasion.

[beansidhe:~/ruby-1.8.6] zeitgeis% ruby -v
ruby 1.8.2 (2004-12-25) [universal-darwin8.0]
[beansidhe:~/ruby-1.8.6] zeitgeis% rehash
[beansidhe:~/ruby-1.8.6] zeitgeis% ruby -v
ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-darwin8.9.1]

‘rehash’ causes the shell to recreate the cached path, which is handy when you’re adding new executables outside the current directory.

Technorati Tags: , , ,

Enterprise Content Management (ECM) Team Blog : Taxonomy/Tagging Starter Kit for SharePoint Server, also at the Sharepoint blog

Microsoft has made a kit available for Sharepoint that makes it easier to have taxonomy and tagging.  The tagging allows authors to tag items and to also have controlled vocabularies on particular multi-valued properties.  Users can incorporate the controlled vocabularies into searches and also search by tags. 

In the default configuration, users cannot tag items on the fly (although I suspect that they could change taxonomy values if they have permissions.)

I used to work (engineering) at an ECM company, so using the phrase ‘controlled vocabulary’ in place of taxonomy for this is somewhat second nature.  Since I took a lot of classification classes at the Information School, it’s interesting to see how companies implement these concepts.  It could be interesting if these features became widely available in Sharepoint.

Technorati Tags: , , , , ,

Why is releasing in codes with TODOs and FIXMEs in it ‘The Ruby Way?’

Technorati Tags:

Document modeling is important to any IR approach — the bag of words approach assumes word independence, and this is simple, but inappropriate to natural language. There have been a bunch of approaches to this sort of thing in the past, but here’s a relatively new one that does well versus various TREC collections.

Here’s a link to the paper: LDA-based Document Models for Ad-hoc Retrieval.

The presentation was largely a crawl of the paper section by section, and I’m going to emulate that approach by just referring to the paper so you can have that experience.

However: it beats previous models because it maps { document vs. topic } for all topics and documents, as opposed to the cluster approaches, for example, which largely assume that all documents belong to one cluster, or for many practical approaches, belong to whatever cluster it matches best. Because documents belong to n topics with probability p(d[i], n), this is better than searching against bag of words models.

All papers in this section are pretty oriented towards the whole ‘topic searching autogenerated’ is better than word-based. See the papers in question for the differentiators, as a lot of it is math that I’m not going to break out the LaTeX for on the fly. I will also note that most presentations in this area are pretty high on the UMLS fetishism.

Technorati Tags: , , ,

Technorati Tags: ,

For the last couple of days, I’ve been working on a couple of interesting things, of which the most important has been getting my resumes together, and the most interesting of which has been getting rubyonrails installed on my Macintosh.

Rails is interesting, because it handles the idea of dispatching incoming web requests in a way of which I greatly approve. I spent a lot of time in the late 90s trying to convince people that this sort of URL syntax was the right way to do things, and now through rails I’m feeling vindicatedish. Actually, I hadn’t really thought about it until I was talking to an ex-coworker who remembered me talking a lot about the Object-Action syntax being a good start on the Model-View-Controller. At the time, lot of pages named stufff like performSpecificActionOnSpecificObject.extension were more common. Anyway, this sort of thing leads itself to cleaner code design in a number of ways (including patterns, etc…) so it’s the sort of thing that generally ought to be encouraged.

I’m doing a project in Ruby at the moment mostly to get my hand back into programming. Because of the sorts of work/school I’ve been doing for the last while, I’m much much better at design and analysis than I’ve ever been, but my hacking skills are a little weaker than I recall. Learning a new language in a different paradigm is very useful, especially something that’s rich in hacky synax/generator crap like Ruby seems to be at the top level.

More specifically, I’m taking on a social networking light application. I’ve always wanted to do a social networking application, and during the job search seems like a good time.

So, for those of you who don’t know, I’ve been working part-time at a local company to help pay my way through grad school. That’s actually a simplification of the actual truth, as I’m a part-owner of the company and I also am mostly getting benefits more than cash, but for now I’m the main system administrator on one of the main systems they run.

For the last bit, I’ve been tracking down problems in the spam checking software that we use, and it’s been a merry time. Most of the problems have been getting everything on the server to be in a single known compatible state, which is a concept I greatly commend to you if you’re running a server and don’t want to spend lots of time messing with it.

Today’s project was figuring out the source of and eliminating a bunch of error messages that get mailed out to the administrators’ mailbox every night. They’re known harmless, but it’s just aggravating and it might hide other problems.

So, I was looking through the codebase, and I found this little gem:

SPAMD=`ps aux | awk –posix ‘{ if (($1 ~ /popuser/) && ($0 ~ /\/spamd[[:blank:]]/)) print $2; }’ | wc -l | awk ‘{print $1}’`

You might ask yourself what that does. It’s pretty easy to figure out… it counts the number of instances of processes match ’spamd ‘ followed by ‘popuser’, which is useful for figuring out whether or not spamassassin is running on your server. It’s part of 4psa server assistant. However, this may not work depending on how your server is configured. On my server, this never works because of how ps does its output.

My main point here is that that’s a crazy way to write that code. What the person is actually trying to do is make sure that they’re only getting the main spamassassin process and not any of the child processes. The child processes display as “spamd child”, the main spamassassin process displays as something like “/usr/bin/spamd -u popuser -d -m NUMBER -x –virtual-config-dir=/MAIL/DIR/FOR/YOUR/SERVER/%d/%l –socketpath=/tmp/spamd_full.sock”. So, they’ve got to distinguish between those two lines, and they’ve decided to check for random text in the first one, and written a fairly complex little shell script (calling awk twice!) to do so. They can’t check for just the word ‘popuser’ because it might appear in the path, in case you were wondering.

I replaced this with the following line:

SPAMD=`ps ax | grep -v “grep\|spamd child” | grep -i “spamd ” | wc -l | awk ‘{print $1}’`

This checks for all spamd processes, and just eliminates the ’spamd child’ processes first. Why this way? If you’re trying to choose between two things, and one of them changes from system to system, and one of them is fixed and simple, you probably should try to select the fixed one.

So, here I didn’t want the fixed ones, so I eliminated (’grep -v’) them. It saved me from having to try to pick the one I wanted. It’s generally as easy to select for elimination as it is to select for further processing in computer programs. This is also true in card tricks, incidentally. Just in case you want to do some card tricks.

The basic idea behind a lot of card tricks where you choose between two things is that the magician knows which one of the two things that (s)he wants you to have before had. So, the magician decides whether you’re selecting an item or selecting an item for elimination at the time you make the choice, to make sure that you get the right item.

Actually, it’s typically mostly used in really bad card tricks. The ‘decisive moments’ blog describes how a similar process to the magician’s force is used in many video games to keep the plot moving in a somewhat linear fashion transparently to the user, and why it fails.

I wonder how much the folks who do massive interactive games like 4orty2wo use this tactic, and whether they’ve found good ways to disguise that it’s happening.

A couple of months ago, I wrote some sample stemmers for a class I was taking in the iSchool. I’ve put some of them up on the website.

The stemmers on this site are the Porter and the Lovins stemmer, both implemented in PHP. The Porter stemmer was downloaded from one of the several sites on the internet that have the stemmer, the Lovins stemmer was converted to PHP from the Java version available at SourceForge, copies of the source are available on request.

The stemmers are available here. There’s also a call into the php implementation of the soundex algo that I added to demonstrate some points at some point.

“hors d’oeuvres funroll-loops” is my new favorite meaningless expression. it sounds completely meaningless, but came up today with regards to a technical problems I’m working on. It’s just ludicrously fun to say, although probably not as funny as ‘boss’ or ‘grody.’

Also, on the subject of hilarity, check out these references to auderves in google.

Suprisingly, –funroll-loops isn’t the funniest sounding of the gcc options, –malign-double is. Just in case you’re having deus ex machina problems and need to specify your program has an evil twin.