So, for the last while I’ve been working on celiaq.com. It’s a social network for people with Celiac disease and other forms of Gluten Intolerance. It’s designed around the resources that that community has trouble getting reliable information about on the internet, and I think that it’s coming along pretty well.
Right now, it’s looking like that will be done and in a somewhat stable state by 7/15. It’s based on rails and hosted at slicehost right now, although I suspect that it will be moving to Amazon EC2, CloudFront and S3 as it scales to make things easier. It’s a rails/mysql application for the most past, although it would probably run on other db backends.
It’s been an interesting time taking this entire application from vision to deployment, and it’s been a good time. Currently it’s in a beta state, and is soft launched to let interested people enter resources and test the system.
No question now, what had happened to the faces of the pigs. The creatures outside looked from pig to man, and from man to pig, and from pig to man again; but already it was impossible to say which was which. –Animal Farm
The last week or so, obviously, have been somewhat transfigured by pandemic paranoia. Pandemic is ancient greek for either “all peoples” or “freak the fsck out,” it’s hard to tell. This can be easily distinguished from epidemic, which is greek for “upon peoples” and endemic, which is greek for “within peoples.”
Here’s a handy table for disambiguation of various words with demic in them, described in terms of internet memes:
greek root
english literal
meaning
pan-demic
all peoples
the call’s coming from inside the house.
epi-demic
upon peoples
getting mediæval on your ass.
en-demic
within peoples
never gonna give you up, never gonna let you down, never gonna run around and desert you.
syn-demic
together with peoples
I put a disease in your disease, so you could get sick while you’re sick
So, with this out of the way, we can proceed upon our merry way. It’s been an exciting week, I have been interested in public health and related things (epidemiology, etc…) for a long time so I’ve been watching the news, websites, twitter, and any other mechanism of public information I could get my hands on.
As for myself, I’m sort of guardedly watching the news (considering I’m traveling soon to Texas, where both US fatalities have occured), but am mostly concerned about the Northern hemisphere flu season. That said, there’ve been a lot of people coughing in this coffee shop today.
This is a test to see if some stuff at celiaciq is working properly. Some of the navigation has yet to be folded into the main site, so things are wonky.
I’ve been doing more stuff with geodata and geoapis recently in support of a project. It’s been fun, as you can see from the previous entry.
One of the things that this project has made me aware of is the importance of caching. A number of the free, public services have occasionally been hit by surges in usage (apparently frequently coming from iPhone apps that weren’t making good use of reuse) and I’m writing an article about this.
This is more or less a good guide to what I am up to at the moment, although it should be noted that I’ve written this same code fragment in three languages in the last while (this is ruby, the others were PHP and Java, although the Java one was a festival of reflection due to type wackiness.)
I actually have another version of the same code that puts a URL in the debug log that can be used to click directly to google maps. Why? I don’t know. I’m beginning to value Aptana Studio’s remix of Eclipse more and more as time goes on though because I now have Java, PHP, and Ruby/Rails all in the same highly (mostly) performing IDE. The alleged iPhone mode doesn’t work on my computer but I have CRAZY LIBERRIES installed at the moment and I suspect that that’s in large part my own fault — the apple tools still work.
def geo_desc ( geo_loc, extended = false)## specialized pretty printer for address types.# note that there is pretty much a standard mixin for geo stuff and# this works across all the geocoding packages and model types.#return"[nil location]"if geo_loc.nil?
desc = "["
desc << geo_loc.country_code.downcaseunless geo_loc.country_code.nil?
desc <<"."+ geo_loc.state.downcaseunless geo_loc.state.nil?
desc <<"."+ geo_loc.city.downcaseunless geo_loc.city.nil?
desc <<"."+ geo_loc.zipunless geo_loc.zip.nil?
desc <<"] "
desc <<"["
desc << geo_loc.lat.to_sunless geo_loc.lat.nil?
desc <<","
desc << geo_loc.lng.to_sunless geo_loc.lng.nil?
unless geo_loc.precision.nil? or geo_loc.precision == "unknown"
desc <<" ("+ geo_loc.precision+")"else
desc <<" ?"end
desc <<"]"if extended
desc <<" "+ geo_loc.full_addressunless geo_loc.full_address.nil?
endreturn desc
end
The Right One by Scott McCloud is an interesting piece (and contains mildly nsfw cartoon images.) It’s particularly interesting to me because it’s primarily a story about categories, similarity, and ordering, and I highly recommend reading it. I thought it was astounding, and have played through the comic several times.
I’m not going to into detail about how, because that would kill the awesome, but if you read it, you should get a sense fairly easily.
When Stephen King came to the end of his Dark Tower series, he felt that he had to rewrite the first book The Gunslinger because it was now no longer in sync with the rest of the series.
I’ve been watching this series by John Cleese from the BBC on the Human Face. It’s a lot about how the face is perceived in terms of beauty, expression, emotion, and for communication with occasional insights into the nature of fame. I greatly recommend it for people, as it’s great for thinking about what is meant by beauty, and the role it, appearance, and how what we do with our faces affect our lives. I thought it was really interesting, and I don’t think it’s just the cold medicine talking.
Antelope-as-document is a famous article in information science/librarianship, which this song seems to be ‘about,’ more or less.
“Honking Antelope”
Why dont you go photograph
Everything that ever passed in time,
Indigenous traces, tribal chiefs,
Vanishing hereditary lines,
Poets gone wild on the muse
Prophets all destroying the Tao,
When you see that honking antelope,
The secret dance of snakes, the tales of it all,
So, I decided to put this on my blog as there’s about 1000 mentions of this problem and none of this simple solution. First, let me start by saying, if you encounter this problem with a program other than Eclipse, please let me know… this seems to be a pervasive problem in Eclipse and it’s derivatives (Aptana Studio, RadRails, etc…)
To get your application’s title bar back on the window, hit F8 to bring up the function key, and then use the spaces view to select the window that you want and drag it to some place where it’s more useful.
Anyway, this will probably be more helpful for people on MacOSX Leopard (and onwards) than otherwise, but it’s been very helpful to me. It seems to affect mostly Eclipse for some reason, but the bug/feature (this wouldn’t be an issue on XWindows) has been around basically forever.
Earlier this summer, I was working on the technology for the websites grid.org and groups.grid.org. These are largely drupal-based websites, my responsibilities in them were varied but like many drupal projects a lot of what ended up taking the time was upgrading, patching, trying to find ways to achieve what the customer thought was possible based on inaccurate or optimistic module descriptions, and fixing incompatibilities.
It’s a good site though, and as it evolves, the value of the platform that it’s based on will make its value to the grid community stronger and stronger.
So, I haven’t been posting much recently because I’m working as engineering lead at a stealth startup. I’m doing things stealthily at the moment, mainly, sneaking up behind people with large amounts of source code, clandestinely attending meetings in exotic locales, that sort of thing.
I have a lot of belief in the product(s), service(s), and/or solution(s) that we may or may not be making, it’s an exciting time.
InfoCamp gives the power to the people! It turns the traditional conference structure upside-down, with most of the conference content being designed and delivered by the people who attend.
The result is a highly collaborative, vibrant atmosphere in which participants can discuss nascent ideas, emerging technologies, thorny problems, practical suggestions, and current issues.
InfoCamp Seattle 2008 is an unconference open to anyone interested in information architecture, user-centered design, librarianship, information management, usability, and related topics. It doesn’t matter if you’re a student or an established professional, in the private sector, a library, or a government agency — all interested folks are invited to register!
What will you encounter at InfoCamp? Since it’s an unconference, we can’t predict exactly what will happen! But we do know that you’ll hear from our keynote and plenary speakers, Jacob O. Wobbrock and Tamara Adlin. You’ll also participate in discussions about topics such as social media, the future of information systems, or other subjects already being suggested on the InfoCamp wiki (http://infocampseattle2008.pbwiki.com)!
InfoCamp Seattle 2008
September 27-28
http://www.infocamp.info
Join the InfoCamp revolution! Revolution has never been so fun!
Thanks,
Aaron, Andy, Corprew, Genevieve, Joshua, Kathryn, Kristen, and Rachel
Your InfoCamp Seattle 2008 Organizers
I went to the Neal Stephenson reading at third place books Sunday afternoon, he’s celebrating the release of his new book Anathem. I have read the first several hundred pages of this book so far and greatly recommend it.
I bought this book in hardcover because I’m guessing that like many of his books I’ll return to them again and again, sometimes in hostile conditions that destroy the glue of paperbacks. After the reading, I got on the signing line and had him sign it, which is something I don’t normally do, but it elevated the status of the book from mechanism for conveying a stream of text to artifact, which I find funnier with for his books than generally.
3rd place books isn’t a natural place to have a reading on a Sunday afternoon, and the crowd for the reading was large. Neal’s voice was quiet in the crowd, but carried well. The questions people asked were often more revealing of their personal psychology than actually informative, although Neal frequently twisted them around to convey useful information. A lot of them, unsurprisingly, were about his process of work.
The monthly get together for the organization that I’m the secretary/treasurer of the regional chapter of. If this sort of thing interests you, c’mon by.
Once a month, we get together to have drinks, chat, network, and geek out with fellow information architects, librarians, usability experts, user experience designers, and other like-minded user-centered professionals and students. It’s open to anyone, so bring a friend — especially those in other local organizations! The format will be casual, but all are encouraged to bring something to discuss — recent work, an interesting topic, or even your resume. This event is organized by the Pacific Northwest chapter of the American Society for Information Science & Technology.
What: Seattle Monthly Information Architecture Meetup
http://ia.meetup.com/57
Where: Elysian Pub, 1221 E. Pike St., Seattle, WA
When: 7-10pm, May 13th (2nd Tuesday of every month)
This and the next several are when the students will come in a big drove most likely, so if you’re looking to hire new grads in the information professions, it’s a good bet.
The last while, I’ve been working on the North Atlantic Bloom 08 Collaboratory Website at UW(ashington) APL. I finished that today (mostly, I imagine requests will continue to come in over time), and the science team is out in the North Atlantic doing science and working with their autonomous instruments as they measure phytoplankton and thus carbon flux.
Last week, we went to rainier vista and had a reception with the Seattle Housing Authority, Ignition NW board members, and a lot of residents of rainier vista and members of other Seattle Communities.
This tree, designed by Seattle’s Iron Monkeys, is the first permanent art piece installed by a grant from Ignition NW. It’s going into Rainier Vista, a mixed income housing project in Seattle that is pretty much ideal as far as neighborhoods go, and right next to public transit.
Our efforts at getting involved in our local community are paying off!
Cory Doctorow will be reading from his recent Little Brother at the ballard branch of SPL on may 19th. I’m planning to be there, the progreff of nature depending.
So, I’ve been doing a lot of research into logging & dispatching recently for one of the volunteer organizations that I’m a lead in. In the course of this, I came across this on http://www.history.navy.mil/faqs/faq73-1.htm
Deck logs are not “Captain’s Logs”
A deck log is not a daily diary written by the ship’s captain. The “captain’s log” was a dramatic device used by the creators of the televison series Star Trek to introduce each episode, and does not exist in the U.S. Navy.
People must have been asking. This is strangely hilarious
This will be my last town hall of this term on the INW board. I am probably going to take a year off before seeking another term. It should be an interesting time, if a bit tear-jerked. Well, actually not tear-jerked, but it will be strangely bittersweet to leave the board of an organization that I’ve worked on first creating and then serving on the board of for 5-6 years now (and 9 years in some ways.)
What: The Ignition Northwest Spring Town Hall
When: Tonight, 4/17 at 7:00
Where: The CHAC Lower Level (just down the ramp) at 12th and Pine
Come on down TONIGHT for the INW Spring Town Hall. You could learn things to your advantage. On the agenda.
Critical Massive Updates
Election Information
INW Outreach
New Festivals
Art Art ART!
New INW Programs
Community Announcements
Oh, and did I mention…T-Shirts?
Come on down. See and be seen. Meet your community. Learn what we’ve done and what we’re going to do and how YOU can be a part of it. I hope to see you all there.
Drupal 5 has a few problems in its security layer, as I’ve mentioned other places, and some of them stem from the sort of ‘it-works-for-me’ philosophy of open source. This is particularly a problem in a complex system like Drupal, which in most installations is made up of a few dozen modules in addition to the core.
The current issue I’m having is that nodes created by the aggregation module get their taxonomy stripped when they’re updated because of how another module uses the security functionality, which is just hilarious in a site that’s largely organized organically by taxonomy. So, after talking with the people I’m working for on the site, I ended up creating a simple PHP script to run through cron that fixes the issues ‘the hard way.’
If you check out this query…
function fix_object($name,$sqlcon){$query="SELECT term_data.name name, term_data.tid termid, node.nid nodeid, node.title title FROM node LEFT JOIN term_node ON ( term_node.nid = node.nid ) LEFT JOIN term_data ON ( term_data.tid = term_node.tid ) WHERE node.type = 'aggregation_item ' AND node.title LIKE 'Xxxxx ".$name."%'";// Perform Query$result=mysql_query($query);// ... and so on...
You can see that this is a fairly normal sql query that looks for all the nodes of type aggregation_item and titled a particular pattern. Because of the way the joins are structured, that means that any nodes that have lost their taxonomies will have NULL for termname and termid. Those nodeids with NULL termids can then have the proper taxonomy entries stuffed back into them…
function insert_taxo_4_node($node_id,$taxo_id,$con){$query="INSERT INTO term_node (nid, tid) VALUES (".$node_id.",".$taxo_id.")";$result=mysql_query($query);// Check result// This shows the actual query sent to MySQL, and the error. Useful for debugging.if(!$result){$message='Invalid query: '.mysql_error()."\n";$message.='Whole query: '.$query;die($message);}}
I’m largely posting this up in case people run into the same problem — this is a hilariously simple fix for a difficult to fix problem in drupal, but it’s a generic information architecture issue of what to do when the system that you’re working on is unreliable. I should probably mention that the issues with security in drupal aren’t related to authentication, but instead are related to item ACLs denying access to things for strange reasons, and are not crucial security bugs in the OMG MUST PATCH NOW sense.
Although, as the trainers noted, it sounds like the set up to a joke, Burning Man takes providing management training to its senior volunteers and staff members very seriously. As part of my Quinfecta of Frenetic Activity down in SFO, I went to the 2008 version of said training.
Because of BM’s nature as a largely community-driven and volunteer-based organization, their management priorities are a little different than other organizations. It’s very simple, really, and I can sum up the difference in three words:
RESPECT EMERGENT BEHAVIOR
This shows up in a number of ways, in respecting the ideas that volunteers come up with, in coping with what emerges from chaos every year, and in generally planning from middle-up or bottom-up instead of top-down when possible. The respect emergent behavior is a spin of mine on it, but it’s something that I’ve had as a part of my core management philosophy for a while, so I was suprised and pleased to hear burning man talking about it.
I was down in the bay area this weekend, so I went over to Yuri’s Night Bay Area 08, which was a party held in honor of Yuri Gagarin’s mission to earth orbit — the first time a human had left the surface of the planet in that dramatic a fashion. It’s turned into a somewhat generic celebration of humanity’s scientific achievements and hope for the future, at least locally, but it’s still held on an active NASA airfield at a NASA Base (Moffet field at NASA Ames.) I’m very much in favor of the idea of progress and not giving up on the future, so I’m pretty enthusiastic that there’s a party celebrating this sort of thing.
I’d heard about Yuri’s Night in previous years, and there was one in Seattle this year, but I very much wanted to go to the Bay Area one, so I was happy when my plans fell along with it. I was fortunate enough to be able to volunteer as security staff for the event, and got to hang out with a number of other friends of mine who were also volunteering. It was, all in all, an awesome time; although as someone who’s hilariously allergic to wheat, I need to remember to bring Lara Bars in infinite numbers to events when there isn’t easy-in, easy-out.
The exhibits were awesome, the theme was awesome, my friends were very much in evidence, and it was altogether a great time. I’m very glad that I was able to go.
I’ve been working on a website in RoR for the last while, and it’s about to go live in the private beta sort of way that seems to be so popular these days. It’s handy that way, because that way I can set up the site at slicehost or similar and not have to worry (too much) about my server slowing from getting overloaded. This same site’s next incarnation is going to be facebook related, so that should overwhelm any sense of moderation (if I’m lucky.)
So, the key method of invitation to a private beta is that you mail someone a code allowing them access to the system, for these purposes, let’s just assume that the code is some reasonably long unique string (in my code, it’s actually a uuid.) So, set up a migration something like this to manage them:
used_yet isn’t a boolean for reasons that are too laborious to go into here, but reflect some functionality in the code that I’m not going to display. Assuming that you’re using acts_as_authentication and are redirecting anyone who tries to access your app to the default welcome page according to the usual methods, set up something like this in your routes.rb:
map.root:controller=>"welcome"
This is probably the case in like half the rails apps out there. Have the index method of the welcome controller put up a form with a field like:
#let's see if the formatter can handle rails erb without exploding.<% form_tag('welcome/checkinvite', :method=>:get)do-%><%= text_field_tag 'invite'%><%= submit_tag 'begin'%><%end-%>
This lets the user enter their invite in more or less the normal method. Now in your ‘welcome’ controller, you’ll need a ‘checkinvite’ method that looks something like the following:
After this, you’ll need to have some code in your HTML page that links you to the account/signup functionality of acts_as_authenticated. I’m not going to include that because I’m too lazy to fish it out of my app functionality, but you can do that pretty much with a link_to using :invite=%gt;@invite_guid as an extra parameter.
You need to put the same invite detection code in account/signup, and then when you’ve created the account, set invite.used_yet = 1. This is about as simple as a method that I can think of for doing the private beta functionality that seems to be so much in vogue these days. Enjoy.
The Telegraph Arts Blog is the latest media source that I look at on a regular basis to mention LOLCat Wasteland. Somehow, that counts more than friend’s blogs and more livejournals than I thought likely.
There have been a lot of people asking angry questions to Apple today because the Apple β that they gave out to iPhone developers was timed to expire today and a lot of devs now have bricked their main mobile phone until an update appears. Lots of people appear angry, but they’re missing the main issue for Apple:
Dear Apple, why are you letting people this stupid into your β programs
People frequently forget what beta for software means in these days where everything is β until people find a way to make money off of it. It means untested, believed working properly but may blow up at any time, not ready for production. So, I’m halfway between bemused and annoyed at the outrage that some folks seem to be fielding on various fora.
Also, calling a phone ‘bricked’ when you can easily recover it by downloading new software hours later is hitting the epistemological puff pastry with a hammer.
Yesterday morning, because my life hasn’t been busy enough the last while, I woke up, got my shit together, and then went down to the 37th precinct. My district went fairly strongly for Obama (6:1 or 8:1), and I was an alternate delegate for Obama. I figured to go and see what was happening, and it was generally a good time, but long. I generally don’t talk much about my personal beliefs here, but someone asked me to put up this writeup that I originally put in a public forum.
Act I: the beginning (1 hour)
The first hour was spent registering for my precinct, and then sitting in the bleachers waiting for things to happen. I noticed something in particular, which was that a lot of the Clinton delegates who showed up were fairly hostile, which was strange, and generally about 10 years older than I was. This (the age) was a surprise, because at my precinct caucus, the people who showed up were more or less demographically indistinguishable from any other crowd of seattleites. The Obama delegates had about the same number of people of that age, but it was if the Clinton campaign only took that demographic. (This is, of course, a generalization and completely anecdotal evidence.)
I had a lot of fun talking to a delegate (for Clinton) that I knew from a company I worked at several years ago. I also made some new friends who were also Obama delegates. At the precinct level, there were a lot of people who I knew through their involvement in the Seattle tech community. The only person I saw from my precinct at the district caucus was a really bad Clinton delegate whose speech at the precinct caucus switched a bunch of people over to Obama.
Act II: the interlude (2 hours)
A lot of the second act was spent writing email on my iPhone to various people, because not a lot was happening. A record number of people wanted to be delegates for Obama at the next level (around 500, one announcement said), which was a number that was past unprecedented and into WTF? areas. I spent a bunch of time talking to the union organizer who was there (because I was in UAW for a while (graduate assistants at the UW), I got a nifty UNION DELEGATE sticker.)
We had a bunch of speakers during this time. A lot of the judges that we have the opportunity to vote for came up and spoke. We also had Ed Murray, our district’s state senator talk for a while, which is always fun. Ed Murray is my favorite Seattle politician because he always brings up his ties to minority communities by stressing that his fiancee (male) of 17 years is Asian. Most Seattle/Washington politicians at least once you get past the Seattle level are mostly Irish, and their attempts to connect to the whole crowd are frequently bizarre logic jumps. (My ancestry is mostly also Scots and Irish, I just hate bad logic.)
Also speaking was Jim McDermott, whose speech was essentially ‘My colleagues in Washington are always amazed by the fact that I keep acting like an ultraliberal freak and you people keep sending me back. But that’s why you do it, Right?’ … the crowd then goes wild, which it generally does when he says things like this. This is a true thing he is saying, they don’t call this the People’s Republic of Washington for nothing, folks.
Eventually, the mass crowds died out and they processed everyone through. I was in the last group of alternates to be called out of the ‘alternate and guest bleachers,’ the methodology for this strangely resembled The Price Is Right.
Act III: Dramatis Personae & Exeunt Omnes (1 hour)
As I was walking over to my credentials check across the high school gym, the announcer called out that Sean Astin was in the crowd and would be addressing the audience. I’d been getting ‘dodgeballs’ that ‘Sam’ was at other caucuses, but there was a lot of idiot SMS traffic that day and I was more or less ignoring it. But, Sean Astin came out to speak on behalf of Hilary Clinton. However, by the time he reached our crowd, all the delegates but me and about 30 other people were already seated in the other room.
I walked over to the auditorium where the delegates were sitting, and it was so packed that I ended up sitting outside and chatting with my friends Eric K (Clinton) and Brian W (Obama). Eric has been a perpetual nuisance on the local burningman-bcwa lists with his strident Clinton cheering, which I only object to when he’s just promulgating media spin and not giving his own opinions. So, I chatted with them for about an hour. Mostly, we just talked about the event and the day, because we both know the others well enough to know that we’re all fixed in opinion.
Act IV: Performance Art and Resolution (1.5 hours or more)
Eventually, the Obama delegates left the clinton delegates in the auditorium and went back to the gym (because it was bigger), and each delegate who wanted to go on to the state level got 30 seconds to describe their qualifications to go. Lots of people spoke, but I think the total list was only about 100 - 200 rather than the original 500. So, people spoke about why they should go, some were funny, some were insightful, some were strident. After a really long time, we got our delegate voting sheets and I voted for the people I thought should go and then left. I missed some of the speakers, but I had said I’d actually get work done today. I think I selected a good range of people, but I was also pretty damn tired by this point.
My overall thoughts on the process is that the Obama campaign seems to be so successful here that it’s actually overwhelming the machinery of the democratic party through the amount of turnout it’s causing. Things that would usually just be a short exercise by the party faithful are getting many times the normal turnout. This doesn’t seem to be a plan on the Obama campaign’s part, it’s just occasioning such high turnouts and enthusiasm that the small number of volunteers that would typically suffice for these things are getting overrun.
I’ve currently got about a half dozen small projects going of various sorts (but am always looking for more and/or an actual full time job given that ), but one group that’s been a real pleasure to work with for the last month or so has been the North Atlantic Bloom Project, I’m making them a website that includes various features like maps, instrument data that populates into websites for site navigation and the like, and is based on drupal.
My current interests (this week) are macintosh development, semantics (RDF), embodied philosophy and category (Lakoff), drupal, iPhone development, good coding practices, and project lifecycle management (from soup to nuts.) My interests next week will be getting a website off the ground that I unexpectedly ended up owning after a contract fell through. That was suprising and not entirely pleasant, but will see what can be done to have matters turn out positively.
I’ve been working for the scientists over at the North Atlantic Bloom 08 as part o the team making their collaboratory. This project has mostly been based on Drupal, which is a content management system (in the web sense, not the ECM sense) based largely on PHP.
One part of this is taking a lot of content over email — the scientists creating a lot of the content on the site are in the middle of the North Atlantic. Because of this, I’ve been using the drupal module mailhandler for a lot of things that would normally be done directly through the site.
Basically, changing the comparison from $data[0]==’taxonomy’ to a case-insenstive comparison:
allows users to send in messages and have their mostly unambiguous intention followed. This is, in general, a good thing, and reflects the general net protocol practice of “be conservative in what you send, be liberal in what you accept.’ This is a general guide to good behavior and successful implementation and is generally called the Robustness Principle. I advise everyone to check out RFC 1122, because it has many entertaining things to say about the nature of the Internet at the protocol level.
Anyway, so when you can actually figure out what the person meant to be doing, you should probably accept it. This is a generic problem that pops up every now and then with computers. Surprisingly often, it pops up with case sensitivity at the protocol level. Here’s another moment for me with that same issue from about 7 years ago in Apache SOAP:
I proposed changing it to:
for(int i =0; i < pds.length; i++){if(0== propertyName.compareToIgnoreCase(pds[i].getName())){return pds[i].getWriteMethod();}}
from:
for(int i =0; i < pds.length; i++){if(propertyName.equals(pds[i].getName())){return pds[i].getWriteMethod();}}
This does more or less the same thing, but is in Java. This was a case where the apache implementation wasn’t interoperating with the Microsoft implementation because the MSFT implementation had a different valid interpretation of the SOAP specification than the Apache server did. Both implementations had completely reasonable behavior, but when the MS SOAP implementation responded to Apache, it did so with its own capitalization semantics (based on COM — PropertyName) instead of the Apache SOAP capitalization semantics (based on Java — propertyName). So, this is an ongoing problem that keeps showing up again and again.
I appear to be destined to run into it and advocate for the robustness principle in every technology i use, apparently. Either that, or it’s just a slow day and I’m rambling.
My first access to a unix machine was around 19 years ago, and I’m still amazined that sudo tcsh is a valid command on most systems.
I’m not saying that it isn’t convenient, mind you, but the fact that I can then execute emacs is also hilarious. Especially because sudo emacs is prohibited.
here is your system log, let me save you the trouble of auditing it by running a shell.
(I’m aware, incidentally, that it’s basically impossible to stop people from running a shell as long as they can run any naive-turing-complete interpreter or compiler. Maybe it’s time to only fight battles you can win.)
note that the latest revision of this blog’s theme seems to have introduced a weird bug with the code layout plugin (wp-syntax) on some browsers. i’m looking into it.
I think the single most useful thing I’ve figured out recently in programming for MacOSX and the iPhone is this little snippet right here.
What this does is intercept incoming messages that are supposed to go to the current window, and redirect them to that instance. I’ve run into issues in Leopard (MacOSX 10.5) where this is an issue. To some extent, this is probably a misconfiguration in interface builder somewhere, but it also an issue when using CoreData, because the ManagedObjectContexts are particular to instances of NSManagedDocument, and there are issues that arise if you end up using the wrong context.
I saw Corprew today
outside the window of my bus.
He was standing at the corner
stuffing a laptop or something like that
into his compact bag. 5
He looked intent.
This is my favorite non-ovary related poem written on, by, or about mass transit in Seattle.
It IS a fine morning. I was coming from a consulting gig I currently have going with folks whose offices are in the University District, and I’d just bought a plastic hardshell for my MacBook Pro.
(Incidentally, the two videos on that page use Apple’s Victoria text-to-speech voice for the narration instead of a human voiceover, a practice I’ve not noticed before. It took a few seconds to pick out what was “wrong” - the occasional clipped words and odd pronunciations that indicated an artificial voice.)
This is a reference to the computer game Portal, the main antagonist in that game is a computer named GlaDOS, who has a voice somewhat similar to the victoria voice and that’s also the reason that there are some grammatical irregularities in the dialogue as well. (”Aperture [Noun]“, which is said several times alludes to how the computer in that game says “Aperture Science,” the name of the company.)
This last week was the InfoCamp 2008 kick off meeting. After a successful 2007 event, we’ve decided to do it again and to expand it further. This year we have Aaron, Kristen, Andy, Rachel and myself back again, and we’re also joined by Genevieve, a librarian from PLU, and Joshua, a student at the UW Information School.
Aaron and Kristen graciously cooked food for the lot of us, and we had a great initial planning meeting in which we identified roles and people responsible for roles, and then talked for a bit about the future of the ASIS&T PNW. I’m very much looking forward to doing another InfoCamp with this team, it should be a lot of fun.
We’re looking to have a much easier time this year since we have the experience of doing the conference last year, and are also starting much earlier in the year with our planning. It will continue to be an unconference serving (primarily) the PNW Information Science community.
gloriously, i upgraded to the latest version of wordpress today, and the upgrade ate all my tags. that fails to be awesome in real ways, but i remain generally happy with wordpress.
Follow-up to the Total Eclipse (of the Heart) event I organized at Cal Anderson park. There’s been some media coverage, which has for the most part been quite nice.
More singers climbed the hill already singing the song, like a weird version of that old “I’d Like to Buy the World a Coke” commercial. And it was an honestly touching event: It’s hard to convey the power of dozens of people repeatedly singing “Forever’s gonna start tonight” all at once, but if you were there and you didn’t feel tingles, you must be a little dead inside.
Please join me in Cal Anderson park on February 20th at 7:01PM to sing Total Eclipse of the Heart karaoke style for the 51 minutes that the eclipse is, in fact, total. There is a page for this event, and if you want to hold a similar event in another place, I will post it there.
If you want to bring along music to sing along to with your boombox, feel free. If you want to play an instrument, feel free. If you want to bring your whole band, go ahead. I will have a megaphone so that people can take their turn on the top of the hill if they want. If you bring your own megaphone, don’t be a dick.
Cal Anderson Park: http://www.seattle.gov/parks/park_detail.asp?id=3102
All told, we had 95 people known through the door for the conference, and probably a few more (so, say, an even 100) if you included the people who didn’t bother to register. (It was free for many attendees, and we think a bunch of people, knowing it was free, ghosted the conference.) That’s pretty awesome for a first-time conference in a community which hasn’t really done the unconference thing before.
InfoCamp 2007 was a great success, and I am really happy about the way that it went. There were roughly 50 sessions over 2 days, and roughly 85 participants. It was a great relief, and everyone seemed hopeful that there’d be another one next year and there wasn’t too much negative feedback, and that’s about as high praise as you can expect.
My session was called “Thesaurus, Ontology, and Inference” and was about the benefits you get from having a minimal amount of semantic data associated with documents — mostly exposing metadata that’s already present in the system and some things you can do with it. I think my presentation confused a bunch of people because I had to cut so much information out (my session was compressed from 60 minutes to 10 for various reasons), but we’ll see how it plays in the long run. I think there’s a lot of difference in the assumptions that I make based on previous work experience and what the IAs/IxDs in the audience have from their work experience. At any rate, got some decent feedback for refining the presentation.
I made it to about half the sessions I wanted, keeping folks moving in the right direction was a time-consuming project. Low time between interrupts, but a lot of good fun. It was especially fun talking to an audience and getting people to introduce their sessions, and I also had a lot of valuable hallway conversations with people. One particular thing I found was a good venue for publishing professional work other than the one I have already, and the fact that there’s a difference in focus between the two helps, so what isn’t wanted by one may be by another.
All in all, I really enjoyed being one of five people running this conference, it was a great time with a great group of people… I look forward to this community growing.
For a bit of a change of pace, I rewrote TS Eliot’s The Wasteland in LOLCAT: here. I’ve been off busy for a bit doing various things, should have time for blogging again now.
One of the problems I’m running into a bit with the whole cuneiverse whokno.ws project is distinguishing between discussions about ‘movies’ and discussions about particular movies.
You can currently see this in the whokno.ws test category ‘Shrek,’ which pretty much exists to help test the difference between talking about the concept of movies in general and talking about particular movies.
There’s an algorithm in the whokno.ws core that distinguishes between unrelated concepts with similar written representation, distinguishing between, say China the place and China the porcelain. So, say “China (porcelain) exports from China (country)” still gets a little rocky, but “China (porcelain) exports from, say, Ireland” gets figured out pretty handily. Ideally, though, these concepts are automatically distinguished between and only rarely will a document turn up positive for both of them.
Whokno.ws uses — in its non-trivial web dump form — a SKOS concept map linked to large numbers of data files, term lists, and other impedimenta. A lot of this impedimenta is generated automatically, but some of it isn’t. It isn’t an ontology in the most formal sense of the word, although it is going to approach that in september as a source of incredibly accurate information gets mined somewhat to fill out some of the nascent semantic web bits.
The nascent semantic web bits will enable me to accomplish the original goal of whokno.ws pretty easily.
Of course, this would be more impressive if there wasn’t a bunch of bad data in today’s test run, probably causing it to be much… less relevant than usual.
One of the recent changes that I had to make is changing whokno.ws the concept thesaurus, taking it from (roughly) pre-coordinate to (roughly) post-coordinate. The problem here was that the algorithm used to determine group membership wasn’t blending well with the way the principles of divison works. I think this is a problem with statistical techniques generally.
But here’s a simple way to explain it to you: Say I have two sets, and whether you’re in that set is defined as whether an article contains a particular concept. These sets aren’t disjoint, which is to say that there’s an overlap between the two sets. For Whokno.ws, because the modelling of set membership is statistical, doing a computation *combining* the parameters appeared to be the way to go — the ranker would give high scores to things that were about *both* topics. However, it also gives a high score to things that are only about one of the two topics, but *really* about that topic.
What this means is that, say, articles about “bird flu in vietnam” are best found by looking for articles about “bird flu,” and then looking in that set of articles for articles about “vietnam.” This is very interesting to me, because it means that by the proper way to do this is actually by using the “POM” or “BOTD” engines that I’ve already written. Strange.
Today’s whokno.ws task: improving the way that stopwords are calculated. It’s made a decent difference in the article scores, very much for the better.
Tomorrow’s whokno.ws task: adding a (blocking) queue system to the interface between the feed retriever and the parser.
Monday’s whokno.ws task: changing the way that the classifier matches concepts to phrases. (also known as word -> lexical element transition step 1/3.)
Saturday, I went to the morning session of BarCampBankSeattle — I wasn’t able to make it to the other sessions due to work and a wedding — and I attended a couple of sessions about various trends in banking and commerce. I didn’t get to have my session on vertical applications for banking, which is a bit of a shame, but there you go.
I went to three different sessions:
Loyalty Economics
The key question in measuring satisfaction for banking applications (and for many things, I suspect) isn’t “how satisfied are you?” but “how likely would you be to recommend this [whatever] to others?” This, followed up with “why?” is apparently a more useful metric for predicting growth than most others, and is called the Net Promoter Score.
The NPS is measured by asking people to answer that question on a scale of 1-10, and then subtracting the percentage of people who answered 0-6 (your “detractors”) from the percentage of people who answered 9-10 (your “promoters.”) Apparently, the highest rated companies on this score are Harley Davidson, Costco, and USAA.
Money as Primary Social Network
This speaker proposed moving beyond single alternative currencies (such as Ithaca Hours, and various forms of local currencies and hour exchanges) to a system where any group that was large enough in a community could issue their own currency.
This is a strange solution to a well-known problem — sometimes using money to mediate exchange of goods and services fails. This can happen when you have a group of people all willing to exchange with eachother, but they don’t actually have enough money around to mediate the exchange. In the system that currently exists, we’d have to work through informal networks to do that exchange, but in this guy’s proposed system, we’d be able to *issue* eachother currency (essentially formalized network IOUs) to mediate the exchange.
This is all true enough, but I’m not sure that the solution he proposed (100s of currencies in a city the size of seattle all done virtually over the network through a terminal at each vendor) fits the scope of the problem well.
FaceBook and Financial Applications
This was a session about whether facebook is a useful tool for financial applications to use. It mostly came out split between arrant tech utopiana and people talking about human factors in a technical frame, which I think was useful but not relevant.
In short, the conclusion was that “like AOL was, facebook is a tiny closed environment, and it’s hard to see what the impulse is for institutions and apps other than ‘getting stuff up there.’” One participant added “I expect open clients such as the iPhone to be a better investment of my development dollars than working heavily inside of facebook.”
So, that’s my participation wrap-up. Also many congratulations to Betsy and Jason whose wedding celebration I went to Saturday night.
For the last while, I’ve been working on a project that involves scanning large numbers of RSS/Atom feeds, and then using Bayesian1 classifiers to break it into one of a number of categories for summarization and display (the system that I’m using to do this is available as a sample website, but really needs more data in the training sets before it’s ready to entertain all of you.) The categories are pretty straightforward, and they fit into a somewhat neat controlled vocabulary (ontology/thesaurus/whatever.)
There’s a relation, though, between the different terms in this sort of classification and the training data used to build the Bayesian Classifier. If the terms are arranged in a hierarchy (and certain assumptions are made about that hierarchy, like subterms encompassing part of the range of meaning of their parent term and nothing else)2, then the training data used for classifying terms can be shared.
For example, all positive training data that belongs to the child terms can also be used for the parent. So, for (a constructed) example, positive training data for tamiflu also belongs in the positive data for bird flu vaccines. The reverse is true of negative training data. For negative data, the negative data for the parent can also be used for the child terms.
This is highly useful information when you’re making a large scale text classifier (and having it classify texts as belonging to categories or not, as opposed to just clustering texts into the categories that actually appear. It’s easier to use things like bayesian classifiers do to this if you’re looking for somewhat fine-grained detail.
Currently, I’ve been using Classifier4J for doing the classification and text summarization3. The text summarization is sort of annoying, though, because it’s based on a simple statistical choice of sentences which occasionally picks up date-lines and partial phrases because of what’s ‘important.’ I’m resorting the urge to go completely POS-tagging nuts on the whole thing and only selecting sentences of certain types or completeness because this is, after all, a side project. (The number of times I see things like ‘this sentence no verb.’ is astounding, though, and slowly driving me nuts.)
So, another day in the life.
1 although i’m also using a vector space classifier for a related, larger project and it’s driving me less nuts training it. 2 this is called a meronymous (’part-of’) relationship, and given that half the people who regularly read this blog were in LIS530 or its equivalent at some point, you should remember this. 3 and will probably eventually switch to jNBC http://jbnc.sourceforge.net/ before i go nuts
A while ago, I went to hear Chuck Palahniuk speak at Seattle Town Hall, and while he was speaking he said a lot of really entertaining things, but he also said some really interesting things about communication in general. He said that the basis of humor is surprising people, telling them something that they’re not expecting to hear. In his context ( like say in fight club), there’s usually a stark difference between how people act and how we expect them to in everyday life. In addition to all the other sorts of reactions his work provokes, there’s usually some shockingly funny stuff there.
So, that’s humor. But what about other things. Generally, most of us go through life in a state of continuous partial attention, and people find themselves not ever really focusing on one particular thing, too distracted by one thing or another to ever actually get anything done. One reaction to this is the whole Getting Things Done system for actually doing things. It’s very oriented towards keeping tasks somewhat minutely focused, and scoping tasks in and out as needed. I’m not a big fan of GTD, because it’s designed to solve a different personal organization problem than I usually have,
Claude Shannon described a problem of identifying information in his paper A Mathematical Theory of Communication. In it, he describes information as that which you can’t predict from the previously received parts of the message. So, basically, successful time-management people are able to discriminate between the new information (where they focus) and the rest (where they pay continuous partial attention.) You see people attempting to do this (and usually failing) when they’re using their laptops in meetings. Get Things Done is a management strategy for people who are bad at this (and most are.)
When giving information (presenting, teaching, etc…) I try to give people information in a way that allows for this behavior, pushing the main themes much more than the subsidiary and supporting information. This allows the constant partial attention people to blink in and out. However, I also use a lot of humor to get my point across, and to keep people paying attention. Until I was reflecting on Palahniuk’s talk this morning, I didn’t see how this is essentially the same strategy — by confounding the expectations of what people are expecting to hear, you force them to pay closer attention.
This is my annual GPG housekeeping. Some people use the infromation contained in this message to send secure mail to me. If this doesn’t make any sense to you, feel free to ignore it. If it does make sense to you and you can’t verify this, contact me through whatever way seems appropriate. This block contains a revocation of my 5/2006 to 5/2007 key, and then a new 6/2007 to 6/2008 key.
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1.4.7 (Darwin)
Comment: A revocation certificate should follow
iGsEIBECACsFAkZolkIkHQFrZXkgc3VwZXJzZWRlZCBieSA1QjQ5MkY3QzA2RDQx
MTlDAAoJEJ/DSim6jwbQqFgAn261hqJ1imw2OAeSlX862XpeXbAOAJ49zySpdXH4
nTIDDC2NZ6k8Y/xmVw==
=hmH8
-----END PGP PUBLIC KEY BLOCK-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
My new GPG key has signature 5E74 7A06 C111 400A 0DE4 39C9 5B49 2F7C 06D4 119C
Its ID is 5B492F7C06D4119C
Here's an ASCII armored version:
- -----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1.4.7 (Darwin)
mQGiBEZojWoRBACeKSrUJ30x6ZPvbdnGHRupMPhyNPsjJQAEYDkqRwFx714nee0k
/bhCRmMioRP5E1CDxCukOSJq9GngRsiDH0SCDu1eeRnII7ZM68cX7/guxGylrMtP
7JW4LwZnIJjLO5EIpajJE4fpd46FOx3ggJyXgnbGvnCjM+TSUwQTXLqp0wCg/fuM
meHVru5C1VUat/72DIM2PVkD/3Uyd0r647fHnHJkTYVj1uFNlY0YvSxbeiw9P3bE
z29XTpCd2e07Np5ftbaWoEIwSkH9iP4iXmWR0eR/PwKRdnxqRlb5xQm/z787HSmK
5eAaYf8sacKYAXzxwbkWat0BOcyfi5Bh5TXlJsDy+9RIhLMahKdpkeGOdZ9+31NL
01yzA/0Szbb2wEtJ8c0higvwpULLXZK77u/MSESVQyX8PeFg9DluXhEGgAo7EFhe
KnZg2j94Rdq0Gn6aqvm/aXv9+hMhnvmCnJbaqgWKAlFenNaH+N18zCLQxhs9rzVG
xJrRcW8n58qb9vyQ8JCxLJA5JvqhAZGJrMW/qEgQICkSUbyPp7QeRWR3aW4gUmVl
ZCA8Y29ycHJld0BnbWFpbC5jb20+iGYEExECACYFAkZojWoCGyMFCQHhpIYGCwkI
BwMCBBUCCAMEFgIDAQIeAQIXgAAKCRBbSS98BtQRnLZIAKDhFdKSb2lf5H6DVOiz
O4WD1HhMVwCcDq6+fBG6Yi+Dp79KjYw9sWmp7L60IkNvcnByZXcgUmVlZCA8Y29y
cHJld0Bjb3JwcmV3Lm9yZz6IaQQTEQIAKQIbIwUJAeGkhgYLCQgHAwIEFQIIAwQW
AgMBAh4BAheABQJGaJTBAhkBAAoJEFtJL3wG1BGc5ekAoOCQw4D8mpfyADrwxIYS
WCbHuBF+AJ9ldaZ8E50BGWofTWJ9ycHJldyBSZWVkIChvbmx5
IHVzZSBmb3IgaVNjaG9vbCBidXNpbmVzcykgPGNvcnByZXdAdS53YXNoaW5ndG9u
LmVkdT6IZgQTEQIAJgUCRmiULwIbIwUJAeGkhgYLCQgHAwIEFQIIAwQWAgMBAh4B
AheAAAoJEFtJL3wG1BGc1WcAn2q2DNrUFaEiVFSBFbmSZas1Np39AKD3rzjqF5Nk
8jTOqJBACXntDfBTx7Q5Q29ycHJldyBSZWVkIChvYnNvbGV0ZSBlbWFpbCkgPGNv
cnByZXdAc3RhdGljZmFjdG9yeS5vcmc+iGYEExECACYFAkZolLYCGyMFCQHhpIYG
CwkIBwMCBBUCCAMEFgIDAQIeAQIXgAAKCRBbSS98BtQRnDcuAKC/IycdR/mMAG65
aRZZEaexi3XaNgCfdaSAXprfV+zbyNqVyRsydkLFVH+5BA0ERmiNahAQAJf0DiOQ
riIq+/Q+tAjw1R86NFb0IwpGbaW3a9GE/YlABdnPGDbdrT7XHJtvTo/IW0LzLC3e
Mjc3qf2SB+OO1DTgFLpYQmtHA7Hd9w6qqTRUDK2L52zHeZ+/DNbbWnplvxGH0Hm3
2gh0Qk30KSat0WtsRipVPDjfcDKGATcNVEE12qxLmdvCSgJZHQgRgF9c10DcdN5e
DrTA1f8ABrOXZk0DEXoFdNEL4nFWcfPS5yvckwTpH7D7z9ok0wL1dfLO2o1xZ7+T
LFjOJ+KqF2gWj65dgNu53rC+VmPNXX5XU29OqhXXyE4Flx85uSDrrk3YRQxgoTHE
x3IJvlyFNzHdKQmmNc5ir3cqyqeiqdjre3brfdttxF7zFjNPT++dV9VSsK+oP6zB
tKzxI9PVAZzV/ho7DLmftSS7m7bi73Ec+geGKgpV4rZgBDtUJfGQp6uL1Bgh5xbb
Osfr/Qyo/NLDcxNRlSdK1sFCJ8/NPwc35r8KSR5+qxhWK42HYh5V42Vdm63+I+8g
bFenI+b6U8xkabs02QglkCCFGoZuG7ZtpDeZgeTrApvULUp7jaOyaJnigSQ+ngvj
3BRRmzI12k6XlfYAJ2ceKF6jg7ZBisfJRUb7789I8N02NMbs08j8Rl8vT94YQPT3
Uo3xwyM4aUHj3N0O4S9gAGnDCFYzJafmbrH3AAMHD/9bT76EfaBT2MkwqjC4beMX
hrazkQt8Se24NyuytNGa77qZLG0cRUhKp17iJG6UO75qaV/V8oUUrfwb4So2+POg
wIYj2ObQ/Spi5lNjhJswDjFkcR5edoImnyWvMiemzuSD1cHMx+q5urnSRp/on+qB
uR++pdr34YRP/kkINVo0VtmPtcwqWk2u9KHgdhbjFbJyP+HAOKuM/2m+eLPkFdkS
gtJ2DeLfg7IEaA9vWzUHmOSD/5j2l5ICjE7Iy/ikesRIG2EmkeBxfWeEgCpVCqYz
tGPVKq/z/joLAI2Tc/nsRrgltKW1kE/sYxtNrFsnyXUrFOVUpvy4uYuOWXzKROS3
A3iXdBcMUIdUjGYdRv1QoRl4ryMrVDodGpgmYffHHutTIe+SDiCC+uJkoOjRpPAt
aVdlUnRsPKDr8FoCaTs25mlhBxW/X0+0R2JMyiVMWripkdCCdGR59kB1gm4J3/7H
+vfryFAEswL0jTQgaWr1fnrbK2mVoLtfk6RgSZ3Uy+Ob801tCq9jZQymmHOkoRpB
eeSkmytLK1ZoHPL4zjblH64Kzd/Yg4Hq2F6FWpZHvvNPvWUTBstvIFprt6MZkBsW
v6WYZHSroLdvvboPMWFtndeoPYgQIkWjTDzb2Ma62I3k3KFBfsOk49HZucAI4i8z
yWuUWsI2f4Ltowl+e8+Y0YhPBBgRAgAPBQJGaI1qAhsMBQkB4aSGAAoJEFtJL3wG
1BGcMmMAoI9/yAed7gHIYNYI9nC1m39PfxbmAKDFcVcfl17woPcsZRyKw1AN1fDJ
QA==
=C5MF
- -----END PGP PUBLIC KEY BLOCK-----
But I'd really suggest you get it from the keyserver and save yourself the import/export hassles.
Thanks,
Corprew Reed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
iD8DBQFGaJlzW0kvfAbUEZwRArZhAKD8y38eCUqPXltB7OH5vCeWtu8/SwCeLQpB
CQAY+6d4oF4zKbSwgdNnePo=
=Qz9a
-----END PGP SIGNATURE-----
Because it’s come up a lot in recent days, a list of ‘popular’ (really ‘prevalent’) cognitive biases from Wikipedia. A lot of what I’ve been doing over the last several days has been wondering why people misestimate things to the extent that they do, and how to help with that. So, it’s been relevant. A good contrast to cognitive bias is ‘field sense,’ which is some people’s ability to sense their surroundings at an apparent supernatural level. While I don’t aspire to the level that Gretzky has this, it’s a useful perception style to try to develop.
This blog is (again) getting up to steam after a couple of weeks of me working on various projects (and writing more elsewhere), and then vacationing and moving. I’ve also collapsed one of my other blog projects into this one.
Since I last wrote here, I’ve been doing a bit of program management for a company based out of Austin, TX (which is not related to the cognitive biases) and it’s been fun and entertaining although it’s a very short duration project.
More on Cruel 2 B Kind (previous post)– a blog post from the Seattle PI by Monica Guzman. The PI has someone covering the tech news who also blogs about what she’s up to when it overlaps with her professional responsibility. Here’s my overlap with her experience.
Then came the finest kill of the day.
Corprew Reed was playing on a team of one. As we came around a building — bam! He got us with a “Happy Bee Day” before we knew what was coming and became the fearless leader we all desperately needed.
I recognized Reed. He had spoken at Ignite Night not too long ago. Actually, I recognized a lot of the players, I realized.
[...]
“OK. We have two options,” Reed announced to the mass. “Either we sit and wait or we take this show down Broadway.”
Moments later we were on Capitol’s Hill main drag wishing everyone we saw a happy Handshake Day. Well, why not?
Since this is my blog, I figured to only include the parts that feature me. You should read the actual article for more information on the fun.
This is a social night for the local chapter of the American Society for Information Science and Technology. You should come by and check it out if that’s the sort of thing you find interesting. I’m chapter secretary for this year, and we’re doing all sorts of neat stuff that we’ll tell you about that the meeting or will be updated as relevant on our site.
Join us for some good company and geeky conversation next Thursday (5/10) at the Elysian Pub in Capitol Hill!
What: Seattle Monthly Meet-up, organized by the Pacific Northwest chapter of the American Society for Information Science & Technology.
http://asistpnw.org
Where: Elysian Pub, 1221 E. Pike St., Seattle, WA
When: 7-10pm, 2nd Thursday of every month
Once a month, we’ll get together to have drinks, chat, network, and geek out with fellow information architects, librarians, usability experts, user experience designers, and other like-minded people. It’s open to anyone, so bring a friend — especially those in other local organizations! The format will be casual, but all are encouraged to bring something to discuss — recent work, an interesting topic, or even your resume.
So, I was talking to someone today about their application (which was Ruby on Rails-based), and we had a long conversation about locking. There’s a couple of different sorts of locks that show up in software development, but there’s one in particular that mostly only shows up in enterprise software development, the Long-lived Lock.
Locks are used to keep other processes from modifying resources in the system. These can show up at a variety of levels ranging from Critical Sections (Java / Win ) that synchronize access to particular pieces of code, to database locks, which keep people from reading from or writing to rows or tables while operations are done.
However, all of these operations are for short periods of time. You can’t keep a read or write lock on a row in a database for an extended period of time (or in cases where you can, you almost certainly shouldn’t..) About the longest time a row in a database should be locked is to perform a single transaction (which may be spread between multiple databases, rows, or what have you, but the time is just the changes for the transaction, not all the time that people spend staring at a screen and enterting data before hitting the return key.)
But how do you let a user lock information for an extended period of time? For example, say the user is locking a row in the database that represents a document that they’re updating (a frequent setup in most ECM/DM systems.) Well, since that’s part of the ECM system, that should happen inside the logic of that application. It shouldn’t be achieved through database locking, but should instead be stored as information within the database.
It’s possible to set this up a number of different ways, but lets assume you have a document table document and it has, by convention, an id column that represents the primary key on the table. I’m also going to make the assumption that writing to a document is done by a particular user. Your application’s security system may vary.
So, let’s look at a table set up for locking on the document table:
And you just join this table in when you need to know if there are locks on a particular object, and you otherwise create and delete locks as needed. One particular thing about this sort of locking strategy is that you end up with expired locks accumulating on documents, so you want to clean those up, and also when you join in the lock table you want to have non-expired locks only.
Your app needs behavior about various things to surround this, like what’s the security model surrounding locks (who can know about them, are they on a user/group/role basis, etc…), and when can a lock be broken. Sooner or later, you’ll need to break locks, like for an employee on vacation who’s got documents locked or similar. But that’s all above the database structure and the immediate operations on the lock table, which I’m discussing here.
Well, that’s part one of three. The next segment will be the Ruby-on-Rails implementation I sketched out for my interlocutor, and the last will be some variations on and exceptions to this idea. I consider long-lived locks a design pattern, because it’s a recurring pattern in enterprise computing.
The first meeting (that I attended) concerning this fall’s ASIST PNW chapter meeting was held yesterday. Aaron Louie (chapter president) and I met with the UW iSchool student chapter president and vice-president. A year ago, I was just turning over the control of the student chapter to the next year’s president, and these people are one down the line from that (the officers are typically 2nd year students in one of the Masters programs at the iSchool.) Now, the students seem pretty young to me, but in some ways they did at the time as well.
So, I’m not going to go into details yet — that would be jumping the gun, but I think that we’ve got something very exciting lined up. We’d talked previously about how we can best revitalize the chapter, which had been faltering somewhat in recent years. I think we’re doing a pretty good job so far — the ‘Information People Get-Together’ that we’ve had the last two months at the Elysian has been going well, and the way we’re planning to run this conference (unconference style, with ask later sessions similar to Ignite Seattle’s) will be fun, exciting, and informative for folks. We’re on target for our goals, consistent in message, and serving our identified audience.
About that last sentence. (One of) The real benefit(s) from being at the iSchool for me was getting more in line with the User Centered view of the universe. Before the school, I had been largely feature/product/use case oriented (largely as a result of many years of dev background with light project management), and I think the iSchool helped better my sense of the overall context — both social and technical — in which systems exist and are created.
The last several months have been integrative of all the different things I’ve learned in different periods of my life. Someone remarked to me the other day (@ the ASIST Info Social Hour) that I sound like a consultant, but it sounds to me like I’ve integrated all the different things I know.
Last Saturday, I ‘won’ the first Seattle instance of ‘Cruel 2 B Kind,’ the Benevolent Assassination Game. I’m very interested in ARGs in general, so I was really looking forward to playing this limited time/scope version for a while.
The essence of the game is that players go around complimenting people, and these compliments serve as weaponry against other players. When defeated, players join the team that defeated them. This rapidly causes teams to grow and become larger, and the number of teams to become smaller. To onlookers, it’s an interesting and hopefully pleasant experience, and hopefully a good counterpart to the usual Seattle Chill.
My initial strategy was to stay at the borders of the site while the initial killing spree of compliments took off. I saw S-’s team in disguise a few dozen yards away from me, and sent fake dodgeballs out to attempt to get them to leave their fixed position to attack me (allowing me to sneak up behind them.) After I got bored of that, I went over to the main part of Cal Anderson park.
After a couple minutes in the park spent talking to various people about the game and not finding any players (but hearing large teams off in the distance), I started walking in to the main part of the field. I walked around the game for a while, using ‘internalized ki’ to reduce my normal presence level.
Eventually, it ended up with a team of about 50 walking towards me (one of three left in the game), and so I hid as comically as possible figuring that I would just be captured easily and join them. However, because the team was large and non-focused, they didn’t manage to attack in even a vaguely coordinated fashion, so I defeated them all with a challenge.
Since I was now team leader, we came up with a strategy to keep that from happening to us, which was good because one of the other remaining teams met us again and again (and we tied each time, which was funny.) I enlisted people to watch the front and back of the group, and we went up on broadway greeting people and eventually ended up on teletubby hill.
We faced off against the other group with us on teletubby hill in the park and them about 20 meters away. We regaled them with ‘You Are My Sunshine’ and ‘Red Rover,’ and then walked towards them to make the final attack. It was pretty funny, and the picture in this post is from right before the end.
So, I guess I made the single largest capture in the game, and my team was the biggest at the end. This was very amusing. I ‘won’ primarily because I had a huge amount of fun playing the game. I got to lead a group of 60+ people down broadway wishing people ‘Happy Handshake Day,’ and did a lot of fun running around. It also required me to engage leadership skills to keep the team organized as it walked, in a way I usually don’t in Seattle.
As far as ARGs go, this was a great activity to demonstrate the power of games. I went home and read a book on the subject of play. It was all in all an interesting time.
I’m very interested in organizing these sorts of events, and also using them for training and the like. But, it’s great to have the chance to just participate and play in one of them, and I’m very glad that Brady and Jen organized it.
Somewhat randomly, I got contacted by someone I knew from CTY today. CTY is a program for ‘gifted children’ that I went to back when I was a ‘gifted child.’ It was somewhat strange to be contacted by someone I hadn’t talked to in 20 years, but they were doing well.
Morgan Stanley threw down some economic forecasts that make for interesting reading, and probably should be read carefully by anyone in the US who uses money or buys and sells products and/or services.
We don’t pretend to have all the answers, but here are our guesses: Job gains have already slowed, and payrolls will continue to decelerate, but not fast enough to undermine consumer wherewithal. The housing recession is far from over, but strong global growth likely will sustain both output and employment. The productivity slowdown is cyclical, but the trend may also have slipped. We still think core inflation has peaked, but inflation risks are rising again. And margin compression implies that profits likely will stall in 2007.
If you live in the rest of the world and not the US, you should probably check out this report instead, which deals with the growing decoupling of the world economies from the US’, as the US begins to lose its unitary global economic superpower status.
On the surface, the latest global trends seem quite consistent with a decoupling scenario. America has slowed but the rest of the world has picked up. In particular, there seems to have been a meaningful shift in the mix of growth in the industrial world. The US economy has downshifted from 3.4% growth over the 2003-05 period to only about 2% over the past year while trend growth in Europe and Japan has accelerated from around 1.5% to 2.5%.
Looks like it is going to be an interesting time, at any rate. I’m adding this to this blog because I want to give a closer idea of the sorts of things I read from day to day. Currently, the economic decoupling of the rest of the world from the US Economy is a fascinating topic for reading. After Breton Woods, most of the world used the US Dollar as the chief medium of financial exchange. In the last several years, the USD has started to be replaced by the Euro for many things, and the US is no longer the unitary financial superpower. What this means overall remains to be seen, and many apparently discount this ‘decoupling’ idea, but with the rest of the world’s financial indicators going upwards while the US’ goes down, there may be sufficient pudding for this proof in the next several years.
I’ve been using this set of instructions to install ruby on rails on MacOSX for a while (in case you’ve ever wondered, which you haven’t, I use a MacBook Pro set up to run Windows XP and MacOSX 1.4.x.) It doesn’t work well for me, because I use ‘tcsh’ and not ‘bash’ as my shell on the computer. I also like confining changes to my own account.
So, I use the instructions given in the cited article, with the following difference.
Paths
Here, add the following line to the end of your .cshrc
For the rest, I replace all instances of ’sudo command’ with ’sudo tcsh’ followed by the command. More concretely, instead of:
curl -O ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6.tar.gz
tar xzvf ruby-1.8.6.tar.gz
cd ruby-1.8.6
./configure --prefix=/usr/local --enable-pthread --with-readline-dir=/usr/local
make
sudo make install
sudo make install-doc
cd ..
I do:
curl -O ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6.tar.gz
tar xzvf ruby-1.8.6.tar.gz
cd ruby-1.8.6
./configure --prefix=/usr/local --enable-pthread --with-readline-dir=/usr/local
make
sudo tcsh
make install
make install-doc
exit
cd ..
This has the advantage of keeping my root environment clean and running bash, which have been disadvantages to the other solutions I’ve seen for this sort of thing. There’s a related issue of whether you should be able to sudo a shell, but that’s not the point of this article to argue about — this article is about making sure you have the right environment variables when you type ‘make install,’ basically.
I haven’t provided exact conversions of all the sets of commands because if you can’t figure the rest out, you might want to switch your account shell back to bash to avoid more trouble later. In particular, you will want to execute the ‘rehash’ shell command on occasion.
One of the other things I’m involved in is I’m on the board of Ignition NW, a local community development nonprofit.
People were concerned that the tone of the Organization’s blog (which I’m responsible for as the secretary of the organization) was too formal, so I’ve started pushing out updates like this. It’s interesting how much the desire for tone varies, and it’s especially interesting when you have an organization like INW whose site is for disparate audiences (members, government agencies, other art and burning man organizations) who are used to seeing different tones.
Managing tone is in some ways as important as being consistently on message. I watch it carefully, because I’m prone to subtle jokes, wordplay, and sarcasm when talking. Anyway, if you’re in the area, you should come to the town hall tonight.
The Ignition NW Board had its spring retreat at Portaplaya’s family’s cabin two weekends ago. The board has two of these a year, where we review the stuff we said we’d do in the last six months, plan for the future, and write things on our knuckles.
The town hall’s tonight, and there will be a lot of information about the stuff we’ve done, and the stuff we’re going to do. There will also be presentations from Burners without Borders and community members talking about what they’re doing. If you join us in the CHAC Lower Level afterwards, you can probably get stuff written on your knuckles too.
The town hall is at the Capitol Hill Arts Center. 1321 12th Avenue Seattle. Starts at 7pm sharpish.
I’m currently pushing features out like a madman at gameofthe.net and related projects. There are actually four different applications (’website’) behind gameofthe.net, each of which displays information from amazon and various RSS feeds in different ways (’different slices of the same data.’) Over time, I think this will be an interesting resource for people to work with. I don’t know if this is a ’startup’ in the classical sense1, but it’s an interesting project for me from an information science/information theoretic point of view.
Stepping back, it’s been interesting how sensitive this project has been to its initial conditions. A few small changes in user sampling has caused things to go down completely different paths than I originally would have expected. I think this is, in many ways, useful as it enabled me to get user feedback early on. However, given a slightly different sample set of initial users, the direction would have been quite different.
1 In fact it’s not, as I’ve been at startups and this isn’t it.
So, the latest parts of gameofthe.net will be delayed for a couple of days because the power was out at my house last night. Since I mostly work on this application evenings, it will be a couple of days before time comes ’round again for me to work on it. Hopefully, I will be able to get to it on thursday, but it takes a couple of hours to roll out a new release (when it isn’t a couple of days of going off the rails like a crazy train like it was last time.)
Until then, I will work on the remainder of the new features list for 3.0 as I have time. Right now, I’m busy applying for jobs. Want to hire a corprew? I’ll send you a resume. There’s one online as well, but I have a couple of different ones for different sorts of jobs (as I can be everything from a Web Systems Director to a Tech Lead depending on the size and type of company you have.) Full time or contract.
One of the books that I should probably have gotten early on when I set out to learn Ruby/Rails for this site, but didn’t, is the Ruby Cookbook.
It has a lot of useful stuff in it, stylistically as well as performance-wise. It’s possible to find all the information in this book using google, but that’s probably a massive waste of time unless you’re being paid by the hour. I wasn’t, and I also had the goal of learning rails the hardest way possible, because I know that I learn better. So, after buying this book, I went back and took out some of the more inelegant madness that I’d put into the system.
In general, I really like the O’Reilly cookbooks, and this is a good contribution to the series.
Microsoft has made a kit available for Sharepoint that makes it easier to have taxonomy and tagging. The tagging allows authors to tag items and to also have controlled vocabularies on particular multi-valued properties. Users can incorporate the controlled vocabularies into searches and also search by tags.
In the default configuration, users cannot tag items on the fly (although I suspect that they could change taxonomy values if they have permissions.)
I used to work (engineering) at an ECM company, so using the phrase ‘controlled vocabulary’ in place of taxonomy for this is somewhat second nature. Since I took a lot of classification classes at the Information School, it’s interesting to see how companies implement these concepts. It could be interesting if these features became widely available in Sharepoint.
Someone is sending out spam with a return address of corprew.org that contains windows viruses. we aren’t this host, we don’t run windows, and no one here runs outlook.
I thought this might make an interesting counterpoint to many idealistic discussions of information. It discusses how information gets used, misused, and not used at a bank.
I was talking to someone today about my realization that a lot of my exposures to the ideas that are important in Enterprise Content Management and Knowledge Management were actually contained in the books of Robert Anton Wilson, a counter-culture article who recently passed away.
RAW had a lot of insights into how people’s minds worked, and although he’d have probably found this pretty horrifying or hilarious, a lot of insight into how businesses worked as a result.
The prime example of this is the Snafu Principle, the poem associated with I’ll reproduce here:
In the beginning was the plan, and then the specification; And the plan was without form, and the specification was void.
And darkness was on the faces of the implementors thereof; And they spake unto their leader, saying: “It is a crock of shit, and smells as of a sewer.”
And the leader took pity on them, and spoke to the project leader: “It is a crock of excrement, and none may abide the odor thereof.”
And the project leader spake unto his section head, saying: It is a container of excrement, and it is very strong, such that none may abide it.”
The section head then hurried to his department manager, and informed him thus: “It is a vessel of fertilizer, and none may abide its strength.”
The department manager carried these words to his general manager, and spoke unto him saying: “It containeth that which aideth the growth of plants, and it is very strong.”
And so it was that the general manager rejoiced and delivered the good news unto the Vice President. “It promoteth growth, and it is very powerful.”
The Vice President rushed to the President’s side, and joyously exclaimed: “This powerful new software product will promote the growth of the company!”
And the President looked upon the product, and saw that it was very good.
Don’t know if it was originally his, but his writings were my first exposure to the ideas of people changing how they transferred information in regards to the power dynamics around them. That idea has stuck with me, and as I’ve gotten to work more in the ECM and KM spaces, I’ve gotten to see how a lot of the software written there is actually about attempting to get around how people communicate in regards to power and threats to their own security (siloing, withholding, spinning, eliding.)
Anyway, I’d just thought I’d share that, because I was thinking about RAW and KM at the same time today.
After looking at the periodic table of visualization methods, a lot of people have commented that the periodic table is, in and of itself, a visualization method. Ironically, however, it isn’t apparently contained within the periodic table.
The question I have is whether the periodic table is a treemap with all equal size boxes, and ordering determined by the row being the number of electron shells and columns being the number of electrons in the unfilled shells. The grouping is by similar chemical properies (so, for example, Nobel Gases, Metals, Lathanides/Actinides, and etc… are all their own groups, and could also be described as the configuration of electrons in the shells.)
Passwords are not particularly secure. It’s hard to make, remember, and use one that is hard for programs to guess. Therefore, people who case about security frequently use a package called OpenSSH for logging into remote hosts to work instead of telnet.
I’ve been working with a bunch of people for the last while, who wanted to have their sessions logging into remote machines not be trivially snoopable and also wanted to not have to remember their passwords for a bunch of machines. I wrote up a set of instructions based on openssh that are designed to make this relatively straightforward.
This is how I normally use remote unix boxes — set my password to an ludicrously long, difficult to crack/remember/whatever string and then use ssh and public key encryption to actually log in. It works for me. These instructions are probably most useful for MacOSX and other unix (Linux) users (and MS Windows users using cygwin on the command line.) Most Windows users will want to use Putty, which is a fine package of encryption tools for that platform.
So, I was watching the iPhone’s release yesterday, and I thought the interface was very, very cool at first.
But after a while, I was thinking that I’ve seen touch-sensitive application dependent interfaces before. Eventually, it occurred to me that it was the computer interface from Star Trek..
This selection of PDA-like devices might be clearer. Apparently, the interfaces adapt to the application that you’re running, and are touch-sensitive on the same sort of black screen.
So, in regards to this, I have only one thing to say:
Also, apparently by posting about Jordan Weisman’s comments at Dorkbot Seattle, I managed to help people figure out who was doing the vanishing point game that started yesterday at CES. Interesting. Apparently, entry points to this game are in a lot of the vista filled gift/review laptops that microsoft has been sending out to high-end bloggers and others. It’s pretty hilarious, overall. I’m hoping to get into making these games in the next several months, at least as a small scale hobby. So, we’ll see what happens.
Also, I’ve switched from using ecto, which is a blogging program for the macintosh to using a blog program built into firefox. Ideally, this should help me keep things updated.
And that’s it for updates — I’m going to be updating a lot of the pages (as opposed to blog posts) on corprew.org in the next while, and I’ll probably post again when I’m done with that. Currently, there’s going to be more ‘review this specification’ stuff coming through the blog for the next while. I’ve got a couple of different ones on the queue.
In ISO 5963-1985, sections 1.3 and 1.4 define the scope of the document in a couple of important ways. Mostly, that it’s designed to help indexers index documents in ways that are helpful for users. It helps with this by providing a consistent set of guidelines for analysis that indexers use to promote useful indexes inside organizations and between organizations that exchange indexes. (A note from before that this document specifically deals with humans doing indexing, and not algorithmic indexing done by computers.)
So, the goal of the document is consistent subject indexing through making a guide to the document analysis and concept identification stages of the indexing process. To what degree is consistency possible, likely, or desirable?
Certainly some consistency is useful. As users of this index are (presumably) part of a domain or community of practice that actually exists out in the world somewhere, they probably have a common shared vocabulary that they use to describe things and the index should reflect that as much as possible. But, if you interchange between different groups of people, they will probably have different vocabularies for the same (or similar) things, and parts of documents analyzed may be more or less important, causing the subject of the document to change (with regard to the other groups.)
Depending on the type and scope of documents and the vocabulary used, indexer consistency may be quite low. In some ways, this mirrors the usual problems of recall versus precision when trying to retrieve information from a system. If the indexer has a relatively small set of terms that they’re choosing from, or is only trying to cover things in the broadest of terms, then it is easier to come up with common terms than if they’re creating their own terms on the fly or are trying to be very specific.
It isn’t clear, however, that lack of consistency between indexers is actually a bad thing. As long as the users are well-supported in their searches, which is the point of this exercise, why does it matter if the results are nonstandard. In section 1.4, they indicate that they’re specifically trying to standardize practice rather than results.
This entry is part of my ongoing blog entry series on specs and standards, done largely so I have reference to my thoughts later on. It’s also put up with the expectation that it will be helpful to other people. You’re welcome to comment on this, and I may make new versions of this document later that incorporate other remarks or just reflect my changing understanding. One particular note is that I won’t send you a copy of this spec — you should buy it or get it from your local (university) library.
Well, we appear to be safely up and running at our new home. I moved corprew.org from the old Static Factory Media hosting machine to a new home at dreamhost. Largely, I needed to install some demo applications I’ve written which require more resources than SFM’s host machine can spare for any particular client (it remains a good machine for general hosting needs and is operated by BizNik, who acquired SFM’s technical assets approximately six months ago and are excellent people whose services I highly recommend.)
SFM was an interesting company, with a lot of promise but plagued by many of the problems that plague startups. It didn’t really consistently keep a single message, but I learned a lot about running a business from it. I wasn’t one of the executives, but had experience running businesses, and therefore advized on various areas.
Over the years I was involved with them, I had to jump in and help with administrating the SFM machine a lot. So, moving my stuff off that machine represents an end for me to my involvement with my role there. It’s strange, and bitttersweet. Because there’s a lot of happiness to being done with particular things even when they have a lot of good memories, which being involved with SFM did.
corprew.org is in the process of moving to a new host. please excuse any difficulties between 12/29 and 1/3 as stuff gets transferred to a new site (and various new technologies.)
At some point in most people’s lives, they sit down and try to do something. There are a variety of methods of figuring out how people do things, but I thought I’d try and explain this in sort of an information science friendly way. So, here’s a lovely(ish) explanation of why you can’t get anything done (in information science, with pictures and diagrams.)
Obviously, actually doing something presupposes an actor, and generally, this actor is a person. This causes a number of inconveniences, but as of yet there hasn’t been much we can do about it. Occasionally, you get computers doing things on behalf of people and complicated rube goldberg machines with gears, rope, and most of the contents of a decrepit hardware store, but let’s not try to draw one of those. We’ll also ignore robots, because we know how that sort of thing generally turns out.
So, yeah, you have this person sitting around and then suddenly they get a crazy idea that they actually want to do something. If we can’t dissuade them of this, we’ll call it ‘performing an action,’ and put it on the diagram as follows. It’s customary to represent the action as an arrow with subject and object at the head and tail, but here we’re going to make the action a circle and the arrow can be doing. For example, consider bob goes to the store — typically, you’d represent this as subject ‘bob’ action: ‘goes to the store’, but here we’re representing this as subject: ‘bob’ object: ‘goes to store’, with the action ‘does.’
This is called ‘reification,’ and is like ‘objectification’ for verbs. It’s either a complex intellectual process or a handy trick for making diagrams. At any rate, it’s a suitably long word.
You might say okay, that sounds fairly reasonable and non-problematical, but it just gets more complex from there.
Most people have a very complicated view of the world, but despite that manage to get through life without too many unusually horrible things happening to them. There are occasional exceptions to this, like people whose worldview consists entirely of ‘stuff’ or desire for products advertised on television. In general, though, people’s conceptual world consists of cloudy masses of ‘things’ and ‘stuff.’
Every action requires specific pieces of information do to it. For example, getting to work in the morning generally requires one to know where one’s pants (or pants equivalent) is.
People, with the possible exception of astronauts, don’t operate in a vacuum. They generally operate in a context, which can be summed up as ‘all the information that they bring with them to a particular task.’ This context is accumulated throughout people’s entire lives, and can conveniently be thought of as ‘why other people are wrong.’But are people actually wrong, or do they just have different opinions on what makes up reality?
The last part of this is the domain, which reflects that people, even thought they may be doing the same action with the same information and same context, may do something entirely different depending on the ‘field’ they’re working in or trying to serve. To use a somewhat gross simplication, the action “this hamburger should have cheese on it” may have a very different response depending on the environment (sort of restaurant) the person is working in, even if it’s the same person.
To top this off, things like Cognitive Work Analysis divides domain up into about 7 different layers, all of which come together in a big bullseye, but I’m only making so many lines in these damn diagrams. Personally, I think about this as as a massive oversimplification of this and call it the PICAD model, because calling things ‘ACRONYM model’ makes it seem more impressive than a bunch of circles with arrows in a box.
I went to the gaming session of Seattle dorkbot last night. I went to two of the sessions, and then spent the third in the bar. The third session was a real yawner, correctly identified as such by Ario, that made me sad, as I’m very interested in the topic of “Games for Social Change.” Here’s my notes from the thing, as it might be of interest to readers, I mainly took notes on Jordan Weisman’s session on Alternate Reality Games.
To some extent, it was a marketingish presentation, although as you can see JW is particularly rife with geekcred.
So, the notes:
ARGs tell stories interactively. The premise began for them, based on the Kubrick/Spielberg movie AI, because they’d been licensed to make a number of games based on the product, but the movie wasn’t particularly given to making games. Instead of making games based on the movie itself, they made it based on the universe that the movie took place in.
Their question was, how to tell that story. But they came up with an idea based on the narrative structure organic to the web.
This is essentially a community effort, the people who take place in the exploration form a ‘hivemind‘ in response to finding shards, and tell stories to each other. The story goes from being the original narrative to being a consensus narrative that comes from the audience’s experience.
The community effect produced is that the hivemind has every skill on the planet, and it can go everywhere and do everything and anything. It has essentially any skill on the planet. It is also, by that same factor, smarter than the people writing the game.
Use life as a game board — it took place all over the world.
radio drama told on payphones — fragments of the story were released as people talked
Name of game… campaign for prerelease of Halo 2
This is, in its essence, pop culture hacking, it’s about about the audience crwating fiction and inseminating your references into their everyday consciousness. However, this is against the everyday experience of marketing staff — they want to put up as much collateral as possible and advertise it’s existence as widely as possible to get as many people to notice as possible. But that turns out to not work well with getting people to want to experience this, what you want to do is draw people down the rabbit hole.
How to get audience in? Spend time creating content, not telling them about it.
Allow communication about shards of content to draw people in… People will start looking with a few small clues.
highlights of their work All of their big campaigns have led to marriages, because collaborate and share rahter than compete, story drives communities, competition drives individualism. This is, to a large extent, their goal — the building of a temporary community, possibly tied to awareness of some product or service that people make them make the game for. It’s an interesting balance between entertainment, advertising, and ‘using the real world as the gameboard.’
William Gibson’s Pattern recognition was a tip of the hat to I love bees. I’m wondering if WG’s perception of ARG makes this a must-read for anyone interested in ARGs. I’m probably going to pick up the book in the next couple of weeks to find out. If anyone has any opinions on that, please feel free to let me know via email or comment.
One thing that JW mentioned at the end of his talk, and I suspect that this was a deliberate seed effort of his, was to say that if you were in front of the Bellagio during CES on 1/6/2007, you might see something interesting in the fountains. Anyway, in the spirit of thanking him for coming to the event, I thought I’d pass this on.
Narrativization is a process by which we help provide the context that takes things from being a mere series of events to being story and history. However, this narrativization may not be limited to effects strictly within the text. It may, in fact, function as a version of hypnosis, according to Scott Adams, when it works in appeals to different senses and sense perceptions.
Neal Stephenson, who has used Bicameralism as a plot device, was the first person who came to mind, so I flipped through Cryptonomicon looking for stuff that met Scott Adams description of the techniques. A bunch of the lengthy digressions that sort of litter Cryptonomicon are full of the sort of appeal to the senses that Adams describes, which makes me wonder whether it was a deliberate technique for manipulation or just an accident of style. Some apparently would claim that Tolkein did something similar.
It makes me wonder how much of all this is tied to the overall topic of framing the message, though.
One of the better google mashups I’ve seen recently. It remixes information available from a variety of epidemiology sources with google’s now ubiquitous mapping program.
Get Your Daily Plague Forecast:
A new website mashes health data with Google Maps to track global disease outbreaks. By Seán Captain.
It’s an interesting mashup because epidemiology information is very hard to assemble into a coherent picture, being based (as it is) on data about people in particular locations and suchlike. Reports linked to maps is probably clearer than the old agglomeration of report style that it used to use, especially given that you’re talking about locations on a global scale.
After the playa this year, I dropped a friend who’d rolled his vehicle off at his house in Corvallis, and then went out to the Oregon Coast and cruised up it into Washington. This marks the completion of the entire length of 101 for me, although I plan to redo Corvallis -> Bay Area at some point.
While I was out there, took some time to see the major sites and enjoy some of the parks in the area. It was a quick trip, but felt good after the two weeks of heat on the playa.
the session today has a lot of stuff based on the concept of “the distance between two arbitrary faces of a hyperdimensional cube,” where they actually mean more like hyperdimensional rectangles from what i can tell. There’s apparently some benefit to this, but I like to think of it as ‘rocking out on the hypercubes.’
because the hypercubes, they rock out. Aside from that, it’s straight up XML Element Retrieval. Feel free to observe the exhibits carefully, don’t touch the points of the <, they’re quite sharp. Please take extreme care not to become entangled in the forest of literal references, the & are quite difficult to detach from one’s clothing once they become caught.
The tendency to use Σ and Π as iteration operators has made for crazy space madness algorithm writing on the slides over the last coupe of days. I’m going to be going back and looking at a lot of the math that i’ve forgotten over the last couple of years. Intense. Learned a lot. It’s about over now, a couple hours are left.
I’ve learned a lot at this, and gotten a couple new ideas about things I should be working on learning. Fun. Now, on to the Semantic Grail meeting tonight.
Adapting Ranking SVM to Document Retrieval
Yunbo Cao, Microsoft Research Asia
Jun Xu, Nankai University
Tie-Yan Liu, Hang Li, Microsoft Research Asia
Yalou Huang, Nankai University
Hsiao-Wuen Hon, Microsoft Research Asia
traditional: tf, idf, document length
currently: page rank, structural features of document, others
future: ??? Relevance SVM ???
Check it out, they gave a similar talk on Relevance SVM at the WWW conference. The talk itself was hard for me to understand due to speaker ESL issues, so your guess is as good as mine.
Document modeling is important to any IR approach — the bag of words approach assumes word independence, and this is simple, but inappropriate to natural language. There have been a bunch of approaches to this sort of thing in the past, but here’s a relatively new one that does well versus various TREC collections.
The presentation was largely a crawl of the paper section by section, and I’m going to emulate that approach by just referring to the paper so you can have that experience.
However: it beats previous models because it maps { document vs. topic } for all topics and documents, as opposed to the cluster approaches, for example, which largely assume that all documents belong to one cluster, or for many practical approaches, belong to whatever cluster it matches best. Because documents belong to n topics with probability p(d[i], n), this is better than searching against bag of words models.
Context-Sensitive Semantic Smoothing for the Language Modeling Approach to Genomic IR Xiaohua Zhou, Xiaohua Hu, Xiaodan Zhang, Xia Lin, Il-Yeol Song
It is possible to disambiguate homonyms in a probabilistic manner by using Topic Signatures that let you identify which of the topics that the questionable-homonym is actually retrieving. Using Topic Signatures is also more effective for finding documents than the ‘bag of words’ model.
so “‘terms’ -> ‘Topics’ -> ‘find documents for topic’” is more effective for both precision and relevance than “‘terms’ -> ‘find documents for terms’” Doing this topic model is called ’smoothing’ or ’semantic smoothing.’
My reflection on this is that it’s a lot like using an automatically built controlled vocabulary for and mapping both documents and terms to this algorithmically. Strangely, this presentation reminds me a lot of a math-intensive version of Jens-Erik’s classes on Indexing, but, I suspect, only if you’ve already heard JEM talking and have that context.
It works better than WordNet (according to a person who asked a question), because it uses math to eliminate ambiguity of meaning.
Anyway: I plan to read the rest of their stuff. It looks interesting. It would be interesting to see what sort of ontologies (InfoSci sense) can work it with. However, nothing to do with Genomic IR I can see other than that’s probably the non-described domain they’re using.
Goal: figure out how to make abstract data models and languages for implementing IR models. (BIR / Language Modeling / Poisson Model are the three existing models.)
This was a very strange talk, that explained a paper they’d written. It had many fine visual aids, including dice describing the different sorts of models. The dice had different things on them for the different models.
BIR has a die for each term with one side for each document, the side has a one or zero on it, you judge probability that way.
LM has one die with a side for each word-position in the collection, with what word’s there.
PM is like BIR, but it involves numbers other than one or zero on a side?
But what’s the point, really? This is fairly obvious based on the normal expression of these models… what does this demonstrate:
1. BIR and PM assume the collection to be a set of non-relevant documents, the LM assumes you’re selecting terms from relevant documents.
2. Poisson Bridge: see the paper for this, I dropped out of applied math grad school for a reason.
3. BIR and LM deliver the same ranking if the BIR term weight == LM term weight. This seems unlikely to happen, but is true.
4. TF-IDF. If you apply the poisson bridge to BIR + LM, you get TF-IDF.
This was more an ‘interesting alternate view’ of probability than anything new. It was pretty enjoyable though.
Semantics: Knowledge and Domain of Clinical Medicine — Jimmy Lin, Dina Demner-Fushman
Their approach is “more than just a ‘bag of words,’” and explores the role of knowledge in specific domains. ‘bag of words’ refers to the classic IR technique of searching for and ranking results on word order, frequency, probability.
Parts of a problem: Problem Structure / User Tasks / Domain knowledge
a framework for clinical reasoning, and an instantiation in the domain of clinical medicine — retrieval of relavant articles in medline. Concept-based retrieval in this sense moves from matching on keywords to matching on ‘little knowledge elements.’
What are these LKEs? Things that are extracted from documents, little chunks of inference, stuff like that. They have built a series of extractors that is described in an AAAI article from the same authors. effectiveness: 90%
Why medical? Evidence-based Medicine is a paradigm that emphasizes evidence based on clinical research. In this framework, the documents describing the research done provides the LKEs.
Then: use a ranking system to go back over, compare to medline’s original and a current state of the art article. See chart in actual talk for comparison.
Result: Knowing the problem structure helps a lot, better than purely statistical methods. This may only be applicable to medical domain, and quality, depth and size of resources for other domains.
I will be at SIGIR 2006 this Monday through Wednesday on the University of Washington Campus. Although I’m going to be doing a lot of networking and attending conference sessions, this would be a pleasant time for lunch and hanging out if you happen to be around the campus/UD. SGIR is the ACM special interest group on information retrieval, and is interesting and fun if you find that sort of thing interesting and fun. At any rate, I hope to learn a lot. I’ll be around as a conference volunteer on Thursday as well — they were low on volunteers and I wasn’t doing anything in particular that day.
A couple of weeks ago, the Seattle Memorial Temple crew asked if the the NW Rangers could help with the procession from the 2112 house where the bluehouse slayings happened to the place in Seattle Center where the temple was being installed proper. I went out with a couple of other rangers to help them.
It was a fairly emotional day. I know that excitement has been running high during the temple day, but a lot of that was replaced in the installation of the temple with a sense of the purpose that folks were there for. Pictures in the set include various views of the procession, the temple, the Seattle police chief, a bunch of temple/space needle views, and one or two pictures of friends.
It was good to see people out and about, and to see the installed temple. I put up a view of the templee with the space needle in this post I thoguht was nice. The procession was good, a nice 2.5 mile walk. There were a lot of police escorting us, many more than I would have expected for a parade of this size. I suspect that was related to friday’s killings.
It’s weird, I’m largely past the 2112 events– there’s been a huge amount of personal perceptual time since those events, but the story that someone read at the end of the events about someone dying and finding themselves in a dust storm moved me to tears and just thinking of that story still does, probably will for a while.
Indexing schemes are largely similar in their outputs – the right answer to many questions, such as ‘controlled vs. non-controlled,’ how many terms to use, and how expressive or deep the vocabulary should be are all matters of indexing policy considered on a per-collection basis and more or less well-defined in the specification (ISO, 1985). The biggest difference in indexing methods is whether they have an objective or subjective view of the world. Much of the difference between methods after that comes down to how objectively or subjectively they view the world, and what the right approach to dealing with that is.
The most subjective approach to indexing is ‘social indexing,’ which makes no assumptions about underlying reality beneath the words it uses for indexing, and in fact many of the original proponents of it as a method felt that this was a benefit, because each user may mean something different by tags (Guy & Tonkin, January 2006). Folksonomy proponents are backing down from this extreme approach over time, but denying a possible shared reality is as far towards subjective as possible.
A large distinction is between document-oriented and request-oriented indexing, with document-oriented indexing using the properties of the document to describe the document and request-oriented indexing considering what the likely requests of the user community are likely to be (Fidel, 1994). Another formulation of request-oriented indexing is tying indexing to supporting the information-seeking behavior of the users; indexing in this sense is about making it possible for users to find documents that satisfy their information needs (Hjørland, 1997).
Document-oriented indexing and request-oriented indexing are present in all collections to various extent. In general, the more general the collection (the broader the scope of the collection and of the population the collection serves), the more oriented on the properties of the document itself the indexer has to be. This is because the broader the scope of the collection, the less adapted the indexing can be to particular topics. Both the depth of indexing and specificity of meaning of terms have to be limited in a more general collection. (Hjørland, 1997) For a collection more limited in scope, more attention can be paid to supporting the sorts of requests that will be coming in from a user community. (Fidel, 1994; Swift, Winn, & Bramer, 1979) Document-oriented indexing is more objective than request-oriented indexing, because it presupposes (or at least pretends) that there is a single reality that the documents are being described in relation to. Request-oriented indexing is more subjective, because it admits the concept of communities of practice (or discourse communities) with their own conceptions of the world (Hjørland, 1997; Mai, 2001).
The process of indexing is in fact more subjective than this, because the subjectivity of the indexer must be taking into account – there is a series of interpretations that the indexer makes in moving from text to a representation in the subject index(es) (Mai, 2001). The indexers’ interpretation can be informed by the domain, which is called domain-centered indexing. When analyzing a document, domain-centered indexing starts with an analysis of the domain and then moves on to the users’ needs (and indexer interpretation) and then role of the document with regards to these (Mai, 2005).
Indexing approaches vary in a number of ways, but one of the most important is how subjective their view of the world is, and how granular that subjectivity is. Document-centered indexing is least granular (most objective), with a single representation claiming to serve all needs; after that is domain indexing, which considers documents in light of their role in a domain (although other information is taken into effect.) User (or request)-oriented indexing considers the information needs of the users as primary, assuming that the users of the catalog have specific purposes in mind for the catalog. Democratic indexing goes further in assuming the fragmentation of reality, assuming that there isn’t consensus in meaning among users, except possibly in aggregate.
For indexing practice, which of these methods should be considered is tied to the collection. The more focused a collection on particular users or domains, the more tied to requests or domain analysis the collection can be. The more general the collection, the more likely it is to need an assertion that there is a single objective view of reality (document-centered) or that there isn’t a meaningful consensus (social-indexing) – and the decision between document-centered and social-indexing may be more usefully made on other means, like whether the collection is developed (like a library or a set of resources chosen for a purpose) or arbitrary (like pages taken off the web.)
bibliography
Fidel, R. (1994). User-Centered Indexing. Journal of the American Society for Information Science, 45 (8), 572-576.
Guy, M., & Tonkin, E. (January 2006). Folksonomies: Tidying up Tags? D-Lib Magazine, 12(1).
Hjørland, B. (1997). Chapter 2: Subject Searching and Subject Representation Data. In Information Seeking and Subject Representation: An Activity-Theoretical Approach to Information Science (pp. 11-37). Westport, CT: Greenwood.
ISO. (1985). 5963: Documentation — Methods for Examining Documents, Determining their Subjects and Selecting Indexing Terms. (No. ISO 5963-1985): International Organization for Standardization.
Mai, J.-E. (2001). Semiotics and Indexing: An Analysis of the Subject Indexing Process. Journal of Documentation, 57(5), 591-622.
Mai, J.-E. (2005). Analysis in Indexing: Document and Domain Centered Approaches. Information Processing and Management, 41(3), 599-611.
Swift, D. F., Winn, V. A., & Bramer, D. A. (1979). A Sociological Approach to the Design of Information Systems. Journal of the American Society for Information Science, 30, 215-223.
For the last couple of days, I’ve been working on a couple of interesting things, of which the most important has been getting my resumes together, and the most interesting of which has been getting rubyonrails installed on my Macintosh.
Rails is interesting, because it handles the idea of dispatching incoming web requests in a way of which I greatly approve. I spent a lot of time in the late 90s trying to convince people that this sort of URL syntax was the right way to do things, and now through rails I’m feeling vindicatedish. Actually, I hadn’t really thought about it until I was talking to an ex-coworker who remembered me talking a lot about the Object-Action syntax being a good start on the Model-View-Controller. At the time, lot of pages named stufff like performSpecificActionOnSpecificObject.extension were more common. Anyway, this sort of thing leads itself to cleaner code design in a number of ways (including patterns, etc…) so it’s the sort of thing that generally ought to be encouraged.
I’m doing a project in Ruby at the moment mostly to get my hand back into programming. Because of the sorts of work/school I’ve been doing for the last while, I’m much much better at design and analysis than I’ve ever been, but my hacking skills are a little weaker than I recall. Learning a new language in a different paradigm is very useful, especially something that’s rich in hacky synax/generator crap like Ruby seems to be at the top level.
More specifically, I’m taking on a social networking light application. I’ve always wanted to do a social networking application, and during the job search seems like a good time.
Section 1.2 of iso5963 just discusses the relevance of the document to the community that might want to use it. It’s common for standards documents to have this sort of information in it. A lot of them specify the duration of the standard, what standards they replace or augment, and how generally applicable the standards are. This is important information when you make a standard because it’s the important to say what the standard is, who it is for, and how long it is in effect for.
However, I’ve always found the wording ‘can be employed by any agency by which human indexers…’ to be pretty funny, because ‘agency’ can also mean ‘means.’ So, it’s also a play on words, although I doubt that that was their intention. Personally, I am subject to compulsive misprisions, which is what that sort of misreading is technically called.
Being prone to verbal misunderstanding at a somewhat compulsive level makes doing the indexing thing fun, probably more fun than it should be.
ISO5963 deals with indexing, which is the process of representing a document by a couple of subject terms. The specific parts of the indexing process it deals with are “examining documents, determining [the documents'] subjects, and selecting appropriate indexing terms.” (1.1) It doesn’t deal with the other parts of the indexing process, which is usually called ‘indexing policy,’ and reflects broader issues like what sort of vocabulary to use, are indexers using thesauri, folksonomies, simple word lists, or what have you; how many terms to use; and stuff. There are also other things like how the system will be presented to the users that are driven by the underlying system. This document is just about the determining the terms to use.
One of the important points about this is that documents in a collection are represented by their subjects in an index. Why would you want to do this? It’s really hard to read all the documents in a collection when you want to find a specific one, or to find a document on a specific topic. To make this faster, the subjects of a document are ‘extracted’ from the document, and these subjects are put into an index. This index is then presented to the user in some form, common forms are hierarchies, alphabetized lists, and search engines that include things like ’subject’ or ‘topic’ as a field that one can search on.
well, now that i’ve graduated, i suppose i should start posting again here more often. Daily posts will start again in about a week. I’m queuing up some content that, while it isn’t going to thrill anyone to death, will at least be somewhat relevant to the topic at hand.
Originally uploaded by msim2006 (a friend of mine from one of the iSchool’s masters programs.) It’s me (r) talking to Dave Winer (l) about something or other. Eventually, someone’s going to do a ‘1000 kung fu postures of Corprew’ given the way I move my hands around when I’m talking.
This is a photograph from one of the sessions at Mind Camp 2.0 that I participated a lot in, it was called ‘del.icio.us inside,’ which was a presentation about delicious-type stuff and its possible role in corporate intranets. The main talkers there were dave winer, some ms guys, a msim guy, and myself. It was an interesting time and I learned a lot. Apparently, though, my tendency to ask difficult questions that derail conversations and agendas continues.
The second was called ‘a penguin is a bad bird,’ and was about Lakoff and basic-level categories (I picked the name, which seems to be a generic example of prototype effects.) It was at 9:00 AM in the morning, so I warmed up the largely comatose people who came with a category theory-based comedy routine. I think saying ‘category theory-based comedy routine’ is pretty much admitting you have a problem, but the people who were crazy enough to be there seemed to have enjoyed it. It was originally going to be ‘a penguin is a bad example of a bird,’ but there wasn’t enough space on the schedule.
Overheard (as reported to me):
guy1: “What is this penguin one?”
guy2: “Someone said that guy is awesome but has an weird sense of humor and it would probably be worth going to if it wasn’t sunday morning.”
So, it went pretty well. There was something I wanted to go to at 66% of the sessions, and the other sessions I just hung around and hung out with my friendsnetworked with colleagues. I had a lot of fun, and I’m getting the hang of this self-organizing conference thing. I’m also having better luck at choosing information science topics to talk about in that sort of combo presentation/rap session format. The next mindcamp is scheduled for November and I’m going to try to be more involved in that one — this one had too much grad school work in big piles around it for anything to happen other than just attending and gibbering. Ran into people from Hackers, BM, Stronghold, and various other things.
I thought it would be interesting to give a presentation about some of the ideas (Prototypes, Basic Categories) in Lakoff’s Women, Fire, and Dangerous Things at MindCamp 2.0 in Seattle.
A penguin being a bad example of a bird is sort of a running theme in almost every presentation on prototype categories I’ve run into. It’s clearly a bird, but it doesn’t have all the attributes that most people would think of if you asked them to describe a bird.
Also amusingly, I was clearly in the low-rent porn and love district when I gave this presentation (which was on Sunday morning at 9am.) I’d like to thank the people who actually came out for it, it was a good time.
I think this quote from The Name of the Rose by Umberto Eco best sums up my Indexing and Abstracting class and in many ways life generally at the moment.
“I have never doubted the truth of signs, Adso; they are the only things man has with which to orient himself in the world. What I did not understand is the relation among signs . . . I behaved stubbornly, pursuing a semblance of order, when I should have known well that there is no order in the universe.”
“But in imagining an erroneous order you still found something. . . .”
“What you say is very fine, Adso, and I thank you. The order that our mind imagines is like a net, or like a ladder, built to attain something. But afterward you must throw the ladder away, because you discover that, even if it was useful, it was meaningless . . . The only truths that are useful are instruments to be thrown away.”
I love this class, incidentally. Life, also, is fairly beautiful
ISO5963 (ISO, 1985) states indexing ‘extracts concepts from documents by a process of intellectual analysis.’ Any sort of intellectual analysis involves context – at least the context the indexer personally operates in. Hjørland (Hjørland, 1992) distinguishes between ‘content-oriented indexing’ and ‘request-oriented indexing.’ This division also describes what additional contexts are important to the indexing process.
‘Request-oriented indexing’ means indexing documents in a collection in relation to requests that will be made against that collection – using the terminology of whatever field is the object of study. Request-oriented indexing has a high return on investment when the purpose to which the collection to be put and the user population is well-known – both the context of the collection and the user population can be used to develop indexing, both for particular documents and the overall collection policy. Using large amounts of context in this manner for subject indexing would seem to be the clear winner, but it fails to work generally.
For request-oriented indexing to work well, there are several suppositions. First, is that the purpose of the collection remains stable over time. This may fail for several reasons, such as the gradual evolution of a field or members of a different field using the collection (ISO, 1985). Second, that the terminology and object of study of a field remains constant over the lifespan of a collection; but, most fields change over time and it is generally hard to know when creating something how long something will last. Third, the supposition that there is a single shared useful context shared among all the users of a document that will make retrieval in the index system possible. Although it would be possible to index documents within many possible contexts, it becomes an exponential problem as the number of contexts and documents grows (Hjørland, 1992).
Subject indexing is a representation of a document; the purpose for that representation is finding and using that document at some future time – ‘search for items with potential’ for various purposes (problem solving, meeting information needs, gaining general subject understanding) in Hjørland’s formulation (Hjørland, 1997). If the document representations aren’t effective, the searching based on the representation will be of low quality (Mai, 2000). Considering searching as the goal, subject indexing can be analyzed in terms of searching’s metrics, recall and precision. When the indexer is able to focus specifically on analyzing the user’s domain, the user can have high precision and recall. However, as the indexer’s analysis of the documents in a collection moves out of alignment with the user’s context, either precision or recall will drop. Recall drops when the indexer chooses terms that are different than terms the user chooses for searching on a particular subject. Precision drops when the user and indexer categorize the subjects differently.
What determines the right amount of context to use? With a more general collection, the domain will be less specific and the collection less purposively unified and the indexer must make less use of context, leaning more towards a document-centered approach to serve the searches of a broader audience. More specific collections, however, can use a more domain-oriented approach to indexing more aligned to the context that the documents exist in to better serve their specific users.
Bibliography
Hjørland, B. (1992). The Concept of “Subject” in Information Science. Journal of Documentation, 48(2), 172-200.
Hjørland, B. (1997). Chapter 2: Subject Searching and Subject Representation Data. In Information Seeking and Subject Representation: An Activity-Theoretical Approach to Information Science (pp. 11-37). Westport, CT: Greenwood.
ISO. (1985). Documentation — Methods for Examining Documents, Determining their Subjects and Selecting Indexing Terms. (No. ISO 5963-1985): International Organization for Standardization.
Recently, a bunch of evites that have gone out where a mailing list got invited and it had instructions for people to invite themselves to the party and then RSVP to the invite that they send themselves. That works, but it’s complex and usually a couple of people end up RSVPing for the mailing list as a whole and other craziness ensues.
Here’s some instructions on how to use evite’s suggested method for doing this — giving the event a URL of its own, and then mailing the URL to people.
1. choose the ‘Change Reminders/Include Event Link’ option. This is usually on the right side of the website’s content a ways down. Depending on your template, it will look something like this, but may vary slightly based on the graphics in the site and other things.
2. Enter the name of the party at the end of the URL. After you select that, it will cause a new area to appear further down on the screen where you can schedule reminders and insert a party name at the end of a URL.
If you put something there (like gradparty) as seen here, you’ll then be able to mail people a link to the URL for the party, and they’ll just be able to rsvp there, and they won’t have to go through the RSVP process.
3. Profit For example, for this (non-existent party), you can send out an message with
http://www.evite.com/app/publicUrl/evite@corprew.org/gradparty
Over at The Universe of Discourse, MjD has been doing a monthly wrapup of search terms that people have been using to find content on his site. I thought I’d inaugurate that feature here, and do it on a monthly basis. I’m doing it early this month because the next four weeks are going to be an unremitting hell as I attempt to finish grad school. Here goes:
“Lauren Classification” — this was a relatively frequent search term this month. The Lauren Classification is a method for classifying tumors. I have a blog post about classification that my friend Lauren commented on, so this is an entirely useless search result.
“verbs” and “vodka” — these were pretty common searches, and I can’t figure out what they were looking for on my site with the possibility that
“sample stemmers” — I do actually have sample stemmer implementations on my website, so it was presumably a happy day for someone.
“/tmp/spamd_full.sock” — I actually have an article about card tricks and development metholodology that has the answer to the usual problem contained within, so assuming they looked through my purple prose, they found what they were looking for.
“pop culture management” — possibly a good idea, but nothing will help you with that here.
“aboutness” and “pre-coordinate” remain things that people come here a lot for. For a while, my post about “pre-coordinate” vs. “post-coordinate” was the #1-#3 entry on google on the topic, beating out more definitive answers like the spec and various professional societies in my field.
There were a bunch of other search results, but most of them were either to fiction that I’ve written (and which is contained elsewhere on this site) or they were fairly quotidian queries that people more or less got the answers they wanted for.
Someone asked me to move a post about what I thought of the IA summit on a public forum, presumably so they could refer to what I said, but you never know. here goes…
I was sort of struck by the way that the conference sessions were running, in that lots of people were grabbing on particular strategies (design patterns, for example) and running with them. It struck me as both a good thing and a bad thing. Good, because it will improve the overall quality of IA, bad because it seems to be a easy hit for toolmakers. Coming back down, for example, I figured out a way to make tools that generate a company’s commonly used design patterns in a relatively straightforward fashion given a couple of assumptions about their site architecture and database storage (but only assumptions found in LIS 540-543.)
What that means is that at some point a tools vendor will come along and make a lot of the things that we’re seeing know available as part of some suite. So, if I have to guess based on previous examples, it means that there will be a large growth in ### of IAs over the next two years and then probably go back to around the number that we have now after that as things move from art to craft. I dunno, I’m guessing that’s not what I was supposed to take away from the conference. I did learn a lot about a variety of different things that I hadn’t been exposed to much, like design patterns for websites and a lot of things about tagging that I hadn’t considered.
Personally, I think that the most insightful presentation with the biggest implications for design of projects whose information organzation deliverables have long expected lifecycles was the presentation by Campbell and Fast, described here: http://www.iasummit.org/2006/conferencedescrip.htm#164 I think that their work had insights into the nature of the lifecycle of websites versus the use of information organization that was pretty useful, and I’m hoping to get more details on it over time.
There are some references to iSchool specific stuff here, LIS54# classes are all databases and information retrieval systems.
I found this interesting, reminded me a lot of the book Women, Fire, and Dangerous Things by George Lakoff. Lakoff writes about prototype categories, where category membership is a fuzzy concept. Explicit rules for category membership, like ‘terrorists buy one way tickets,’ ‘pit bulls are dangerous dogs,’ and other concrete rules like that are less useful in establishing membership in categories than general rules like ‘is there something suspicious about this person?’ or ‘does this dog (or the dog’s owner, apparently more useful) have a history of violent behavior?’
The key, apparently, is that some genearalizations are stable and some are unstable. Unstable generalizations are things like ‘drug smugglers buy one way tickets’ and ‘pit bulls are dangerous dogs.’ Drug smugglers can change their behavior, and it used to be other dogs that were the dangerous ones. (It turns out, according to Gladwell, that the most dangerous kind of dog is the kind that people buy to seem dangerous themselves, and this has varied from era to era.) So, the key to building a category is to figure out what are the stable generalizations and which are the unstable ones. Gladwell gives examples ranging from terrorists, to NYC subway searches, to dogs.
Anyway, I thought it made for interesting reading. I recommend it.
So, for those of you who don’t know, I’ve been working part-time at a local company to help pay my way through grad school. That’s actually a simplification of the actual truth, as I’m a part-owner of the company and I also am mostly getting benefits more than cash, but for now I’m the main system administrator on one of the main systems they run.
For the last bit, I’ve been tracking down problems in the spam checking software that we use, and it’s been a merry time. Most of the problems have been getting everything on the server to be in a single known compatible state, which is a concept I greatly commend to you if you’re running a server and don’t want to spend lots of time messing with it.
Today’s project was figuring out the source of and eliminating a bunch of error messages that get mailed out to the administrators’ mailbox every night. They’re known harmless, but it’s just aggravating and it might hide other problems.
So, I was looking through the codebase, and I found this little gem:
You might ask yourself what that does. It’s pretty easy to figure out… it counts the number of instances of processes match ’spamd ‘ followed by ‘popuser’, which is useful for figuring out whether or not spamassassin is running on your server. It’s part of 4psa server assistant. However, this may not work depending on how your server is configured. On my server, this never works because of how ps does its output.
My main point here is that that’s a crazy way to write that code. What the person is actually trying to do is make sure that they’re only getting the main spamassassin process and not any of the child processes. The child processes display as “spamd child”, the main spamassassin process displays as something like “/usr/bin/spamd -u popuser -d -m NUMBER -x –virtual-config-dir=/MAIL/DIR/FOR/YOUR/SERVER/%d/%l –socketpath=/tmp/spamd_full.sock”. So, they’ve got to distinguish between those two lines, and they’ve decided to check for random text in the first one, and written a fairly complex little shell script (calling awk twice!) to do so. They can’t check for just the word ‘popuser’ because it might appear in the path, in case you were wondering.
This checks for all spamd processes, and just eliminates the ’spamd child’ processes first. Why this way? If you’re trying to choose between two things, and one of them changes from system to system, and one of them is fixed and simple, you probably should try to select the fixed one.
So, here I didn’t want the fixed ones, so I eliminated (’grep -v’) them. It saved me from having to try to pick the one I wanted. It’s generally as easy to select for elimination as it is to select for further processing in computer programs. This is also true in card tricks, incidentally. Just in case you want to do some card tricks.
The basic idea behind a lot of card tricks where you choose between two things is that the magician knows which one of the two things that (s)he wants you to have before had. So, the magician decides whether you’re selecting an item or selecting an item for elimination at the time you make the choice, to make sure that you get the right item.
I wonder how much the folks who do massive interactive games like 4orty2wo use this tactic, and whether they’ve found good ways to disguise that it’s happening.
Since it comes up now and again, Tom Boutell has published instructions on how to host bit torrents from your home PC. This is highly useful if you want to have information downloaded from a server, but want to limit the amount of bandwidth you use. Why this is handy (and actually relevant) will be seen shortly.
Apparently, I still remember obscure things I learned about british calendars, about 14 years after college.I’m fascinated by the measurement of time, especially from when periods had a radically different notion of what time is, how it should be measured, and how time passes than we do today.
An example of this wouldd be a medieval book of days, which described what people did in particular seasons, but also see an old british book of days, for things that happened on particular dates that people in the 1880s felt were important.
So, I upgraded to WP 2.0 today, and everything appears to be going reasonably well so far. I’ve started producing new posts offline, but content production largely will have to wait until I’ve finished writing up a bunch of papers.
I added a secondary blog full of old posts, mostly representative samples posting to usenet. This is going to be a testbed for a bunch of metadata stuff that I’m going to do over the next several weeks.
reposted from a private forum by request, we’re talking here about code4lib, but it also applies to other things like Seattle Mind Camp.
About participating in hackfests: I would advise everyone to participate in things like this as an acculturational process if nothing else. Technical conferences (and technical communication generally) works in different ways than what a lot of us see day to day, and it’s better to learn how to participate and communicate in them when the stakes are low than otherwise.
I’ve seen a LIS professor (not from here) go down in flames really hard when trying to communicate with a small group of technical people. The individual was making the point very strongly that users didn’t understand boolean search, when everyone else in the conversation was talking about something else entirely that happened to contain the words ‘user,’ ‘boolean,’ and ’search.’
(insert statement about ‘training’ programs versus the theoretical aspects of the mlis program that the ischool uses to justify a lot of its skulduggery.)
There’s every reason that the professor could have figured out what was going on, but she was clearly communicating in his/her own world, without referent to the context of the people around him/her, and being entirely condescendingly ignorant of the fact that the people around him/her actually knew what they were talking about.
One[1] of the reasons that the Roman Legions got more or less paved directly into their beautifully constructed roads in the 4th C. was that the technical superiority they’d enjoyed over the ‘barbarians’ had faded through years of communication and trade over the Danube. The recent acquisitions of Flickr and Delicious (among other things) indicate that people outside of library science realize Even More Than Before there is immense value in metadata and books like Ambient Findability and Information Architecture indicate that the understanding gap between ‘information professionals’ and ‘people who want to exploit information like it was liquid dinosaur’ is closing sharply.
One plan for dealing with this sort of thing is to go fetal and hope that nothing bad happens to you and that ‘civilization’ wins eventually. In general, gothic cathedrals are very nice and lovely, but the sack of Rome kind of sucked for the Romans. Other ideas might involve meeting people half-way, and admittedly 540 is a retarded way to start down that path, but hey, nothing’s perfect.
So, overall, I’m ranting and rambling and you know how that sort of thing goes, but my advice is to participate in the hack-fest type things.
[1] it’s over-simplistic analysis day. whoopee! hackfest librarianship technical communication
In general, I disagree with that diagram. There’s a level *below* application called ’system,’ and it isn’t in that diagram because IBM wants to sell you systems that you write your applications on.
A relatively large number of systems suffer from lack of LIS-type knowledge at their cores, which then bubbles up through the various levels of architecture, and causes things like current OPAC implementations to be produced. Librarians (and people with librarian-type knowledge) are needed at all the different levels.
It also doesn’t discriminate between the different sorts of programmability that you might build into a system. Let’s divide programming (or ‘development’) into two fields, which we’ll call ‘Programming’ and ‘Scripting.’[1] ‘Programming’ belongs at the lowest levels of the system, and, as you might imagine, largely in the ‘Development’ phase of construction.
However, the graph is also missing a column called ‘Deployment,’ which comes after Development. This is where ‘Scripting’ comes in. ‘Scripting’ is the process of writing code to retrieve and manipulate information[2] from the system, and also changing the state of the system. In extension, it’s linking together systems made through the process of the three columns given in the article’s graphs. Why doesn’t it have this column? Because IBM would like you to pay lots of money to have all that nasty system linking done by their tools and IBM Global Services.
This nasty system linking is to a large extent easy to do on a system that makes hooks for it possible[3], and it’s a place where people who understand the information that they’re working with can add huge amounts of value to a business, institution, or whatever.
Following this logic, ‘Scripting’ is highly useful for librarians, etc… to know how to do. To the extent that it isn’t harmful to your business, keeping money and not paying it to large consulting companies is handy[4], but that’s a whole other discussion.
[1] note that the distinction between ‘programming’ and ’scripting’ is to some extent informed by my own predujices and may offend some people, namely most people with programming ‘degrees’ and ‘certifications’ from institution name elided and also to some extent an undergraduate major at UW students. People who don’t have ideological stakes in being thought of as ‘real programmers’ usually don’t care.[1.5]
[1.5] I suggest that having an ideological stake in whether someone thinks you are a ‘real programmer’ is a stupid thing to do, even for ‘real programmers.’
[2] antelopes mostly.
[3] unlike, say, most OPACs.
[4] see how fast i change my tune if i get a job at a large consulting company.
development process sei level negative infinity antelope theory system analysis librarianship opac
The Intellectual Foundation of Information Organization is somewhat heavy going, but it’s the definitive work about a lot of areas in information organization. A lot of people encounter information organization issues professionally in the technical fields, but a lot of these issues have been around for ages, appearing in business and libraries for a long period of time.
This book has a huge amount of information in a very small amount of space, and can be somewhat heavy going. It has something to say about almost every issue having to do with organizing information. I also highly advise reading it for anyone in a MLIS/MSIM program, I went and looked this book up today for someone whose class wasn’t reading it for some reason and I advise it highly. It requires an Information Architect or other web designer to be able to think in basic principles about the stuff that they’re doing to be able to use this book — looking for information about controlled vocabularies instead of what the latest buzzword is. However, the payoff from having done so is high due to the clarity of the information presented.
The writing in this book is in the ‘little red schoolhouse‘ academic style from the University of Chicago. I found it very easily digestable and understandable, and it had a profound affect on how I thought about information organization; I credit doing very well in my classes on the subject and being able to speak intelligibly on the subject outside of class to having started out by reading this book. I recommend it highly.
So, just as a helpful hint for people doing Taxonomy type stuff, or whatever controlled vocabulary: The difference between pre-coordinate and post-coordinate terms are pretty obvious when you can actually remember them, but here’s a helpful hint:
Precoordinate: concepts are combined into terms before the thesaurus is created. Postcoordinate: concepts are combined into terms after the thesaurus is created — ie: usually at time of use.
I know that someday someone who needs this information will find it searching the web, and that makes me (relatively) happy.
A lot of writing in general, and a lot of web pages specifically, concerns some topic. That writing is about a topic. There are a bunch of different ways of figuring out what something is about, and many of these are hilariously wrong. But that isn’t what this post is about, this post is about ‘aboutness assertions,’ which is how you say what things are about once you’ve decided that something is about something.
Kiva.org is an organization that focuses on making rural microloans. A microloan for these purposes is a smaller development loan, useful for buying, say, a couple of goats or a truck instead of a large infrastructure loan for buying schools or new highways. It’s useful on more of a personal level, and it’s a smaller need that doesn’t get served well by traditional NGOs. Kiva is a platform for these rural focused loans, with an initial focus in Uganda.
Why this is significant is that Kiva has managed to remove several levels of aggregation of loan in order to more effectively reach the people with the needs and the money, and to connect them directly. Removing layers like this is called disintermediation. Read the rest of this entry »
The earlier poster said that he didn’t get Tim Patrick’s talk, so I thought I’d give my own interpretation of it.
There are two different meanings of collaboration in this session, one is collaboration between different groups of people, internal or external, covered by the other speakers. Collaboration in Tim’s sense is also called Read the rest of this entry »
Last week was the annual Lazerow lecture at the Information School. This year’s was given by Gary Marchionini on Human Computer Information Retrieval. It was an interesting talk, about building search engines and how people who are more involved in searching can get better results.
Leilani (the vice chair of UW ASIS&T) taped the lecture, and you can watch it here. Due to the way that the site pages are organized, you’ll have to scroll down to the lecture if you’re viewing this page in the future. It was given on 2005-10-16. The slide deck from Gary’s similar HCIR talk at MIT is online, in case you want to find out what the talk is about before jumping in.
The UW has a copy of the original in Special Collections, and an earlier fascimile edition from 1968. I’d recommend looking at the facsimile one if you have the time. I’m desirous of a fascimile copy of my own, because it’s fascinating and because I have this image of Wilkins as the ultimate erudite crank.
I’d read a bunch of people write in the past how ‘blogging’ a conference enriched their understanding of the material covered in the talks that they went to, so when I saw the ASIS&T conference make a call for bloggers, I signed up. During the conference, my posts will be visible here. After the conference, I’ll post here whether it changed my impressions. I really enjoyed last year’s conference, and I’m looking forward to it.
For those of you who don’t know, one of the things that I’ve been doing outside of school these years is serving on the board of what’s currently unfortunately called the PNW LLC Thingie. This board was elected by a large community to form an LLC to do a number of things, potentially including arts grants, curating art events, building community centers, overseeing events, and other things.
It has meetings every other Thursday, and the various committees on topics like naming, board representation, and stuff like that. It’s an organic outgrowth of stuff that’s been going on for the last couple of years, and I think that it will grow to be useful and powerful over time. At the moment, the wheels are grinding exceedingly fine, though.
I’m going to ASIS&T’s national conference this year. I really enjoyed it last year, and it changed my perceptions of the field a great deal — both what I was interested in and what I wanted to get out of school. I’m hoping that this conference is as interesting.
It has a upcoming reference here, for those of you who are fond of upcoming. If you’re going, feel free to drop me a line.
This document is from the Renascence Texts collection at UOregon. It’s useful for a project that I’m working on (and hope to have time to get back to at some point in the future.) OTOH, imagine trying to retrieve this document through an information retrieval system.
By the Queene.
A Proclamation agaynst the maintenaunce of Pirates.
THE Queenes Maiestie vnderstandeth, that although by her former commaundementes notified by proclamation to all her subiectes, and namely to her officers in her Portes, for the staying, ceassing, and suppressing of all occasions of piracies: yet some numbers of vessels armed with certayne disordered persons mixt of sundry nations, do still haunt the narowe seas, and resort secretly into small Creekes and obscure places of this Realme for reliefe of vitayles, and suche lyke: And for their better defence to escape apprehension, do colourably pretende that they be licenced to serue on the seas, and are not to be accompted culpable as pirates.
I’ve added, with some reluctance, tagging to this journal. To commemorate this fact, I somewhat less than helpfully will tag this post with the tag tagging
Today, we were talking for a while about why Capitalization might be important in an information retrieval system. One reason is that for different words that are spelled the same (or the same word that has different definitions, depending on your dictionary), there can be different meanings depending on the Caps, and you might want to factor that in.
Here are some examples
china: In capitalized form, it refers to a country; in lowercase form, it refers to a form of porcelain or dishware made thereof. aids: Aids are helpful, AIDS certainly is not. ira: Ira is a male first name (as in Ira Glass), IRA is the Irish Republican Army. it: ‘it’ is a pronoun in English and IT is an abbreviation for Information Technology. lox: ‘Lox’ is smoked salmon, and LOX is liquid oxygen. Only one of these is at all good for bagels. dos: DOS is a primordial operating system, and ‘dos’ is spanish for two.
This is complicated somewhat by the fact that words are capitalized at the beginning on sentences, obviously. The customary solution is to search against lowercase but to display to the user the results as they exist in the original document.
A couple of months ago, I wrote some sample stemmers for a class I was taking in the iSchool. I’ve put some of them up on the website.
The stemmers on this site are the Porter and the Lovins stemmer, both implemented in PHP. The Porter stemmer was downloaded from one of the several sites on the internet that have the stemmer, the Lovins stemmer was converted to PHP from the Java version available at SourceForge, copies of the source are available on request.
The stemmers are available here. There’s also a call into the php implementation of the soundex algo that I added to demonstrate some points at some point.
“hors d’oeuvres funroll-loops” is my new favorite meaningless expression. it sounds completely meaningless, but came up today with regards to a technical problems I’m working on. It’s just ludicrously fun to say, although probably not as funny as ‘boss’ or ‘grody.’
Also, on the subject of hilarity, check out these references to auderves in google.
Suprisingly, –funroll-loops isn’t the funniest sounding of the gcc options, –malign-double is. Just in case you’re having deus ex machina problems and need to specify your program has an evil twin.
The Wilkins rampage continues with these modifiers for English’s modal auxiliary verbs. These would be attached to a verb (built through Wilkin’s system), notice how the closely related verbs are similar in shape — with just the tip of the tail changed, and the overall collection of verbs are rotations of a single character.
For more detail on modal verbs, go here. It may also amuse you to read RFC 2119: Key Words for Use to Indicate Requirement Levels.
Part of Wilkins’ Essay towards a Real Character… was a numerical system organized along the principles of his classification system. Note that the basic character in all these columns is made up of variances on Wilkins’ glyph representing category ‘Measure’, a subtype of ‘Quantity.’
Here’s a picture of the top-level categories back for your amusements, measure is in the middle category about halfway down.
Also note that the power of 10 for the particular number being represented is given as a modifier in the lower right corner of the number.
i was cleaning out the old corprew.org blog today, and i figured adding this post here would be amusing, i think a lot of it is still truish.
For the last while, I’ve been astounded that everyone who sets out to create an open source content management system either creates a blogging tool or a slashdot clone. It is really strange, and I don’t know why those are seen as replacements for general-purpose CMSs.
Anyway, I found this discussion of same over at kalsey.com interesting, as it has a more or less current analysis of the issues at hand.
At some point, I hope to go to the information school at the UDub, and I will not contribute to open source content management for obvious reasons, but it is interesting to see where these systems go. My own assumptions about what a content management system is supposed to do is shaped by several years of work and thinking, but it is definitely thinking about different things.