There are quite a few regulations to bear in mind and comply with when developing a website for a Government organisation or any large public body. This has lead to a lot of sites being developed in a very defensive manner, ensuring safe compliance at the expense of a great and consistent user experience.
This video features a presentation about how large publicly-liable institutions should and can embrace the latest in web technologies without sacrificing standards. By embracing them, in fact.
The content of this was developed while planning and building the National Museums Scotland website which launched last November. The messages presented are applicable to museums, galleries, swimming pools, councils, anywhere, really.
If you're a techie person in the cultural or government sector, you might find this useful in convincing others to take advantage of the latest cool (and useful) technologies.
The source for the slides is available online although it's mostly webkit-friendly. I realise the irony of a presentation about cross-platform HTML5 not being a great example itself of that but it does degrade adequately. If I get the time in the future, I'll tidy it up. An actual good (and relevant) example of cross-platform web technologies is the National Museums Scotland website itself which performs fantastically across all manner of platforms.
Twitter has an amazing amount of data and there's no end to the number of ideas a coffee-fuelled mind can come up with to take advantage of that data. One of the greatest advantages of Twitter – its immediacy, its currency, its of-the-moment-ness – is, however, also one of the drawbacks. It is very easy to find out what people are thinking about now, yesterday or a few days ago but trying to find something from more than a week ago is surprisingly tricky. The standard, unauthenticated search only returns the last 7 days of results, beyond that, there's nothing. You can get around this by building custom applications which authenticate themselves correctly with Twitter and provide a level of traceability but that's quite tricky.
The easiest way to archive search results for your own personal use is actually surprisingly simple. Every search results page includes an RSS feed link1. This will be updated everytime there are new results. Simply subscribe to this feed with a feed reader (Google Reader is simple and free) and your results will be archived permanently. This is great if you're searching for personal interest stuff but doesn't work so well if you want to share the results with the public.
This was the problem I was presented with when I was asked to build a tweet-archiving mechanism for Museum140'sMuseum Memories (#MusMem) project. Jenni wanted some kind of system that would grab search results, keep them permanently and display them nicely. I didn't want to create a fully OAuth-enabled, authenticated search system simply because that seemed like more work than should really be necessary for such a simple job. Instead, I went down the RSS route, grabbing the results every 10 minutes and storing them in a database using RSSIngest by Daniel Iversen. The system stores the unique ID of each Tweet along with the time it was tweeted and the search term used to find it. The first time a tweet is displayed, a request is made to the Twitter interface to ask for all the details, not only of the tweet, but also of the user who tweeted it. These are then stored in the database as well. This way, we don't make too many calls to Twitter and we don't get blocked.
If you want your own Tweet Archive, I've put the code on GitHub for anyone to use. It requires PHP, MySQL and, ideally, a techie-type to set it up.
Archiving Tweets - Non-technical
With the technical side out of the way, we're left with the human issues to deal with. If you're automatically saving all results with a particular phrase, all a malicious person needs to do is include that phrase in a spam-filled tweet or one with inappropriate language and suddenly, it's on your site, too. If you aren't going to individually approve each tweet before it is saved, you must keep a vigilant eye on it.
The other thing which has turned out to be a problem is the Signal-to-Noise ratio. When we initially decided on the hashtag #MusMem, nobody else was using it. To use Social Media parlance, there was no hashtag collision. The idea was to encourage people to use it when they wanted their memories stored in the MemoryBank. Unfortunately, it is now being used by anyone tweeting anything related to Museums and Memories. This is particularly troublesome at the moment as this month is International Museum month, one of the themes of which is ‘Memory’ (which is why we built the MemoryBank in the first place). This means that the permanent memories we want stored (the Signal) are getting lost in the morass of generic Museum Memories (the Noise). There is no way to solve this problem algorithmically. If we are to thin it down, we actually need to manually edit the several thousand entries stored.
If anyone can think of a solution to this issue, please let everybody know – the world needs you.
I'm currently working on a tool which uses JS to parse an XML document and output it as JSON. Straightforward enough, you'd think. The issue I'm fighting against crops up when the XML tags have an arbitrary namespace. True enough, some of the files I'll be processing have this namespace defined at the top of the document but the tool has to be robust enough to cope when they don't.
To cut a long story short, IE6, IE7 and IE8 have an interesting attitude to innerHTML when the tags you are trying to insert have a namespace. IE9 seems to do the job as you'd expect. I've created some jsFiddle pages which try and identify the issues. They both use QUnit and the test code is included below.
I started off using jQuery to help identify this as the tool uses jQuery for the XML heavy-lifting. The two tests in this demo create elements in two different ways. First, we create the element using document.createElement and grab the nodeName then we use jQuery's constructor and use get(0) to grab the bare DOM element's nodeName. Also, in this first set of tests, we're creating non-standard elements.
test("Compare elements without namespace", function() {
var element1, element2;
element1 = document.createElement('spud').nodeName;
element2 = $('<spud/>').get(0).nodeName;
equals(element1, element2, "We expect these to match");
});
The code above runs fine everywhere – IE, FireFox, Opera, Chrome, etc. etc. Good.
test("Compare elements with namespace", function() {
var element1, element2;
element1 = document.createElement('a:spud').nodeName;
element2 = $('<a:spud/>').get(0).nodeName;
equals(element1, element2, "We expect these to match");
});
This runs fine in non-IE browsers, they all report the nodeName as 'a:spud'. IE now reports the nodeName as 'spud'. Ah. I dug through the jQuery source, tracking down the bare roots of the constructor and eventually figured out that just looking at the element itself isn't going to provide any clues. The bit that does the actual string-to-elements work (somewhere around line 5619 in jQuery 1.5.2) creates a container div then injects the (slightly modified) code as innerHTML. The issue must be in IE's interpretation of innerHTML, I thought to myself. And then to you by writing it here.
.innerHTML aside
or ‘jQuery is clever’
Before we continue with this long and, ultimately, unnecessary investigation into namespaces, I have to take a small diversion to cover some smart stuff jQuery does. One thing in particular, in fact. Around that line I mentioned earlier (5619-ish), an extra bit of text is inserted into the innerHTML to cope with IE's oddity. If you are trying to create a non-standard element using innerHTML, IE will not complain but also just do pretty much nothing at all:
var div = document.createElement('div');
div.innerHTML = '<spud></spud>';
alert(div.innerHTML);
The above code will alert '<spud></spud>' in most browsers but '' in IE. What jQuery does is firstly wrap your element in an extra <div></div> (producing '<DIV></DIV>') then prepends the word 'div' to that. The innerHTML reported by IE is now 'div<DIV><SPUD></SPUD></DIV>'! There it is! Next, the extra gubbins is removed by calling .lastChild and you're left with innerHTML = '<SPUD></SPUD>'. That's pretty darned clever.
Back on track. Armed with this little trick, we can reliably test innerHTML in IE using non-standard elements.
module("Known elements (span)");
test("Compare elements without namespace", function() {
var div1, div2;
div1 = document.createElement('div');
div1.innerHTML = '<span></span>';
div2 = document.createElement('div');
div2.appendChild(document.createElement('span'));
equals(div1.innerHTML.toLowerCase(), div2.innerHTML.toLowerCase(),
"We expect these to match");
});
test("Compare elements with namespace", function() {
var div1, div2;
div1 = document.createElement('div');
div1.innerHTML = '<u:span></u:span>';
div2 = document.createElement('div');
div2.appendChild(document.createElement('u:span'));
equals(div1.innerHTML.toLowerCase(), div2.innerHTML.toLowerCase(),
"We expect these to match");
});
The first test in this pair runs fine everywhere exactly as we'd hope and expect. The second fails miserably in IE. Let us quickly run the same test with unknown elements just to make sure we're identifying the right problem:
module("Unknown elements (spud)");
test("Compare elements without namespace", function() {
var div1, div2;
div1 = document.createElement('div');
div1.innerHTML = 'div<div>' + '<spud></spud>' + '</div>';
div1 = div1.lastChild;
div2 = document.createElement('div');
div2.appendChild(document.createElement('spud'));
equals(div1.innerHTML.toLowerCase(), div2.innerHTML.toLowerCase(),
"We expect these to match");
});
test("Compare elements with namespace", function() {
var div1, div2;
div1 = document.createElement('div');
div1.innerHTML = 'div<div>' + '<u:spud></u:spud>' + '</div>';
div1 = div1.lastChild;
div2 = document.createElement('div');
div2.appendChild(document.createElement('u:spud'));
equals(div1.innerHTML.toLowerCase(), div2.innerHTML.toLowerCase(),
"We expect these to match");
});
As before, the first test in this pair works fine, the second fails. Cool. Or not. Either way, you can now see that it doesn't really matter whether the elements are standard or custom and that little diversion we took earlier really was unnecessary. Still, you know more now about some of the cleverness in jQuery than you did before.
It turns out the reason IE reports the nodeNames as the non-namespaced ones is because it has been busy behind the scenes and added an extra XML namespace prefix into our current context. The innerHTML of the div we filled up using innerHTML has been modified to:
<?xml:namespace prefix = u />
<u:span></u:span>
Where'd that namespace declaration come from?! Goshdarnit, IE. From its point of view, within that little context, u:span is equivalent to span
The last line there is true for all browsers except IE 6, 7 and 8!
In conclusion?
Ultimately, there are no winners here. Identifying the problem is quite different from fixing it. I've added a note to the relevant jQuery bug in the tracker but it's not so much a bug in jQuery as a humorous IE quirk. There's some talk of refactoring the .find() method to handle more complicated tagnames so this might get picked up then. The fix will probably be something along the lines of checking the outcome of the innerHTML doesn't have an unexpected namespace declaration when the selector has a colon in it:
div.replace( /<\?[^>]*>/g, '' )
I'd submit the patch myself but I'm having difficulty getting unmodified jQuery to build on any of my machines without failing most of the QUnit tests. I've probably typed something wrong.
After seeing this collection of the 892 different ways you can partition a 3 x 4 grid1, I was struck by a thought. If these were generated as HTML templates, they could be combined with a couple of other useful websites and become a nice, API-driven site builder2.
The process
On the site-building webpage, you'd enter a few keywords describing the site you want and drag a slider along between 1 and 12 to specify how many content areas you want. The value from the slider would be used to pick a template randomly from the number available for that combination of panels.