geek – thingsinjars

6 Oct 2011

Text Editors

Many years ago during the text-editor holy wars, I sided neither with the Vi Ninjas nor the Emacs Samurai but instead went a third way – Pico (short for Pine Composer). It was the text editing part of the e-mail program I used (Pine). For many years, this served me well. Even today, Pico's successor - Nano – is installed pretty much everywhere. It isn't , however, quite powerful enough for fast-paced web development. Serious, full-time development needs shortcuts and macros, syntax highlighting and snippets. When you spend 10 or more hours every day pressing buttons to change flashing lights, you need to optimise the way the lights flash.

After Pico, I found Crimson Editor which served me well for almost 10 years. I eventually started working on a Mac and became a TextMate user for most of the last 5 years.

In my new job, I find myself jumping from computer to computer to desktop to server quite a lot. The only constant editor available is Vi. Or Vim (Vi Improved). I've been trying to pick it up as a way to ensure I can always jump into a friendly text editor no matter where I am. Besides, these days Vim is the old-new-cool thing that all the cool-old-kids use, particularly MacVim so I thought it was worth giving it a go to see what the fuss was about.

Incidentally, the first thing that always jumped into my head whenever anybody mentioned Vi was the Vim Golf challenge two of my lecturers played while I was at Uni. It's a well-known and surprisingly popular game where two Seventh-dan Vi proficients try to accomplish an edit on a file using the fewest keystrokes. Each morning, one lecturer would lay down a challenge and the other had 24 hours to come up with a solution before presenting it and the next challenge the following day. The (possibly apocryphal) conclusion to the long-running game came when it was discovered that the lecturer who won most often was actually recompiling the network install of Vim every night to include his own custom keyboard shortcuts.

One of the biggest deciding factors in trying it out was actually fellow Edinburgh Web Dev/International Traveller Drew Neil (@nelstrom), creator and voice of the vimcasts.org series of screencasts who is actually writing a book on Vim this very second as I write this. Most people are evangelical about their choice of text editor to the point of rabid fundamentalist, frothing-at-the-mouth, intoning-keyboard-shortcuts craziness (hence my allusions to text-editor holy wars). When I mentioned to Drew that I used Pico instead, his response was long the lines of "Fair enough". This lack of confrontation actually inspired me to try it out. Well played, sir.

In an unrelated aside, I once did an illustration for the 'Collective Nouns Illustrated' project Drew ran.

Anyway, I'll give it a go and see what happens. If you're interested, I recommend reading Yehuda Katz' post 'Everyone Who Tried to Convince Me to use Vim was Wrong'. Don't worry, I'm definitely not going to try and convince anyone to use one code editor over another. You should probably stop using FrontPage, though.

Geek, Development
3 Oct 2011

Whitehat Syndication
When I refer to Google or Googlebot here, I could really be talking about any web search or web crawler. When I refer to Yahoo, I actually mean Yahoo. Don't get confused.

I recently ran into an interesting SEO problem on a project which has led to a question I just don't know the answer to:

How do you display syndicated content without triggering Google's duplicate content flag?

Hmm... intriguing.

Background

To explain the problem more fully (without giving out any project specifics), imagine you have a website. You probably already do so that shouldn't be hard. Now, imagine you fill that website full of original content. Again, that shouldn't be hard. For the sake of example, let's assume you run a news blog where you comment on the important stories of the day.

Next, you figure that your readers also want to read important related facts about the news story. Associated Press (AP) syndicates its content and through the API, you can pull in independently-checked related facts about whatever your original content deals with. So far, so good.

Unfortunately, a thousand other news blogs also pull in the same AP content alongside their original (and some not-so-original) content. Now, when the Googlebot crawls your site, it finds the same content there as it does in a thousand other places. Suddenly, you're marked with a 'duplicate content' black flag and all the lovely google juice you got from writing original articles has been taken away. Boo.

Your first thought might be to reach for the rel="canonical" attribute but that really only applies to entire pages. We need something that only affects a portion of the page.

Solution

What you need to do is find a way to include the content in your page when a visitor views it (providing extra value for readers) but prevent Google from reading it (hurting your search ranking). Fortunately, there are some methods for doing this. One involves having the content in an external JS file which is listed in your robots.txt to prevent Google from reading it. Another similar method involves having the content in an external HTML and including it as an iframe, again, preventing crawling via robots.txt. When the reader visits the page, the content is there, when Google visits, it isn't.

The Problem with the Solution

Both of the techniques mentioned here involve an extra HTTP request. You are including an external file so the visitor's browser has to go to your server, grab the file and include it. This isn't a huge problem for most sites but when you're dealing with high-traffic, highly-optimised websites, every file transferred counts. You go to all the trouble of turning all background images into sprites, why waste extra unnecessary connections on content?

There is an extra problem with the JS solution in that the content isn't available to visitors that don't have JS enabled but as I've mentioned before, that's only a problem if any of your visitors actually have JS disabled. Only you know that for sure.

Yahoo's Solution

Yahoo have a nice solution to this problem. If you include the attribute class="robots-nocontent" on any element, the Yahoo spider (called 'slurp') will ignore the content. Perfect. This does, however, only work for Yahoo. Not perfect.

My solution

My attempt at solving this problem which is a combination of SEO and high front-end performance was inspired by the technique GMail uses to deliver JS to mobile devices. In their article, Google delivers JS that they don't want run immediately in the initial payload. They figure that the cost of serving a single slightly larger HTTP request is less than the delay in retrieving data on demand.

I use HTML embedded in a JS comment in the original page which is then processed on DOMReady to strip out the comments and inject it into wherever it is supposed to be (identified by the data-destination attribute). I'm doing this on a page which already loads jQuery so this can all be accomplished with a simple bit of code.
```
<script type="text/html" class="norobot" data-destination=".content-destination">
 /*!
  <p>This content is hidden on load but displayed afterwards using javascript.</p>
 */
</script>
<div class="content-destination"></div>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.5/jquery.min.js"></script>
<script>
 $('.norobot').each(function() {
  _this = $(this);
  $(_this.data('destination')).html(_this.html().replace('/*!','').replace('*/',''));
 });
</script>
```
Notes on the code

You may have noticed the type="text/html" attribute on the script block. That's to get around the fact that jQuery parses and executes any script blocks it finds when moving elements around (in an appendTo() or an html(), for example. Adding this attribute tells jQuery to process this as template code.

Also, the opening JS comment here begins /*!. The exclamation mark in this is a directive to any minifiers you might use on the code to tell them not to remove this comment block.

This is also available in a standalone page.

This is all a very long setup for my initial question. Does Google read this and, if so, does this affect duplicate content rankings?

Plus and Minus
- Minus: The duplicate content is definitely in the page.
- Plus: It's hidden in JavaScript
- Minus: we're using JavaScript to serve different content to users and to google.
- Plus: we're showing less to google than users. Spam techniques show more to increase keyword matches.
- Plus: faster response due to a single http request (Google likes fast pages)
Obviously, we could add an extra step of obfuscating the 'hidden' content by reversing it or encoding it. This would definitely hide it from google and it would be trivial to undo the process before showing it to the user but is this step necessary? Part of my reasoning for concluding that Google ignores JS comments is that thousands of sites include the same standard licences bundled with their JS library of choice and don't get penalised. This may, of course, be specific to licences, though.

I can find no definitive answer anywhere on this subject. If you have any good references, please let me know. Alternatively, if you happen to know Matt Cutts, ask him for me. If I get any conclusive answer, I'll update here.

Development, Geek
9 Sep 2011

Testing

The web is a visual medium. Well, mostly.

There's no better way to test a visual medium than by looking at it. Look at your site in as many browsers as you can. If you've already got as many browsers installed on your development computer as you can fit, get another computer and install some more. Either that or run a Virtual Machine.

Definition: Virtual Machine (VM)

Applications like VirtualBox, VMWare or Parallels allow you to run an entire computer within your computer. It is a self-contained system that doesn't interact with your own machine meaning you can have IE6 installed on one VM, IE7 on another and IE8 on a third. All running in a window on your iMac. Shiny.

If you can't do that easily, you could use one of the growing number of browser testing services. These are server rooms packed with computers running Virtual Machines and automated systems to which you supply a URL, wait a few moments and get shown an image (or several hundred images) showing your URL in different browsers on different platforms. Some of the more sophisticated services allow you to scroll down a long page or specify different actions, text entry or mouse events you want to see triggered. These services can be exceptionally useful when it comes to developing HTML e-mails as there are some rare and esoteric e-mail clients out in the wild. Litmus does an excellent job at virtualised testing for HTML e-mails. On that note, the Campaign Monitor library of free HTML e-mail templates is a great place to start, learn and possibly finish when working on an HTML e-mail.

There is also a place for automated testing for some things. Recently, there has been a bit of a movement away from validating code as the purpose of web development is not to make it 'check a box' on a merely technical level, it is to get the message across via the design however possible. However, validation is still the best and easiest way to check your syntax. Syntax errors are still the main cause for mistakes appearing in your web sites and are the easiest thing to fix. Don't assume IE is wrong. Again, if you're keen on HTML e-mails, here's a great post on the Litmus blog.

This article is modified from a chapter in a book Andrew and I were writing a couple of years ago about web development practical advice. Seeing as we both got too busy to finish it, I'm publishing bits here and there. If you'd like to see these in book form, let me know.

Geek, Development, Guides
29 Aug 2011

Don’t be seduced by the drag-and-drop side

You don’t have to be a survivor of the vi and Emacs holy wars to appreciate the beauty of fully hand-crafted code. There was a bit of attention a couple of weeks ago on the soon-to-be-launched Adobe Muse which lets you “Design and publish HTML websites without writing code”. If you want to be a kick-ass developer, you must realise that tools like this aren't designed for you. They're designed for people who want to do what you can do but either don't have the time or the inclination to learn how. Although drag 'n' drop application do lower the barrier to entry for creating a website, there is still a need for web developers to know exactly what's going on in their pages.

I'm not saying that there will never be a time when visual design can be automatically translated into as good a product as a hand-crafted site, I'm just saying it’s not yet.

In much the same way as with JavaScript (See “You must be able to read before you get a library card”), building your HTML using that extra level of abstraction might work for almost every situation but will eventually leave you stuck somewhere you don’t want to be. By all means, pimp up your text editor with all manner of handy tools, shortcuts and code snippets but make sure you still know exactly what each bit of mark up means and does. If you structure your code well (more on that in a later post), your natural feel for the code will be as good a validator as anything automated (by which I mean prone to errors and worth double-checking).

Learn the keyboard shortcuts. If you learn nothing else from this, learn the importance of keyboard shortcuts. You might start off thinking you'll never need a key combination to simply swap two characters around but one day, you'll find yourself in the middle of a functino reaching for ctrl-T.

Also, there is no easy way to tell if a text editor is fit for you until you have tried it, looking at screenshots won’t work. You don't need to build an entire project to figure out whether or not you're going to get on with a new text editor, just put together a couple of web pages, maybe write a jQuery plugin. Do the key combinations stick in your head or are you constantly looking up the same ones again and again? Do you have to contort your hand into an unnatural and uncomfortable claw in order to duplicate a line?

The final thing to cover about text editors is that it's okay to pay for them. Sure, we all love free software. “Free as in pears and free as in peaches” (or whatever that motto is) but there are times when a good, well-written piece of software will cost money. And that's okay. You're a web developer. You are going to be spending the vast proportion of your day using this piece of software. If the people that made it would like you to spend $20 on it, don't instantly balk at the idea. Think back to the idea of web developers as artisan craftsmen. You're going to be using this chisel every day to carve out patterns in stone. Every now and then, you might need to buy your own chisel.

This article is modified from a chapter in a book Andrew and I were writing a couple of years ago about web development practical advice. Seeing as we both got too busy to finish it, I'm publishing bits here and there. If you'd like to see these in book form, let me know.

Geek, Opinion, Guides

Background

Solution

The Problem with the Solution

Yahoo's Solution

My solution

Notes on the code

Plus and Minus

The web is a visual medium. Well, mostly.

Categories

Shop

I'm currently reading

Projects

@thingsinjars.com