Googling Your Brain How (and why) to build your own everything database

Nov 1, 2011

Do you have the soul of a librarian? If not, go away. Bugger off. Do not read this. My intended audience probably consists of no more than a few dozen people globally. This is a dorky, hard-core, altruistic, labour-of-love guide to modern filing philosophy and tactics — seriously — for people with data-o-centric careers like writing, management, and software development. (Or maybe you’re just a data pack rat.)

I am a writer, and I don’t have to-do lists so much as (vast) lists of ideas and reference material that I need to “deal with.” And checking something off doesn’t mean I’m done with it — most information is recycable. (The GTD faithful will recognize a problem there.) My productivity challenge is not “getting things done” but “filing things well.” I need a kickass filing system that I can put nearly anything into almost effortlessly, and then get it back later when I need it — even if I forgot I ever filed it in the first place.

I call it my “everything database” (ED) — a collection of all your brainy assets, mostly abstract or digitized, but not necessarily. PDFs and quotes, napkin scribbles and shower brainstorms, URLs and photos. “Asset” is the only word broad (and vague) enough to encompass all the possibilities: anything that has potential relevance to any project, ever.

The superpower bestowed by a good ED is like being able to Google your own brain. Suppose I could wave my magic database wand and capture every asset in my life, beautifully and thoroughly labelled with keywords that tell me what each item means to me. Such a system could respond to the command, “Show me every amusing item I’ve collected about placebo effects since 2006.” It would cough up a list of placebo-themed media, brainstorms, quotes, and scientific papers — much like Google search results, but (crucially) also including my own notes and assets and resources that I made a point of saving.

Wouldn’t that be something!

It is. I’ve had a pretty decent ED for over a year now, and it’s great. I now search my ED almost as often as I search the web, and the writing is on the wall: eventually it will take the lead.

Go digital or go home: the necessity of digitization

If I didn’t scare you off in the first paragraph, here’s another warning: if the subject of ebooks makes you squirm and say things like “I just really like the feel and smell of paper,” then you have just stepped into the wrong dark alley and you’re about to get conceptually mugged. Your kind aren’t welcome here. Unless you’re prepared to convert.

This probably doesn’t need to be said, but I’m going to say it anyway for the benefit of a readers who haven’t fully embraced life as a cyborg: building an ED is fundamentally a digital, virtual endeavour. Everything that can be digitized — with a reasonably good ratio of effort to reward — should be digitized. For instance, ebooks are regarded as fundamentally superior to paper books — not aesthetically, but because the ED must be searchable. While physical assets can be included (and unavoidable for some people, i.e. anyone working with original and historical primary sources), they are definitely more awkward to manage.

iDatabase? The role of Mac

A Mac is certainly not required for an ED: this is a discussion of principles, not platform-o-centric implementation details. Nevertheless, this is my article, and I built my ED on a Mac, and I’m a Mac user from way before there were iThings. There are going to be some Mac-ish moments ahead. Starting with the tools — because the concepts are so entangled with the tools.

Notably, the most important software tool (Evernote) is quite deliberately cross-platform.

Key tools for making your Mac into an extension of your brain

To build an ED, there are some tools you’re going to need, mostly software tools. You need a powerful digital scrapbook, which will be the main or most general database of assets — dumping ground for most notes and lots of stuff that doesn’t fit anywhere else. You’ll also want some kind of task manager, something for handling metadata (keywords, tags, ratings) in your computer’s filing system, some tools to make it more efficient to file and retrieve, and almost certainly a few specialized databases for whatever special data types you have to deal with. Finally, and humbly, you will need some cheatsheets to help you remember how to use all this crap.

Evernote (Evernote Corporation) is the closest thing there is to an actual “ED application.” With client software for every imaginable platform plus cloud storage, Evernote makes it possible to grab and get most kinds of assets from nearly anywhere. Crucially, what it can’t actually store, it can store a reference to (i.e. you describe and link to an external file). Evernote is the flagship of my ED software fleet. There are certainly other applications and services in its class, but for various reasons (too detailed even for this article) Evernote is the clear winner for me, and a whole lot of other people — the company is thriving, as I expected it would.
Leap (Ironic Software) is the app I use to assign meaning (tags) to files that I cannot store in Evernote, such as a spreadsheet: a work in progress that I tinker with several times a week. That file has to be a file (it cannot be stored in Evernote, because Evernote is not a spreadsheet application). But data that has to live in my computer’s maze of folders needs to be part of the same overall organizational scheme as data that lives in my main database, Evernote. Leap uses a clever, robust hack (see OpenMeta) to assign tags to files, and it also has a powerful interface for exploring your filing system by using that metadata.
Things (Cultured Code) is where I store my to-dos. To-dos (tasks, actions, jobs) are not reference information — but it’s vital to constantly distinguish between the two, and have a place to put the to-dos. I carefully selected Things as an ideal task manager to integrate with a life dominated by filing and management of an ED. Crucially, Things supports tagging, so you can use the same tags in Things that you do in your ED (and that consistency matters). Things only handles about 10% of the flood of stuff coming at me every day — but it’s a vital 10%, so all the more importantly that it be handled by a good tool, great for information wranglers with minimalistic task management needs. As Dan Frakes wrote, Things is in “the near-ideal middle ground between simple to-do lists and overly complex project organizers” (like OmniFocus, which I abandoned because, for all its bewildering complexity, it did not support tags, and so I couldn’t make it play nicely with my ED).
LaunchBar (Objective Development) is miraculous little app, fiendishly difficult to describe, and a perfect companion for the serious filer and keyboardist. It is primarily a lubricant between systems, enabling ultra-efficient capture and retrieval. LB indexes a wide variety of resources on your system and gives lightning fast access to them via the keyboard, no matter where you are. I particularly use LaunchBar to facilitate rapid access to some resources, and to activate capture/search scripts enable very low friction filing and retrieval. So, for instance, any selected text can be grabbed by LaunchBar and then passed to a script that plonks it into the specialized database of my choice … all without changing apps. Such awesome efficiencies are critical to the ED — if it’s hard to get information in and out of the ED, you won’t use the ED. All databases have their own idiosyncratic methods of getting stuff into them, including universal shortcuts. But the last thing I need is a dozen conceptually similar-yet-different universal shortcuts. So I created a suite of automation scripts — and all of those scripts can be activated the same way via LaunchBar. That consistency is extremely valuable, and it really starts to show its power when you start creating scripts with slightly different end goals. With LaunchBar, it is trivial to choose from several similar scripts; but trying to remember shortcuts for six very similar script would be nearly impossible.
Special email addresses, an idea I got from Shawn Blanc, are another valuable capture aid. LaunchBar and scripts are great … if I’m at my computer. But I now routinely consume information on my iPhone and iPad in a variety of apps, several of which do not integrate natively with Evernote. However, emailing whatever you’re looking at is an option in almost every app. So I have three dedicated, distinctive email addresses where I can send different kinds of information. Once it arrives in my email inbox, rules and scripts pass the data on to the appropriate software: i.e. messages sent to “tttask@” are automatically filed in Things; i.e. messages sent to “nnnote@“ are file in Evernote. (And, of course, there’s the “paper” trail of the messages themselves, which is handy if the automatic filing ever fails.)
Evernote can’t handle everything. Specialized databases are vital for most people. You’re still going to want to store addresses in Address Book, music in iTunes, pictures in iPhoto, scientific papers in Papers, and so on. In addition to a few staples, most serious EDers are going to need and want a couple more tools for special data types. For instance, I use BibDesk — a BibTeX front-end — to handle a database of a couple thousand bibliographic records that need to be BibTeX. Yet it would also be rather nice if those records would turn up in searches in Evernote — and they do, because I periodically export the entire BibTeX database into its own Evernote notebook, tags and all! But BibDesk is where I actually manage my bibliographic data.

Warning! Low-tech ahead! There is one more tool that deserves its own section …

I use a sheet to cheat (several sheets, actually)

Cheatsheets are needed for me to tie all the above together. For instance, I have a cheatsheet just to remind me of all the various places that I may have fresh, unfiled data (stuff that isn’t even in an inbox). Processing new assets is a disciplined daily ritual, and I literally cannot remember all the places I have to look. So I have a cheatsheet, a checklist of inboxes, of places to look for assets that I have grabbed in the last day: new notes? New tasks? New files? New photos? New bibliographic records? And so on. Without a cheatsheet, I will invariably forget to check one or three of my own inboxes. Even with a cheatsheet I tend to get distracted by what I find, but it helps to keep me on track.

Naturally I also have a cheatsheet to remind me how to use the software I use — know your tools! And I also keep copies of their manuals handy. RTFM.

I also have a set of tagging cheatsheets, which help me remember which kinds of tags to apply to my assets. Without those cheatsheets, I am invariably slow and inconsistent with my tags. With them, I rip through the options in moments.

I also use tricks like making a cheatsheet into a desktop background, which I can expose with Exposé … or just printing the damned thing out (aaaaagh, paper!) and putting in front of me on the desk for a couple hours while I tackle a particularly large batch of tagging. Some cheatsheets are saved as image files, because I can open them almost instantly with LaunchBar and Preview.

Cheatsheets also train me. They help me to rememberize all this stuff over time, so that I slowly become far better at freestyle filing without them. And yet the cheatsheets rarely become obsolete, either, because sooner or later I always turn my attention to a project that deals with a different kind of information — and I need a cheatsheet to help me get back into that domain. I also inexorably upgrade and fine tune the cheatsheets. They are never done.

Dumb inboxes

Inboxes are repositories of incoming (candidate) tasks or assets to be considered and processed, trashed or acted on or filed (basic GTD stuff). Inboxes are where new data piles up — hopefully not too deep — waiting for you to decide what it means to you. Dumb inboxes are “manually” managed. Smart inboxes are a database inboxes that are populated dynamically. I have four main inboxes: a physical inbox (for mail, receipts, etc), my Things inbox, my Evernote inbox, and my email inbox. Then there are several more specialized and lower priority inboxes.

Smart or dumb, there should be an inbox in every database. If you collect some kind of data somewhere, it should have an inbox. One of the most interesting things I did when setting up my ED — interesting if you’re a librarian at heart — was to go through all of my various databases and set up some kind of inboxing. Trying to implement this parallel structure was surprisingly difficult and rewarding. No two databases are quite the same.

For instance, iTunes is one of the most widely used databases of all time, and also has some of the most robust and standardized intrinsic metadata support you can find (ID3) … and yet it also lacks any direct support for tagging (to do keywords in iTunes, you have to hijack a field not really intended for keywords), and no concept of an inbox at all. You must get by with a custom playlist.

iPhoto bafflingly thumbs its nose at existing standards for intrinsic metadata (IPTC), and yet it actually has a fairly robust custom keywording system — the mirror image of iTunes. But the idea of an inbox is not native to iPhoto any more than it is in iTunes, so creating an inbox in iPhoto means creating an “album” for it (conceptually identical to a playlist), to which you must make a point of manually dragging every imported photo.

By contrast, databases like Gmail and Things speak “inbox” perfectly: inboxing is literally imposed on you when using them: anything you add/receive starts out in an inbox, by design.

When every database application you use has a similar structure in principle — when they all have some kind of inbox for new stuff, and some smart inboxes for the backlog — the consistency becomes quite powerful. In many cases it was necessary to settle for rudimentary, dumb inboxes: something I drag data into and out of manually. Wherever possible, though, you want to leverage the intelligence of database software and make smart inboxes.

Smart inboxes that tell you what assets don’t mean anything yet

It’s as certain as death and taxes that you already have large quantities of data of every description in every kind of database that has not yet been tagged or rated or otherwise given much meaning.

Your iTunes may have literally thousands of songs that aren’t tagged, rated, or even identified by artist or album. Every determined filer must come to grips with this inevitable data backlog. What do you do with 50,000 untagged assets? That’s not an implausible number! I suspect my backlog is actually even larger than that.

Ironically, even though several database apps lack a native concept of an inbox, most of them do support some kind of smart grouping feature — a much more complex idea and much harder programatically — which can show you every item which lacks a tag or a rating, and other complicated logical criteria. The implementation varies from app to app, but the principle is usually there. Even incredible stupid apps like Apple’s Address Book support this.

So, to deal with data backlog, you can often create “smart” inboxes that automatically identify untagged items … and then you start chipping away, probably in some kind of prioritized way, accepting that you will never actually get through it all. For instance, I often start with ratings. By assigning importance to assets, I can then create a smart inbox to show me the most important stuff that I haven’t really dealt with (i.e. show every item has more than 3 stars, but is otherwise untagged).

If you have smart inboxes, then you always know where to go to find data that needs metadata.

Die, friction, die!

I’m not sure where the idea of “friction” came from originally, but it certainly comes up a lot if you hang around productivity nuts and user interface designers (as one does). Friction is anything that slows you down when working with your tools. It’s a thoroughly generalizable concept — you have “friction” in a woodshop, a townhall meeting, a video game — but of course in the context of building and maintaining an ED, it refers pretty much exclusively to the user-friendliness and efficiency of your filing software.

For example, several modern note-taking apps — including Evernote — are notable in that they do not require the user to save a file. This is the classic example (see Gruber’s Untitled Document Syndrome). To the non-librarian-at-heart, worrying about the effort required to save a file probably sounds pretty weird, but hopefully every reader still with me isn’t going to put up much of a fight at this point. Trust me: the friction of file-saving is real: if I had to pick a filename and location for ever note, it would really add up.

It really just comes down to raw speed. How fast can you get ten assets meaningfully filed? Anything at all that slows you down — that’s the friction.

Friction must be exterminated wherever possible. There is a inversely proportionate relationship between how much friction you have in your system and the A.Q. — awesome quotient — of your ED.

What to file

WTF? That is the question. Also: what to capture in the first place?

“Capture” is the correct geeky term for getting something, anything, into an inbox in the first place. It’s not quite the same as filing. Filing is what you do with the stuff that you decide to keep. Capture should be as fast and easy as possible, no thinking allowed. You capture stuff so that you can file it later … or not.

You should capture at least 50% more than you actually keep. Capture generous, file stingy.

Are you going to need that thing someday? Maybe? Then capture it and then probably file it. File every asset you can that you “might need someday,” assuming it’s reasonably “cheap” (low investment) to do so. The easier filing is, the more you can file. The size and power of your ED will be limited mainly by how quickly and efficiently you can assign meaning to its contents. If you can’t file an item easily, stop and think: maybe you shouldn’t file it.

Warning: you can’t capture or file everything. Be sensible.

This is one of the most important rules of building an ED. Some assets will never be captured, or can’t even be captured in principle. Maybe in some spooky cybernetic future you will be able to store, index and retrieve every clever thought you ever have … but not yet. In the ED’s of the 22nd Century, you won’t just have an record for every book and article, but for every meaningful subdivision of them, every clause and sentence, every section and paragraph … but not yet. If you try to file everything, you will go off your nut. In three months you’ll be living in a cardboard box, muttering about metadata, and classifying bits of shiny garbage.

You have to draw the line somewhere. Digitization is one of the most obvious places to draw the line.

Digitization

On the one hand, you must embrace digitization. But, on the other, you may wish to avoid filing things that can’t go directly into the ED in digital form.

Assets are either digital or they aren’t. They can either go right into the ED, or they can’t. And it will always be more efficient to file something purely digital that can live right in the database.

Assets that have to live outside of the database — paper books, say — will always be more of a pain in the ass to keep track of, because they do not already have a digital representation. When you realize that you’re looking at a webpage you want to save, it’s already on your computer.

Not so with books.

One of the reasons for the popularity and success of Delicious Library application is the way that software makes it so easy (frictionless!) to get a book pleasingly digitized — just scan the bar code using your Mac’s camera! So much easier than typing in the bibliographic information. And yet many physical assets are still quite awkward and tedious, and you should think hard about whether or not you really need to get them into the database.

A photographer with a bunch of negatives may decide that he really can’t avoid digitization — the assets are too mission critical to ignore. But you should probably pass on a bunch of old books on a topic you probably won’t ever deal with again.

Search and metadata

Metadata surrounds us, penetrates us, and binds the ED together. It is the secret sauce. It’s the skeleton. Without metadata (and search), an ED is about as useful as a pile of stuff in an attic. With it, it’s more like a magic filing cabinet. You need metadata to tag assets with meaning, especially overlapping meaning — i.e. an item can belong to many categories at once.

(A quick reminder for the n00bier n00bs reading this: metadata is data about data, or information about information. For instance, a song file in MP3 format is the data, and the title of the song is metadata. Metadata can actually be a part of the file, or firmly attached to it like a tick, or simply associated with it.)

The idea of an ED is slowly emerging from search and metadata technologies, like life from a primordial ooze. The raw material is still surprisingly primitive, especially metadata. Searching (and indexing) tech has a long ways to go yet, but it’s got a strong lead at a full gallop, with Google leading that charge and many other smaller players innovating like crazy. There are several familiar and beloved search tools, such as “the Google” itself, your TiVo directory, Apple’s Spotlight, personal apps like Quicksilver and LaunchBar, and so on. These tools have already transformed computing, making it possible to find digital needles in virtual haystacks the size of mountains. It’s pointless to have a massive database of information assets that you can’t search. Searching will continue to evolve and you will be blown away by what is possible in just another few years, but it’s ready for personal-scale EDing right now.

Not so much with metadata.

Metadata is frankly mess. Even though it is a mind-bogglingly useful idea, it is not yet baked into any computer operating system, not in a way that’s useful to ordinary mortals. Arbitrary files cannot be rated and keyworded in a handy way, let alone a standard way. Apple is getting there. They’ve been building foundations, planting seeds. Bear with me and read this passage from a 2007 review of Mac OS 10.5 by John Siracusa. You don’t have to “get” this — the techy particulars are not relevant — but soak in the flavour of what he’s saying about metadata. Note the reverential tone …

Apple finally gets it! See how useful this stuff is? Just imagine the insane contortions the pre-metadata-enlightenment Apple would have gone through to store and track all this stuff, each application going off in its own direction with a custom implementation. So much wasted effort, so many unique bugs. No more! Extended attributes provide a general-purpose facility for doing the same things, written and debugged in one place.

What he’s talking about there is the beginning of the “metadata-enlightenment,” and it’s a Very Good Thing. But I’m afraid none of this means much to mere computer users yet. So far it’s mostly just useful for programmers and other über-geeks.

Except …

Thanks to those foundations, we have OpenMeta: the first glimmerings of useful, universal, reliable tagging and rating of arbitrary files in OS X. It’s a hack — an unofficial, unsupported exploitation that Apple does not promise to support in future versions of the operating system — but it’s a good hack.

Tagging and rating — the holy grail

There’s lots of kinds of metadata. Take a photo on your digital SLR, and it embeds information about the aperature, shutter speed and much more right into the file.

Yawn. Handy for photographers. Useless cruft for anyone else.

But ratings and “tags” (a.k.a. “keywords”) are useful for any kind of asset and any kind of user. Every file could use some tags. They solve a basic problem of filing that has existed since the filing cabinets have existed: what do you do if it really makes sense to file an item in two places? Three places? Seven?

This basic problem was tragically carried over into the filing system of computers, which enforced a strict (and unnecessary) hierarchy where a file could only live in one directory/folder at once. (Tech note: it’s always been possible to put a file in more than one place on a computer — magic! — but this groovy power has simply never been made accessible to people who don’t get the joke sudo make me a sandwich, which is, let’s face it, almost everyone. I know how to do it, but it’s not worth it.)

But what if the location of a file didn’t matter? What if you could just label a file with any meaningful category? This file is “fun” and and it’s about the “kids” and definitely has “sentimental” value and it’s got five stars. Then, as needed, search your entire hard drive and find 182 “sentimental” files with more than 3 stars … and it doesn’t matter where they are, or what they are.

That’s tagging.

(And rating. Rating is basically a specialized form of tagging — a weird, idiot-savant sibling of tagging.)

With tagging, you don’t have to decide where to put a file, just what it means to you. This is exactly how specialized databases have worked for years: in databases, the programmer is the only person who cares where the information is. The user only cares what information means, what category it belongs to. With universal arbitrary file-tagging, theoretically you could store all your files in a single folder, because in principle tags make file location irrelevant!

But we’re a long way from the kind of implementation of tags that would make that practical.

For 30 years of personal computing, arbitrary, standardized tagging has been pretty much completely unexploited. It has certainly been possible for at least 15 years, but still it is only just barely available, and not at all standardized.

Some caution required using tagging tools based on OpenMeta

There are now several Mac programs that exploit OpenMeta, so that ordinary schmucks can finally start tagging their files, assigning meaning to nearly anything stored on a hard drive. Yay. But with great power comes great responsibility.

I repeat that OpenMeta is a hack. It requires some digital hanky panky that Apple does not officially endorse. Although it’s unlikely, the technological rug could get pulled out from under it. The nightmare scenario is that you could spend the next two or three years organizing your data this way, and wake up one morning to find that a software update has nuked all your organizational efforts.

Fortunately, the rudimentary precautions required are generally worthwhile and sensible anyway — i.e. a good backup strategy — so I don’t want to blow this out of proportion. Just be wary and savvy and don’t rely exclusively on tagging via OpenMeta.

Regardless of what technology you use, beware generally of over-investing in metadata and tags. Like any other data, metadata is fundamentally fragile, prone to failure over time. In broad strokes, protect yourself by being generally minimalistic with your tags, by indicating the meaning of assets in as many different ways as possible, and by understanding the tools you use to create and manage metadata.

More rules of thumb …

Some rules of thumb for sane, useful tagging

Only use tags that will actually help you find something later. The sole purpose of tags is to help you find things later — so don’t use a tag unless it truly does that. They have no other value. It can be awfully tempting to classify things for the sake of classifying. It’s easy to dream up tags that are aesthetically pleasing representations of categories that are … completely useless. Example: my brain simply loves the “personal” tag, because I can easily and cleanly mentally distinguish between data related to “personal” and “work” data. However, I never, ever use this distinction to actually find data. I never go looking for “personal” stuff, or “work” stuff. So it’s basically a useless tag.

Avoid tagging unimportant assets. The more you think you “might need this someday,” the more important it is to file (tag) it properly. But most unimportant assets need minimal tagging, if any. One tag might be fine, and even that might be too much. Example: I have a folder full of several dozen web receipts in PDF form. They are not very important individually, and they have virtually no relevance to anything else. It is completely fine to leave these untagged and stored the old fashioned way, in a folder.

Aim for better filing, not perfect filing: the most important 20% of your tags can address 80% of your future needs. Filing and tagging is a bottomless pit of potential, and perfection is not even remotely attainable.

Be more free with transient tags, i.e. tags with expiring relevance, such as a tag representing a project. Transient tags have good short term utility, and they are very “safe” — your entire filing system could blow up next year, and it wouldn’t really matter as far as this tag is concerned, because the project will be over by then. Example: when I’m working on a big, hot project, I usually assign it a tag with the prefix “px” and use that tag willy nilly. A month later I may have dozens of items tagged with that … and so what?

Use “natural tagging” as much as possible: that is, embed tags and keywords in titles and content as much as possible without it being a problem, and generally develop a habit of being as descriptive as possible in titles and filenames. Filenames have always been a simple form of metadata, and many kinds of data “describes itself.” Example: I have a photo of Alexa Ray Joel on my computer because it illustrates a story I wrote about how she tragicomically tried to kill herself with a homeopathic remedy. The file has very limited relevance, low importance, and needs little or no tagging except her name … which is in the filename.

Use specialized databases to store an asset if you possibly can. “Where do I put this?” is one of the basic questions you have to ask about every asset. I default to putting assets in specialized databases, which usually have metadata management baked in. I only store something in the filing system if I have to — if there’s nowhere else to put it. You can’t get away from the specialized database: you are inevitably going to have to use specialized databases for certain kinds of data.

Tags can be both descriptive and prescriptive, serving as a mental aid to limiting the scope of what I clutter my mind and filing system with, i.e. if I don’t seem to have a tag for it, maybe it’s not actually interesting to me; if I resist throwing it out, maybe it needs a tag? Example: I have a list of tags represent about 20 topics that I am keen on. Several of those topics are perpetually marginalized, things I like but really barely have time to even think about. If I come across an item that can’t be described by one of those topic categories, chances are it’s not really worth storing. In this sense, my topic tags are prescriptive.

Loosely categorize tags into groups for mental convenience. When you’re tagging an asset, you are basically trying to remember every possible tag that might be relevant — a “check all that apply” mental exercise that’s nearly impossible without some categorization. Which basically means that you have to … gold star if you can see it coming … tag your tags. Seriously. Example: two really obvious tag categories I have are my “topic” tags (science! tech! publishing!) and my “text type” tags (anecdote! list! essay!). Those don’t overlap at all, but some tags do. For instance, the tag “snippet” — a handy reusable chunk of text — indicates both the type of asset and how I might use it, and so it appears in two places on my tag cheatsheet.

Don’t be afraid of obvious tags — in fact, you should emphasize them! “What is obvious about this item?” Of course tags are obvious when you’re looking at the asset — I can tell in a glance if the tag “Kim” (my wife) or “inspiring” is appropriate for an item that’s right in front of me. The issue is findability — you want to be able to find it later when you aren’t looking at it.

Beware of redundant tagging, because it’s just unnecessary work. Files and assets often already have quite a bit of metadata, such as their filename and file type. An image file is obviously an image file, and is well-represented as such in the filing system, so you don’t need to apply your own “image” tag. That would kind of be like a librarian running around a library applying a “book” label to every book.

Err on the side of duplicate filing, using two different technologies. File twice if you gotta. It’s always better to have stuff crop up in multiple places than be entirely unfindable. However, duplicated filing effort does cause friction, so beware of filing twice deliberately, but it’s no biggy if duplication occurs easily or accidentally. Example: a perpetual filing duplication that I’ve come to accept is, ridonkulously, the case of lolcats — that’s right, funny cat pictures with silly captions. They are great for a fun, light-hearted way to illustrate all kinds of odd little things. I go looking for relevant lolcats pretty often, and I want them stored in iPhoto and I want them in Evernote.

Define your standard, core tags. There can be many more, but definitely have a core set in mind. Your “main” tags will be endlessly useful.

Don’t worry about tagging excessively if the tags are flowing — don’t sit there straining to come up with tags, but if tags are easily occurring to you, then by all means throw them out there … tag profusion is not a big deal, because you can always collapse similar tags later (i.e. if you tag use “star wars new hope” one day and “star wars part IV” the next day, you can eliminate the redundancy later, no problemo). One concern though is that some tagging tools may make it difficult to manage large numbers of tags, so cleanup could get to be a bit of a hassle. Do beware of that.

Use folders to represent for major tags. The whole idea of tags is that an asset belongs in multiple overlapping categories, making it impossible to store it meaningfully in only one location. Nevertheless, most assets fit best in a single category — a single tag — and can (and should) be stored in a folder for that category. I call this “folderization” of tags, and it’s a critical tag integrity/security strategy. If all your tags got nuked somehow, it would be a tragic loss, but at least you’d still have files stored in folders representing the main ones. So always encode metadata into a hierarchy by moving files into folders that correspond to high-level keywords; i.e. if a file has the high-level tag “health science,” then sooner or later it should be moved into a folder “health science.” If you ever lost the tag, at least you’d still have a folder full of assets that you knew were primarily related to health science.

Defy the custom of storing files in folders by file type — it’s stupidly redundant. Where you choose to put a file is the most important organizational decision you will ever make about that file. So why would you store in a location that tells you what you already know — it’s a movie? Use that mental energy to categorize the file in a way that isn’t obvious — otherwise you might as well not be involved at all.

Consider using prefixes to identify tag categories, to discriminate between contexts. Example: without a prefix, the tag “writing” could be quite ambiguous. It might mean that the asset is writing or that it is about writing. The tags “is-writing” and “about-writing” could solve that.

To wrap up, I’d like to say a little about the why …

The ED as a GTD replacement

I need an Everything Database because GTD just wasn’t cutting it any more.

Three years ago I discovered the productivity marvel that is the GTD way of life, or “getting things done.” Months of study and experimentation can be distilled to a single nugget:

To free your mind from the job of remembering everything that needs to be done, you have to get tasks out of your head and into a trusted system.

GTD rests on this principle: that a person needs to move tasks out of the mind by recording them externally. That way, the mind is freed from the job of remembering everything that needs to be done (which it sucks at), and can concentrate on getting things done.

GTD was great psychotherapy for me and played a crucial role in saving me from severe chronic insomnia. I started sleeping a lot better once I knew that most of my commitments were organized into some pretty clever lists.

Unfortunately, changes in technology and my job description started to eat away at my GTD-powered peace of mind. I am a pure researcher and writer now, trying to wring meaning out of influx of data and media so torrential and untameable that I might as well try to drink the Amazon River.

Many of GTD's principles are still extremely useful and standard operating procedure for me, but there are two key ways that GTD just isn’t cutting it any more.

•

GTD depends heavily on the concept of narrowing the scope of possible actions by “context,” based on the sensible idea that not all tasks are possible at all times, so why waste time thinking about who you need to call when you’re nowhere near, say, a phone?

Hey, wait a sec … my iPhone is welded to my hand …

See the problem? In my unbelievably digitized and virtual futuristic life, all kinds of work are possible all of the time. Almost. This is certainly not the case for everyone: other people still have meetings, places to be, and tangible tools for tangible jobs. But quite a lot of people are now virtualized to an amazing degree, and for me the evolution is nearly complete: I don’t have anywhere I have to be. I don’t need anyone or anything except my Macs, my iPad, and my iPhone. My choices are always the same. If I need to figure out what to do with my morning, the classic GTD question is pretty much the most useless question I can think of: “What tasks are possible right now, in this context?”

Answer: any of them.

Context is durned near meaningless to me. I cannot narrow down my choices that way. The only thing that has meaning is the information I’m working with. Which brings me to the next problem.

•

In GTD, a filing system for “reference information” is important … but less important than tracking commitments. Certainly the GTDer needs a filing system, and a good one; you need to be able to easily store and retrieve information related to your projects. But that’s just mechanics. The meat of GTD is built into the name: it’s GTD for getting things done, not FTW for filing things well.

There’s not actually much that I need to “do,” but a very great deal that I need to know, integrate, consider, cite, and so on.

GTD is primarily for managing actions or commitments, which are defined and managed using concepts like start and due dates and dependencies (i.e. can’t do b until a is done). But I need to manage data. Data doesn’t have due dates, and it can’t get “checked off.” It takes a whole different skill set is required to manage a large reference library.

Those are the skills I need: information wrangling skills.

Without them, reference information has been taking over my life. Instead of waking up at night because of hazy commitments — not worries there, thanks to GTD — I was starting to wake up with information overwhelm, actually suffering anxiety over knowledge and data management, what I have or have not stored, whether or not I will remember that awesome article I skimmed the day before.

•

So if I can work on anything at any time, and every task is the same kind of task, how can I discriminate? How can I narrow it down? How can I decide what’s next? How do I decide what information to work with next?

By what it means.

GTD got tasks out of my head and into a trusted (task management) system. Now I almost exclusively need to get information out of my head and into a trusted (filing) system. An ED let’s you file and retrieve by meaning.

What have I got on “placebo”? Forty-two scientific papers, seven books, five podcasts, seventeen quotes, two YouTube videos …

That’s how I decide what to do next.

So a few months ago I started thinking about what a really good filing system would be like, and how I could create start to create a database of every information asset I have: the everything database.

Document History

August 23, 2012 — Shiny new look.

April 26, 2012 — Revision and expansion. I integrated some of the lessons I’ve learned over the last couple years.

Dec 19, 2010 — Initial publication. A bit raw and rough, but an acceptable v1.0.