ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


What's on Jason's Hard Drive

by Jason Hunter
11/02/2006

Several years ago I noticed something funny about my habits as a technologist. My hard drive was always immaculately organized, while my office looked like a three year old had spent the day locked inside. To help organize my papers I tried a few physical-world organization ideas--like using cubby holes to store documents (quicker inserting than vertical files)--but no matter what I tried, eventually all my documents ended up in a big pile. I honestly don't think techies are by nature motivated to keep the real world as organized as they keep their virtual world.

That led to a solution I've used now for several years: virtualize my document management! It's a system that worked great for me and I think will work for any tech-savvy person, so I'll share it here in the hope it might help you. It involves a Perforce revision control system, a document scanner, and several hard drive organizational conventions and file editing habits. If you're a techie and want your physical life more organized using virtual tools, read on.

Step One: Perforce

To start with, every important document in my personal, professional, or business life goes into a Perforce repository--everything from source code to scanned legal contracts to vacation snapshot photos. Perforce is a commercial and very high-quality revision control system. It has a price that's a bit high for basic personal organization, so I use and recommend the free version they provide that supports a max of two clients. That works out perfect for me to install the server on my Mac, place one client on my laptop, and another client on a separate hard drive on the same Mac that's the server. This setup means every file gets replicated on check-in across two machines and three hard drives.

In 1997 when I first started my virtual organization, the best free alternative to Perforce was CVS, a much more awkward RCS system. These days there's Subversion, an excellent CVS follow-on, and I might recommend someone starting fresh go with Subversion. It's open source, supports any number of clients, and has better disconnected-from-the-world behavior than Perforce. Its main downside is a more complicated server setup and a desire to consume double the disk space on the client side (annoying when storing binaries).

Step Two: Scanner

For many years I used Perforce to organize all my assets that started out electronic. Then in 2002 I bought a document scanner, and it's forever changed the way I manage paper assets. Every tax deductible receipt, every contract I've signed, every loan agreement, and basically every important document in my life has been scanned and stored as a multi-page TIFF file under my Perforce repository.

A digital library of scanned documents makes everything immediately available, even when I'm on the road, with no cabinet required--all for the price of maybe 15 minutes a week scanning. I put documents to scan on top of the scanner itself and work through them when it's late, I don't feel like working, and I don't feel like vegging. It's fun the same way ripping CDs is fun: a mindless accomplishment. After scanning, the documents get piled unceremoniously into an 8.5"x11" box (just like before but this time without any guilt!). Each box represents one calendar year. I find the 10 ream printer paper boxes work marvelously. The papers need no extra organization in the real world because they're organized online. I keep the paper copies in case of audit, at which point I'll have the motivation to pore through the stack looking for the physical document that matches the digital file.

Some advice when buying a scanner: get one with a document feeder in addition to the flat glass plate. Come April 15th when you have a long tax return to scan, you want to just push a button and let things go without intervention. If you can afford it, get one with a duplex scanning feature. Nothing's better than loading a long dual-sided contract and letting it auto-run, or so I imagine. I haven't yet splurged!

When you're scanning personal or financial documents, make sure to secure your machine. To start with, password protect your account with a strong password--letters and numbers, and using "3" for "e" doesn't count. Of course that alone isn't enough as someone with physical access to your machine can boot off another media, so setup a BIOS password also. Still, that's not a problem if the person with your machine can remove your hard drive to read your data. I like to setup a hard drive controller password too. Taken together, these passwords should make it sufficiently tricky to read your data even if your laptop is stolen. Encryption is always a good idea too, although the Windows built-in drive encryption trusts the Administrator account overly much.

Step Three: Organize the Scanned Files

Where do I keep the scanned files? Under the Perforce repository of course! Scanned documents go under /perforce/scans, under which I keep many subdirectories.

For example, /perforce/scans/financial stores bank statements, credit card statements, investment summaries, those annual FICA mailings, tax returns, etc. These days most of these documents come electronically or can be printed to PDF from a website so they don't even have to be scanned. Each financial institution or concept gets its own subdirectory like fidelity, vanguard, loans, and taxes. Within each specific subdirectory I place files with a date prefix and subject suffix. For example, 20050630-statement.tif is the June 30th statement and 20030519-form5498.tif is the Form 5498 sent in back in 2003. The year-first date makes file listing naturally appear chronological, and lets me use a suffix such as statement across many different days. The scans support multiple pages within the same file, so there's no convention needed for different pages.

Having well-organized digitized financial records proved amazingly helpful when applying for a home mortgage. Whenever the loan officer asked for a document (several years worth of tax returns, investment records, etc), I could print a quick copy. Sometimes, with less sensitive documents, I just fired off an email. My organization proved itself again when I applied for a home equity line of credit. The HELOC loan officer again asked for everything the first mortgage company did, plus details about the first mortgage and assessment records on the house. Happy me, I had those scanned and didn't have to waste an hour digging.

My receipts, in /perforce/scans/receipts, are similar to financial records, but they get their own turf. Under receipts I keep various subdirectories: personal for personal receipts (good for tracking warranties), selfempl for tax deductible self employment expenses (ready to be zipped up into an email to my accountant come January), donations for donated goods, and reimbursed for copies of expensed items (lest anyone lose my invoice). A full file path might be /perforce/scans/receipts/selfempl/2005/20050224-cell.tif. Notice how under each major receipt category there's a year-by-year subdirectory. I've found that keeps the directories from growing too long. The paper receipts go into the same 8.5"x11" box as everything else, piled right on top.

Under /perforce/scans/autos I place my auto purchase, insurance, and maintenance records, broken out by car and organized with the same date-oriented naming convention. For example, scans/autos/2004-tl/20050511-15k.tif contains the 15,000 mile service report. I've found this particularly useful at resale time as I have proof of a good maintenance record (and people think anyone with such great records must do great maintenance!). Plus it helps track when I need service and has saved me money by reminding me that I did in fact change the timing belt last year! There's no way I'd search through the paperwork to find out, but when I can PgDn through documents, I will. If you see someone in the auto shop with a laptop out paging through documents, that's probably me.

/perforce/scans/house is a directory for house related items. A tax assessment mailer might get stored as 20040815-assessment.tif. It's a good idea to have subdirectories for different properties so when you move you can ignore the records of the past location. Of course, because the organization is all virtual, if you forget to do that initially it's trivial to shuffle things around later. Reorganize without paper cuts!

Every taxonomy needs an odds and ends folder, and /perforce/scans/misc-legal is mine geared toward legal items I want to keep but that don't fit the previous categories. It holds for example a nice color scan of my passport (just in case), my drivers license, various group membership materials, and random legal contracts.

The recent addition of /perforce/scans/fun is a grab bag of stuff I want to keep for sentimental reasons, such as a scan of my wedding program. I'm sure I'll have the file long after I've lost the paper copy.

What don't I have? A way to text search my TIFF scans. I look forward to when that will be feasible (meaning low effort and cheap).

Step Four: Other Assets

Under Perforce I keep a lot more than just scans. Each employer or client I've had gets a subdirectory related to that employment and the work I've done there. By convention I give each employer or client an immediate biz subdirectory holding the electronic paystub or PO and invoice records along with scans of any agreements. This is one of those cases where maybe those things should go in misc-legal. When in doubt, soft link. Try that with paper!

These days electronic books are becoming popular, so I created a /perforce/ebooks area. In there I keep for example ebooks/pragprog/pragmatic-automation.pdf, a book from my friend Mike Clark.

I've long had a /perforce/hacks directory to hold little personal hack programs I don't want to lose. It holds the code from my freshman year poker-playing game (written in Pascal!), my senior year lread program (to read a Linux filesystem from DOS), and of course several recent handy XQuery utilities. It's also a good repository for hacks friends share with me that aren't public. By putting the files into Perforce, they easily move with me during machine upgrades, while other little-used files get left behind.

Ever since my first digital camera I've had a /perforce/photos area to hold any digipic I would be upset to lose. They take up space (raw space times three because of my replicated system) but that way I know I'll always have them. Too many people lose all their photos when a hard drive fails. Photos that don't make the Perforce cut get stored on an external drive hanging off the Mac server. It's a single point of failure, but oh well. I didn't like them much anyway.

Under the /perforce/photos directory I keep subdirectories oriented by date. For example, photos/20050506-hawaii holds images from a Hawaiian vacation in May. Sometimes when various people take pictures of an event, I put their copies in subdirectories of their name. I use ACDSee to view my photos. It makes is easy to view directories or groups of directories at a time. I wish iPhoto on my Mac would understand hard drive organizations more.

Last but not least, I keep a /perforce/writing subdirectory. It holds things like, well, this!

Here's a skeleton view of everything I've described. A quick check shows I presently have 28,000 files under Perforce with 1,500 of them under scans, so naturally what you see here is just a wee sample:

perforce/
    scans/
        financial/
            taxes/
                20030519-form5498.tif
            loans/
                20041111-acura-tl.tif
            fidelity/
                20060531-statement.pdf
            vanguard/
                20060531-statement.pdf
            receipts/
                personal/
                    2005/
                    2006/
                donations/
                    2005/
                    2006/
                reimbursed/
                    2005/
                    2006/
                selfempl/
                    2005/
                    2006/
                        20060224-cell.tif
        autos/
            2004-tl/
                20060511-15k.tif
        house/
        misc-legal/
        fun/
    ebooks/
        pragprog/
            pragmatic-automation.pdf
    hacks/
        lread/
        nedpoker/
    photos/
        20050506-hawaii/
            IMG_0447.jpg
    writing/
        javanet/
        javaworld/
        oracle/
    sgi/
        biz/
    marklogic/
        biz/

Last Tip

Last tip: I've had a great experience keeping a "work journal" file (stored under Perforce of course). My journal file consists of two parts, separated by an easy-to-search-for and appears-nowhere-else marker such as ----. Above the line I place past accomplishments in chronological order organized by day. Below the line I list my to do items, priority ranked so the more urgent ones are on top. To be future-proof and OS-resistant I keep the file as simple text.

I started the journal the day I met Tim O'Reilly and received the offer to write the book Java Servlet Programming. It helped me track my progress (I averaged one chapter every three weeks), record discoveries, and remember people I needed to talk with and their contact information (I can't put everyone in the Palm Pilot). After a hard day, the journal shows me exactly what I accomplished, and years later the record is still there. By keeping the record in the virtual world it seems more real to me. Go figure.

Sometimes I feel like my best job description is someone who moves things from below a line to above it, eternally changing "to do" items into "how I did it" entries. I do search back on the entries quite often to remember things. The steps to do a JDOM release, how to move funds to Fidelity, and the location where I found the cool tcsh shell for Windows are all things I look up. As of today it's over 440,000 words in a 2.6 MB text file. It seems that I average 250 "words of work" per workday.

I hope my habits and conventions can inspire you to dig out of the paper pile in your office. If you have ideas to share, I'd like to hear them. Now, if you'll excuse me, I have to go and note in my work journal that I finished this article!

Jason Hunter is Principal Technologist with Mark Logic and the author of Java Servlet Programming.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.