ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Organizing Files

by Karl Vogel
12/15/2005

The problem: the filesystem on my Unix workstation was a mess. I couldn't find anything without grepping all over creation. About half the time, I'd actually find something useful. Usually I'd get no hits at all, or I'd match something like a compiled binary and end up hosing my display beyond belief.

I wrote this from a Unix/Linux perspective, but Mac users running a recent version of the operating system should be able to make sense of it. Here are some terms for non-Unix types:

What Didn't Work

This is what didn't do the trick.

Making my own categories

I tried about six different categorization schemes, all of which went nowhere. Poking around in the IBM and Red Hat distribution web pages gave me setups that looked like this:

Making the directory tree was easy. Figuring out where to put things under it wasn't. For example, I learned how to use a program such as tcpdump, which is great for examining network traffic to find out why two of your systems aren't working and playing nicely together.

Where does it go? Should it be Networks? Well, only systems people can use the program, because it's for troubleshooting and you can see vital information such as passwords and the contents of email messages in transit--so how about Admin? I'm not storing the software as such, just instructions on how to use it--so how about Documentation?

Notice the Misc category? Where do you suppose most stuff ended up?

Making a projects directory

A friend of mine had a well-organized system, and I happened to notice that he had a directory called projects under his home directory. It sounded good, so I made one, too.

Do I use the project name for each separate project? Oops, some projects have either duplicate or very close names. How about using the name of the person requesting the work? Does a two-minute job count as a project?

In short, this had all the same defects I encountered when making up my own categories, plus some brand-new ones.

Dumping it all in $HOME

I tried this during my "to hell with it" phase, and wound up with more than 1,000 files in my home directory. Take a wild guess how well that worked out.

Separating files by file type

I then tried putting text files in one area, PDF files in a second, web pages used for presentations in a third, and so on.

This only gave me a bunch of strangely named files without much in common, in no particular order. At least I could grep through the text files without accidentally getting a match from a PDF file and seeing gibberish all over my screen.

Treating it all as email

I noticed someone else storing most of his daily work as email. Anything having to do with a given project all ended up in a mail folder, and he could just use the mail reader of his choice to navigate around in it. That sounded cool.

Six hundred fifty-eight email folders later, I noticed a tiny problem. Where did I put that security notice about Solaris operating systems? Was it in the security folder or the solaris folder? And why did I put it in this one when it clearly should have gone into the other one? (Pick whatever values you like for "this" and "the other.")

All these methods had one thing in common: putting a bunch of semistructured information somewhere and expecting it to magically organize itself. Trying to impose additional order on stuff like this is a waste of time when you consider how much new semistructured information we send and receive every day.

Linux Server Hacks, Volume Two

Related Reading

Linux Server Hacks, Volume Two
Tips & Tools for Connecting, Monitoring, and Troubleshooting
By William von Hagen, Brian K. Jones

What Might Work Elsewhere

I tried a few things that almost worked ....

Dewey decimal system

I ran into something called CyberDewey, written by David Mundie. This sounded like the neatest thing since sliced bread, especially because he seemed to have the same problems I did when trying to organize files.

I went so far as to buy a copy of the Abridged Dewey Decimal Catalog, which is actually pretty nifty; if you're looking to organize your paper files, you could do a lot worse than use an existing classification scheme like this.

For example, suppose that I want to know where to file an article on hurricane relief. I flip to the Relative Index portion of the catalog (about 200 pages out of 1,000 total), look for "Hurricanes," and see something like this:

Hurricanes: 551.55
   Weather forecasting: 551.64
   See also: Disasters

"Disasters" sounds promising, so I try that:

Disasters: 904
   ...
   Social services: 363.34
   Public administration: 353.9

This gives me some category numbers to check in the front portion of the catalog:

363.34    Disasters, including floods and war

353.9     Safety administration
          ...
          Disaster and emergency planning

904       Collected accounts of events

551.55    Atmospheric disturbances, including cyclones, hurricanes, ...

At this point, I have some choices about how to file my article: make a folder called 904 if it's an interview with a Katrina survivor, or 353.9 if it's about FEMA or government response, and so forth. It's nice because someone else has already done the hard stuff (figuring out the categories and where they go).

Unfortunately, the stuff available in the Abridged version is a little too general for my job. I also didn't feel like typing in the categories by hand or forking over $275 for a copy of WebDewey. You can get some of the DDC headings in digital form, but not enough to solve my problem.

See also Three and Four Digit Headings from the Abridged Dewey Decimal Classification (about 2,000 entries).

DMOZ setup

The Open Directory Project wants to be the largest, most comprehensive human-edited directory of the Web. Among other things, it provides a nice set of categories that are well-organized, free, and already in digital form. It's not hard to take its category list and turn it into a directory tree suitable for a web page. Figure 1 shows a section of my home page based on that list.

DMOZ categories on my home page
Figure 1. DMOZ categories on my home page

The 00hierarchy link holds the computer-related topics from the DMOZ category list that I thought would be most useful. This is OK for a setup with a small number of files and reasonably clean delineation between topics, but it didn't quite do the trick for my daily work.

Canadian government setup

"But Minister, it isn't like this film is the first troublesome thing to come out of Canada. Let us not forget Bryan Adams."

"No, no. The Canadian government has apologized for Bryan Adams on several occasions."

--South Park: Bigger, Longer & Uncut

Canada took a stab at making a set of consistent categories for government records management called ARCS. It's a block numeric records classification system based on function and subject. Each functional or subject grouping of records is assigned a unique three- or four-digit number; this is a primary number, and it's the main building block for the system. The system uses these numbers to classify all information related to a subject or function, regardless of physical format.

Most government offices deal with a similar set of administrative requirements, and the ARCS setup is a pretty nice representation. The documentation comes in PDF form and is very thorough. I made a directory tree suitable for web access based on this setup:

The plain-text categories file I used to generate the directory tree and indexes is here. The University of Calgary has a similar system more suitable for colleges and universities.

This works better for administrative work than for something like my job.

Trouble-ticket system

I looked at several trouble-ticket and user-request systems, and they're dandy for collaboration between help desk people. Unfortunately, they don't really do the trick for organizing my day-to-day information. Here are some of the more interesting packages and papers I saw:

The one that seems to have improved the most is Roundup. Its features include:

What Seems to Work So Far

Nothing was quite right, so I invented my own system.

Cherry-picking GTD

The idea of doing a complete brain dump and storing everything in a trusted location was probably the most useful thing I learned from the Getting Things Done book.

My job as a system administrator doesn't change every day, but it's much easier to keep track of things via date rather than via subject. I tend to remember things in time order, so I finally stopped trying to change the way I work to fit some hierarchy. Instead, I made a directory structure on the machine to match my work habits.

I have a top directory cleverly named notebook with subdirectories in the form yyyy/mmdd, so every day has its own folder (Figure 2).

Every day has its own folder
Figure 2. Every day has its own folder

Someone might ask me, "Remember that thing we broke last Wednesday, and then we fixed it on Thursday before anyone noticed?" (See Figure 3.)

Finding a file by date
Figure 3. Finding a file by date

Make sure to index the files you use and modify most often for rapid retrieval. grep is fine for doing a quick-and-dirty search through a few files, but it doesn't work nearly as well for anything larger.

Things That Started to Go Right

The most noticeable improvement was finally being able to find the things that happened on a given date.

Here's what convinced me that I was on the right track: I'd think of a problem, and a moment later a fix would occur to me that was consistent and doable. Most of the scripts and utilities I needed were either already available or took fewer than 20 minutes to write.

Where is today's stuff?

About 95 percent of the time, I'm dealing with stuff for yesterday, today, and the coming week. A few symbolic links made navigation pretty easy (Figure 4).

Symlinks for navigation
Figure 4. Symlinks for navigation

I can get to next Monday's folder by typing cd ~/monday at the prompt, no matter where I am in the filesystem. Because I use the Z shell, I can also get there by just typing monday. Here are the shell settings:

cdpath=(.. ~ )  # specify a search path for the cd command.
setopt autocd   # cd to a directory if it's the first word on the command line.

How to do scheduling

Remind is a calendar and reminder program for Linux and most Unix systems, and it's made to order for this sort of thing. I've found two good articles on it:

Notebook directories

I use Remind to generate the symlinks from a cron job that runs just after midnight every day:

#!/bin/sh
# mknbdir: make notebook directory for the day.
# should be run just after midnight.
# make symbolic links for today, tomorrow, etc.

PATH=/usr/local/bin:/bin:/sbin:/usr/sbin:/usr/bin
export PATH

die () {
    echo "$@" >& 2
    exit 1
}

cd $HOME
top='notebook'
test -d "$top" || die "$top: dir not found"

#
# Handle pages for today and yesterday.
#

cur="`date +%Y/%m%d`"
test -d "$top/$cur" || mkdir -p "$top/$cur"
test -d "$top/$cur" || die "unable to make $cur"
test -L yesterday && rm yesterday
test -L today && rm today
ln -s $top/$cur today

#
# Handle pages for other dates.
# NOTE: "date" must have "-d" option for this to work.
#

other='yesterday tomorrow sunday monday tuesday wednesday
       thursday friday saturday'

for day in $other
do
    cur=`date -d "$day 03:00" +%Y/%m%d`
    test -d "$top/$cur" || mkdir -p "$top/$cur"
    test -d "$top/$cur" || die "unable to make $cur"
    test -L $day && rm $day
    ln -s $top/$cur $day
done

#
# Keep a year's worth in advance.
# NOTE: must have "remind" installed for this to work.
#

cd $HOME/$top

echo 'REM MSG daily' |
    remind -s+53 - |
    sed -e 's!^\(....\)/\(..\)/\(..\).*!\1/\2\3!' |
while read dir
do
    test -d $dir || mkdir -p $dir
done

exit 0

This script also illustrates using Remind to generate dates without having to worry about leap years and other events. Figure 5 shows how to print this week plus the next 52 weeks.

Printing the next year of weeks
Figure 5. Printing the next year of weeks

Daily agendas

Anything on my to-do list for today goes in a file called agenda under today's notebook directory. The a command simply runs Remind on today's agenda file (Figure 6).

Finding today's agenda
Figure 6. Finding today's agenda

I've also written a slightly more advanced script to display your agenda for the next few days.

The rem script allows me to edit or create new agendas quickly:

#!/bin/ksh
# rem: edits the agenda file in today's notebook directory
# (default), or one entered on the command line.  If the
# agenda file doesn't exist, a new one is created.

PATH=/bin:/usr/bin:/usr/local/bin
: ${EDITOR="/usr/bin/vi"}
export PATH EDITOR
umask 022

afile=$HOME/today/agenda

for ac_option
do
    case "$ac_option" in
        -*) ;;
        *)  afile="$ac_option" ;;
    esac
done

test -e "$afile" || echo "REM MSG new agenda" > $afile
exec $EDITOR $afile
exit 0
Timed pop-up reminders

Suppose that I have a meeting at 2 p.m. in room 205, and I'd like a reminder to show up on my screen 5 minutes before it starts. Figure 7 shows all that it takes. A small program called showcal runs once every minute looking for a file called hhmm.rem in today's notebook directory. If it finds one, it runs the contents through Remind and then to another program that handles screen pop-ups.

Creating a pop-up reminder
Figure 7. Creating a pop-up reminder

What if I want a reminder every day at a certain time? I just put the same type of file in the ~/.calendar directory instead (Figure 8).

Adding a daily reminder
Figure 8. Adding a daily reminder

This reminder will show up at 8:35 p.m. to ensure that I don't miss anything crucial if I happen to be at work late on Sunday or Wednesday. It won't show up on any other days.

How to record progress

Search, Don't Categorize

It's a lot easier to break up my workstation text files by frequency of updates, and then set up searching appropriately:

I installed SWISH-E to index anything in text format. That plus one or two shell scripts handles my search needs.

Other Things That Help

That's not all; I've developed several other tricks.

A good shell

I used tcsh and Bash for several years. They're fine programs, but I haven't found anything to match the flexibility of the Z shell:

A good window manager

I started using the FVWM window manager in 1999, and really liked it when configured using Eric Raymond's Big Blue-Steel Desktop. However, I wanted to make better use of the open desktop space I had, so I moved to the IceWM window manager a few months ago.

I have four virtual desktops for my basic work environment:

The function keys F1-F4 take me to desktops 1-4, respectively. F5 locks my keyboard. F6 lets me either restart IceWM or log out.

IceWM takes up very little screen real estate. I have one toolbar at the bottom of my screen that is barely 3/8 inch high, so I have plenty of room for applications (Figure 12).

My IceWM toolbar
Figure 12. My IceWM toolbar

The leftmost button (the IceWM logo) brings up a programs menu. Button 2 (which looks like an underscore) dismisses all the applications on your current desktop. It's a toggle; click it again and they all come back. Button 3 (which looks like overlapping windows) gives a one-click menu to get to any open application on any workspace. Button 4 (which looks like a monitor) opens a tiny Xterm window. Button 5 (Web) starts Mozilla in workspace 2; button 6 (Xpdf) starts Xpdf in workspace 3; button 7 (Lock) runs xlock; button 8 (Leo) runs the Leo outliner in the current workspace; and button 9 (PrtScrn) uses the ImageMagick program import to save a screenshot to the file $HOME/prn/screendump.png. Finally, the buttons containing numbers take me directly to a respective workspace.

My IceWM setup files are $HOME/.icewm/keys, $HOME/.icewm/menu, $HOME/.icewm/preferences, $HOME/.icewm/theme, $HOME/.icewm/toolbar, and $HOME/.icewm/winoptions.

My general X-windows setup files are $HOME/.Xdefaults, $HOME/.xinitrc, $HOME/.xmodmaprc, and $HOME/.xsession.

A decent browser

You need a web browser that does something besides act like a virus delivery service. Mozilla or Firefox will do the trick; I prefer Mozilla because on Firefox I had too much trouble trying to get the keyboard working the way I liked.

My only complaint was having the backspace key take me to the previous link, instead of simply moving the display of the current page back by one screen. I added lines 34 and 35 to /usr/X11R6/lib/mozilla/res/builtin/htmlBindings.xml to change the backspace key accordingly:

1  <?xml version="1.0"?>

...

32  <handler event="keypress" keycode="VK_LEFT" command="cmd_scrollLeft" />
33  <handler event="keypress" keycode="VK_RIGHT" command="cmd_scrollRight" />

34  <handler event="keypress" keycode="VK_BACK" command="cmd_scrollPageUp" />
35  <handler event="keypress" keycode="VK_DELETE" command="cmd_scrollPageUp" />
36  <handler event="keypress" keycode="VK_HOME" command="cmd_scrollTop"/>
37  <handler event="keypress" keycode="VK_END" command="cmd_scrollBottom"/>

38  <handler event="keypress" key="x" command="cmd_cut"
modifiers="accel"/> 39  <handler event="keypress" key="c"
command="cmd_copy" modifiers="accel"/>...

This doesn't seem to work with Mozilla-1.7.12, unfortunately. You have to unzip the file toolkit.jar in /usr/X11R6/lib/mozilla/chrome, go into the content/ directory, modify the platformHTMLBindings.xml file to hold the key mappings you like, and create a new toolkit.jar file holding the modified files.

Mozilla has a nice list of keyboard shortcuts available.

My Mozilla setup files are $HOME/.mozilla/userid/odd-string/user.js, $HOME/.mozilla/userid/odd-string/prefs.js, $HOME/.mozilla/userid/odd-string/chrome/userChrome.css, and $HOME/.mozilla/userid/odd-string/chrome/userContent.css.

Configurable mail delivery

I use qmail as my message transfer agent because it's secure and extensible. My user ID is vogelke, so I own any address starting with vogelke- and can filter mail to that address accordingly.

I have several qmail files in my home directory. The basic .qmail file is for any mail addressed to me without any dash extensions on the end. Each message runs through the procmail program for further filtering (Figure 13), and a copy of that same message gets appended to my backup/vogelke file.

Filtering incoming email
Figure 13. Filtering incoming email

When I want to save an outgoing message, I send a BCC to vogelke-bcc, which appends a copy of the message as seen by the mail delivery software to my mail/sentmail file (Figure 14). This way, the message in sentmail is identical to the one seen by the recipient, in case I need to do any troubleshooting. Some mail readers can save a copy of your outgoing mail, but they don't do such a good job saving the headers you might need in case there's a delivery problem.

Saving a copy of a message I've sent
Figure 14. Saving a copy of a message I've sent

If I want to send mail to someone and have a copy of that message posted to my weblog, I send a BCC to vogelke-blog. This sends a copy of the message to the newpost program (Figure 15), which readies it for web display.

A weblogging alias
Figure 15. A weblogging alias

This example is a little more complex. I like to keep a record of the messages I've sent without having to keep copies of every single message, so I always send a BCC to vogelke-header. This sends a copy of the outgoing message through the formail program, which extracts the most useful lines from the header and appends them to a file in my mail folder called SENT.year-week (Figure 16).

Extracting headers from sent messages
Figure 16. Extracting headers from sent messages

This provides me with a record of everyone I've mailed, broken out by week (Figure 17).

Everyone I've mailed
Figure 17. Everyone I've mailed

Mail to vogelke-xnote goes through a program called xnote (Figure 18), which displays pop-up messages. Sending something to vogelke-xnote with the subject line Wake up causes a pop-up to appear (Figure 19). This is very useful, because any host that can send you mail can pop a message on your screen. The servers I maintain are all set up to send mail to me in this way whenever they shut down nicely and reboot, whenever hourly checks indicate that disk space is getting tight, and the like.

An alias to pop up messages
Figure 18. An alias to pop up messages

An emailed pop-up
Figure 19. An emailed pop-up

The xnote script is basically a wrapper for another program called xalarm.

A good email reader

Mutt is a fast, customizable mail-reader with lots of features:

My setup is almost identical to that of Dave's mutt config. Figure 20 shows message 1 of 27 from my inbox. My screen displays 47 lines at a time, and most of my email messages are shorter than that, so I rarely have to scroll through multiple pages to see if I need to keep or act on a message.

Reading my inbox
Figure 20. Reading my inbox

Templates

Most wheels aren't worth reinventing. If you find yourself constantly rewriting the same code snippets or email, it's time to pick a language and a template setup.

Perl and the Text::Template package suit me fine, but if push comes to shove, any decent scripting language with variable substitution can serve as a template engine.

Here are some of the better articles I've seen on choosing (or writing) a template system.

Code fragments or cliches

If you spend more than five minutes figuring out how some language function or WordSmasher-2000 utility works, write it down. I have a directory called ~/cliche that holds things that held me up, things I don't want to lose, or things I don't feel like typing in again (Figure 21).

Snippets I'll use again
Figure 21. Snippets I'll use again

I called it cliche because most of the snippets are the moral equivalent of "I'm just here for the team"; people expect them, and I'll probably end up using them sooner or later. For example, the file ~/cliche/ASCII/alphabet simply keeps me from having to stumble all over the keyboard if I need to loop through the alphabet for some reason:

0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z

The file ~/cliche/perl/yesterday is a four-line function in the Perl language providing the time 24 hours ago, and so on.

Useful Links

Karl Vogel is a Solaris/BSD system administrator at Wright-Patterson Air Force Base, Ohio.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.