Jeromy Anglim's Blog: Psychology and Statistics


Monday, March 8, 2010

Choosing an Auto Generation Pattern for BibTeX Keys in JabRef

This post discusses the issue of choosing a default pattern for the BibTex key generator in JabRef.

THE CONTEXT
If you haven't already heard, JabRef is an open source reference manager built on Java, and BibTeX is a file format for storing references. I've recently been making the transition from Word to LaTeX and from Endnote to BibTeX and JabRef (although you can use JabRef with MS Word).

Once I had imported my references from Endnote into JabRef, I needed to generate Bibtexkeys. These keys could then be used to link references in the BibTeX database to citations in a document.  Jabref facilitates the task using an auto key generator. This generator takes information from references to create the keys. A typical key pattern might be [auth][year] which might create a key like "Smith2000" for an article by Smith in the year 2000.

However, this presented a challenge. Once created, keys should not be changed. If the choice of key mattered and I chose a poor key, it would be a big issue once I had many documents integrated with many citations all using these key formats. The key can also be used to link article PDFs if you rename your files to match the key. Thus, I decided to think about whether the choice of key mattered and if so, what makes a good key?

The remainder of this post discusses my thoughts and sets out the auto-key generation pattern that I adopted.

DECIDING ON A KEY

A good key is: 
Unique: Within the database a key must be unique in order to link citations to the appropriate reference. While Jabref  automatically appends letters to duplicates in the database (i.e., Smith2000a, Smith2000b), if you need to combine two databases from two different researchers, it would be better if there were no identical keys.
Short: Short references are easier to type. They take up less space when cited in LaTeX source.
Readable in LaTeX: If the key suggests the reference, this can make the LaTeX text more readable. It also means that if the database is lost, damaged, or not available, a reasonable guess can be made about what was the intended citation, at least if you know the area of research.
Excludes problematic characters: Certain characters would prevent the key from being used in LaTeX or prevent it from being used as a file name. Using only letters and numbers and starting with a letter seems like a safe option.
Never Changes: A good key never changes. If it were to change, connections to the reference in documents and files would also need to be updated.

Decision:
In the end I adopted the following algorithm
[auth][year][journal:abbr]:
This key prints first author, year and initials of the journal. It is of moderate length (perhaps 10 to 20 characters), quite highly unique, and quite recognisable. It also represents how I file articles in my head. The choice may also reflect my training in APA format, which adopts an author-year style of citation. Thus, when speaking to others about an article, I might refer to it as "Smith 2000" or "Smith's 2000 JAP article".

I also had to overcome one extra issue:

Avoiding Problematic Characters



Problem: I found that some of my journal names included Ampersands (&) (e.g., Memory & Cognition; Journal of Personality & Social Psychology). This led to Bibtexkeys with Ampersands (&), which is problematic for LaTeX. 

Solution: I changed the journal names to use "and" instead of "&". For example, "Memory and Cognition" replaced "Memory & Cognition". I also added a field to the replace regular expression in JabRef.
Replace (regular expression): \&
by: 
i.e., the replacement was the "&" preceded by an escape character and the "by" field was left blank indicating that such a character should be removed.

Here's a screen shot of my BibTex key generator preferences:

What happens if the details of the reference change?
Once a key has been generated it should almost never change. Thus, even if a field in the reference used to generate the key was incomplete or had a typographical error, the key should not be regenerated. Similarly when references are imported that use a different key, they should not be altered. 
In short, I think its better to not have to think about it too much.  In JabRef I also specify "Do not overwrite existing keys" to further prevent the loss of keys.




LOOKING UP THE KEY
I have JabRef open while I edit my documents. My procedure:
1. alt-tab to bring up JabRef
2. Control + F (Find) type identifying information e.g., author year
3. Control + Shift + E (Focus entry table)
4. Down Key (to highlight first reference)
5. Control + Shift + K (to copy BibTex key) (or Control + K to include the \cite{...} text)
6. Alt + Tab to return to text editor
7. paste citation into text editor

I believe some editors have even better integration with JabRef than the one that I use.