Constructing a list of valid species from Catalog of Fishes 1: The Emacs approach

One of the most common tasks for a number of studies involves the preparation of a list of accepted names. This can be a hard task for some groups without centralized and curated taxonomic resources, but fortunately in fishes we have Eschmeyer’s Catalog of Fishes, housed by the California Academy of Sciences. It is by far the best taxonomic database existent and even other famous ones such as Fishbase are tend to be in line with it as it is manually curated by specialists. In contrast to Fishbase, it does not allow programmatic access through APIs or packages, so it is often necessary to fetch and hand-pick the information of interest.

Below is an Emacs recipe for fetching a list of accepted names given a taxonomic search so that we can easily build those lists regardless of the size of the taxonomic group. The only thing we need is a functional install of Emacs (tested in Emacs 24 and 25 under linux). Remember that semicolons (;) are comments in Emacs Lisp, Emacs’ programming language. The recipe follows common conventions in Emacs: C = Control, M = Meta (usually Alt), and S = Shift; these letters in combination with uppercase or lowercase ones represent key combinations or keystrokes. For instance, C-S-x means to hold Control, Shift, and lowercase x, or M-w means to hold Alt and w.

Steps to obtain a list of valid species from CAS:


; open eww and search the CAS catalog
M-x eww "CAS catalog of fishes"

; incremental search and then click into the link
C-s Search the catalog

; go to the search box and look for the genus of 
; interest
Chaetostoma

; select all the text in the page
C-x h

; copy the selection
M-w

; paste (yank) it to another non-read-only 
; buffer, e.g., <em>scratch</em>
C-y

; back to beginning of the buffer
C- or M-x beginning-of-buffer

; all semicolons are comments in elisp! 
; replace them with commas
C-M-% ; RET , RET !

; a bunch of multiple blank characters 
; are messing with the info, reduce all 
; of these to single spaces
C-M-% [[:blank:]]+ RET ' ' (single space) RET !

; be sure that no further spaces are present
C-M-% \s-+ RET ' ' (single space) RET !

; back to beginning of the buffer
C- or M-x beginning-of-buffer

; the field of interest is "Current status", 
; but in order to homogenize text remove all 
; newlines but replace them with spaces
C-M-% C-q C-j RET ' ' (single space) RET !

; back to beginning of the buffer
C- or M-x beginning-of-buffer

; be sure that no further spaces are present
C-M-% \s-+ RET ' ' (single space) RET !

; insert newlines just before the regex 
; "Current status"
C-M-% 'Current status' RET C-q C-j 'Current status' RET !

; back to beginning of the buffer
C- or M-x beginning-of-buffer

; be sure that no further spaces are present
C-M-% \s-+ RET ' ' (single space) RET !

; back to beginning of the buffer
C- or M-x beginning-of-buffer

; keep only lines with "Current status" regexp 
; as there is a single header line messing with 
; results
M-x keep-lines RET "Current status: Valid as" RET

; remove the text before the species name as it is 
; uniform accross records replacing with "nothing"
C-M-% "Current status: Valid as " RET RET !

; now remove starting with "Loricariidae" until end 
; of line replacing the regexp with "nothing"
C-M-% Loricariidae.+$ RET RET !

; first select all text and then sort the lines
C-x h
M-x sort-lines

; just keep lines with Chaetostoma as some names 
; were once associated to Chaetostomus or Chaetostoma 
; but not anymore
M-x keep-lines RET "Chaetostoma" RET

Author: gaballench

Ictiólogo con interés en informática y música, simpatizante de iniciativas open source.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s