Referencing zoological name authorships in LaTeX


During the preparation of a manuscript using \LaTeX I came across an issue with references for authors of taxonomic entities (e.g., species and genus names). Initially I started writing taxonomic authors as plain text hoping to solve the cross-referencing later by citing elsewhere the same work or so. However, for some references there was no need at all to cite them elsewhere but only at the name, and this pushed me to look for a solution. Initially I considered to code a variant of the commands \citet and \citep of the natbib package that results way useful for handling bibliographies and text references.

The problem of referencing authors of names is that the International Code on Zoological Nomenclature (ICZN) has specific rules for citing such information in taxonomic works, and they may mimic regular in-text references to the uninitiated. Also, several pre-ICZN works handled variably these cases as there was no Code to rule them all (fortunately now we do have one).

This case will be specific to zoological nomenclature and may not be directly useful for botanical nomenclature because they use author abbreviations (e.g., Planch. for Planchon, or even L. for Linnaeus); therefore, the instructions below would need modification for rendering the author in abbreviated form but retaining the full name in the bibliography. I don’t even know if it is possible at all.

Please note that all instances of Author1990 are examples of a reference key that you need to define in your bibtex bibliography file (in this example, manuscript.bib). A typical bibtex entry would look like this:

author = {Author, Bob A},
journal = {Tertiary Research},
number = {4},
pages = {131--137},
title = {{Fossils from the Bracklesham Group exposed in the M27 Motorway excavations, Southampton, Hampshire}},
volume = {12},
year = {1990}

Regular citations

Basically, textual references from natbib are cited depending on whether they are intended to be textual citations where the author is part of the sentence, or whether both author and year are to be part of the citation:

  • Reference between parentheses: \citep{Author1990}. Example: Author (1990) suggested that this genus resembles the other genus.
  • Reference without parentheses: \citet{Author1990}. Example: Genus A resembles genus B (Author, 1990).

Taxonomic authorship

The ICNZ (Chapter 11, Article 51, p.54) considers two cases: Either we want to cite the author of the name but not the year of publication, or we want to cite in full the author and year of publication; for the latter case the separation between author name and year must have a comma and a space, and there must be no mark or character between specific epithet name and the author. Also, the style of citation will depend on whether the name is to be used as first proposed by the author, that is, a species combined with its original genus, or whether the specific epithet underwent allocation to a genus other than the original. In the first case, the author (or author-and-year) are never to be put between parentheses (Article 51.2), whereas in the latter, the use of parentheses around the author (or again, author-and-year) is mandatory (Article 51.3). This latter usage leads to confusion for persons unaware of the correct citation of authors in zoological nomenclature that tend to view these as bibliographical citations, that is, everywhere put between parentheses. As is clear form the Code, the parentheses indicate that such specific name underwent some allocation from its original genus, that is, someone proposed a new combination with another genus.

The proper way to get the desired, ICNZ-compliant results are as follows:

  • Only the author without year, as originally proposed (sensu Art. 51.1): \citeauthor*{Author1990}. Please note that for multi-authored names the Code requires to write in full all the author names (i.e., without abbreviating to the et al. for three or more authors). For one or two authors there is no difference between \citeauthor{Author1990} and \citeauthor*{Author1990}, both will produce the author name lacking parentheses and year.
  • Both author and year, as originally proposed (sensu Art. 51.2): \citealp{Author1990} and \citealp*{Author1990}. In this case, the author and year are to be produced without parentheses, hence the “al” in the command. These are identical to \citep{Author1990} and \citep*{Author1990} except that they remove the parentheses around parenthetical citations.
  • Only the author without year, for a name that underwent some sort of recombination or allocation to a genus other than the original one (sensu Arts. 51.1 and 51.3): Just use \citeauthor{Author1990} or \citeauthor*{Author1990} but place manually the parentheses around the command. This will render the desired output from the \citeauthor command but between parentheses. Example: (\citeauthor*{Author1990}) will produce (Author) instead of just Author.
  • Both the author and year, recombined or allocated to a genus other than the original one (sensu Art. 51.3): \citealp{Author1990} and \citealp*{Author1990} but between parentheses. As above, manual parentheses around the command will produce the desired output. Example: (\citealp*{Author1990}) will produce (Author, 1990) instead of just Author, 1990.

It is noteworthy that this way you will not need to worry with journals that ask you to cite all of the name authorships in taxonomic works (e.g., Neotropical Ichthyology). With this approach you can produce a document with cross-references and formatted authorship citations, without need of manual insertion of these into the document (that for \LaTeX happens to be impossible as far as I know). Once you use any of the natbib commands, the reference will be inserted and formatted automatically.

A hypothetical example

The \LaTeX code block will generate the PDF below showing the regular citations and taxonomic references properly handling the use cases discussed above.


\usepackage{natbib} % this package will manage the citations and provides the \citeX commands
\usepackage[utf8]{inputenc} % allows to input with other encodings
\usepackage[colorlinks=true,citecolor=blue]{hyperref} % this colors the references and links
\usepackage[noblocks]{authblk} % for the affil block in authorships

\author{Gustavo A. Ballen}
\affil{Museu de Zoologia da Universidade de São Paulo,}
\title{On the genus \textit{Sphyraena}}



This article reviews the nomenclatorial status of \textit{Sphyraena bolcensis} 
\citealp{Agassiz1843}, or alternatievely if we want, of \textit{Sphyraena bolcensis} 
\citeauthor{Agassiz1843} without the year. A species originally described in 
\textit{Sphyraena} but currently removed from it by \citet{Woodward1901} is 
\textit{Sphyraenodus speciosus} (\citeauthor{Leidy1877}), or with the year, 
\textit{Sphyraenodus speciosus} (\citealp{Leidy1877}).

These usages are compliant with the relevant articles of the \citet{ICZN1999}. This 
kind of bibliographical reference has been used erroneously in the past, specially 
in pre-ICZN works \citep[e.g.,][p.]{Rapp1946}. In the latter example, 
\textit{Sphyraenodus silovianus} (\citeauthor{Cope1875}) should have been written 
down as \textit{Sphyraenodus silovianus} \citeauthor{Cope1875} instead as this 
species was originally described in \textit{Sphyraenodus} and not 

\bibliographystyle{apa} % type of citation style to be used, here 'APA'
\bibliography{manuscript.bib}  % the bibtex file 'manuscript.bib'



Final comment

As we have seen this blog post reinforces the idea that we can use \LaTeX for successfully preparing a taxonomic manuscript that is compliant with the ICZN and also makes use of the automation power of this typesetting system and its huge package ecosystem. Now we need to convince both publishers and scientific societies that edit and publish scientific periodicals to better support (or just support, to begin with) \LaTeX submissions. In the end, most (if not all) manuscript submission systems will send a PDF to the reviewers and not the Word version submitted by the authors. This would be a step further towards support of free software by avoiding the restriction to use a commercial package such as MS Office.


Merry Christmas!

It’s Christmas time, and I decided to enter the mood by making a Christmas tree representing my current interests. Back in 2015 I constructed a Christmas tree based on the shape of an arbitrary cladogram and then added color and shapes in order to represent a Christmas tree. Below I describe how I did it long with the final output. Then I describe the process for constructing another Christmas tree, this time using a random walk stochastic process and some automatic additions to the plot of several realizations of such process. As with the cladistic Christmas tree, I present below the output along with the script for constructing it. These two graphics represent my shift from a hard-core parsimony guy to a fanatic of all quantitative things.

Cladistic, manual version

First thing is to create a taxon-only matrix in mesquite, and then to manually create the relationships and also to transpose the daughter branches for each node. The output would be something like this:

[written Wed Dec 23 14:54:24 BRST 2015 by Mesquite  version 3.04 (build 725) at ware/]

	TITLE Taxa;
		taxon_1 taxon_2 taxon_3 taxon_4 taxon_5 taxon_6 taxon_7 taxon_8 taxon_9 taxon_10 taxon_11 taxon_12 taxon_13 taxon_14 taxon_15 taxon_16 taxon_17 taxon_18 taxon_19 taxon_20


	Title Trees;
	LINK Taxa = Taxa;
[0] 		1 taxon_1,
[1] 		2 taxon_2,
[2] 		3 taxon_3,
[3] 		4 taxon_4,
[4] 		5 taxon_5,
[5] 		6 taxon_6,
[6] 		7 taxon_7,
[7] 		8 taxon_8,
[8] 		9 taxon_9,
[9] 		10 taxon_10,
[10] 		11 taxon_11,
[11] 		12 taxon_12,
[12] 		13 taxon_13,
[13] 		14 taxon_14,
[14] 		15 taxon_15,
[15] 		16 taxon_16,
[16] 		17 taxon_17,
[17] 		18 taxon_18,
[18] 		19 taxon_19,
[19] 		20 taxon_20;
	TREE tree1 = (20,(18,((1,((3,((15,((4,((5,((7,((11,((9,10),12)),8)),6)),13)),14)),16)),2)),17)),19)[% ] [% ] [%  setBetweenBits = selected ];


Then I manually edited it in figtree, mostly making the branches thicker, and fixing the fonts, resulting in the following file:

begin taxa;
	dimensions ntax=20;

begin trees;
	tree tree_1 = [&R] (taxon_20:1.0,(taxon_18:1.0,((taxon_1:1.0,((taxon_3:1.0,((taxon_15:1.0,((taxon_4:1.0,((taxon_5:1.0,((taxon_7:1.0,((taxon_11:1.0,((taxon_9:1.0,taxon_10:1.0):1.0,taxon_12:1.0):1.0):1.0,taxon_8:1.0):1.0):1.0,taxon_6:1.0):1.0):1.0,taxon_13:1.0):1.0):1.0,taxon_14:1.0):1.0):1.0,taxon_16:1.0):1.0):1.0,taxon_2:1.0):1.0):1.0,taxon_17:1.0):1.0):1.0,taxon_19:1.0);

begin figtree;
	set appearance.backgroundColorAttribute="Default";
	set appearance.backgroundColour=#ffffff;
	set appearance.branchColorAttribute="User selection";
	set appearance.branchColorGradient=false;
	set appearance.branchLineWidth=7.0;
	set appearance.branchMinLineWidth=0.0;
	set appearance.branchWidthAttribute="Fixed";
	set appearance.foregroundColour=#000000;
	set appearance.hilightingGradient=false;
	set appearance.selectionColour=#2d3680;
	set branchLabels.colorAttribute="User selection";
	set branchLabels.displayAttribute="Branch times";
	set branchLabels.fontName="Abyssinica SIL";
	set branchLabels.fontSize=8;
	set branchLabels.fontStyle=0;
	set branchLabels.isShown=false;
	set branchLabels.significantDigits=4;
	set layout.expansion=642;
	set layout.layoutType="RECTILINEAR";
	set layout.zoom=0;
	set legend.attribute=null;
	set legend.fontSize=10.0;
	set legend.isShown=false;
	set legend.significantDigits=4;
	set nodeBars.barWidth=4.0;
	set nodeBars.displayAttribute=null;
	set nodeBars.isShown=false;
	set nodeLabels.colorAttribute="User selection";
	set nodeLabels.displayAttribute="Node ages";
	set nodeLabels.fontName="Abyssinica SIL";
	set nodeLabels.fontSize=8;
	set nodeLabels.fontStyle=0;
	set nodeLabels.isShown=false;
	set nodeLabels.significantDigits=4;
	set nodeShape.colourAttribute=null;
	set nodeShape.isShown=false;
	set nodeShape.minSize=10.0;
	set nodeShape.scaleType=Width;
	set nodeShape.shapeType=Circle;
	set nodeShape.size=4.0;
	set nodeShape.sizeAttribute=null;
	set polarLayout.alignTipLabels=false;
	set polarLayout.angularRange=0;
	set polarLayout.rootAngle=0;
	set polarLayout.rootLength=100;
	set polarLayout.showRoot=true;
	set radialLayout.spread=0.0;
	set rectilinearLayout.alignTipLabels=false;
	set rectilinearLayout.curvature=0;
	set rectilinearLayout.rootLength=100;
	set scale.offsetAge=0.0;
	set scale.rootAge=1.0;
	set scale.scaleFactor=1.0;
	set scale.scaleRoot=false;
	set scaleAxis.automaticScale=true;
	set scaleAxis.fontSize=8.0;
	set scaleAxis.isShown=false;
	set scaleAxis.lineWidth=1.0;
	set scaleAxis.majorTicks=1.0;
	set scaleAxis.origin=0.0;
	set scaleAxis.reverseAxis=false;
	set scaleAxis.showGrid=true;
	set scaleBar.automaticScale=true;
	set scaleBar.fontSize=10.0;
	set scaleBar.isShown=true;
	set scaleBar.lineWidth=1.0;
	set scaleBar.scaleRange=0.0;
	set tipLabels.colorAttribute="User selection";
	set tipLabels.displayAttribute="Names";
	set tipLabels.fontName="Abyssinica SIL";
	set tipLabels.fontSize=16;
	set tipLabels.fontStyle=2;
	set tipLabels.isShown=true;
	set tipLabels.significantDigits=4;
	set trees.order=false;
	set trees.orderType="increasing";
	set trees.rooting=false;
	set trees.rootingType="User Selection";
	set trees.transform=false;
	set trees.transformType="cladogram";

We can save the file above in svg vectorial format so that then we can do a bit of edition in Inkscape I ended up with this:


The process is fairly manual, but it pleased both my contacts in social networks and myself, and it was funny to tweak things so that an evolutionary construct based on graph theory could mimic a Christmas tree. It is noteworthy that all was carried out with open source programs, so you can try it yourself without need to pay for the programs 🙂

Stochastic, automated Christmas tree

Stochastic processes are models that describe phenomena incorporating uncertainty across time, that is, where the final result and the trajectory of a sequence of events can not be determined, however, they can be characterized so that we can build expectations about them. A big deal of real-world phenomena can be described using stochastic processes, from time series analysis (including financial applications, medical imaging, and world-scale phenomena such as global warming) to modeling of evolution (e.g., Brownian motion, the birth-death process).

One of these is often called random walk, and says that for a given variable, the values it takes through time are allowed to show variation (or to show uncertainty) between certain admissible values, so that the change in the value from a step 1 to a step 2 can not be predicted exactly and can change if we repeat the realization of the events. Mathematically we then say that, for a time step t and its previous step t - 1 in variable x, the value of x at time step t would depend on the previous value t - 1 plus or minus an uncertain quantity that applies to the current step w_t:

x_{t} = x_{t - 1} + w_{t}

Oftentimes, we use the normal distribution for describing w_t centered at \mu = 0 and with variance \sigma^2 taking smaller or larger values depending on whether we want the process to vary a lot or just a little; the larger the variance, the larger the likely values that w_t can sum or rest from the previous value.

w_t = N(\mu,\sigma^2)

If we repeat a lot of times such stochastic processes we can consider a whole bunch of possible trajectories, that once plotted together, can produce the shape of a pine with tip on the origin.

Using unicode characters we can plot a point with a colored star to the plot area with the points function, and choose colors with finer control with the function RGB that combines values of the colors red, green, and blue, along with a transparency value alpha. We can use this source for picking a color visually and then look for the adequate amounts of red, green, and blue.

# set seed for repeatable results
# initialize the plot with the tip star
plot(x = 0, y = 0, xlim = c(-60, 60), ylim = c(-500, 0),
     xlab = NA, ylab = NA, pch = -9733, cex = 3,
     col = rgb(red = 255/255, green = 191/255, blue = 0/255))
# set initial values
x <- 0
y <- x
# design a data frame for saving the values from each realization
allPoints <- data.frame(x, y, stringsAsFactors = FALSE)
# iterate over realizations and numbers of steps for producing
# 1000 different trajectories, each with 500 steps
for (j in 1:1000) {
    x <- 0
    y <- x
    for (i in 1:500) {
        # x to the left of the arrow is x_t and the one to the right is x_{t-1}
        # plus or minus an amount drawn at random from a normal(0,0.7)
        x <- x + rnorm(1, sd = 0.7)
        # append the value to the series
        y <- c(y, x)
    # plot the line with the values saved before
    lines(x = y, y = -seq_along(y),
          col = rgb(green = 128/255,
                    red = 22/255,
                    blue = 20/255,
                    alpha = 0.3))
    # save the data to the data frame
    allPoints <- rbind(allPoints, data.frame(x = y, y = -seq_along(y)))
# plot the star again
points(x = 0, y = 0, pch = -9733, cex = 3,
       col = rgb(red = 255/255, green = 191/255, blue = 0/255))
# red balls
# red balls, y axis
redY <- seq(from = 0, to = 500, by = 80)[-1]
# sequence for calculating the values of each x component for a given y in red balls
for (i in redY) {
    redX <- c(seq(from = min(allPoints[allPoints$y == -i, "x"]),
                  to = max(allPoints[allPoints$y == -i, "x"]),
                  by = 20),
              max(allPoints[allPoints$y == -i, "x"]))
    points(x = redX, y = rep(x = -i, times = length(redX)),
           pch = 21,  bg = "red", cex = 2)
# yellow balls
# yellow balls, y axis
yellowY <- (seq(from = 0, to = 500, by = 80) + 40)[-length(seq(from = 0,
                                                               to = 500,
                                                               by = 80) + 40)]
# sequence for calculating the values of each x component for a given y in red balls
for (i in yellowY) {
    yellowX <- c(seq(from = min(allPoints[allPoints$y == -i, "x"]),
                     to = max(allPoints[allPoints$y == -i, "x"]),
                     by = 20), max(allPoints[allPoints$y == -i, "x"]))
    points(x = yellowX, y = rep(x = -i, times = length(yellowX)),
           pch = 21, bg = "yellow", cex = 2)


A couple of tricks were needed in order to produce the orientation of trajectories and position of red and yellow balls: explicit transposition of variables where the values of x were flipped to its negative -x and the y as the independent variable and x as the dependent one. Also, the coordinate points for the balls were automated and therefore they may look weird to the right, since I did not found a way to automatically calculate coordinates and variable distance among points so that the initial and final ones were always on the largest or smallest point. It might be possible but I just got tired of thinking about an alternative, in the end, a script that builds the whole thing seems fair enough.

That’s all for now, happy holidays to everybody!

Assorted tools for interacting with servers

Personal list of commands to remember when running analyses on servers. This post is expected to grow from time to time as new commands/command-combinations are found to be useful.

# login with ssh
ssh -l user ip # e.g., myUser 186.333.444.111

# create a virtual screen
screen -S screenName # use something identifying the analysis in screenName

# back to virtual screen called "screenName"
screen -r screenName

# list screens
screen -r

# detach virtual screen keeping the analysis
Ctrl + a + d

# detach virtual screen AND kill the job
Ctrl + c

# copy files through ssh to server
# supports the -r recursive tag for several files/directories
scp user@ip:/path/to/files path/to/local/directory # from directory in user at ip to local directory 
scp path/to/local/file user@ip:/destination/directory
scp -r path/to/local/directory user@ip:/destination/directory

# monitor processes while they are running based on the filetype and modification time
find . -name *.log -ls | grep "date-right-now" # e.g., "Apr  06" or "Apr 21", note the space in the former

Common compilation errors in Beamer and possible solutions (part 1)

This blog post is the first of a series that aims at documenting the most common errors I’ve found when preparing presentation slides in beamer AND using the knitr package for compiling both \LaTeX and R code in the same document and the explanations and solutions found for them. Most of these, of course, are based on a huge pile of stack overflow and most of them were found about a year or so ago when preparing the slides for my quals, unfortunately, I did not save any of these sources since. As for the date of preparation of this post, I will link the solutions found elsewhere, that frequently are used partially or in addition to other sources/tests. As a final note, some of the problems/solutions apply when authoring slides with R code evaluation, so you will not likely find them unless using knitr. Please note that the error messages come from Emacs+ESS, so I’m not sure if you will find them spelled exactly the same in other tools such as RStudio; I’m not even sure that you can compile them so compactly tools other than Emacs. I will try to reproduce some examples of them as code so that you can compare them to your own code. That said, it just suffices to point out that Emacs rules!

Caveat. I am testing the \LaTeX code with Emacs+ESS compiling with the keystrokes M-n r and then M-n P (uppercase) and RET in the buffer of the .Rnw file in order to compile the PDF slides. I haven’t tried other latex environment but would love to hear about alternative error messages for the same cases herein highlighted.

'Missing $ inserted'

Most likely you used a character or symbol reserved for math mode (e.g., underscore _). Please note that these characters need to be escaped with the \ or its respective reserved word, or even enclosed into a math inline “pharse”, for instance when using greek letters:

Good: $\alpha$-diversity; Bad: \alpha-diversity
Good: filename\_without\_spaces; Bad: filename_without_spaces



\title{Awesome beamer presentation}
\author{Gustavo A. Ballen, D.Sc.(c)}
\institute{University of Sao Paulo \\ 
           Ichthyology \\



  \frametitle{My Slide}
    \item First item with good use\_of\_subscripts
    \item Second item with bad use_of_subscripts 


The code above will produce the error:

./mathSymbol.tex:71: Missing $ inserted.
<inserted text> 
l.71 \end{frame}

./myFile.tex:71: Emergency stop.
<inserted text> 
l.71 \end{frame}
./myFile.tex:71:  ==> Fatal error occurred, no output PDF file produced!
Transcript written on myFile.log.
/usr/bin/texi2dvi: pdflatex exited with bad status, quitting.

Please also note that the error message refers to the .tex file with the line number of the problem, not to the .Rnw file. Actually, this message error does not tell us that the problem is with the underline, yet it gives a clue about the math mode as the $ is inserted, and this one is used in order to open and close in-line math text (e.g., formulae).

Whenever this error happens, check whether you are using uncommon characters in your text that might be expected to play a reserved role in math mode. For instance:

% Let's assume you have exactly the same code 
% before this point as lines 1-14
  \frametitle{My Slide}
    \item First item with good use of $\alpha$-diversity in-line math mode
    \item Second item with bad \alpha-diversity in-line math mode

will produce the same error as the underscore case but this time associated to \alpha, that is a command for inputting the greek letter \alpha. Enclosing the \alpha command with the $ operators will solve the problem.