Merry Christmas!

It’s Christmas time, and I decided to enter the mood by making a Christmas tree representing my current interests. Back in 2015 I constructed a Christmas tree based on the shape of an arbitrary cladogram and then added color and shapes in order to represent a Christmas tree. Below I describe how I did it long with the final output. Then I describe the process for constructing another Christmas tree, this time using a random walk stochastic process and some automatic additions to the plot of several realizations of such process. As with the cladistic Christmas tree, I present below the output along with the script for constructing it. These two graphics represent my shift from a hard-core parsimony guy to a fanatic of all quantitative things.

Cladistic, manual version

First thing is to create a taxon-only matrix in mesquite, and then to manually create the relationships and also to transpose the daughter branches for each node. The output would be something like this:

[written Wed Dec 23 14:54:24 BRST 2015 by Mesquite  version 3.04 (build 725) at ware/]

	TITLE Taxa;
		taxon_1 taxon_2 taxon_3 taxon_4 taxon_5 taxon_6 taxon_7 taxon_8 taxon_9 taxon_10 taxon_11 taxon_12 taxon_13 taxon_14 taxon_15 taxon_16 taxon_17 taxon_18 taxon_19 taxon_20


	Title Trees;
	LINK Taxa = Taxa;
[0] 		1 taxon_1,
[1] 		2 taxon_2,
[2] 		3 taxon_3,
[3] 		4 taxon_4,
[4] 		5 taxon_5,
[5] 		6 taxon_6,
[6] 		7 taxon_7,
[7] 		8 taxon_8,
[8] 		9 taxon_9,
[9] 		10 taxon_10,
[10] 		11 taxon_11,
[11] 		12 taxon_12,
[12] 		13 taxon_13,
[13] 		14 taxon_14,
[14] 		15 taxon_15,
[15] 		16 taxon_16,
[16] 		17 taxon_17,
[17] 		18 taxon_18,
[18] 		19 taxon_19,
[19] 		20 taxon_20;
	TREE tree1 = (20,(18,((1,((3,((15,((4,((5,((7,((11,((9,10),12)),8)),6)),13)),14)),16)),2)),17)),19)[% ] [% ] [%  setBetweenBits = selected ];


Then I manually edited it in figtree, mostly making the branches thicker, and fixing the fonts, resulting in the following file:

begin taxa;
	dimensions ntax=20;

begin trees;
	tree tree_1 = [&R] (taxon_20:1.0,(taxon_18:1.0,((taxon_1:1.0,((taxon_3:1.0,((taxon_15:1.0,((taxon_4:1.0,((taxon_5:1.0,((taxon_7:1.0,((taxon_11:1.0,((taxon_9:1.0,taxon_10:1.0):1.0,taxon_12:1.0):1.0):1.0,taxon_8:1.0):1.0):1.0,taxon_6:1.0):1.0):1.0,taxon_13:1.0):1.0):1.0,taxon_14:1.0):1.0):1.0,taxon_16:1.0):1.0):1.0,taxon_2:1.0):1.0):1.0,taxon_17:1.0):1.0):1.0,taxon_19:1.0);

begin figtree;
	set appearance.backgroundColorAttribute="Default";
	set appearance.backgroundColour=#ffffff;
	set appearance.branchColorAttribute="User selection";
	set appearance.branchColorGradient=false;
	set appearance.branchLineWidth=7.0;
	set appearance.branchMinLineWidth=0.0;
	set appearance.branchWidthAttribute="Fixed";
	set appearance.foregroundColour=#000000;
	set appearance.hilightingGradient=false;
	set appearance.selectionColour=#2d3680;
	set branchLabels.colorAttribute="User selection";
	set branchLabels.displayAttribute="Branch times";
	set branchLabels.fontName="Abyssinica SIL";
	set branchLabels.fontSize=8;
	set branchLabels.fontStyle=0;
	set branchLabels.isShown=false;
	set branchLabels.significantDigits=4;
	set layout.expansion=642;
	set layout.layoutType="RECTILINEAR";
	set layout.zoom=0;
	set legend.attribute=null;
	set legend.fontSize=10.0;
	set legend.isShown=false;
	set legend.significantDigits=4;
	set nodeBars.barWidth=4.0;
	set nodeBars.displayAttribute=null;
	set nodeBars.isShown=false;
	set nodeLabels.colorAttribute="User selection";
	set nodeLabels.displayAttribute="Node ages";
	set nodeLabels.fontName="Abyssinica SIL";
	set nodeLabels.fontSize=8;
	set nodeLabels.fontStyle=0;
	set nodeLabels.isShown=false;
	set nodeLabels.significantDigits=4;
	set nodeShape.colourAttribute=null;
	set nodeShape.isShown=false;
	set nodeShape.minSize=10.0;
	set nodeShape.scaleType=Width;
	set nodeShape.shapeType=Circle;
	set nodeShape.size=4.0;
	set nodeShape.sizeAttribute=null;
	set polarLayout.alignTipLabels=false;
	set polarLayout.angularRange=0;
	set polarLayout.rootAngle=0;
	set polarLayout.rootLength=100;
	set polarLayout.showRoot=true;
	set radialLayout.spread=0.0;
	set rectilinearLayout.alignTipLabels=false;
	set rectilinearLayout.curvature=0;
	set rectilinearLayout.rootLength=100;
	set scale.offsetAge=0.0;
	set scale.rootAge=1.0;
	set scale.scaleFactor=1.0;
	set scale.scaleRoot=false;
	set scaleAxis.automaticScale=true;
	set scaleAxis.fontSize=8.0;
	set scaleAxis.isShown=false;
	set scaleAxis.lineWidth=1.0;
	set scaleAxis.majorTicks=1.0;
	set scaleAxis.origin=0.0;
	set scaleAxis.reverseAxis=false;
	set scaleAxis.showGrid=true;
	set scaleBar.automaticScale=true;
	set scaleBar.fontSize=10.0;
	set scaleBar.isShown=true;
	set scaleBar.lineWidth=1.0;
	set scaleBar.scaleRange=0.0;
	set tipLabels.colorAttribute="User selection";
	set tipLabels.displayAttribute="Names";
	set tipLabels.fontName="Abyssinica SIL";
	set tipLabels.fontSize=16;
	set tipLabels.fontStyle=2;
	set tipLabels.isShown=true;
	set tipLabels.significantDigits=4;
	set trees.order=false;
	set trees.orderType="increasing";
	set trees.rooting=false;
	set trees.rootingType="User Selection";
	set trees.transform=false;
	set trees.transformType="cladogram";

We can save the file above in svg vectorial format so that then we can do a bit of edition in Inkscape I ended up with this:


The process is fairly manual, but it pleased both my contacts in social networks and myself, and it was funny to tweak things so that an evolutionary construct based on graph theory could mimic a Christmas tree. It is noteworthy that all was carried out with open source programs, so you can try it yourself without need to pay for the programs 🙂

Stochastic, automated Christmas tree

Stochastic processes are models that describe phenomena incorporating uncertainty across time, that is, where the final result and the trajectory of a sequence of events can not be determined, however, they can be characterized so that we can build expectations about them. A big deal of real-world phenomena can be described using stochastic processes, from time series analysis (including financial applications, medical imaging, and world-scale phenomena such as global warming) to modeling of evolution (e.g., Brownian motion, the birth-death process).

One of these is often called random walk, and says that for a given variable, the values it takes through time are allowed to show variation (or to show uncertainty) between certain admissible values, so that the change in the value from a step 1 to a step 2 can not be predicted exactly and can change if we repeat the realization of the events. Mathematically we then say that, for a time step t and its previous step t - 1 in variable x, the value of x at time step t would depend on the previous value t - 1 plus or minus an uncertain quantity that applies to the current step w_t:

x_{t} = x_{t - 1} + w_{t}

Oftentimes, we use the normal distribution for describing w_t centered at \mu = 0 and with variance \sigma^2 taking smaller or larger values depending on whether we want the process to vary a lot or just a little; the larger the variance, the larger the likely values that w_t can sum or rest from the previous value.

w_t = N(\mu,\sigma^2)

If we repeat a lot of times such stochastic processes we can consider a whole bunch of possible trajectories, that once plotted together, can produce the shape of a pine with tip on the origin.

Using unicode characters we can plot a point with a colored star to the plot area with the points function, and choose colors with finer control with the function RGB that combines values of the colors red, green, and blue, along with a transparency value alpha. We can use this source for picking a color visually and then look for the adequate amounts of red, green, and blue.

# set seed for repeatable results
# initialize the plot with the tip star
plot(x = 0, y = 0, xlim = c(-60, 60), ylim = c(-500, 0),
     xlab = NA, ylab = NA, pch = -9733, cex = 3,
     col = rgb(red = 255/255, green = 191/255, blue = 0/255))
# set initial values
x <- 0
y <- x
# design a data frame for saving the values from each realization
allPoints <- data.frame(x, y, stringsAsFactors = FALSE)
# iterate over realizations and numbers of steps for producing
# 1000 different trajectories, each with 500 steps
for (j in 1:1000) {
    x <- 0
    y <- x
    for (i in 1:500) {
        # x to the left of the arrow is x_t and the one to the right is x_{t-1}
        # plus or minus an amount drawn at random from a normal(0,0.7)
        x <- x + rnorm(1, sd = 0.7)
        # append the value to the series
        y <- c(y, x)
    # plot the line with the values saved before
    lines(x = y, y = -seq_along(y),
          col = rgb(green = 128/255,
                    red = 22/255,
                    blue = 20/255,
                    alpha = 0.3))
    # save the data to the data frame
    allPoints <- rbind(allPoints, data.frame(x = y, y = -seq_along(y)))
# plot the star again
points(x = 0, y = 0, pch = -9733, cex = 3,
       col = rgb(red = 255/255, green = 191/255, blue = 0/255))
# red balls
# red balls, y axis
redY <- seq(from = 0, to = 500, by = 80)[-1]
# sequence for calculating the values of each x component for a given y in red balls
for (i in redY) {
    redX <- c(seq(from = min(allPoints[allPoints$y == -i, "x"]),
                  to = max(allPoints[allPoints$y == -i, "x"]),
                  by = 20),
              max(allPoints[allPoints$y == -i, "x"]))
    points(x = redX, y = rep(x = -i, times = length(redX)),
           pch = 21,  bg = "red", cex = 2)
# yellow balls
# yellow balls, y axis
yellowY <- (seq(from = 0, to = 500, by = 80) + 40)[-length(seq(from = 0,
                                                               to = 500,
                                                               by = 80) + 40)]
# sequence for calculating the values of each x component for a given y in red balls
for (i in yellowY) {
    yellowX <- c(seq(from = min(allPoints[allPoints$y == -i, "x"]),
                     to = max(allPoints[allPoints$y == -i, "x"]),
                     by = 20), max(allPoints[allPoints$y == -i, "x"]))
    points(x = yellowX, y = rep(x = -i, times = length(yellowX)),
           pch = 21, bg = "yellow", cex = 2)


A couple of tricks were needed in order to produce the orientation of trajectories and position of red and yellow balls: explicit transposition of variables where the values of x were flipped to its negative -x and the y as the independent variable and x as the dependent one. Also, the coordinate points for the balls were automated and therefore they may look weird to the right, since I did not found a way to automatically calculate coordinates and variable distance among points so that the initial and final ones were always on the largest or smallest point. It might be possible but I just got tired of thinking about an alternative, in the end, a script that builds the whole thing seems fair enough.

That’s all for now, happy holidays to everybody!


Assorted tools for interacting with servers

Personal list of commands to remember when running analyses on servers. This post is expected to grow from time to time as new commands/command-combinations are found to be useful.

# login with ssh
ssh -l user ip # e.g., myUser 186.333.444.111

# create a virtual screen
screen -S screenName # use something identifying the analysis in screenName

# back to virtual screen called "screenName"
screen -r screenName

# list screens
screen -r

# detach virtual screen keeping the analysis
Ctrl + a + d

# detach virtual screen AND kill the job
Ctrl + c

# copy files through ssh to server
# supports the -r recursive tag for several files/directories
scp user@ip:/path/to/files path/to/local/directory # from directory in user at ip to local directory 
scp path/to/local/file user@ip:/destination/directory
scp -r path/to/local/directory user@ip:/destination/directory

# monitor processes while they are running based on the filetype and modification time
find . -name *.log -ls | grep "date-right-now" # e.g., "Apr  06" or "Apr 21", note the space in the former

Common compilation errors in Beamer and possible solutions (part 1)

This blog post is the first of a series that aims at documenting the most common errors I’ve found when preparing presentation slides in beamer AND using the knitr package for compiling both \LaTeX and R code in the same document and the explanations and solutions found for them. Most of these, of course, are based on a huge pile of stack overflow and most of them were found about a year or so ago when preparing the slides for my quals, unfortunately, I did not save any of these sources since. As for the date of preparation of this post, I will link the solutions found elsewhere, that frequently are used partially or in addition to other sources/tests. As a final note, some of the problems/solutions apply when authoring slides with R code evaluation, so you will not likely find them unless using knitr. Please note that the error messages come from Emacs+ESS, so I’m not sure if you will find them spelled exactly the same in other tools such as RStudio; I’m not even sure that you can compile them so compactly tools other than Emacs. I will try to reproduce some examples of them as code so that you can compare them to your own code. That said, it just suffices to point out that Emacs rules!

Caveat. I am testing the \LaTeX code with Emacs+ESS compiling with the keystrokes M-n r and then M-n P (uppercase) and RET in the buffer of the .Rnw file in order to compile the PDF slides. I haven’t tried other latex environment but would love to hear about alternative error messages for the same cases herein highlighted.

'Missing $ inserted'

Most likely you used a character or symbol reserved for math mode (e.g., underscore _). Please note that these characters need to be escaped with the \ or its respective reserved word, or even enclosed into a math inline “pharse”, for instance when using greek letters:

Good: $\alpha$-diversity; Bad: \alpha-diversity
Good: filename\_without\_spaces; Bad: filename_without_spaces



\title{Awesome beamer presentation}
\author{Gustavo A. Ballen, D.Sc.(c)}
\institute{University of Sao Paulo \\ 
           Ichthyology \\



  \frametitle{My Slide}
    \item First item with good use\_of\_subscripts
    \item Second item with bad use_of_subscripts 


The code above will produce the error:

./mathSymbol.tex:71: Missing $ inserted.
<inserted text> 
l.71 \end{frame}

./myFile.tex:71: Emergency stop.
<inserted text> 
l.71 \end{frame}
./myFile.tex:71:  ==> Fatal error occurred, no output PDF file produced!
Transcript written on myFile.log.
/usr/bin/texi2dvi: pdflatex exited with bad status, quitting.

Please also note that the error message refers to the .tex file with the line number of the problem, not to the .Rnw file. Actually, this message error does not tell us that the problem is with the underline, yet it gives a clue about the math mode as the $ is inserted, and this one is used in order to open and close in-line math text (e.g., formulae).

Whenever this error happens, check whether you are using uncommon characters in your text that might be expected to play a reserved role in math mode. For instance:

% Let's assume you have exactly the same code 
% before this point as lines 1-14
  \frametitle{My Slide}
    \item First item with good use of $\alpha$-diversity in-line math mode
    \item Second item with bad \alpha-diversity in-line math mode

will produce the same error as the underscore case but this time associated to \alpha, that is a command for inputting the greek letter \alpha. Enclosing the \alpha command with the $ operators will solve the problem.