Thursday, January 30, 2014

Intro to Vim

If you have 25 minutes to spare, watch Derek Wyatt's video about why coders should learn Vim.

Derek is charismatic and fun to watch, for one --- about as good as any TV show I waste my time on. Better though: he has a whole series of videos teaching the curious coder about Vim. And why not be entertained while learning how to wield this mighty weapon of choice?  

The Vim Screen
The Vim Screen
I learned Vim very slowly over the years (read: "I didn't really know how to use Vim over the years"), using it only occasionally when I had to do something quick from the command line. I remember the first time I saw its extremely plain, featureless screen and thinking, "This thing can't possibly be as good as everyone says it is." I figured that Vim must be one of those "back in the old days" kind of things... But every time I needed to edit
something at the command line real quick I'd think of Vim. I was intrigued and intimidated by all the good things people had to say about it. 

Recently I've been working at the command line quite a bit and frequently switching between languages, which usually means between editors / development environments. I wanted some consistency. I thought, "Why not try Vim again?" A few Google searches later I was playing Vim Adventures. Cost 25 bucks, but I figured, hey, this will be a fun way to pick up a new skill on my off time...and I've blown 25 bucks on worse things!

Fun it was, but after learning all it had to teach there is often something I need to do that I don't know how, or I will wonder if there is a Vim extension to do such-and-such. I'm currently learning about various Vim plug-ins (e.g., Vim-LaTeX) and can't help but to realize I still know diddly squat about the inherent powers lurking underneath the illusory screen simplicity of Vim.

So-- I'm going to give these videos a go and, hopefully, I learn a thing or two. 

Tuesday, January 7, 2014

The Craft of Homebrew-ing for Mac OS

Author: John Urban
Original publish date: circa Dec. 2012
Originally appeared: Brown University Genomics Club

Do you want to install a bunch of bioinformatics programs, such as bowtie, bedtools, and samtools, on your MacBook Pro?  Want to do it in a super-fast, super-easy way?

Homebrew is the solution. What is Homebrew? There are webpages in the 'More Help' section at the bottom of the page that may help you understand it better. For now, suffice it to say that it makes installing software packages quite straightforward. All you will have to do to get bowtie, for example, is type 'brew install bowtie' at the command line. Wait a minute or so. Now you have bowtie.

First you need to install Homebrew. Unfortunately, you cannot just do 'brew install homebrew'. Fortunately, the installation is straightforward. Read on!

Instructions:
You will need Xcode on your MacBook Pro. This can be downloaded for free from the Mac App Store.

Download Xcode before starting installation of homebrew.

General tip: It is best to stay with latest OS for MAC -- currently 10.8.2 (when writing this). You may consider upgrading before anything else. It might be that you have to in order to get the latest version of Xcode. It is cheap and to quell any fears - it will not erase anything on your computer. Mostly, things will continue unperturbed from the OS upgrade. You may find some issues that can be overcome with TimeMachine backups.



(1) Get Homebrew
Find out how to get Homebrew here.

That link basically says to enter this at the command line:
    ruby -e "$(curl -fsSkL raw.github.com/mxcl/homebrew/go)"

It will then give you a lot of prompts to continue or abort, which will scare you if you're new to all this. 

If you continue through all of the prompts, then proceed to Step (2).



(2) Doctor Homebrew
Homebrew will tell you to do this, but in case you're not the type to pay attention to the silly messages programs print to you and assuming you are paying attention to the words you are reading right now, here is what to do.

First thing to do after installing homebrew is to type at the command line:
    brew doctor

This will result in a self-diagnosis and a printout of all potential 'illnesses and ailments' that  you should 'cure' before doing anything else. I will cover the warnings I took care of here. It may not be comprehensive. Simply pasting your warnings into google is usually sufficient to find further advice though.

I received the following types of warnings. 

(a) I had to update my Xcode
If you just downloaded it you should have the latest version. For me, it was just a matter of updating it through the app store. My computer had been prompting me to update it for months. 


(b) I had to update to the latest version of Xquartz
When I upgraded my MAC OS from 10.6.8 to 10.8.x I found the new OS no longer came with X11. Xquartz was the answer to this. However, I have been using "terminal" instead of X11/Zquartz. Nonetheless, you need most updated version. If you already have an older version of Xquartz, it is just a matter of opening it. It will ask you to update automatically. Otherwise, download latest version.


(c) I had to download Xcode command line tools
These do not download automatically with Xcode as they are not necessary for most people. If you have the latest version of Xcode already (do that first), then just open Xcode and click on the following menu bar options:
                  Xcode --> preferences --> downloads --> components. 
 Then choose to install command line tools.


(d) It gave me the suggestion to arrange the order of my PATH variable a certain way. 
This can be done by going into your .bash_profile (and/or .bashrc) file in home directory and typing something like this:
             export PATH=/usr/local/bin:/usr/local/sbin:~/bin:$PATH

Alternatively, you can find the path file to modify at:
                          /etc/paths
Switch the paths inside it around accordingly.


(e) I had to update python

Verbatim warning:

Warning: "config" scripts exist outside your system or Homebrew directories.
`./configure` scripts often look for *-config scripts to determine if software packages are installed, and what additional flags to use when compiling and linking.

Having additional scripts in your path can confuse software installed via Homebrew if the config script overrides a system or Homebrew provided script of the same name. We found the following "config" scripts:

/Library/Frameworks/Python.framework/Versions/2.7/bin/python-config
/Library/Frameworks/Python.framework/Versions/2.7/bin/python2-config
/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7-config

Solution:
   I had installed python 2.7 at one point (my Mac came with 2.6) and it modified my .bash_profile file in the following way:

#Setting PATH for Python 2.7
#The orginal version is saved in .bash_profile.pysave
PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
export PATH

    In other words, it put the new python in my PATH variable. However, now homebrew has that covered. The solution was to just silence (or erase) those modifications made by python installation.

#Setting PATH for Python 2.7
#The orginal version is saved in .bash_profile.pysave
#PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
#export PATH

    I chose to use '#' to silence them in case I uninstall homebrew in the future, which I doubt I will do. It all can just be erased though. 



(f) I had to sudo chown something

Verbatim warning:

Warning: Some directories in /usr/local/share/man aren't writable.
This can happen if you "sudo make install" software that isn't managed
by Homebrew. If a brew tries to add locale information to one of these
directories, then the install will fail during the link step.
You should probably `chown` them:

    /usr/local/share/man/de
    /usr/local/share/man/de/man1

The following solution should only be employed if you are using your own machine and you are the only user. For example, I am the only user of my MacBook Pro. This means you know your computer's pw, which will be necessary.

        To be sure of your username, type into the command line:
            whoami
        It spits out your exact username.

        Next use the following commands to chown the files as homebrew wants:
            sudo chown username /usr/local/share/man/de
            ## Had to enter computer pw, then
            sudo chown username /usr/local/share/man/de/man1/

Those were all of my warnings.
After each attempt at a solution, re-run 'brew doctor' to see if the diagnosis has changed (i.e. make sure it does not include what you sought to fix).

When all ailments are fixed, 'brew doctor' will return:
    "Your system is raring to brew."

At this point, you may still need to (or just want to) type:
     brew update 
This will ensure homebrew is completely updated upon all fixes.



(3) Use Homebrew
Homebrew is easy to use and there is a lot of tutorials on the web.

Try:
brew list
    to see the current list of 'formula' you have

brew search
    to see full list of 'formula' you may want to get

brew info 'formula'
    to find out more about a given formula

brew install 'formula'
    to install a given formula in the search list

brew uninstall 'formula'
    to remove something from your list or something that did not install correctly

brew tap 'keg'
    e.g. brew tap homebrew/science
    --> I had to 'tap' this 'keg' for quite a few things I wanted.
          So you might as well just tap that keg now.
    --> See the full list of formula that may be dependent on tapping this keg here:

An idea of all the 'formulas' I used 'brew install "formula"' for:
    gfortran (needed for octave)
    octave (an open-source MatLab-like program)
    bedtools 
    samtools
    bamtools
    bowtie
    bowtie2
    vcftools 
    blast (type 'brew info blast' first. You may want the smaller 'dynamic version')
    tophat
    cufflinks
    velvet
    abyss
    blat 
    bwa
    clustal-omega
    fastx_toolkit
    phyml
    emboss
    mira
    sga
    gnuplot*

*A note on gnuplot, octave, aquaterm:
Gnuplot is needed for plots in octave. But you may also need/want aquaterm. If so, a slight nuance (read annoyance) is that you must install aquaterm before gnuplot. So if you installed gnuplot before aqua term, then trying to plot in octave will not work if the GNUTERM variable in octave is set to aqua (you can try setting GNUTERM to x11 though). There is an easy Home Brew solution. First do, 'brew uninstall gnuplot'. If aquaterm was already installed, then just do 'brew install gnuplot' and you are done. If not aqauterm has not been installed yet, then install aquaterm (follow their instructions) before doing 'brew install gnuplot' and being done. Note that for an open source language that feels like MatLab, "Julia" (a quickly developing and impressive language) may be a better option than Octave these days.


If you are unfamiliar with any of the above bioinformatics tools, either:
1- Type the name along with 'bioinformatics' (to narrow down possible interpretations) into google
    e.g. sga bioinformatics
OR try:
2 - brew info formula
    e.g. brew info sga



1 - type 'brew help'
2 - type 'man brew'




Bashing Through Bioinformatics (Part 1)

Every now and then, my brother John and I decide to do an online course together. Recently, it was a refreshingly good course on Bionformatics with Pavel Pevzner. As a dedicated masochist, I decided why not try to do this class just by shell scripting in Bash

What is Bash? That's a question I'm not entirely sure I'm fit to answer. The succinct, but potentially ambiguous answer is that Bash is a type of UNIX shell. To be incrementally more transparent, a shell is a language of commands and an interpreter of those commands -- another name for a shell is a "command-line interpreter." Your job is to know the command-line language; the shell's job is to tell the operating system what you want it to do. 

But Bash is more than a set of commands. It's also a scripting language: for loops, if statements, string operations, basic calculations... Interestingly, some of the commands (perhaps more aptly called "tools" or "utilities") within the Bash environment are scripting languages themselves, like Awk, R, or Python. For general purpose programming, languages like Python are often more powerful and flexible than Bash, and so, as a scripting language, one might write Bash off. And yet the power of these other languages can easily be exploited and accessed in the Bash environment in a variety of ways (piping, heredocs, batch processing, etc). And so the line between what is and is not Bash becomes blurry to me:  where does Bash end and other features of the Bash shell environment begin? Bash is not Python or R, yet Bash can be used to seamlessly glue Python and R commands together. 

About two years ago my brother was learning to program and would occasionally ask me a question, sometimes about the shell environment. The questions were simple ones, like about compressing a data file. At that point in time, my familiarity with the shell environment was pretty basic. I wasn't using shells for anything much more advanced than navigating my file system, making and removing folders ("directories" in shell parlance), and beginning sessions in languages like Python or IDL. I used a UNIX shell as a means to an end: open up Terminal or X11, type in "python" or "idl", and--BANG!--the shell served its purpose. I'm exaggerating a little (e.g., I used FTP and SSH during that time), but overall I didn't demand much of the command line, and I didn't know you could.

My perspective on shells and shell scripting began to dramatically change when, some time last year, John began talking about all these weird things, like Awk and Sed. I kindly ignored him so I didn't have to consider he knew more than me... Then he made some shell scripts in Bash to grab stock data from Yahoo! Finance and do some basic analyses.

My mouth hung agape, imaginary cigarette dangling. My cup of coffee conveniently positioned to fall on my lap at this exact moment, forcing me to spit whatever coffee in my mouth onto my brother's face.

You see, the purpose of the scripts---what the scripts did---didn't surprise or wow me. This was the same type of stuff I did in MatLab for my day-to-day research. But these scripts weren't written in MatLab or Java or C++ or any of the other languages I was aware of... They were written in Bash. That was new to me. It never occurred to me that I could stay right in the shell and script something useful. To me, the shell seemed clunky and primitive -- a relic of times past. This is embarrassing to admit, but it's true. 

For the first time in a long time, there was something that John and I could really talk about: programming and, hey what the hell --- finance. I studied physics. John studied bio. Programming and finance seemed like a good intersection: programming because we both liked it and finance because it was an equal-footing domain where neither of us enjoyed much expertise (no ego to get in the way).  Although we have never really discussed finance again, we became interested in exploring ways to exploit my physics and mathematics background for applications in molecular biology and genomics research, which brings us back to the opening of this discussion: that I'm a masochist who wished to script bioinformatics algorithms purely in Bash.

Before writing any further, because of the inherent ambiguity of where Bash begins and ends, here were some basic rules I followed:
  1. Basic Awk usage was allowed, despite Awk itself being considered a scripting language. "Basic" because that's about my pay grade with Awk, but more importantly because Awk has some advanced features that I wanted to limit myself from using, e.g., multi-dimensional arrays. One of my goals was to hack up more complex data structures than those native to Bash (i.e., almost anything more complex than a 1D array). 

  2. Other scripting languages that can double-up as command-line tools, like Python or R, were definitely NOT allowed. Otherwise, what would be the point of this exercise?
Strictly sticking to these rules is something that interested me because I began wondering: When exactly is the point while coding that I *need* to resort to a specialized language like Python, MatLab, or R? In various research projects I've worked on, much of my coding has involved importing some data, divyying it up, cleaning it, organizing it, maybe restructuring it, or searching it, converting characters, and writing the alterations to a file. That is,  while the primary purpose of my programming is mathematically oriented (e.g., computing power spectra of a geomagnetic time series), much of the coding surrounding the mathematical features is non-mathematical. I have found that much of these non-mathematical, textually-oriented tasks are often easier and more efficient to do at the Bash command line, or maybe in a Bash script, using tools like Awk, Sed, and grep. 

Two things I quickly learned while coding bioinformatics algorithms in Bash are: (1) Bash for-loops are slow, (2) complex data structures that one might take for granted in more general purpose programming languages are incredibly useful linguistic advancements, and this becomes relentlessly unmistakable as one's scripting demands on Bash progresses. 

These two issues make a class on bioinformatics dreadfully interesting! In fact, I'm ashamed to admit I had to hang up my masochistic towel. While at first it was fun trying to write Bash functions to mimic data structures necessary for bioinformatics (e.g., lists or multi-dimensional arrays, and so on), this quickly began eating up all my time -- I do have to work on my PhD research every now and then! Even if I continued in this vein, speed seemed to really prevent me from realizing my goal: for each programming task in the class, one's program has to run in under five minutes. Beyond simple tasks, my bash scripts were not meeting this requirement. 

It wasn't all bad though: I did learn a lot! If you love learning new languages even when it's probably not totally necessary, then I'd recommend repeating this exercise yourself. Learning Bash felt like a history lesson in computer languages. In the coming weeks, I will expound in gory detail what I learned during my short stint as a strictly-Bash scripter.