Wednesday, February 24, 2016

Count lines in source files

Following up on the last article, here's how to count the total number of lines your source files:

find **/src/* -regex ".*\.\(ts\|java\|less\)" -print0 | xargs -0 wc -l | tail -1

ZSH, which I'm using supports **. I believe bash cannot do that. ** would be equivalent to */*, */*/*, etc.
What this find command does after finding this filenames is to print them to standard output, but instead of separating them with a new line character, it uses NUL or \0 (the null character).

xargs then reads this. We also tell it that the filenames are separated by NUL (with the option -0), and then we give the command to be executed on each file, the wc -l command. wc stands for word count, but actually it's also capable of counting characters and lines. When we only want lines, we can use the -l argument (l as in lucky). As we are not interested to see how many lines each of the files contains, we "select" only the last line of the output with tail, because that last line contains the total.

Here is an alternative way to do the same thing:

find **/src/* -regex ".*\.\(ts\|java\|less\)" -exec wc -l {} \; | awk '{print $1;}' | paste -s -d+ | bc

In this case we find built-in feature to execute a command on each file it finds, using with the -exec argument. It is described in the manual of find. Basically, everything until the semi-colon (which as all characters that would otherwise be interpreted by the shell instead of being sent to the command, needs to be escaped). {} is replaced by the filename (the path is relative to the current directory).


wc prints the number of lines, a space, and then the filename. We use awk to print the first "word" it finds (by default, awk supposes "words" are separated by one or more spaces). I am not going to explain awk syntax, but it is a very useful tool. In fact we could have used it to do the rest of the job.

paste is a handy tool too, yet most people have never heard of it and try to replicate its features with complicated shell scripts... In this example it is going to transform this:

1
2
3

into this:

1+2+3

bc, which I believe stands for basic calculator will be used to compute the sum.


Now, there are tools to define what are "lines of code" and how to count them with more context that only some text. But, let's just ignore blank lines:

find **/src/* -regex ".*\.\(ts\|java\|less\)" -printf "sed '/^$/d' %p | wc -l\n" | sh | awk '{print $1;}' | paste -s -d+ | bc

sed, with these arguments, truncates empty lines (the regex describes "a line that begins and ends with nothing in between").
Unfortunately find does not support pipes ( | ) in the -exec argument, so we use a little trick.
That trick is to print commands to standard output, and then to interpret them with the shell, as if these commands were part of a shell script.

Find files by extension with the terminal

Today I was trying to find source files in a development project. These files have the extensions .java, .ts and .less.
I realized I never needed to do such a search before, and here's how I did it:

find . -regex ".*\.\(ts\|java\|less\)"

Let me know if you find something better.