Pages

Monday, January 2, 2012

Joining text files

Sometimes I get a couple of text files with various data about the capacity of disks, memory etc. Here are two examples:

# cat root.log
/ Size,/ Used
16525754,15004443
...
# cat home.log
/home Size,/home Used
209665920,53370295
...

The two examples above are statistics about the size of the root and home file systems and how much space is consumed. The values are in kb - / has ~15GB disk space and ~14GB of space is used. The same with /home - ~200GB disk size and ~50GB are used. In most cases I just want one big comma separated file for all file systems. The easiest way to join these two files is the command paste:

# paste -d "," root.log home.log
/ Size,/ Used,/home Size,/home Used
16525754,15004443,209665920,53370295
...

With paste you can join both lists to one list file, the -d option will define the delimiter, in my case a comma, the default delimiter is a tab. Sometimes I get files with a time field like this:


# cat root.log
Time,/ Size,/ Used
10:12:54,16525754,15004443
10:22:54,16525754,15004443
10:32:54,16525754,15004443
10:42:54,16525754,15004443
...
# cat home.log
Time,/home Size,/home Used
10:22:54,209665920,53370295
10:32:54,209665920,53370295
10:42:54,209665920,53370295
10:52:54,209665920,53370295
...

The issue with both files is that they have different time stamps. The first entry of the root.log file starts at 10:12:54 but the first timestamp of the home.log file starts 10 minutes later at 10:22:54. To join these two files you can use the command join:

# join -t "," -1 1 -2 1 root.log home.log
Time,/ Size,/ Used,/home Size,/home Used
10:22:54,16525754,15004443,209665920,53370295
10:32:54,16525754,15004443,209665920,53370295
10:42:54,16525754,15004443,209665920,53370295

The -t option defines the delimiter again and the -1 and -2 option defines the fields you want to join. In both files the first field is the time stamp. The first field in the third line of the root.log file shows the timestamp 10:22:54 which is the first field of the second line from the home.log file. That means that the third line from root.log and the second line from home.log will be joined. The second line from root.log will be ignored. The first line of both files are the headers and will be joined.