Personal Text Logger

Created by Steven Baltakatei Sandoval on 2021-03-09T06:11Z under a CC BY-SA 4.0 license and last updated on 2021-03-10T02:46Z.

Summary

I wrote a bash script a while ago for compressing and encrypting log text streams produced by other scripts. I call it bklog. A static copy (version 0.1.33) is available here but I version track it in an environmental sensor script repository here.

Background

Born from desire to record my surroundings

I wanted a program to write to disk the observations captured by environmental sensor data loggers. I wanted such a program to permit me to also encrypt captured sensor data against a public key in case I have it capturing sensitive data (e.g. my personal smartphone's location data) and may need to transfer the data in shared spaces (e.g. a cloud service provider or someone's computer.

The reason I wanted this program was to be able to do something with data produced by Raspberry Pi devices I have been tinkering with. I could see myself generating temperature, air pressure, location, and other trendable data. At first, I started making an adhoc bash script to record each of these items but I decided to make a general script that could compress and encrypt any text stream via stdin.

Encrypted with age

I had seen age recommended as a command line encryption tool whose main feature was that it had fewer configuration options than gnupg, a tool often used for encrypting files. I was attracted to age because it accepted input data in the form of stdin and its public keys took the form of 63-char strings. Also, the key generation process was all done in the command line. Here, secretkey.txt is an example of what a private key in age beta1 looks like:

baltakatei@mycomputer:~$ age-keygen
# created: 2021-03-09T06:32:52Z
# public key: age14a65znxam4k45kd4xg0lc8uu0yvlpyj376kap970mcxf066wycqq57xzqd
AGE-SECRET-KEY-1VEDW7UEKRCE6P2NUDEDXZF0RJ23QTND5PZP97SHHD5TG6Z7CL70Q09A4D5

As of 2021-03-09, age is in beta7. The current version of bklog, 0.1.33 assumes use of beta2.

Recently useful for recording uptime statistics

I haven't recently touched bklog in a while after getting it to work successfully in recording temperature and location data with a set of Raspberry Pi Zero W devices I played with during the 2020 COVID-19 restrictions. However, recently I did find a use for it when I decided to try and collect some uptime, bandwidth, and system process data from the server that runs this blog. I was pleased that past-me had decided to include usage information for the many options I made in the program. I wrote some bash scripts whose only purpose was to output a single stream of data continuously. For example, here's the simple script named uptime_continuous.sh for producing uptime data:

#!/bin/bash
# Desc: Outputs `uptime` every 15 seconds

while true; do
    uptime &
    sleep 15;
done;

Such a script produces output like this:

07:13:47 up  1:30,  1 user,  load average: 0.36, 0.38, 0.37
07:14:02 up  1:30,  1 user,  load average: 0.42, 0.40, 0.37
07:14:17 up  1:31,  1 user,  load average: 0.40, 0.39, 0.37

A separate bash script to be run by cron would pipe the output of this initial stream into bklog which would take care of the job of compressing, encrypting, and writing the data to disk. Because bklog was intended to run on small GNU/Linux systems such as Raspberry Pi devices that use SD cards (flash memory with a limited number of writes), it collects a buffer for a period of time (default: 10 minutes) before compressing, encrypting, and writing a separate file to a memory-only directory (default: /dev/shm). Then, this file is appended to a time-stamped output tar file (default: one tar file per day). I provided several option flags that allow on to adjust time zone, output file name patterns, time periods, etc. An example use of uptime_continuous.sh without encryption is here:

# Desc: Logs system statistics
# Note: Run at boot or every day at midnight UTC
# Depends: bklog, ifstat, top, uptime

~/.local/bin/uptime_continuous.sh | /usr/local/sbin/bklog -v -e \
  -r age14a65znxam4k45kd4xg0lc8uu0yvlpyj376kap970mcxf066wycqq57xzqd \
  -o "/home/admin/logs" -l "uptime" -w ".log" -c -z "UTC" -b "600" -B "day" \
  1>~/$(date +%s)..uptime_logger.log 2>&1 &

Here is usage information that can be obtained by running $ bklog --help:

USAGE:
    cmd | bklog [ options ]

OPTIONS:
    -h, --help
            Display help information.
    --version
            Display script version.
    -v, --verbose
            Display debugging info.
    -e, --encrypt
            Encrypt output.
    -r, --recipient [ string pubkey ]
            Specify recipient. May be age or ssh pubkey.
            May be specified multiple times for multiple pubkeys.
            See https://github.com/FiloSottile/age
    -o, --output [ path dir ]
            Specify output directory to save logs. This option is required
            to save log data.
    -p, --process-string [ filter command ] [ output file extension] 
            Specify how to create and name a processed version of the stdin.
            For example, if stdin is 'nmea' location data:

            -p "gpsbabel -i nmea -f - -o gpx -F - " ".gpx"

            This option would cause the stdin to 'bklog' to be piped into
            the 'gpsbabel' command, interpreted as 'nmea' data, converted
            into 'gpx' format, and then appended to the output tar file
            as a file with a '.gpx' extension.
            This option may be specified multiple times in order to output
            results of multiple different processing methods.
    -l, --label [ string ]
            Specify a label to be included in all output file names.
            Ex: 'location' if stdin is location data.
    -w, --store-raw [ file extension ]
            Specify file extension of file within output tar that contains
            raw stdin data. The default behavior is to always save raw stdin
            data in a '.stdin' file. Example usage when 'bklog' receives
            'nmea' data from 'gpspipe -r':

            -w ".nmea"

            Stdin data is saved in a '.nmea' file within the output tar.
    -W, --no-store-raw
            Do not store raw stdin in output tar.
    -c, --compress
            Compress output with gzip (before encryption if enabled).
    -z, --time-zone
            Specify time zone. (ex: "America/New_York")
    -t, --temp-dir [path dir]
            Specify parent directory for temporary working directory.
            Default: "/dev/shm"
    -R, --recipient-dir [path dir]
            Specify directory containing files whose first lines are
            to be interpreted as pubkey strings (see '-r' option). Only
            one directory may be specified.
    -b, --buffer-ttl [integer]
            Specify custom buffer period in seconds (default: 300 seconds)
    -B, --script-ttl [time element string]
            Specify custom script time-to-live in seconds (default: "day")
            Valid values: "day", "hour"

Here, bklog is located within /usr/local/sbin/. The output directory is specified to be /logs. Each file saved in the output tar archive contains "uptime" and ends with .log. The time zone is specified to be "UTC" (what my server uses and will be useful since I am programming the cron job to run at midnight every day). Files are written by 600 seconds. The verbose diagnostic output (optional; 1>) and any error messages (2>&1) is written to a time-stamped (UNIX epoch seconds) file in the home folder ($(date +%s)..uptime_logger.log). Files are encrypted against the age public key defined by the string "age14a65znxam4k45kd4xg0lc8uu0yvlpyj376kap970mcxf066wycqq57xzqd".

The result is a tar file named 20210309..mycomputer_uptime.gz.age.tar. The hostname mycomputer is included by default. The file's contents after about an hour are:

$ tar --list -f 20210309..mycomputer_uptime.gz.age.tar
20210309T090729+0000..VERSION
20210309T090726+0000--PT2M45S..mycomputer_uptime.log.gz.age
20210309T091011+0000--PT10M0S..mycomputer_uptime.log.gz.age
20210309T092011+0000--PT10M1S..mycomputer_uptime.log.gz.age
20210309T093012+0000--PT10M0S..mycomputer_uptime.log.gz.age
20210309T094012+0000--PT10M1S..mycomputer_uptime.log.gz.age
20210309T095013+0000--PT10M0S..mycomputer_uptime.log.gz.age
20210309T100013+0000--PT10M0S..mycomputer_uptime.log.gz.age
20210309T101013+0000--PT10M1S..mycomputer_uptime.log.gz.age
20210309T102014+0000--PT10M0S..mycomputer_uptime.log.gz.age

Each file's name includes an ISO 8601 time period before the ... I make use of the PT separator which indicates a time period. -- is a recommended by the standard as a replacement for / since / causes problems when used within UNIX file names.

VERSION files contain the version of bklog and age used as well as some other metadata useful for someone interpreting the archive.

Other scripts and commands can be used to automatically extract and reconstitute a continuous uptime file but the stream of uptime data produced by uptime_continuous.sh is all saved.

Some example commands that can decrypt the files are:

$ tar -xf 20210309..mycomputer_uptime.gz.age.tar  # extract files
$ for file in ./*.age; do
  age -d -i ~/secretkey.txt "$file" | gunzip > "${file%.gz.age}";
done;

Where secretkey.txt is the same file generated by age-keygen described earlier.

Conclusion

I found that an older script I wrote for for recording environmental sensor data was also useful in recording system statistics. I described how uptime data could be regularly produced by a custom script uptime_continuous.sh and piped into bklog for compression, encryption, and writing.