Using Puppet’s “tidy” resource to clean up old files

This will be a short one, I promise!

Let’s say you’re managing backups on a host and only want to keep the last couple of days worth of snapshots. You could construct some find..xargs shenanigans to do the job, but why not manage that with Puppet? This is what tidy is for.

tidy { '/srv/backups/':
  age     => '3d',
  rmdirs  => true,
  recurse => true,
}

This snippet of Puppet will clean every directory / file out of /srv/backups that is 3 days or older. At least… that’s what it says it does. Let’s check out the contents of the backups directory after a successful Puppet run:

[root@crab-battle ~]# ls -al /srv/backups/
total 28
drwxr-xr-x. 133 root root     8192 Apr 16 00:00 .
drwxr-xr-x.   3 root root       20 May  3  2017 ..
drwxrwxr-x.   2 root postgres    6 Apr 12 21:13 backup-20180409.0000
drwxrwxr-x.   2 root postgres    6 Apr 13 21:14 backup-20180410.0000
drwxrwxr-x.   2 root postgres    6 Apr 14 21:13 backup-20180411.0000
drwxrwxr-x.   2 root postgres    6 Apr 15 21:14 backup-20180412.0000
drwxrwxr-x.   2 root postgres 4096 Apr 13 00:00 backup-20180413.0000
drwxrwxr-x.   2 root postgres 4096 Apr 14 00:00 backup-20180414.0000
drwxrwxr-x.   2 root postgres 4096 Apr 15 00:00 backup-20180415.0000
drwxrwxr-x.   2 root postgres 4096 Apr 16 00:00 backup-20180416.0000

Why is it keeping all those older (and empty) directories!? I told it to rmdirs.

Well, it turns out that tidy uses atime rather than mtime to determine whether or not it’s time to age-out a file. atime represents the “last time a file / directory was accessed” (rather than modified). When Puppet comes through to determine what needs to be cleaned up, it reads each file (bumping the atime). Check it out:

[root@engr-for-200 ~]# ls -l --time=atime /srv/backups/
total 16
drwxrwxr-x. 2 root postgres    6 Apr 16 01:43 backup-20180409.0000
drwxrwxr-x. 2 root postgres    6 Apr 15 22:43 backup-20180410.0000
drwxrwxr-x. 2 root postgres    6 Apr 15 22:43 backup-20180411.0000
drwxrwxr-x. 2 root postgres    6 Apr 15 22:43 backup-20180412.0000
drwxrwxr-x. 2 root postgres 4096 Apr 16 01:43 backup-20180413.0000
drwxrwxr-x. 2 root postgres 4096 Apr 16 00:13 backup-20180414.0000
drwxrwxr-x. 2 root postgres 4096 Apr 16 00:13 backup-20180415.0000
drwxrwxr-x. 2 root postgres 4096 Apr 16 00:00 backup-20180416.0000

Luckily, the tidy resource has a type parameter we can use to specify that it use “mtime” rather than “atime”. Updating our resource to the following will work as expected:

tidy { '/srv/backups/': 
  age => '3d', 
  rmdirs => true, 
  recurse => true, 
  type => 'mtime',
}

Easy!