Directory report generator

A function for gathering the size of all directories within a drive or folder and tracking them over time.


Background

This script emerged out of a need to better track changes made to a large number of directories in a network volume used by many people at once. The Benson digital initiatives team is responsible for overseeing the use of certain UT Libraries network volumes used by all Benson staff to store digital collections materials and project files.

For basic tasks like monitoring volume capacity and total usage, Windows Explorer works well enough. Over time, however, the need to track specific areas of growth on each volume has become clear. This tool attempts to facilitate this tracking process, by creating a running log of all top-level directories in a given directory or volume, along with the size of their contents.

This tool is meant to be run at regular intervals, to allow growth to be consistently tracked over time. The tool accommodates folders that have been added or removed in between report intervals, as well as empty folders.

The script

The script asks the user to provide two paths: one to a drive or directory to be scanned, and a second to a directory where report output files will be saved.

user_input

If no reports have been saved to the second path previously, the script creates a temporary output file at that location. If the same path has been used to save reports, the script will append the new report output to the previous output.

Script part one: Scanning and spreadsheet

The script then creates two lists: the first is a list of all directories (ignoring “loose” files) at the user-supplied path, obtained using the os.listdir and os.path.isdir methods. The second is a list of all the values in the first column of the previous report. These lists may or may not be identical: directories may be added or deleted between report intervals, and the tool must still record them. A new spreadsheet file is created, containing only a header row made up of report dates.

lists

The script then calculates the size of each of the directories currently found at the chosen path, as well as the total size of all directories. If a folder was already scanned in a previous report, its existing size will be copied to the new spreadsheet file, with the current size in the last column.

To account for folders that were not in previous reports, as well as folders that have been deleted and re-added in between reports, the script compares the length of each row to the total number of reports, and inserts blank columns as needed to ensure that all values are being written in the correct column.

calculate size

The script then identifies folders that were present in a previous report but which are not currently in the directory. The contents of these rows are copied to the new spreadsheet.

The “Total” row is copied last, and the total size of all folders is added to the new column.

read_previous_report

Finally, the script removes the old version of the spreadsheet and replaces it with the new one.

replace_report

At this point, the tool has output a spreadsheet listing each folder’s current size in GiB, under the current date. For reports run over the same directory several times, each column shows data from that date. Past folders that are no longer present are included, so if they are ever added again they can be tracked too.

spreadsheet

Script part two: human-readable text file

The second half of the script produces a more human-readable text file summarizing the changes to each directory. It does this by checking each value in the last column against the previous column. Like the spreadsheet output file in part one, the text file is updated each time the report is run and the same output directory is specified.

Because the script is meant to accommodate folders being deleted and added, there are certain cases where folders cannot be compared against a previous report, or where folders from a previous report are not found in the latest report. The script attempts to account for all of these conditions and output the most concise and useful information to the text file.

This requires a large number of conditional statements. These ensure that no impossible calculations (i.e. division by zero or an empty cell) are attempted, and that each output is as readable and useful as possible.

conditionals

Folders that have been added since the previous report say “(new folder)”, and folders that have been removed since the last report say “removed” alongside their previous size. The script allows for empty folders, but whenever a folder is not empty in both reports being compared, it shows the change in volume in both absolute terms and as a percent.

textoutput

The script also converts data from GiB to TiB when the output would be more than 1024 GiB. This makes it easier to work with volumes or directories whose subfolders are a mix of sizes.

textoutput2

Final thoughts

This was my most technically complex script to date, and my first real attempt at designing something for general use, rather than for just one collection.

You can download this script from GitHub at the following link:

https://github.com/DavidABliss/directory_report_generator/blob/master/directory_report_generator.py

I encourage you to experiment with it, tweak it, test it for errors, and share your findings with me. If you use it in your work, please let me know!