Replace dev_ino with mtime, sha256.

This change was prompted by my discovery that under DrivePool, two files can have the same dev, ino pair. It's understandable but the fact of the matter is I don't want to rely on inodes any more. Hashing has the downside of speed, but considering the time investment of tagging files in the first place I think it should be worthwhile.
2021-02-03 12:12:47 -08:00 · 2021-02-03 12:12:47 -08:00 · 4bf5b6d824
commit 4bf5b6d824
parent f8efc9d569
8 changed files with 358 additions and 65 deletions
--- a/README.md
+++ b/README.md
@ -1,7 +1,7 @@
 Etiquette
 =========

-I am currently running a demonstration copy of Etiquette at http://etiquette.voussoir.net where you can browse around. This is not yet permanent.
+I am currently running a read-only demonstration copy of Etiquette at http://etiquette.voussoir.net where you can browse around.

 ### What am I looking at

@ -126,6 +126,28 @@ In order to prevent the accidental creation of Etiquette databases, you must use

 </details>

+### Basic usage
+
+Let's say you store your photos in `D:\Documents\Photos`, and you want to tag the files with Etiquette. You can get started with these steps:
+
+1. Open a Command Prompt / Terminal. Decide where your Etiquette database will be stored, and `cd` to that location. `cd D:\Documents\Photos` is probably fine.
+2. Run `etiquette_cli.py init` to create the database. A folder called `_etiquette` will appear.
+3. Run `etiquette_cli.py digest . --ratelimit 1 --glob-filenames *.jpg` to add the files into the database. You can use `etiquette_cli.py digest --help` to learn about this command.
+4. Run `etiquette_flask_dev.py 5000` to start the webserver on port 5000.
+5. Open your web browser to `localhost:5000` and begin browsing.
+
+### Why does Etiquette hash files?
+
+When adding new files to the database or reloading their metadata, Etiquette will create SHA256 hashes of the files. If you are using Etiquette to organize large media files, this may take a while. I was hesitant to add hashing and incur this slowdown, but the hashes greatly improve Etiquette's ability to detect when a file has been renamed or moved, which is important when you have invested your valuable time into adding tags to them. I hope that the hash time is perceived as a worthwhile tradeoff.
+
+### Maintaining your database with Etiquette CLI
+
+I highly recommend storing batch/bash scripts of your favorite `etiquette_cli` invocations, so that you can quickly sync the database with the state of the disk in the future. Here are some suggestions for what you might like to include in such a script:
+
+- `digest`: Storing all your digest invocations in a single file makes ingesting new files very easy. For your digests, I recommend including `--ratelimit` to stop Photos from having the exact same created timestamp, and `--hash-bytes-per-second` to reduce IO load. In addition, you don't want to forget your favorite `--glob-filenames` patterns.
+- `reload-metadata`: In order for Etiquette's hash-based rename detection to work properly, the file hashes need to be up to date. If you're using Etiquette to track files which are being modified, you may want to get in the habit of reloading metadata regularly. By default, this will only reload metadata for files whose mtime and/or byte size have changed, so it should not be very expensive. You may add `--hash-bytes-per-second` to reduce IO load.
+- `purge-deleted-files` & `purge-empty-albums`: You should only do this after a `digest`, because if a file has been moved / renamed you want the digest to pick up on that before purging it as a dead filepath. The Photo purge should come first, so that an album containing entirely deleted photos will be empty when it comes time for the Album purge.
+
 ### Project stability

 You may notice that Etiquette doesn't have a version number anywhere. That's because I don't think it's ready for one. I am using this project to learn and practice, and breaking changes are very common.
--- a/etiquette/constants.py
+++ b/etiquette/constants.py
@ -41,7 +41,7 @@ ffmpeg = _load_ffmpeg()

 # Database #########################################################################################

-DATABASE_VERSION = 18
+DATABASE_VERSION = 19
 DB_VERSION_PRAGMA = f'''
 PRAGMA user_version = {DATABASE_VERSION};
 '''
@ -84,10 +84,11 @@ CREATE INDEX IF NOT EXISTS index_bookmarks_author_id on bookmarks(author_id);
 CREATE TABLE IF NOT EXISTS photos(
    id TEXT PRIMARY KEY NOT NULL,
    filepath TEXT COLLATE NOCASE,
-    dev_ino TEXT,
    basename TEXT COLLATE NOCASE,
    override_filename TEXT COLLATE NOCASE,
    extension TEXT COLLATE NOCASE,
+    mtime INT,
+    sha256 TEXT,
    width INT,
    height INT,
    ratio REAL,
--- a/etiquette/objects.py
+++ b/etiquette/objects.py
@ -4,6 +4,7 @@ but are returned by the PDB accesses.
 '''
 import abc
 import bcrypt
+import hashlib
 import os
 import PIL.Image
 import re
@ -786,6 +787,8 @@ class Photo(ObjectBase):
        self.author_id = self.normalize_author_id(db_row['author_id'])
        self.override_filename = db_row['override_filename']
        self.extension = self.real_path.extension.no_dot
+        self.mtime = db_row['mtime']
+        self.sha256 = db_row['sha256']

        if self.extension == '':
            self.dot_extension = ''
@ -1144,14 +1147,15 @@ class Photo(ObjectBase):

    @decorators.required_feature('photo.reload_metadata')
    @decorators.transaction
-    def reload_metadata(self):
+    def reload_metadata(self, hash_kwargs=None):
        '''
        Load the file's height, width, etc as appropriate for this type of file.
        '''
        self.photodb.log.info('Reloading metadata for %s.', self)

+        self.mtime = None
+        self.sha256 = None
        self.bytes = None
-        self.dev_ino = None
        self.width = None
        self.height = None
        self.area = None
@ -1160,10 +1164,8 @@ class Photo(ObjectBase):

        if self.real_path.is_file:
            stat = self.real_path.stat
+            self.mtime = stat.st_mtime
            self.bytes = stat.st_size
-            (dev, ino) = (stat.st_dev, stat.st_ino)
-            if dev and ino:
-                self.dev_ino = f'{dev},{ino}'

        if self.bytes is None:
            pass
@ -1181,15 +1183,20 @@ class Photo(ObjectBase):
            self.area = self.width * self.height
            self.ratio = round(self.width / self.height, 2)

+        hash_kwargs = hash_kwargs or {}
+        sha256 = spinal.hash_file(self.real_path, hash_class=hashlib.sha256, **hash_kwargs)
+        self.sha256 = sha256.hexdigest()
+
        data = {
            'id': self.id,
+            'mtime': self.mtime,
+            'sha256': self.sha256,
            'width': self.width,
            'height': self.height,
            'area': self.area,
            'ratio': self.ratio,
            'duration': self.duration,
            'bytes': self.bytes,
-            'dev_ino': self.dev_ino,
        }
        self.photodb.sql_update(table='photos', pairs=data, where_key='id')

--- a/etiquette/photodb.py
+++ b/etiquette/photodb.py
@ -1,4 +1,5 @@
 import bcrypt
+import hashlib
 import json
 import os
 import random
@ -439,16 +440,6 @@ class PDBPhotoMixin:
    def get_photo(self, id):
        return self.get_thing_by_id('photo', id)

-    def get_photo_by_inode(self, dev, ino):
-        dev_ino = f'{dev},{ino}'
-        query = 'SELECT * FROM photos WHERE dev_ino == ?'
-        bindings = [dev_ino]
-        photo_row = self.sql_select_one(query, bindings)
-        if photo_row is None:
-            raise exceptions.NoSuchPhoto(dev_ino)
-        photo = self.get_cached_instance('photo', photo_row)
-        return photo
-
    def get_photo_by_path(self, filepath):
        filepath = pathclass.Path(filepath)
        query = 'SELECT * FROM photos WHERE filepath == ?'
@ -484,6 +475,14 @@ class PDBPhotoMixin:
            if count <= 0:
                break

+    def get_photos_by_hash(self, sha256):
+        if not isinstance(sha256, str) or len(sha256) != 64:
+            raise TypeError(f'sha256 shoulbe the 64-character hexdigest string.')
+
+        query = 'SELECT * FROM photos WHERE sha256 == ?'
+        bindings = [sha256]
+        yield from self.get_photos_by_sql(query, bindings)
+
    def get_photos_by_sql(self, query, bindings=None):
        return self.get_things_by_sql('photo', query, bindings)

@ -496,6 +495,8 @@ class PDBPhotoMixin:
            author=None,
            do_metadata=True,
            do_thumbnail=True,
+            hash_kwargs=None,
+            known_hash=None,
            searchhidden=False,
            tags=None,
        ):
@ -503,6 +504,16 @@ class PDBPhotoMixin:
        Given a filepath, determine its attributes and create a new Photo object
        in the database. Tags may be applied now or later.

+        hash_kwargs:
+            Additional kwargs passed into spinal.hash_file. Notably, you may
+            wish to set bytes_per_second to keep system load low.
+
+        known_hash:
+            If the sha256 of the file is already known, you may provide it here
+            so it does not need to be recalculated. This is primarily intended
+            for digest_directory since it will look for hash matches first
+            before creating new photos and thus can provide the known hash.
+
        Returns the Photo object.
        '''
        # These might raise exceptions
@ -514,6 +525,11 @@ class PDBPhotoMixin:

        author_id = self.get_user_id_or_none(author)

+        if known_hash is None:
+            pass
+        elif not isinstance(known_hash, str) or len(known_hash) != 64:
+            raise TypeError(f'known_hash should be the 64-character sha256 hexdigest string.')
+
        # Ok.
        photo_id = self.generate_id(table='photos')
        self.log.info('New Photo: %s %s.', photo_id, filepath.absolute_path)
@ -529,7 +545,8 @@ class PDBPhotoMixin:
            'author_id': author_id,
            'searchhidden': searchhidden,
            # These will be filled in during the metadata stage.
-            'dev_ino': None,
+            'mtime': None,
+            'sha256': known_hash,
            'bytes': None,
            'width': None,
            'height': None,
@ -543,7 +560,8 @@ class PDBPhotoMixin:
        photo = self.get_cached_instance('photo', data)

        if do_metadata:
-            photo.reload_metadata()
+            hash_kwargs = hash_kwargs or {}
+            photo.reload_metadata(hash_kwargs=hash_kwargs)
        if do_thumbnail:
            photo.generate_thumbnail()

@ -594,6 +612,7 @@ class PDBPhotoMixin:
            has_thumbnail=None,
            is_searchhidden=False,
            mimetype=None,
+            sha256=None,
            tag_musts=None,
            tag_mays=None,
            tag_forbids=None,
@ -729,6 +748,7 @@ class PDBPhotoMixin:
        has_tags = searchhelpers.normalize_has_tags(has_tags)
        has_thumbnail = searchhelpers.normalize_has_thumbnail(has_thumbnail)
        is_searchhidden = searchhelpers.normalize_is_searchhidden(is_searchhidden)
+        sha256 = searchhelpers.normalize_sha256(sha256)
        mimetype = searchhelpers.normalize_extension(mimetype)
        within_directory = searchhelpers.normalize_within_directory(within_directory, warning_bag=warning_bag)
        yield_albums = searchhelpers.normalize_yield_albums(yield_albums)
@ -908,6 +928,9 @@ class PDBPhotoMixin:
        elif is_searchhidden is False:
            wheres.append('searchhidden == 0')

+        if sha256:
+            wheres.append(f'sha256 IN {sqlhelpers.listify(sha256)}')
+
        for column in notnulls:
            wheres.append(column + ' IS NOT NULL')
        for column in yesnulls:
@ -1497,9 +1520,12 @@ class PDBUtilMixin:
            *,
            exclude_directories=None,
            exclude_filenames=None,
+            glob_directories=None,
+            glob_filenames=None,
+            hash_kwargs=None,
            make_albums=True,
            natural_sort=True,
-            new_photo_kwargs={},
+            new_photo_kwargs=None,
            new_photo_ratelimit=None,
            recurse=True,
            yield_albums=True,
@ -1521,6 +1547,10 @@ class PDBUtilMixin:
            This list works in addition to, not instead of, the
            digest_exclude_files config value.

+        hash_kwargs:
+            Additional kwargs passed into spinal.hash_file. Notably, you may
+            wish to set bytes_per_second to keep system load low.
+
        make_albums:
            If True, every directory that is digested will be turned into an
            Album, and the directory path will be added to the Album's
@ -1582,9 +1612,14 @@ class PDBUtilMixin:
            return exclude_filenames

        def _normalize_new_photo_kwargs(new_photo_kwargs):
-            new_photo_kwargs = new_photo_kwargs.copy()
-            new_photo_kwargs.pop('commit', None)
-            new_photo_kwargs.pop('filepath', None)
+            if new_photo_kwargs is None:
+                new_photo_kwargs = {}
+            else:
+                new_photo_kwargs = new_photo_kwargs.copy()
+                new_photo_kwargs.pop('commit', None)
+                new_photo_kwargs.pop('filepath', None)
+
+            new_photo_kwargs.setdefault('hash_kwargs', hash_kwargs)
            return new_photo_kwargs

        def _normalize_new_photo_ratelimit(new_photo_ratelimit):
@ -1598,43 +1633,63 @@ class PDBUtilMixin:
                raise TypeError(new_photo_ratelimit)
            return new_photo_ratelimit

-        def check_renamed_inode(filepath):
-            stat = filepath.stat
-            (dev, ino) = (stat.st_dev, stat.st_ino)
-            if dev == 0 or ino == 0:
-                return
+        def check_renamed(filepath):
+            '''
+            We'll do our best to determine if this file is actually a rename of
+            a file that's already in the database.
+            '''
+            same_meta = self.get_photos_by_sql(
+                'SELECT * FROM photos WHERE mtime == ? AND bytes == ?',
+                [filepath.stat.st_mtime, filepath.stat.st_size]
+            )
+            same_meta = [photo for photo in same_meta if not photo.real_path.is_file]
+            if len(same_meta) == 1:
+                photo = same_meta[0]
+                self.log.debug('Found mtime+bytesize match %s.', photo)
+                return photo

-            try:
-                photo = self.get_photo_by_inode(dev, ino)
-            except exceptions.NoSuchPhoto:
-                return
+            self.log.loud('Hashing file %s to check for rename.', filepath)
+            sha256 = spinal.hash_file(
+                filepath,
+                hash_class=hashlib.sha256, **hash_kwargs,
+            ).hexdigest()

-            if photo.real_path.is_file:
-                # Don't relocate the path if this is actually a hardlink, and
-                # both paths are current.
-                return
+            same_hash = self.get_photos_by_hash(sha256)
+            same_hash = [photo for photo in same_hash if not photo.real_path.is_file]

-            if photo.bytes != stat.st_size:
-                return
+            # fwiw, I'm not checking byte size since it's a hash match.
+            if len(same_hash) > 1:
+                same_hash = [photo for photo in same_hash if photo.mtime == filepath.stat.st_mtime]
+            if len(same_hash) == 1:
+                return same_hash[0]

-            photo.relocate(filepath.absolute_path)
-            return photo
+            # Although we did not find a match, we can still benefit from our
+            # hash work by passing this as the known_hash to new_photo.
+            return {'sha256': sha256}

-        def create_or_fetch_photo(filepath, new_photo_kwargs):
+        def create_or_fetch_photo(filepath):
            '''
            Given a filepath, find the corresponding Photo object if it exists,
            otherwise create it and then return it.
            '''
            try:
                photo = self.get_photo_by_path(filepath)
+                return photo
            except exceptions.NoSuchPhoto:
-                photo = None
-            if not photo:
-                photo = check_renamed_inode(filepath)
-            if not photo:
-                photo = self.new_photo(filepath.absolute_path, **new_photo_kwargs)
-                if new_photo_ratelimit is not None:
-                    new_photo_ratelimit.limit()
+                pass
+
+            result = check_renamed(filepath)
+            if isinstance(result, objects.Photo):
+                result.relocate(filepath.absolute_path)
+                return result
+            elif isinstance(result, dict) and 'sha256' in result:
+                sha256 = result['sha256']
+            else:
+                sha256 = None
+
+            photo = self.new_photo(filepath, known_hash=sha256, **new_photo_kwargs)
+            if new_photo_ratelimit is not None:
+                new_photo_ratelimit.limit()

            return photo

@ -1672,6 +1727,7 @@ class PDBUtilMixin:
        directory = _normalize_directory(directory)
        exclude_directories = _normalize_exclude_directories(exclude_directories)
        exclude_filenames = _normalize_exclude_filenames(exclude_filenames)
+        hash_kwargs = hash_kwargs or {}
        new_photo_kwargs = _normalize_new_photo_kwargs(new_photo_kwargs)
        new_photo_ratelimit = _normalize_new_photo_ratelimit(new_photo_ratelimit)

@ -1682,6 +1738,8 @@ class PDBUtilMixin:
            directory,
            exclude_directories=exclude_directories,
            exclude_filenames=exclude_filenames,
+            glob_directories=glob_directories,
+            glob_filenames=glob_filenames,
            recurse=recurse,
            yield_style='nested',
        )
@ -1690,7 +1748,15 @@ class PDBUtilMixin:
            if natural_sort:
                files = sorted(files, key=lambda f: helpers.natural_sorter(f.basename))

-            photos = [create_or_fetch_photo(file, new_photo_kwargs=new_photo_kwargs) for file in files]
+            photos = [create_or_fetch_photo(file) for file in files]
+
+            # Note, this means that empty folders will not get an Album.
+            # At this time this behavior is intentional. Furthermore, due to
+            # the glob/exclude rules, we don't want albums being created if
+            # they don't contain any files of interest, even if they do contain
+            # other files.
+            if not photos:
+                continue

            if yield_photos:
                yield from photos
--- a/etiquette/searchhelpers.py
+++ b/etiquette/searchhelpers.py
@ -376,6 +376,31 @@ def normalize_positive_integer(number):

    return number

+def normalize_sha256(sha256, warning_bag=None):
+    if sha256 is None:
+        return None
+
+    if isinstance(sha256, (tuple, list, set)):
+        pass
+    elif isinstance(sha256, str):
+        sha256 = stringtools.comma_space_split(sha256)
+    else:
+        raise TypeError('sha256 should be the 64 character hexdigest string or a set of them.')
+
+    shas = set(sha256)
+    goodshas = set()
+    for sha in shas:
+        if isinstance(sha, str) and len(sha) == 64:
+            goodshas.add(sha)
+        else:
+            exc = TypeError(f'sha256 should be the 64-character hexdigest string.')
+            if warning_bag is not None:
+                warning_bag.add(exc)
+            else:
+                raise exc
+
+    return goodshas
+
 def normalize_tag_expression(expression):
    if not expression:
        return None
--- a/frontends/etiquette_cli.py
+++ b/frontends/etiquette_cli.py
@ -145,6 +145,7 @@ def search_by_argparse(args, yield_albums=False, yield_photos=False):
        has_tags=args.has_tags,
        has_thumbnail=args.has_thumbnail,
        is_searchhidden=args.is_searchhidden,
+        sha256=args.sha256,
        mimetype=args.mimetype,
        tag_musts=args.tag_musts,
        tag_mays=args.tag_mays,
@ -202,18 +203,34 @@ def delete_argparse(args):
        photodb.commit()

 def digest_directory_argparse(args):
-    directory = pathclass.Path(args.directory)
+    directories = pipeable.input(args.directory, strip=True, skip_blank=True)
+    directories = [pathclass.Path(d) for d in directories]
+    for directory in directories:
+        directory.assert_is_directory()
+
    photodb = find_photodb()
-    digest = photodb.digest_directory(
-        directory,
-        make_albums=args.make_albums,
-        recurse=args.recurse,
-        new_photo_ratelimit=args.ratelimit,
-        yield_albums=True,
-        yield_photos=True,
-    )
-    for result in digest:
-        print(result)
+    need_commit = False
+
+    for directory in directories:
+        digest = photodb.digest_directory(
+            directory,
+            exclude_directories=args.exclude_directories,
+            exclude_filenames=args.exclude_filenames,
+            glob_directories=args.glob_directories,
+            glob_filenames=args.glob_filenames,
+            hash_kwargs={'bytes_per_second': args.hash_bytes_per_second},
+            make_albums=args.make_albums,
+            new_photo_ratelimit=args.ratelimit,
+            recurse=args.recurse,
+            yield_albums=True,
+            yield_photos=True,
+        )
+        for result in digest:
+            # print(result)
+            need_commit = True
+
+    if not need_commit:
+        return

    if args.autoyes or interactive.getpermission('Commit?'):
        photodb.commit()
@ -309,6 +326,45 @@ def purge_empty_albums_argparse(args):
    if args.autoyes or interactive.getpermission('Commit?'):
        photodb.commit()

+def reload_metadata_argparse(args):
+    photodb = find_photodb()
+
+    if args.photo_id_args or args.photo_search_args:
+        photos = get_photos_from_args(args)
+    else:
+        photos = search_in_cwd(yield_photos=True, yield_albums=False)
+
+    hash_kwargs = {
+        'bytes_per_second': args.hash_bytes_per_second,
+        'callback_progress': spinal.callback_progress_v1,
+    }
+
+    need_commit = False
+    try:
+        for photo in photos:
+            if not photo.real_path.is_file:
+                continue
+
+            need_reload = (
+                args.force or
+                photo.mtime != photo.real_path.stat.st_mtime or
+                photo.bytes != photo.real_path.stat.st_size
+            )
+
+            if not need_reload:
+                continue
+            photo.reload_metadata(hash_kwargs=hash_kwargs)
+            need_commit = True
+            photodb.commit()
+    except KeyboardInterrupt:
+        pass
+
+    if not need_commit:
+        return
+
+    if args.autoyes or interactive.getpermission('Commit?'):
+        photodb.commit()
+
 def relocate_argparse(args):
    photodb = find_photodb()

@ -420,6 +476,8 @@ Etiquette CLI

 {purge_empty_albums}

+{reload_metadata}
+
 {relocate}

 {search}
@ -480,9 +538,27 @@ digest:
    > etiquette_cli.py digest directory <flags>

    flags:
+    --exclude_directories A B C:
+        Any directories matching any pattern of A, B, C... will be skipped.
+        These patterns may be absolute paths like 'D:\temp', plain names like
+        'thumbnails' or glob patterns like 'build_*'.
+
+    --exclude_filenames A B C:
+        Any filenames matching any pattern of A, B, C... will be skipped.
+        These patterns may be absolute paths like 'D:\somewhere\config.json',
+        plain names like 'thumbs.db' or glob patterns like '*.temp'.
+
+    --glob_directories A B C:
+        Only directories matching any pattern of A, B, C... will be digested.
+        These patterns may be plain names or glob patterns like '2021*'
+
+    --glob_filenames A B C:
+        Only filenames matching any pattern of A, B, C... will be digested.
+        These patterns may be plain names or glob patterns like '*.jpg'
+
    --no_albums:
-        Do not create any albums the directories. By default, albums are created
-        and nested to match the directory structure.
+        Do not create any albums. By default, albums are created and nested to
+        match the directory structure.

    --ratelimit X:
        Limit the ingest of new Photos to only one per X seconds. This can be
@ -496,6 +572,7 @@ digest:
    Examples:
    > etiquette_cli.py digest media --ratelimit 1
    > etiquette_cli.py digest photos --no-recurse --no-albums --ratelimit 0.25
+    > etiquette_cli.py digest . --glob-filenames *.jpg --exclude-filenames thumb*
 '''.strip(),

 easybake='''
@ -546,6 +623,8 @@ purge_deleted_files:
    Delete any Photo objects whose file no longer exists on disk.

    > etiquette_cli.py purge_deleted_files
+    > etiquette_cli.py purge_deleted_files id id id
+    > etiquette_cli.py purge_deleted_files searchargs
 '''.strip(),

 purge_empty_albums='''
@ -555,7 +634,34 @@ purge_empty_albums:
    Consider running purge_deleted_files first, so that albums containing
    deleted files will get cleared out and then caught by this function.

+    With no args, all albums will be checked.
+    Or you can pass specific album ids. (searchargs is not available since
+    albums only appear in search results when a matching photo is found, and
+    we're looking for albums with no photos!)
+
    > etiquette_cli.py purge_empty_albums
+    > etiquette_cli.py purge_empty_albums id id id
+'''.strip(),
+
+reload_metadata='''
+reload_metadata:
+    Reload photos' metadata by reading the files from disk.
+
+    With no args, all files under the cwd will be reloaded.
+    Or, you can pass specific photo ids or searchargs.
+
+    > etiquette_cli.py reload_metadata
+    > etiquette_cli.py reload_metadata id id id
+    > etiquette_cli.py reload_metadata searchargs
+
+    flags:
+    --force:
+        By default, we wil skip any files that have the same mtime and byte
+        size as before. You can pass --force to always reload.
+
+    --hash_bytes_per_second:
+        A string like "10mb" to limit the speed of file hashing for the purpose
+        of reducing system load.
 '''.strip(),

 relocate='''
@ -619,6 +725,9 @@ search:
    --mimetype A,B,C:
        Photo with any mimetype of A, B, C...

+    --sha256 A,B,C:
+        Photo with any sha256 of A, B, C...
+
    --tag_musts A,B,C:
        Photo must have all tags A and B and C...

@ -728,9 +837,14 @@ def main(argv):

    p_digest = subparsers.add_parser('digest', aliases=['digest_directory', 'digest-directory'])
    p_digest.add_argument('directory')
+    p_digest.add_argument('--exclude_directories', '--exclude-directories', nargs='+', default=None)
+    p_digest.add_argument('--exclude_filenames', '--exclude-filenames', nargs='+', default=None)
+    p_digest.add_argument('--glob_directories', '--glob-directories', nargs='+', default=None)
+    p_digest.add_argument('--glob_filenames', '--glob-filenames', nargs='+', default=None)
    p_digest.add_argument('--no_albums', '--no-albums', dest='make_albums', action='store_false', default=True)
    p_digest.add_argument('--ratelimit', dest='ratelimit', type=float, default=0.2)
    p_digest.add_argument('--no_recurse', '--no-recurse', dest='recurse', action='store_false', default=True)
+    p_digest.add_argument('--hash_bytes_per_second', '--hash-bytes-per-second', default=None)
    p_digest.add_argument('--yes', dest='autoyes', action='store_true')
    p_digest.set_defaults(func=digest_directory_argparse)

@ -756,6 +870,12 @@ def main(argv):
    p_purge_empty_albums.add_argument('--yes', dest='autoyes', action='store_true')
    p_purge_empty_albums.set_defaults(func=purge_empty_albums_argparse)

+    p_reload_metadata = subparsers.add_parser('reload_metadata', aliases=['reload-metadata'])
+    p_reload_metadata.add_argument('--hash_bytes_per_second', '--hash-bytes-per-second', default=None)
+    p_reload_metadata.add_argument('--force', action='store_true')
+    p_reload_metadata.add_argument('--yes', dest='autoyes', action='store_true')
+    p_reload_metadata.set_defaults(func=reload_metadata_argparse)
+
    p_relocate = subparsers.add_parser('relocate')
    p_relocate.add_argument('photo_id')
    p_relocate.add_argument('filepath')
@ -777,6 +897,7 @@ def main(argv):
    p_search.add_argument('--has_tags', '--has-tags', dest='has_tags', default=None)
    p_search.add_argument('--has_thumbnail', '--has-thumbnail', dest='has_thumbnail', default=None)
    p_search.add_argument('--is_searchhidden', '--is-searchhidden', dest='is_searchhidden', default=False)
+    p_search.add_argument('--sha256', default=None)
    p_search.add_argument('--mimetype', dest='mimetype', default=None)
    p_search.add_argument('--tag_musts', '--tag-musts', dest='tag_musts', default=None)
    p_search.add_argument('--tag_mays', '--tag-mays', dest='tag_mays', default=None)
--- a/frontends/etiquette_flask/etiquette_flask_dev.py
+++ b/frontends/etiquette_flask/etiquette_flask_dev.py
@ -64,7 +64,7 @@ def etiquette_flask_launch(
        pipeable.stderr('Try `etiquette_cli.py init` to create the database.')
        return 1

-    message = f'Starting server on port {port}, pid={os.getpid()}'
+    message = f'Starting server on port {port}, pid={os.getpid()}.'
    if use_https:
        message += ' (https)'
    print(message)
--- a/utilities/database_upgrader.py
+++ b/utilities/database_upgrader.py
@ -602,6 +602,57 @@ def upgrade_17_to_18(photodb):

    m.go()

+def upgrade_18_to_19(photodb):
+    m = Migrator(photodb)
+
+    m.tables['photos']['create'] = '''
+    CREATE TABLE photos(
+        id TEXT PRIMARY KEY NOT NULL,
+        filepath TEXT COLLATE NOCASE,
+        basename TEXT COLLATE NOCASE,
+        override_filename TEXT COLLATE NOCASE,
+        extension TEXT COLLATE NOCASE,
+        mtime INT,
+        sha256 TEXT,
+        width INT,
+        height INT,
+        ratio REAL,
+        area INT,
+        duration INT,
+        bytes INT,
+        created INT,
+        thumbnail TEXT,
+        tagged_at INT,
+        author_id TEXT,
+        searchhidden INT,
+        FOREIGN KEY(author_id) REFERENCES users(id)
+    );
+    '''
+    m.tables['photos']['transfer'] = '''
+    INSERT INTO photos SELECT
+        id,
+        filepath,
+        basename,
+        override_filename,
+        extension,
+        NULL,
+        NULL,
+        width,
+        height,
+        ratio,
+        area,
+        duration,
+        bytes,
+        created,
+        thumbnail,
+        tagged_at,
+        author_id,
+        searchhidden
+    FROM photos_old;
+    '''
+
+    m.go()
+
 def upgrade_all(data_directory):
    '''
    Given the directory containing a phototagger database, apply all of the