Replace dev_ino with mtime, sha256.

This change was prompted by my discovery that under DrivePool, two
files can have the same dev, ino pair. It's understandable but the fact
of the matter is I don't want to rely on inodes any more.
Hashing has the downside of speed, but considering the time investment
of tagging files in the first place I think it should be worthwhile.
This commit is contained in:
voussoir 2021-02-03 12:12:47 -08:00
parent f8efc9d569
commit 4bf5b6d824
No known key found for this signature in database
GPG key ID: 5F7554F8C26DACCB
8 changed files with 358 additions and 65 deletions

View file

@ -1,7 +1,7 @@
Etiquette Etiquette
========= =========
I am currently running a demonstration copy of Etiquette at http://etiquette.voussoir.net where you can browse around. This is not yet permanent. I am currently running a read-only demonstration copy of Etiquette at http://etiquette.voussoir.net where you can browse around.
### What am I looking at ### What am I looking at
@ -126,6 +126,28 @@ In order to prevent the accidental creation of Etiquette databases, you must use
</details> </details>
### Basic usage
Let's say you store your photos in `D:\Documents\Photos`, and you want to tag the files with Etiquette. You can get started with these steps:
1. Open a Command Prompt / Terminal. Decide where your Etiquette database will be stored, and `cd` to that location. `cd D:\Documents\Photos` is probably fine.
2. Run `etiquette_cli.py init` to create the database. A folder called `_etiquette` will appear.
3. Run `etiquette_cli.py digest . --ratelimit 1 --glob-filenames *.jpg` to add the files into the database. You can use `etiquette_cli.py digest --help` to learn about this command.
4. Run `etiquette_flask_dev.py 5000` to start the webserver on port 5000.
5. Open your web browser to `localhost:5000` and begin browsing.
### Why does Etiquette hash files?
When adding new files to the database or reloading their metadata, Etiquette will create SHA256 hashes of the files. If you are using Etiquette to organize large media files, this may take a while. I was hesitant to add hashing and incur this slowdown, but the hashes greatly improve Etiquette's ability to detect when a file has been renamed or moved, which is important when you have invested your valuable time into adding tags to them. I hope that the hash time is perceived as a worthwhile tradeoff.
### Maintaining your database with Etiquette CLI
I highly recommend storing batch/bash scripts of your favorite `etiquette_cli` invocations, so that you can quickly sync the database with the state of the disk in the future. Here are some suggestions for what you might like to include in such a script:
- `digest`: Storing all your digest invocations in a single file makes ingesting new files very easy. For your digests, I recommend including `--ratelimit` to stop Photos from having the exact same created timestamp, and `--hash-bytes-per-second` to reduce IO load. In addition, you don't want to forget your favorite `--glob-filenames` patterns.
- `reload-metadata`: In order for Etiquette's hash-based rename detection to work properly, the file hashes need to be up to date. If you're using Etiquette to track files which are being modified, you may want to get in the habit of reloading metadata regularly. By default, this will only reload metadata for files whose mtime and/or byte size have changed, so it should not be very expensive. You may add `--hash-bytes-per-second` to reduce IO load.
- `purge-deleted-files` & `purge-empty-albums`: You should only do this after a `digest`, because if a file has been moved / renamed you want the digest to pick up on that before purging it as a dead filepath. The Photo purge should come first, so that an album containing entirely deleted photos will be empty when it comes time for the Album purge.
### Project stability ### Project stability
You may notice that Etiquette doesn't have a version number anywhere. That's because I don't think it's ready for one. I am using this project to learn and practice, and breaking changes are very common. You may notice that Etiquette doesn't have a version number anywhere. That's because I don't think it's ready for one. I am using this project to learn and practice, and breaking changes are very common.

View file

@ -41,7 +41,7 @@ ffmpeg = _load_ffmpeg()
# Database ######################################################################################### # Database #########################################################################################
DATABASE_VERSION = 18 DATABASE_VERSION = 19
DB_VERSION_PRAGMA = f''' DB_VERSION_PRAGMA = f'''
PRAGMA user_version = {DATABASE_VERSION}; PRAGMA user_version = {DATABASE_VERSION};
''' '''
@ -84,10 +84,11 @@ CREATE INDEX IF NOT EXISTS index_bookmarks_author_id on bookmarks(author_id);
CREATE TABLE IF NOT EXISTS photos( CREATE TABLE IF NOT EXISTS photos(
id TEXT PRIMARY KEY NOT NULL, id TEXT PRIMARY KEY NOT NULL,
filepath TEXT COLLATE NOCASE, filepath TEXT COLLATE NOCASE,
dev_ino TEXT,
basename TEXT COLLATE NOCASE, basename TEXT COLLATE NOCASE,
override_filename TEXT COLLATE NOCASE, override_filename TEXT COLLATE NOCASE,
extension TEXT COLLATE NOCASE, extension TEXT COLLATE NOCASE,
mtime INT,
sha256 TEXT,
width INT, width INT,
height INT, height INT,
ratio REAL, ratio REAL,

View file

@ -4,6 +4,7 @@ but are returned by the PDB accesses.
''' '''
import abc import abc
import bcrypt import bcrypt
import hashlib
import os import os
import PIL.Image import PIL.Image
import re import re
@ -786,6 +787,8 @@ class Photo(ObjectBase):
self.author_id = self.normalize_author_id(db_row['author_id']) self.author_id = self.normalize_author_id(db_row['author_id'])
self.override_filename = db_row['override_filename'] self.override_filename = db_row['override_filename']
self.extension = self.real_path.extension.no_dot self.extension = self.real_path.extension.no_dot
self.mtime = db_row['mtime']
self.sha256 = db_row['sha256']
if self.extension == '': if self.extension == '':
self.dot_extension = '' self.dot_extension = ''
@ -1144,14 +1147,15 @@ class Photo(ObjectBase):
@decorators.required_feature('photo.reload_metadata') @decorators.required_feature('photo.reload_metadata')
@decorators.transaction @decorators.transaction
def reload_metadata(self): def reload_metadata(self, hash_kwargs=None):
''' '''
Load the file's height, width, etc as appropriate for this type of file. Load the file's height, width, etc as appropriate for this type of file.
''' '''
self.photodb.log.info('Reloading metadata for %s.', self) self.photodb.log.info('Reloading metadata for %s.', self)
self.mtime = None
self.sha256 = None
self.bytes = None self.bytes = None
self.dev_ino = None
self.width = None self.width = None
self.height = None self.height = None
self.area = None self.area = None
@ -1160,10 +1164,8 @@ class Photo(ObjectBase):
if self.real_path.is_file: if self.real_path.is_file:
stat = self.real_path.stat stat = self.real_path.stat
self.mtime = stat.st_mtime
self.bytes = stat.st_size self.bytes = stat.st_size
(dev, ino) = (stat.st_dev, stat.st_ino)
if dev and ino:
self.dev_ino = f'{dev},{ino}'
if self.bytes is None: if self.bytes is None:
pass pass
@ -1181,15 +1183,20 @@ class Photo(ObjectBase):
self.area = self.width * self.height self.area = self.width * self.height
self.ratio = round(self.width / self.height, 2) self.ratio = round(self.width / self.height, 2)
hash_kwargs = hash_kwargs or {}
sha256 = spinal.hash_file(self.real_path, hash_class=hashlib.sha256, **hash_kwargs)
self.sha256 = sha256.hexdigest()
data = { data = {
'id': self.id, 'id': self.id,
'mtime': self.mtime,
'sha256': self.sha256,
'width': self.width, 'width': self.width,
'height': self.height, 'height': self.height,
'area': self.area, 'area': self.area,
'ratio': self.ratio, 'ratio': self.ratio,
'duration': self.duration, 'duration': self.duration,
'bytes': self.bytes, 'bytes': self.bytes,
'dev_ino': self.dev_ino,
} }
self.photodb.sql_update(table='photos', pairs=data, where_key='id') self.photodb.sql_update(table='photos', pairs=data, where_key='id')

View file

@ -1,4 +1,5 @@
import bcrypt import bcrypt
import hashlib
import json import json
import os import os
import random import random
@ -439,16 +440,6 @@ class PDBPhotoMixin:
def get_photo(self, id): def get_photo(self, id):
return self.get_thing_by_id('photo', id) return self.get_thing_by_id('photo', id)
def get_photo_by_inode(self, dev, ino):
dev_ino = f'{dev},{ino}'
query = 'SELECT * FROM photos WHERE dev_ino == ?'
bindings = [dev_ino]
photo_row = self.sql_select_one(query, bindings)
if photo_row is None:
raise exceptions.NoSuchPhoto(dev_ino)
photo = self.get_cached_instance('photo', photo_row)
return photo
def get_photo_by_path(self, filepath): def get_photo_by_path(self, filepath):
filepath = pathclass.Path(filepath) filepath = pathclass.Path(filepath)
query = 'SELECT * FROM photos WHERE filepath == ?' query = 'SELECT * FROM photos WHERE filepath == ?'
@ -484,6 +475,14 @@ class PDBPhotoMixin:
if count <= 0: if count <= 0:
break break
def get_photos_by_hash(self, sha256):
if not isinstance(sha256, str) or len(sha256) != 64:
raise TypeError(f'sha256 shoulbe the 64-character hexdigest string.')
query = 'SELECT * FROM photos WHERE sha256 == ?'
bindings = [sha256]
yield from self.get_photos_by_sql(query, bindings)
def get_photos_by_sql(self, query, bindings=None): def get_photos_by_sql(self, query, bindings=None):
return self.get_things_by_sql('photo', query, bindings) return self.get_things_by_sql('photo', query, bindings)
@ -496,6 +495,8 @@ class PDBPhotoMixin:
author=None, author=None,
do_metadata=True, do_metadata=True,
do_thumbnail=True, do_thumbnail=True,
hash_kwargs=None,
known_hash=None,
searchhidden=False, searchhidden=False,
tags=None, tags=None,
): ):
@ -503,6 +504,16 @@ class PDBPhotoMixin:
Given a filepath, determine its attributes and create a new Photo object Given a filepath, determine its attributes and create a new Photo object
in the database. Tags may be applied now or later. in the database. Tags may be applied now or later.
hash_kwargs:
Additional kwargs passed into spinal.hash_file. Notably, you may
wish to set bytes_per_second to keep system load low.
known_hash:
If the sha256 of the file is already known, you may provide it here
so it does not need to be recalculated. This is primarily intended
for digest_directory since it will look for hash matches first
before creating new photos and thus can provide the known hash.
Returns the Photo object. Returns the Photo object.
''' '''
# These might raise exceptions # These might raise exceptions
@ -514,6 +525,11 @@ class PDBPhotoMixin:
author_id = self.get_user_id_or_none(author) author_id = self.get_user_id_or_none(author)
if known_hash is None:
pass
elif not isinstance(known_hash, str) or len(known_hash) != 64:
raise TypeError(f'known_hash should be the 64-character sha256 hexdigest string.')
# Ok. # Ok.
photo_id = self.generate_id(table='photos') photo_id = self.generate_id(table='photos')
self.log.info('New Photo: %s %s.', photo_id, filepath.absolute_path) self.log.info('New Photo: %s %s.', photo_id, filepath.absolute_path)
@ -529,7 +545,8 @@ class PDBPhotoMixin:
'author_id': author_id, 'author_id': author_id,
'searchhidden': searchhidden, 'searchhidden': searchhidden,
# These will be filled in during the metadata stage. # These will be filled in during the metadata stage.
'dev_ino': None, 'mtime': None,
'sha256': known_hash,
'bytes': None, 'bytes': None,
'width': None, 'width': None,
'height': None, 'height': None,
@ -543,7 +560,8 @@ class PDBPhotoMixin:
photo = self.get_cached_instance('photo', data) photo = self.get_cached_instance('photo', data)
if do_metadata: if do_metadata:
photo.reload_metadata() hash_kwargs = hash_kwargs or {}
photo.reload_metadata(hash_kwargs=hash_kwargs)
if do_thumbnail: if do_thumbnail:
photo.generate_thumbnail() photo.generate_thumbnail()
@ -594,6 +612,7 @@ class PDBPhotoMixin:
has_thumbnail=None, has_thumbnail=None,
is_searchhidden=False, is_searchhidden=False,
mimetype=None, mimetype=None,
sha256=None,
tag_musts=None, tag_musts=None,
tag_mays=None, tag_mays=None,
tag_forbids=None, tag_forbids=None,
@ -729,6 +748,7 @@ class PDBPhotoMixin:
has_tags = searchhelpers.normalize_has_tags(has_tags) has_tags = searchhelpers.normalize_has_tags(has_tags)
has_thumbnail = searchhelpers.normalize_has_thumbnail(has_thumbnail) has_thumbnail = searchhelpers.normalize_has_thumbnail(has_thumbnail)
is_searchhidden = searchhelpers.normalize_is_searchhidden(is_searchhidden) is_searchhidden = searchhelpers.normalize_is_searchhidden(is_searchhidden)
sha256 = searchhelpers.normalize_sha256(sha256)
mimetype = searchhelpers.normalize_extension(mimetype) mimetype = searchhelpers.normalize_extension(mimetype)
within_directory = searchhelpers.normalize_within_directory(within_directory, warning_bag=warning_bag) within_directory = searchhelpers.normalize_within_directory(within_directory, warning_bag=warning_bag)
yield_albums = searchhelpers.normalize_yield_albums(yield_albums) yield_albums = searchhelpers.normalize_yield_albums(yield_albums)
@ -908,6 +928,9 @@ class PDBPhotoMixin:
elif is_searchhidden is False: elif is_searchhidden is False:
wheres.append('searchhidden == 0') wheres.append('searchhidden == 0')
if sha256:
wheres.append(f'sha256 IN {sqlhelpers.listify(sha256)}')
for column in notnulls: for column in notnulls:
wheres.append(column + ' IS NOT NULL') wheres.append(column + ' IS NOT NULL')
for column in yesnulls: for column in yesnulls:
@ -1497,9 +1520,12 @@ class PDBUtilMixin:
*, *,
exclude_directories=None, exclude_directories=None,
exclude_filenames=None, exclude_filenames=None,
glob_directories=None,
glob_filenames=None,
hash_kwargs=None,
make_albums=True, make_albums=True,
natural_sort=True, natural_sort=True,
new_photo_kwargs={}, new_photo_kwargs=None,
new_photo_ratelimit=None, new_photo_ratelimit=None,
recurse=True, recurse=True,
yield_albums=True, yield_albums=True,
@ -1521,6 +1547,10 @@ class PDBUtilMixin:
This list works in addition to, not instead of, the This list works in addition to, not instead of, the
digest_exclude_files config value. digest_exclude_files config value.
hash_kwargs:
Additional kwargs passed into spinal.hash_file. Notably, you may
wish to set bytes_per_second to keep system load low.
make_albums: make_albums:
If True, every directory that is digested will be turned into an If True, every directory that is digested will be turned into an
Album, and the directory path will be added to the Album's Album, and the directory path will be added to the Album's
@ -1582,9 +1612,14 @@ class PDBUtilMixin:
return exclude_filenames return exclude_filenames
def _normalize_new_photo_kwargs(new_photo_kwargs): def _normalize_new_photo_kwargs(new_photo_kwargs):
if new_photo_kwargs is None:
new_photo_kwargs = {}
else:
new_photo_kwargs = new_photo_kwargs.copy() new_photo_kwargs = new_photo_kwargs.copy()
new_photo_kwargs.pop('commit', None) new_photo_kwargs.pop('commit', None)
new_photo_kwargs.pop('filepath', None) new_photo_kwargs.pop('filepath', None)
new_photo_kwargs.setdefault('hash_kwargs', hash_kwargs)
return new_photo_kwargs return new_photo_kwargs
def _normalize_new_photo_ratelimit(new_photo_ratelimit): def _normalize_new_photo_ratelimit(new_photo_ratelimit):
@ -1598,41 +1633,61 @@ class PDBUtilMixin:
raise TypeError(new_photo_ratelimit) raise TypeError(new_photo_ratelimit)
return new_photo_ratelimit return new_photo_ratelimit
def check_renamed_inode(filepath): def check_renamed(filepath):
stat = filepath.stat '''
(dev, ino) = (stat.st_dev, stat.st_ino) We'll do our best to determine if this file is actually a rename of
if dev == 0 or ino == 0: a file that's already in the database.
return '''
same_meta = self.get_photos_by_sql(
try: 'SELECT * FROM photos WHERE mtime == ? AND bytes == ?',
photo = self.get_photo_by_inode(dev, ino) [filepath.stat.st_mtime, filepath.stat.st_size]
except exceptions.NoSuchPhoto: )
return same_meta = [photo for photo in same_meta if not photo.real_path.is_file]
if len(same_meta) == 1:
if photo.real_path.is_file: photo = same_meta[0]
# Don't relocate the path if this is actually a hardlink, and self.log.debug('Found mtime+bytesize match %s.', photo)
# both paths are current.
return
if photo.bytes != stat.st_size:
return
photo.relocate(filepath.absolute_path)
return photo return photo
def create_or_fetch_photo(filepath, new_photo_kwargs): self.log.loud('Hashing file %s to check for rename.', filepath)
sha256 = spinal.hash_file(
filepath,
hash_class=hashlib.sha256, **hash_kwargs,
).hexdigest()
same_hash = self.get_photos_by_hash(sha256)
same_hash = [photo for photo in same_hash if not photo.real_path.is_file]
# fwiw, I'm not checking byte size since it's a hash match.
if len(same_hash) > 1:
same_hash = [photo for photo in same_hash if photo.mtime == filepath.stat.st_mtime]
if len(same_hash) == 1:
return same_hash[0]
# Although we did not find a match, we can still benefit from our
# hash work by passing this as the known_hash to new_photo.
return {'sha256': sha256}
def create_or_fetch_photo(filepath):
''' '''
Given a filepath, find the corresponding Photo object if it exists, Given a filepath, find the corresponding Photo object if it exists,
otherwise create it and then return it. otherwise create it and then return it.
''' '''
try: try:
photo = self.get_photo_by_path(filepath) photo = self.get_photo_by_path(filepath)
return photo
except exceptions.NoSuchPhoto: except exceptions.NoSuchPhoto:
photo = None pass
if not photo:
photo = check_renamed_inode(filepath) result = check_renamed(filepath)
if not photo: if isinstance(result, objects.Photo):
photo = self.new_photo(filepath.absolute_path, **new_photo_kwargs) result.relocate(filepath.absolute_path)
return result
elif isinstance(result, dict) and 'sha256' in result:
sha256 = result['sha256']
else:
sha256 = None
photo = self.new_photo(filepath, known_hash=sha256, **new_photo_kwargs)
if new_photo_ratelimit is not None: if new_photo_ratelimit is not None:
new_photo_ratelimit.limit() new_photo_ratelimit.limit()
@ -1672,6 +1727,7 @@ class PDBUtilMixin:
directory = _normalize_directory(directory) directory = _normalize_directory(directory)
exclude_directories = _normalize_exclude_directories(exclude_directories) exclude_directories = _normalize_exclude_directories(exclude_directories)
exclude_filenames = _normalize_exclude_filenames(exclude_filenames) exclude_filenames = _normalize_exclude_filenames(exclude_filenames)
hash_kwargs = hash_kwargs or {}
new_photo_kwargs = _normalize_new_photo_kwargs(new_photo_kwargs) new_photo_kwargs = _normalize_new_photo_kwargs(new_photo_kwargs)
new_photo_ratelimit = _normalize_new_photo_ratelimit(new_photo_ratelimit) new_photo_ratelimit = _normalize_new_photo_ratelimit(new_photo_ratelimit)
@ -1682,6 +1738,8 @@ class PDBUtilMixin:
directory, directory,
exclude_directories=exclude_directories, exclude_directories=exclude_directories,
exclude_filenames=exclude_filenames, exclude_filenames=exclude_filenames,
glob_directories=glob_directories,
glob_filenames=glob_filenames,
recurse=recurse, recurse=recurse,
yield_style='nested', yield_style='nested',
) )
@ -1690,7 +1748,15 @@ class PDBUtilMixin:
if natural_sort: if natural_sort:
files = sorted(files, key=lambda f: helpers.natural_sorter(f.basename)) files = sorted(files, key=lambda f: helpers.natural_sorter(f.basename))
photos = [create_or_fetch_photo(file, new_photo_kwargs=new_photo_kwargs) for file in files] photos = [create_or_fetch_photo(file) for file in files]
# Note, this means that empty folders will not get an Album.
# At this time this behavior is intentional. Furthermore, due to
# the glob/exclude rules, we don't want albums being created if
# they don't contain any files of interest, even if they do contain
# other files.
if not photos:
continue
if yield_photos: if yield_photos:
yield from photos yield from photos

View file

@ -376,6 +376,31 @@ def normalize_positive_integer(number):
return number return number
def normalize_sha256(sha256, warning_bag=None):
if sha256 is None:
return None
if isinstance(sha256, (tuple, list, set)):
pass
elif isinstance(sha256, str):
sha256 = stringtools.comma_space_split(sha256)
else:
raise TypeError('sha256 should be the 64 character hexdigest string or a set of them.')
shas = set(sha256)
goodshas = set()
for sha in shas:
if isinstance(sha, str) and len(sha) == 64:
goodshas.add(sha)
else:
exc = TypeError(f'sha256 should be the 64-character hexdigest string.')
if warning_bag is not None:
warning_bag.add(exc)
else:
raise exc
return goodshas
def normalize_tag_expression(expression): def normalize_tag_expression(expression):
if not expression: if not expression:
return None return None

View file

@ -145,6 +145,7 @@ def search_by_argparse(args, yield_albums=False, yield_photos=False):
has_tags=args.has_tags, has_tags=args.has_tags,
has_thumbnail=args.has_thumbnail, has_thumbnail=args.has_thumbnail,
is_searchhidden=args.is_searchhidden, is_searchhidden=args.is_searchhidden,
sha256=args.sha256,
mimetype=args.mimetype, mimetype=args.mimetype,
tag_musts=args.tag_musts, tag_musts=args.tag_musts,
tag_mays=args.tag_mays, tag_mays=args.tag_mays,
@ -202,18 +203,34 @@ def delete_argparse(args):
photodb.commit() photodb.commit()
def digest_directory_argparse(args): def digest_directory_argparse(args):
directory = pathclass.Path(args.directory) directories = pipeable.input(args.directory, strip=True, skip_blank=True)
directories = [pathclass.Path(d) for d in directories]
for directory in directories:
directory.assert_is_directory()
photodb = find_photodb() photodb = find_photodb()
need_commit = False
for directory in directories:
digest = photodb.digest_directory( digest = photodb.digest_directory(
directory, directory,
exclude_directories=args.exclude_directories,
exclude_filenames=args.exclude_filenames,
glob_directories=args.glob_directories,
glob_filenames=args.glob_filenames,
hash_kwargs={'bytes_per_second': args.hash_bytes_per_second},
make_albums=args.make_albums, make_albums=args.make_albums,
recurse=args.recurse,
new_photo_ratelimit=args.ratelimit, new_photo_ratelimit=args.ratelimit,
recurse=args.recurse,
yield_albums=True, yield_albums=True,
yield_photos=True, yield_photos=True,
) )
for result in digest: for result in digest:
print(result) # print(result)
need_commit = True
if not need_commit:
return
if args.autoyes or interactive.getpermission('Commit?'): if args.autoyes or interactive.getpermission('Commit?'):
photodb.commit() photodb.commit()
@ -309,6 +326,45 @@ def purge_empty_albums_argparse(args):
if args.autoyes or interactive.getpermission('Commit?'): if args.autoyes or interactive.getpermission('Commit?'):
photodb.commit() photodb.commit()
def reload_metadata_argparse(args):
photodb = find_photodb()
if args.photo_id_args or args.photo_search_args:
photos = get_photos_from_args(args)
else:
photos = search_in_cwd(yield_photos=True, yield_albums=False)
hash_kwargs = {
'bytes_per_second': args.hash_bytes_per_second,
'callback_progress': spinal.callback_progress_v1,
}
need_commit = False
try:
for photo in photos:
if not photo.real_path.is_file:
continue
need_reload = (
args.force or
photo.mtime != photo.real_path.stat.st_mtime or
photo.bytes != photo.real_path.stat.st_size
)
if not need_reload:
continue
photo.reload_metadata(hash_kwargs=hash_kwargs)
need_commit = True
photodb.commit()
except KeyboardInterrupt:
pass
if not need_commit:
return
if args.autoyes or interactive.getpermission('Commit?'):
photodb.commit()
def relocate_argparse(args): def relocate_argparse(args):
photodb = find_photodb() photodb = find_photodb()
@ -420,6 +476,8 @@ Etiquette CLI
{purge_empty_albums} {purge_empty_albums}
{reload_metadata}
{relocate} {relocate}
{search} {search}
@ -480,9 +538,27 @@ digest:
> etiquette_cli.py digest directory <flags> > etiquette_cli.py digest directory <flags>
flags: flags:
--exclude_directories A B C:
Any directories matching any pattern of A, B, C... will be skipped.
These patterns may be absolute paths like 'D:\temp', plain names like
'thumbnails' or glob patterns like 'build_*'.
--exclude_filenames A B C:
Any filenames matching any pattern of A, B, C... will be skipped.
These patterns may be absolute paths like 'D:\somewhere\config.json',
plain names like 'thumbs.db' or glob patterns like '*.temp'.
--glob_directories A B C:
Only directories matching any pattern of A, B, C... will be digested.
These patterns may be plain names or glob patterns like '2021*'
--glob_filenames A B C:
Only filenames matching any pattern of A, B, C... will be digested.
These patterns may be plain names or glob patterns like '*.jpg'
--no_albums: --no_albums:
Do not create any albums the directories. By default, albums are created Do not create any albums. By default, albums are created and nested to
and nested to match the directory structure. match the directory structure.
--ratelimit X: --ratelimit X:
Limit the ingest of new Photos to only one per X seconds. This can be Limit the ingest of new Photos to only one per X seconds. This can be
@ -496,6 +572,7 @@ digest:
Examples: Examples:
> etiquette_cli.py digest media --ratelimit 1 > etiquette_cli.py digest media --ratelimit 1
> etiquette_cli.py digest photos --no-recurse --no-albums --ratelimit 0.25 > etiquette_cli.py digest photos --no-recurse --no-albums --ratelimit 0.25
> etiquette_cli.py digest . --glob-filenames *.jpg --exclude-filenames thumb*
'''.strip(), '''.strip(),
easybake=''' easybake='''
@ -546,6 +623,8 @@ purge_deleted_files:
Delete any Photo objects whose file no longer exists on disk. Delete any Photo objects whose file no longer exists on disk.
> etiquette_cli.py purge_deleted_files > etiquette_cli.py purge_deleted_files
> etiquette_cli.py purge_deleted_files id id id
> etiquette_cli.py purge_deleted_files searchargs
'''.strip(), '''.strip(),
purge_empty_albums=''' purge_empty_albums='''
@ -555,7 +634,34 @@ purge_empty_albums:
Consider running purge_deleted_files first, so that albums containing Consider running purge_deleted_files first, so that albums containing
deleted files will get cleared out and then caught by this function. deleted files will get cleared out and then caught by this function.
With no args, all albums will be checked.
Or you can pass specific album ids. (searchargs is not available since
albums only appear in search results when a matching photo is found, and
we're looking for albums with no photos!)
> etiquette_cli.py purge_empty_albums > etiquette_cli.py purge_empty_albums
> etiquette_cli.py purge_empty_albums id id id
'''.strip(),
reload_metadata='''
reload_metadata:
Reload photos' metadata by reading the files from disk.
With no args, all files under the cwd will be reloaded.
Or, you can pass specific photo ids or searchargs.
> etiquette_cli.py reload_metadata
> etiquette_cli.py reload_metadata id id id
> etiquette_cli.py reload_metadata searchargs
flags:
--force:
By default, we wil skip any files that have the same mtime and byte
size as before. You can pass --force to always reload.
--hash_bytes_per_second:
A string like "10mb" to limit the speed of file hashing for the purpose
of reducing system load.
'''.strip(), '''.strip(),
relocate=''' relocate='''
@ -619,6 +725,9 @@ search:
--mimetype A,B,C: --mimetype A,B,C:
Photo with any mimetype of A, B, C... Photo with any mimetype of A, B, C...
--sha256 A,B,C:
Photo with any sha256 of A, B, C...
--tag_musts A,B,C: --tag_musts A,B,C:
Photo must have all tags A and B and C... Photo must have all tags A and B and C...
@ -728,9 +837,14 @@ def main(argv):
p_digest = subparsers.add_parser('digest', aliases=['digest_directory', 'digest-directory']) p_digest = subparsers.add_parser('digest', aliases=['digest_directory', 'digest-directory'])
p_digest.add_argument('directory') p_digest.add_argument('directory')
p_digest.add_argument('--exclude_directories', '--exclude-directories', nargs='+', default=None)
p_digest.add_argument('--exclude_filenames', '--exclude-filenames', nargs='+', default=None)
p_digest.add_argument('--glob_directories', '--glob-directories', nargs='+', default=None)
p_digest.add_argument('--glob_filenames', '--glob-filenames', nargs='+', default=None)
p_digest.add_argument('--no_albums', '--no-albums', dest='make_albums', action='store_false', default=True) p_digest.add_argument('--no_albums', '--no-albums', dest='make_albums', action='store_false', default=True)
p_digest.add_argument('--ratelimit', dest='ratelimit', type=float, default=0.2) p_digest.add_argument('--ratelimit', dest='ratelimit', type=float, default=0.2)
p_digest.add_argument('--no_recurse', '--no-recurse', dest='recurse', action='store_false', default=True) p_digest.add_argument('--no_recurse', '--no-recurse', dest='recurse', action='store_false', default=True)
p_digest.add_argument('--hash_bytes_per_second', '--hash-bytes-per-second', default=None)
p_digest.add_argument('--yes', dest='autoyes', action='store_true') p_digest.add_argument('--yes', dest='autoyes', action='store_true')
p_digest.set_defaults(func=digest_directory_argparse) p_digest.set_defaults(func=digest_directory_argparse)
@ -756,6 +870,12 @@ def main(argv):
p_purge_empty_albums.add_argument('--yes', dest='autoyes', action='store_true') p_purge_empty_albums.add_argument('--yes', dest='autoyes', action='store_true')
p_purge_empty_albums.set_defaults(func=purge_empty_albums_argparse) p_purge_empty_albums.set_defaults(func=purge_empty_albums_argparse)
p_reload_metadata = subparsers.add_parser('reload_metadata', aliases=['reload-metadata'])
p_reload_metadata.add_argument('--hash_bytes_per_second', '--hash-bytes-per-second', default=None)
p_reload_metadata.add_argument('--force', action='store_true')
p_reload_metadata.add_argument('--yes', dest='autoyes', action='store_true')
p_reload_metadata.set_defaults(func=reload_metadata_argparse)
p_relocate = subparsers.add_parser('relocate') p_relocate = subparsers.add_parser('relocate')
p_relocate.add_argument('photo_id') p_relocate.add_argument('photo_id')
p_relocate.add_argument('filepath') p_relocate.add_argument('filepath')
@ -777,6 +897,7 @@ def main(argv):
p_search.add_argument('--has_tags', '--has-tags', dest='has_tags', default=None) p_search.add_argument('--has_tags', '--has-tags', dest='has_tags', default=None)
p_search.add_argument('--has_thumbnail', '--has-thumbnail', dest='has_thumbnail', default=None) p_search.add_argument('--has_thumbnail', '--has-thumbnail', dest='has_thumbnail', default=None)
p_search.add_argument('--is_searchhidden', '--is-searchhidden', dest='is_searchhidden', default=False) p_search.add_argument('--is_searchhidden', '--is-searchhidden', dest='is_searchhidden', default=False)
p_search.add_argument('--sha256', default=None)
p_search.add_argument('--mimetype', dest='mimetype', default=None) p_search.add_argument('--mimetype', dest='mimetype', default=None)
p_search.add_argument('--tag_musts', '--tag-musts', dest='tag_musts', default=None) p_search.add_argument('--tag_musts', '--tag-musts', dest='tag_musts', default=None)
p_search.add_argument('--tag_mays', '--tag-mays', dest='tag_mays', default=None) p_search.add_argument('--tag_mays', '--tag-mays', dest='tag_mays', default=None)

View file

@ -64,7 +64,7 @@ def etiquette_flask_launch(
pipeable.stderr('Try `etiquette_cli.py init` to create the database.') pipeable.stderr('Try `etiquette_cli.py init` to create the database.')
return 1 return 1
message = f'Starting server on port {port}, pid={os.getpid()}' message = f'Starting server on port {port}, pid={os.getpid()}.'
if use_https: if use_https:
message += ' (https)' message += ' (https)'
print(message) print(message)

View file

@ -602,6 +602,57 @@ def upgrade_17_to_18(photodb):
m.go() m.go()
def upgrade_18_to_19(photodb):
m = Migrator(photodb)
m.tables['photos']['create'] = '''
CREATE TABLE photos(
id TEXT PRIMARY KEY NOT NULL,
filepath TEXT COLLATE NOCASE,
basename TEXT COLLATE NOCASE,
override_filename TEXT COLLATE NOCASE,
extension TEXT COLLATE NOCASE,
mtime INT,
sha256 TEXT,
width INT,
height INT,
ratio REAL,
area INT,
duration INT,
bytes INT,
created INT,
thumbnail TEXT,
tagged_at INT,
author_id TEXT,
searchhidden INT,
FOREIGN KEY(author_id) REFERENCES users(id)
);
'''
m.tables['photos']['transfer'] = '''
INSERT INTO photos SELECT
id,
filepath,
basename,
override_filename,
extension,
NULL,
NULL,
width,
height,
ratio,
area,
duration,
bytes,
created,
thumbnail,
tagged_at,
author_id,
searchhidden
FROM photos_old;
'''
m.go()
def upgrade_all(data_directory): def upgrade_all(data_directory):
''' '''
Given the directory containing a phototagger database, apply all of the Given the directory containing a phototagger database, apply all of the