Replace dev_ino with mtime, sha256.
This change was prompted by my discovery that under DrivePool, two files can have the same dev, ino pair. It's understandable but the fact of the matter is I don't want to rely on inodes any more. Hashing has the downside of speed, but considering the time investment of tagging files in the first place I think it should be worthwhile.
This commit is contained in:
parent
f8efc9d569
commit
4bf5b6d824
8 changed files with 358 additions and 65 deletions
24
README.md
24
README.md
|
@ -1,7 +1,7 @@
|
||||||
Etiquette
|
Etiquette
|
||||||
=========
|
=========
|
||||||
|
|
||||||
I am currently running a demonstration copy of Etiquette at http://etiquette.voussoir.net where you can browse around. This is not yet permanent.
|
I am currently running a read-only demonstration copy of Etiquette at http://etiquette.voussoir.net where you can browse around.
|
||||||
|
|
||||||
### What am I looking at
|
### What am I looking at
|
||||||
|
|
||||||
|
@ -126,6 +126,28 @@ In order to prevent the accidental creation of Etiquette databases, you must use
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
### Basic usage
|
||||||
|
|
||||||
|
Let's say you store your photos in `D:\Documents\Photos`, and you want to tag the files with Etiquette. You can get started with these steps:
|
||||||
|
|
||||||
|
1. Open a Command Prompt / Terminal. Decide where your Etiquette database will be stored, and `cd` to that location. `cd D:\Documents\Photos` is probably fine.
|
||||||
|
2. Run `etiquette_cli.py init` to create the database. A folder called `_etiquette` will appear.
|
||||||
|
3. Run `etiquette_cli.py digest . --ratelimit 1 --glob-filenames *.jpg` to add the files into the database. You can use `etiquette_cli.py digest --help` to learn about this command.
|
||||||
|
4. Run `etiquette_flask_dev.py 5000` to start the webserver on port 5000.
|
||||||
|
5. Open your web browser to `localhost:5000` and begin browsing.
|
||||||
|
|
||||||
|
### Why does Etiquette hash files?
|
||||||
|
|
||||||
|
When adding new files to the database or reloading their metadata, Etiquette will create SHA256 hashes of the files. If you are using Etiquette to organize large media files, this may take a while. I was hesitant to add hashing and incur this slowdown, but the hashes greatly improve Etiquette's ability to detect when a file has been renamed or moved, which is important when you have invested your valuable time into adding tags to them. I hope that the hash time is perceived as a worthwhile tradeoff.
|
||||||
|
|
||||||
|
### Maintaining your database with Etiquette CLI
|
||||||
|
|
||||||
|
I highly recommend storing batch/bash scripts of your favorite `etiquette_cli` invocations, so that you can quickly sync the database with the state of the disk in the future. Here are some suggestions for what you might like to include in such a script:
|
||||||
|
|
||||||
|
- `digest`: Storing all your digest invocations in a single file makes ingesting new files very easy. For your digests, I recommend including `--ratelimit` to stop Photos from having the exact same created timestamp, and `--hash-bytes-per-second` to reduce IO load. In addition, you don't want to forget your favorite `--glob-filenames` patterns.
|
||||||
|
- `reload-metadata`: In order for Etiquette's hash-based rename detection to work properly, the file hashes need to be up to date. If you're using Etiquette to track files which are being modified, you may want to get in the habit of reloading metadata regularly. By default, this will only reload metadata for files whose mtime and/or byte size have changed, so it should not be very expensive. You may add `--hash-bytes-per-second` to reduce IO load.
|
||||||
|
- `purge-deleted-files` & `purge-empty-albums`: You should only do this after a `digest`, because if a file has been moved / renamed you want the digest to pick up on that before purging it as a dead filepath. The Photo purge should come first, so that an album containing entirely deleted photos will be empty when it comes time for the Album purge.
|
||||||
|
|
||||||
### Project stability
|
### Project stability
|
||||||
|
|
||||||
You may notice that Etiquette doesn't have a version number anywhere. That's because I don't think it's ready for one. I am using this project to learn and practice, and breaking changes are very common.
|
You may notice that Etiquette doesn't have a version number anywhere. That's because I don't think it's ready for one. I am using this project to learn and practice, and breaking changes are very common.
|
||||||
|
|
|
@ -41,7 +41,7 @@ ffmpeg = _load_ffmpeg()
|
||||||
|
|
||||||
# Database #########################################################################################
|
# Database #########################################################################################
|
||||||
|
|
||||||
DATABASE_VERSION = 18
|
DATABASE_VERSION = 19
|
||||||
DB_VERSION_PRAGMA = f'''
|
DB_VERSION_PRAGMA = f'''
|
||||||
PRAGMA user_version = {DATABASE_VERSION};
|
PRAGMA user_version = {DATABASE_VERSION};
|
||||||
'''
|
'''
|
||||||
|
@ -84,10 +84,11 @@ CREATE INDEX IF NOT EXISTS index_bookmarks_author_id on bookmarks(author_id);
|
||||||
CREATE TABLE IF NOT EXISTS photos(
|
CREATE TABLE IF NOT EXISTS photos(
|
||||||
id TEXT PRIMARY KEY NOT NULL,
|
id TEXT PRIMARY KEY NOT NULL,
|
||||||
filepath TEXT COLLATE NOCASE,
|
filepath TEXT COLLATE NOCASE,
|
||||||
dev_ino TEXT,
|
|
||||||
basename TEXT COLLATE NOCASE,
|
basename TEXT COLLATE NOCASE,
|
||||||
override_filename TEXT COLLATE NOCASE,
|
override_filename TEXT COLLATE NOCASE,
|
||||||
extension TEXT COLLATE NOCASE,
|
extension TEXT COLLATE NOCASE,
|
||||||
|
mtime INT,
|
||||||
|
sha256 TEXT,
|
||||||
width INT,
|
width INT,
|
||||||
height INT,
|
height INT,
|
||||||
ratio REAL,
|
ratio REAL,
|
||||||
|
|
|
@ -4,6 +4,7 @@ but are returned by the PDB accesses.
|
||||||
'''
|
'''
|
||||||
import abc
|
import abc
|
||||||
import bcrypt
|
import bcrypt
|
||||||
|
import hashlib
|
||||||
import os
|
import os
|
||||||
import PIL.Image
|
import PIL.Image
|
||||||
import re
|
import re
|
||||||
|
@ -786,6 +787,8 @@ class Photo(ObjectBase):
|
||||||
self.author_id = self.normalize_author_id(db_row['author_id'])
|
self.author_id = self.normalize_author_id(db_row['author_id'])
|
||||||
self.override_filename = db_row['override_filename']
|
self.override_filename = db_row['override_filename']
|
||||||
self.extension = self.real_path.extension.no_dot
|
self.extension = self.real_path.extension.no_dot
|
||||||
|
self.mtime = db_row['mtime']
|
||||||
|
self.sha256 = db_row['sha256']
|
||||||
|
|
||||||
if self.extension == '':
|
if self.extension == '':
|
||||||
self.dot_extension = ''
|
self.dot_extension = ''
|
||||||
|
@ -1144,14 +1147,15 @@ class Photo(ObjectBase):
|
||||||
|
|
||||||
@decorators.required_feature('photo.reload_metadata')
|
@decorators.required_feature('photo.reload_metadata')
|
||||||
@decorators.transaction
|
@decorators.transaction
|
||||||
def reload_metadata(self):
|
def reload_metadata(self, hash_kwargs=None):
|
||||||
'''
|
'''
|
||||||
Load the file's height, width, etc as appropriate for this type of file.
|
Load the file's height, width, etc as appropriate for this type of file.
|
||||||
'''
|
'''
|
||||||
self.photodb.log.info('Reloading metadata for %s.', self)
|
self.photodb.log.info('Reloading metadata for %s.', self)
|
||||||
|
|
||||||
|
self.mtime = None
|
||||||
|
self.sha256 = None
|
||||||
self.bytes = None
|
self.bytes = None
|
||||||
self.dev_ino = None
|
|
||||||
self.width = None
|
self.width = None
|
||||||
self.height = None
|
self.height = None
|
||||||
self.area = None
|
self.area = None
|
||||||
|
@ -1160,10 +1164,8 @@ class Photo(ObjectBase):
|
||||||
|
|
||||||
if self.real_path.is_file:
|
if self.real_path.is_file:
|
||||||
stat = self.real_path.stat
|
stat = self.real_path.stat
|
||||||
|
self.mtime = stat.st_mtime
|
||||||
self.bytes = stat.st_size
|
self.bytes = stat.st_size
|
||||||
(dev, ino) = (stat.st_dev, stat.st_ino)
|
|
||||||
if dev and ino:
|
|
||||||
self.dev_ino = f'{dev},{ino}'
|
|
||||||
|
|
||||||
if self.bytes is None:
|
if self.bytes is None:
|
||||||
pass
|
pass
|
||||||
|
@ -1181,15 +1183,20 @@ class Photo(ObjectBase):
|
||||||
self.area = self.width * self.height
|
self.area = self.width * self.height
|
||||||
self.ratio = round(self.width / self.height, 2)
|
self.ratio = round(self.width / self.height, 2)
|
||||||
|
|
||||||
|
hash_kwargs = hash_kwargs or {}
|
||||||
|
sha256 = spinal.hash_file(self.real_path, hash_class=hashlib.sha256, **hash_kwargs)
|
||||||
|
self.sha256 = sha256.hexdigest()
|
||||||
|
|
||||||
data = {
|
data = {
|
||||||
'id': self.id,
|
'id': self.id,
|
||||||
|
'mtime': self.mtime,
|
||||||
|
'sha256': self.sha256,
|
||||||
'width': self.width,
|
'width': self.width,
|
||||||
'height': self.height,
|
'height': self.height,
|
||||||
'area': self.area,
|
'area': self.area,
|
||||||
'ratio': self.ratio,
|
'ratio': self.ratio,
|
||||||
'duration': self.duration,
|
'duration': self.duration,
|
||||||
'bytes': self.bytes,
|
'bytes': self.bytes,
|
||||||
'dev_ino': self.dev_ino,
|
|
||||||
}
|
}
|
||||||
self.photodb.sql_update(table='photos', pairs=data, where_key='id')
|
self.photodb.sql_update(table='photos', pairs=data, where_key='id')
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,5 @@
|
||||||
import bcrypt
|
import bcrypt
|
||||||
|
import hashlib
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
import random
|
import random
|
||||||
|
@ -439,16 +440,6 @@ class PDBPhotoMixin:
|
||||||
def get_photo(self, id):
|
def get_photo(self, id):
|
||||||
return self.get_thing_by_id('photo', id)
|
return self.get_thing_by_id('photo', id)
|
||||||
|
|
||||||
def get_photo_by_inode(self, dev, ino):
|
|
||||||
dev_ino = f'{dev},{ino}'
|
|
||||||
query = 'SELECT * FROM photos WHERE dev_ino == ?'
|
|
||||||
bindings = [dev_ino]
|
|
||||||
photo_row = self.sql_select_one(query, bindings)
|
|
||||||
if photo_row is None:
|
|
||||||
raise exceptions.NoSuchPhoto(dev_ino)
|
|
||||||
photo = self.get_cached_instance('photo', photo_row)
|
|
||||||
return photo
|
|
||||||
|
|
||||||
def get_photo_by_path(self, filepath):
|
def get_photo_by_path(self, filepath):
|
||||||
filepath = pathclass.Path(filepath)
|
filepath = pathclass.Path(filepath)
|
||||||
query = 'SELECT * FROM photos WHERE filepath == ?'
|
query = 'SELECT * FROM photos WHERE filepath == ?'
|
||||||
|
@ -484,6 +475,14 @@ class PDBPhotoMixin:
|
||||||
if count <= 0:
|
if count <= 0:
|
||||||
break
|
break
|
||||||
|
|
||||||
|
def get_photos_by_hash(self, sha256):
|
||||||
|
if not isinstance(sha256, str) or len(sha256) != 64:
|
||||||
|
raise TypeError(f'sha256 shoulbe the 64-character hexdigest string.')
|
||||||
|
|
||||||
|
query = 'SELECT * FROM photos WHERE sha256 == ?'
|
||||||
|
bindings = [sha256]
|
||||||
|
yield from self.get_photos_by_sql(query, bindings)
|
||||||
|
|
||||||
def get_photos_by_sql(self, query, bindings=None):
|
def get_photos_by_sql(self, query, bindings=None):
|
||||||
return self.get_things_by_sql('photo', query, bindings)
|
return self.get_things_by_sql('photo', query, bindings)
|
||||||
|
|
||||||
|
@ -496,6 +495,8 @@ class PDBPhotoMixin:
|
||||||
author=None,
|
author=None,
|
||||||
do_metadata=True,
|
do_metadata=True,
|
||||||
do_thumbnail=True,
|
do_thumbnail=True,
|
||||||
|
hash_kwargs=None,
|
||||||
|
known_hash=None,
|
||||||
searchhidden=False,
|
searchhidden=False,
|
||||||
tags=None,
|
tags=None,
|
||||||
):
|
):
|
||||||
|
@ -503,6 +504,16 @@ class PDBPhotoMixin:
|
||||||
Given a filepath, determine its attributes and create a new Photo object
|
Given a filepath, determine its attributes and create a new Photo object
|
||||||
in the database. Tags may be applied now or later.
|
in the database. Tags may be applied now or later.
|
||||||
|
|
||||||
|
hash_kwargs:
|
||||||
|
Additional kwargs passed into spinal.hash_file. Notably, you may
|
||||||
|
wish to set bytes_per_second to keep system load low.
|
||||||
|
|
||||||
|
known_hash:
|
||||||
|
If the sha256 of the file is already known, you may provide it here
|
||||||
|
so it does not need to be recalculated. This is primarily intended
|
||||||
|
for digest_directory since it will look for hash matches first
|
||||||
|
before creating new photos and thus can provide the known hash.
|
||||||
|
|
||||||
Returns the Photo object.
|
Returns the Photo object.
|
||||||
'''
|
'''
|
||||||
# These might raise exceptions
|
# These might raise exceptions
|
||||||
|
@ -514,6 +525,11 @@ class PDBPhotoMixin:
|
||||||
|
|
||||||
author_id = self.get_user_id_or_none(author)
|
author_id = self.get_user_id_or_none(author)
|
||||||
|
|
||||||
|
if known_hash is None:
|
||||||
|
pass
|
||||||
|
elif not isinstance(known_hash, str) or len(known_hash) != 64:
|
||||||
|
raise TypeError(f'known_hash should be the 64-character sha256 hexdigest string.')
|
||||||
|
|
||||||
# Ok.
|
# Ok.
|
||||||
photo_id = self.generate_id(table='photos')
|
photo_id = self.generate_id(table='photos')
|
||||||
self.log.info('New Photo: %s %s.', photo_id, filepath.absolute_path)
|
self.log.info('New Photo: %s %s.', photo_id, filepath.absolute_path)
|
||||||
|
@ -529,7 +545,8 @@ class PDBPhotoMixin:
|
||||||
'author_id': author_id,
|
'author_id': author_id,
|
||||||
'searchhidden': searchhidden,
|
'searchhidden': searchhidden,
|
||||||
# These will be filled in during the metadata stage.
|
# These will be filled in during the metadata stage.
|
||||||
'dev_ino': None,
|
'mtime': None,
|
||||||
|
'sha256': known_hash,
|
||||||
'bytes': None,
|
'bytes': None,
|
||||||
'width': None,
|
'width': None,
|
||||||
'height': None,
|
'height': None,
|
||||||
|
@ -543,7 +560,8 @@ class PDBPhotoMixin:
|
||||||
photo = self.get_cached_instance('photo', data)
|
photo = self.get_cached_instance('photo', data)
|
||||||
|
|
||||||
if do_metadata:
|
if do_metadata:
|
||||||
photo.reload_metadata()
|
hash_kwargs = hash_kwargs or {}
|
||||||
|
photo.reload_metadata(hash_kwargs=hash_kwargs)
|
||||||
if do_thumbnail:
|
if do_thumbnail:
|
||||||
photo.generate_thumbnail()
|
photo.generate_thumbnail()
|
||||||
|
|
||||||
|
@ -594,6 +612,7 @@ class PDBPhotoMixin:
|
||||||
has_thumbnail=None,
|
has_thumbnail=None,
|
||||||
is_searchhidden=False,
|
is_searchhidden=False,
|
||||||
mimetype=None,
|
mimetype=None,
|
||||||
|
sha256=None,
|
||||||
tag_musts=None,
|
tag_musts=None,
|
||||||
tag_mays=None,
|
tag_mays=None,
|
||||||
tag_forbids=None,
|
tag_forbids=None,
|
||||||
|
@ -729,6 +748,7 @@ class PDBPhotoMixin:
|
||||||
has_tags = searchhelpers.normalize_has_tags(has_tags)
|
has_tags = searchhelpers.normalize_has_tags(has_tags)
|
||||||
has_thumbnail = searchhelpers.normalize_has_thumbnail(has_thumbnail)
|
has_thumbnail = searchhelpers.normalize_has_thumbnail(has_thumbnail)
|
||||||
is_searchhidden = searchhelpers.normalize_is_searchhidden(is_searchhidden)
|
is_searchhidden = searchhelpers.normalize_is_searchhidden(is_searchhidden)
|
||||||
|
sha256 = searchhelpers.normalize_sha256(sha256)
|
||||||
mimetype = searchhelpers.normalize_extension(mimetype)
|
mimetype = searchhelpers.normalize_extension(mimetype)
|
||||||
within_directory = searchhelpers.normalize_within_directory(within_directory, warning_bag=warning_bag)
|
within_directory = searchhelpers.normalize_within_directory(within_directory, warning_bag=warning_bag)
|
||||||
yield_albums = searchhelpers.normalize_yield_albums(yield_albums)
|
yield_albums = searchhelpers.normalize_yield_albums(yield_albums)
|
||||||
|
@ -908,6 +928,9 @@ class PDBPhotoMixin:
|
||||||
elif is_searchhidden is False:
|
elif is_searchhidden is False:
|
||||||
wheres.append('searchhidden == 0')
|
wheres.append('searchhidden == 0')
|
||||||
|
|
||||||
|
if sha256:
|
||||||
|
wheres.append(f'sha256 IN {sqlhelpers.listify(sha256)}')
|
||||||
|
|
||||||
for column in notnulls:
|
for column in notnulls:
|
||||||
wheres.append(column + ' IS NOT NULL')
|
wheres.append(column + ' IS NOT NULL')
|
||||||
for column in yesnulls:
|
for column in yesnulls:
|
||||||
|
@ -1497,9 +1520,12 @@ class PDBUtilMixin:
|
||||||
*,
|
*,
|
||||||
exclude_directories=None,
|
exclude_directories=None,
|
||||||
exclude_filenames=None,
|
exclude_filenames=None,
|
||||||
|
glob_directories=None,
|
||||||
|
glob_filenames=None,
|
||||||
|
hash_kwargs=None,
|
||||||
make_albums=True,
|
make_albums=True,
|
||||||
natural_sort=True,
|
natural_sort=True,
|
||||||
new_photo_kwargs={},
|
new_photo_kwargs=None,
|
||||||
new_photo_ratelimit=None,
|
new_photo_ratelimit=None,
|
||||||
recurse=True,
|
recurse=True,
|
||||||
yield_albums=True,
|
yield_albums=True,
|
||||||
|
@ -1521,6 +1547,10 @@ class PDBUtilMixin:
|
||||||
This list works in addition to, not instead of, the
|
This list works in addition to, not instead of, the
|
||||||
digest_exclude_files config value.
|
digest_exclude_files config value.
|
||||||
|
|
||||||
|
hash_kwargs:
|
||||||
|
Additional kwargs passed into spinal.hash_file. Notably, you may
|
||||||
|
wish to set bytes_per_second to keep system load low.
|
||||||
|
|
||||||
make_albums:
|
make_albums:
|
||||||
If True, every directory that is digested will be turned into an
|
If True, every directory that is digested will be turned into an
|
||||||
Album, and the directory path will be added to the Album's
|
Album, and the directory path will be added to the Album's
|
||||||
|
@ -1582,9 +1612,14 @@ class PDBUtilMixin:
|
||||||
return exclude_filenames
|
return exclude_filenames
|
||||||
|
|
||||||
def _normalize_new_photo_kwargs(new_photo_kwargs):
|
def _normalize_new_photo_kwargs(new_photo_kwargs):
|
||||||
new_photo_kwargs = new_photo_kwargs.copy()
|
if new_photo_kwargs is None:
|
||||||
new_photo_kwargs.pop('commit', None)
|
new_photo_kwargs = {}
|
||||||
new_photo_kwargs.pop('filepath', None)
|
else:
|
||||||
|
new_photo_kwargs = new_photo_kwargs.copy()
|
||||||
|
new_photo_kwargs.pop('commit', None)
|
||||||
|
new_photo_kwargs.pop('filepath', None)
|
||||||
|
|
||||||
|
new_photo_kwargs.setdefault('hash_kwargs', hash_kwargs)
|
||||||
return new_photo_kwargs
|
return new_photo_kwargs
|
||||||
|
|
||||||
def _normalize_new_photo_ratelimit(new_photo_ratelimit):
|
def _normalize_new_photo_ratelimit(new_photo_ratelimit):
|
||||||
|
@ -1598,43 +1633,63 @@ class PDBUtilMixin:
|
||||||
raise TypeError(new_photo_ratelimit)
|
raise TypeError(new_photo_ratelimit)
|
||||||
return new_photo_ratelimit
|
return new_photo_ratelimit
|
||||||
|
|
||||||
def check_renamed_inode(filepath):
|
def check_renamed(filepath):
|
||||||
stat = filepath.stat
|
'''
|
||||||
(dev, ino) = (stat.st_dev, stat.st_ino)
|
We'll do our best to determine if this file is actually a rename of
|
||||||
if dev == 0 or ino == 0:
|
a file that's already in the database.
|
||||||
return
|
'''
|
||||||
|
same_meta = self.get_photos_by_sql(
|
||||||
|
'SELECT * FROM photos WHERE mtime == ? AND bytes == ?',
|
||||||
|
[filepath.stat.st_mtime, filepath.stat.st_size]
|
||||||
|
)
|
||||||
|
same_meta = [photo for photo in same_meta if not photo.real_path.is_file]
|
||||||
|
if len(same_meta) == 1:
|
||||||
|
photo = same_meta[0]
|
||||||
|
self.log.debug('Found mtime+bytesize match %s.', photo)
|
||||||
|
return photo
|
||||||
|
|
||||||
try:
|
self.log.loud('Hashing file %s to check for rename.', filepath)
|
||||||
photo = self.get_photo_by_inode(dev, ino)
|
sha256 = spinal.hash_file(
|
||||||
except exceptions.NoSuchPhoto:
|
filepath,
|
||||||
return
|
hash_class=hashlib.sha256, **hash_kwargs,
|
||||||
|
).hexdigest()
|
||||||
|
|
||||||
if photo.real_path.is_file:
|
same_hash = self.get_photos_by_hash(sha256)
|
||||||
# Don't relocate the path if this is actually a hardlink, and
|
same_hash = [photo for photo in same_hash if not photo.real_path.is_file]
|
||||||
# both paths are current.
|
|
||||||
return
|
|
||||||
|
|
||||||
if photo.bytes != stat.st_size:
|
# fwiw, I'm not checking byte size since it's a hash match.
|
||||||
return
|
if len(same_hash) > 1:
|
||||||
|
same_hash = [photo for photo in same_hash if photo.mtime == filepath.stat.st_mtime]
|
||||||
|
if len(same_hash) == 1:
|
||||||
|
return same_hash[0]
|
||||||
|
|
||||||
photo.relocate(filepath.absolute_path)
|
# Although we did not find a match, we can still benefit from our
|
||||||
return photo
|
# hash work by passing this as the known_hash to new_photo.
|
||||||
|
return {'sha256': sha256}
|
||||||
|
|
||||||
def create_or_fetch_photo(filepath, new_photo_kwargs):
|
def create_or_fetch_photo(filepath):
|
||||||
'''
|
'''
|
||||||
Given a filepath, find the corresponding Photo object if it exists,
|
Given a filepath, find the corresponding Photo object if it exists,
|
||||||
otherwise create it and then return it.
|
otherwise create it and then return it.
|
||||||
'''
|
'''
|
||||||
try:
|
try:
|
||||||
photo = self.get_photo_by_path(filepath)
|
photo = self.get_photo_by_path(filepath)
|
||||||
|
return photo
|
||||||
except exceptions.NoSuchPhoto:
|
except exceptions.NoSuchPhoto:
|
||||||
photo = None
|
pass
|
||||||
if not photo:
|
|
||||||
photo = check_renamed_inode(filepath)
|
result = check_renamed(filepath)
|
||||||
if not photo:
|
if isinstance(result, objects.Photo):
|
||||||
photo = self.new_photo(filepath.absolute_path, **new_photo_kwargs)
|
result.relocate(filepath.absolute_path)
|
||||||
if new_photo_ratelimit is not None:
|
return result
|
||||||
new_photo_ratelimit.limit()
|
elif isinstance(result, dict) and 'sha256' in result:
|
||||||
|
sha256 = result['sha256']
|
||||||
|
else:
|
||||||
|
sha256 = None
|
||||||
|
|
||||||
|
photo = self.new_photo(filepath, known_hash=sha256, **new_photo_kwargs)
|
||||||
|
if new_photo_ratelimit is not None:
|
||||||
|
new_photo_ratelimit.limit()
|
||||||
|
|
||||||
return photo
|
return photo
|
||||||
|
|
||||||
|
@ -1672,6 +1727,7 @@ class PDBUtilMixin:
|
||||||
directory = _normalize_directory(directory)
|
directory = _normalize_directory(directory)
|
||||||
exclude_directories = _normalize_exclude_directories(exclude_directories)
|
exclude_directories = _normalize_exclude_directories(exclude_directories)
|
||||||
exclude_filenames = _normalize_exclude_filenames(exclude_filenames)
|
exclude_filenames = _normalize_exclude_filenames(exclude_filenames)
|
||||||
|
hash_kwargs = hash_kwargs or {}
|
||||||
new_photo_kwargs = _normalize_new_photo_kwargs(new_photo_kwargs)
|
new_photo_kwargs = _normalize_new_photo_kwargs(new_photo_kwargs)
|
||||||
new_photo_ratelimit = _normalize_new_photo_ratelimit(new_photo_ratelimit)
|
new_photo_ratelimit = _normalize_new_photo_ratelimit(new_photo_ratelimit)
|
||||||
|
|
||||||
|
@ -1682,6 +1738,8 @@ class PDBUtilMixin:
|
||||||
directory,
|
directory,
|
||||||
exclude_directories=exclude_directories,
|
exclude_directories=exclude_directories,
|
||||||
exclude_filenames=exclude_filenames,
|
exclude_filenames=exclude_filenames,
|
||||||
|
glob_directories=glob_directories,
|
||||||
|
glob_filenames=glob_filenames,
|
||||||
recurse=recurse,
|
recurse=recurse,
|
||||||
yield_style='nested',
|
yield_style='nested',
|
||||||
)
|
)
|
||||||
|
@ -1690,7 +1748,15 @@ class PDBUtilMixin:
|
||||||
if natural_sort:
|
if natural_sort:
|
||||||
files = sorted(files, key=lambda f: helpers.natural_sorter(f.basename))
|
files = sorted(files, key=lambda f: helpers.natural_sorter(f.basename))
|
||||||
|
|
||||||
photos = [create_or_fetch_photo(file, new_photo_kwargs=new_photo_kwargs) for file in files]
|
photos = [create_or_fetch_photo(file) for file in files]
|
||||||
|
|
||||||
|
# Note, this means that empty folders will not get an Album.
|
||||||
|
# At this time this behavior is intentional. Furthermore, due to
|
||||||
|
# the glob/exclude rules, we don't want albums being created if
|
||||||
|
# they don't contain any files of interest, even if they do contain
|
||||||
|
# other files.
|
||||||
|
if not photos:
|
||||||
|
continue
|
||||||
|
|
||||||
if yield_photos:
|
if yield_photos:
|
||||||
yield from photos
|
yield from photos
|
||||||
|
|
|
@ -376,6 +376,31 @@ def normalize_positive_integer(number):
|
||||||
|
|
||||||
return number
|
return number
|
||||||
|
|
||||||
|
def normalize_sha256(sha256, warning_bag=None):
|
||||||
|
if sha256 is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
if isinstance(sha256, (tuple, list, set)):
|
||||||
|
pass
|
||||||
|
elif isinstance(sha256, str):
|
||||||
|
sha256 = stringtools.comma_space_split(sha256)
|
||||||
|
else:
|
||||||
|
raise TypeError('sha256 should be the 64 character hexdigest string or a set of them.')
|
||||||
|
|
||||||
|
shas = set(sha256)
|
||||||
|
goodshas = set()
|
||||||
|
for sha in shas:
|
||||||
|
if isinstance(sha, str) and len(sha) == 64:
|
||||||
|
goodshas.add(sha)
|
||||||
|
else:
|
||||||
|
exc = TypeError(f'sha256 should be the 64-character hexdigest string.')
|
||||||
|
if warning_bag is not None:
|
||||||
|
warning_bag.add(exc)
|
||||||
|
else:
|
||||||
|
raise exc
|
||||||
|
|
||||||
|
return goodshas
|
||||||
|
|
||||||
def normalize_tag_expression(expression):
|
def normalize_tag_expression(expression):
|
||||||
if not expression:
|
if not expression:
|
||||||
return None
|
return None
|
||||||
|
|
|
@ -145,6 +145,7 @@ def search_by_argparse(args, yield_albums=False, yield_photos=False):
|
||||||
has_tags=args.has_tags,
|
has_tags=args.has_tags,
|
||||||
has_thumbnail=args.has_thumbnail,
|
has_thumbnail=args.has_thumbnail,
|
||||||
is_searchhidden=args.is_searchhidden,
|
is_searchhidden=args.is_searchhidden,
|
||||||
|
sha256=args.sha256,
|
||||||
mimetype=args.mimetype,
|
mimetype=args.mimetype,
|
||||||
tag_musts=args.tag_musts,
|
tag_musts=args.tag_musts,
|
||||||
tag_mays=args.tag_mays,
|
tag_mays=args.tag_mays,
|
||||||
|
@ -202,18 +203,34 @@ def delete_argparse(args):
|
||||||
photodb.commit()
|
photodb.commit()
|
||||||
|
|
||||||
def digest_directory_argparse(args):
|
def digest_directory_argparse(args):
|
||||||
directory = pathclass.Path(args.directory)
|
directories = pipeable.input(args.directory, strip=True, skip_blank=True)
|
||||||
|
directories = [pathclass.Path(d) for d in directories]
|
||||||
|
for directory in directories:
|
||||||
|
directory.assert_is_directory()
|
||||||
|
|
||||||
photodb = find_photodb()
|
photodb = find_photodb()
|
||||||
digest = photodb.digest_directory(
|
need_commit = False
|
||||||
directory,
|
|
||||||
make_albums=args.make_albums,
|
for directory in directories:
|
||||||
recurse=args.recurse,
|
digest = photodb.digest_directory(
|
||||||
new_photo_ratelimit=args.ratelimit,
|
directory,
|
||||||
yield_albums=True,
|
exclude_directories=args.exclude_directories,
|
||||||
yield_photos=True,
|
exclude_filenames=args.exclude_filenames,
|
||||||
)
|
glob_directories=args.glob_directories,
|
||||||
for result in digest:
|
glob_filenames=args.glob_filenames,
|
||||||
print(result)
|
hash_kwargs={'bytes_per_second': args.hash_bytes_per_second},
|
||||||
|
make_albums=args.make_albums,
|
||||||
|
new_photo_ratelimit=args.ratelimit,
|
||||||
|
recurse=args.recurse,
|
||||||
|
yield_albums=True,
|
||||||
|
yield_photos=True,
|
||||||
|
)
|
||||||
|
for result in digest:
|
||||||
|
# print(result)
|
||||||
|
need_commit = True
|
||||||
|
|
||||||
|
if not need_commit:
|
||||||
|
return
|
||||||
|
|
||||||
if args.autoyes or interactive.getpermission('Commit?'):
|
if args.autoyes or interactive.getpermission('Commit?'):
|
||||||
photodb.commit()
|
photodb.commit()
|
||||||
|
@ -309,6 +326,45 @@ def purge_empty_albums_argparse(args):
|
||||||
if args.autoyes or interactive.getpermission('Commit?'):
|
if args.autoyes or interactive.getpermission('Commit?'):
|
||||||
photodb.commit()
|
photodb.commit()
|
||||||
|
|
||||||
|
def reload_metadata_argparse(args):
|
||||||
|
photodb = find_photodb()
|
||||||
|
|
||||||
|
if args.photo_id_args or args.photo_search_args:
|
||||||
|
photos = get_photos_from_args(args)
|
||||||
|
else:
|
||||||
|
photos = search_in_cwd(yield_photos=True, yield_albums=False)
|
||||||
|
|
||||||
|
hash_kwargs = {
|
||||||
|
'bytes_per_second': args.hash_bytes_per_second,
|
||||||
|
'callback_progress': spinal.callback_progress_v1,
|
||||||
|
}
|
||||||
|
|
||||||
|
need_commit = False
|
||||||
|
try:
|
||||||
|
for photo in photos:
|
||||||
|
if not photo.real_path.is_file:
|
||||||
|
continue
|
||||||
|
|
||||||
|
need_reload = (
|
||||||
|
args.force or
|
||||||
|
photo.mtime != photo.real_path.stat.st_mtime or
|
||||||
|
photo.bytes != photo.real_path.stat.st_size
|
||||||
|
)
|
||||||
|
|
||||||
|
if not need_reload:
|
||||||
|
continue
|
||||||
|
photo.reload_metadata(hash_kwargs=hash_kwargs)
|
||||||
|
need_commit = True
|
||||||
|
photodb.commit()
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
pass
|
||||||
|
|
||||||
|
if not need_commit:
|
||||||
|
return
|
||||||
|
|
||||||
|
if args.autoyes or interactive.getpermission('Commit?'):
|
||||||
|
photodb.commit()
|
||||||
|
|
||||||
def relocate_argparse(args):
|
def relocate_argparse(args):
|
||||||
photodb = find_photodb()
|
photodb = find_photodb()
|
||||||
|
|
||||||
|
@ -420,6 +476,8 @@ Etiquette CLI
|
||||||
|
|
||||||
{purge_empty_albums}
|
{purge_empty_albums}
|
||||||
|
|
||||||
|
{reload_metadata}
|
||||||
|
|
||||||
{relocate}
|
{relocate}
|
||||||
|
|
||||||
{search}
|
{search}
|
||||||
|
@ -480,9 +538,27 @@ digest:
|
||||||
> etiquette_cli.py digest directory <flags>
|
> etiquette_cli.py digest directory <flags>
|
||||||
|
|
||||||
flags:
|
flags:
|
||||||
|
--exclude_directories A B C:
|
||||||
|
Any directories matching any pattern of A, B, C... will be skipped.
|
||||||
|
These patterns may be absolute paths like 'D:\temp', plain names like
|
||||||
|
'thumbnails' or glob patterns like 'build_*'.
|
||||||
|
|
||||||
|
--exclude_filenames A B C:
|
||||||
|
Any filenames matching any pattern of A, B, C... will be skipped.
|
||||||
|
These patterns may be absolute paths like 'D:\somewhere\config.json',
|
||||||
|
plain names like 'thumbs.db' or glob patterns like '*.temp'.
|
||||||
|
|
||||||
|
--glob_directories A B C:
|
||||||
|
Only directories matching any pattern of A, B, C... will be digested.
|
||||||
|
These patterns may be plain names or glob patterns like '2021*'
|
||||||
|
|
||||||
|
--glob_filenames A B C:
|
||||||
|
Only filenames matching any pattern of A, B, C... will be digested.
|
||||||
|
These patterns may be plain names or glob patterns like '*.jpg'
|
||||||
|
|
||||||
--no_albums:
|
--no_albums:
|
||||||
Do not create any albums the directories. By default, albums are created
|
Do not create any albums. By default, albums are created and nested to
|
||||||
and nested to match the directory structure.
|
match the directory structure.
|
||||||
|
|
||||||
--ratelimit X:
|
--ratelimit X:
|
||||||
Limit the ingest of new Photos to only one per X seconds. This can be
|
Limit the ingest of new Photos to only one per X seconds. This can be
|
||||||
|
@ -496,6 +572,7 @@ digest:
|
||||||
Examples:
|
Examples:
|
||||||
> etiquette_cli.py digest media --ratelimit 1
|
> etiquette_cli.py digest media --ratelimit 1
|
||||||
> etiquette_cli.py digest photos --no-recurse --no-albums --ratelimit 0.25
|
> etiquette_cli.py digest photos --no-recurse --no-albums --ratelimit 0.25
|
||||||
|
> etiquette_cli.py digest . --glob-filenames *.jpg --exclude-filenames thumb*
|
||||||
'''.strip(),
|
'''.strip(),
|
||||||
|
|
||||||
easybake='''
|
easybake='''
|
||||||
|
@ -546,6 +623,8 @@ purge_deleted_files:
|
||||||
Delete any Photo objects whose file no longer exists on disk.
|
Delete any Photo objects whose file no longer exists on disk.
|
||||||
|
|
||||||
> etiquette_cli.py purge_deleted_files
|
> etiquette_cli.py purge_deleted_files
|
||||||
|
> etiquette_cli.py purge_deleted_files id id id
|
||||||
|
> etiquette_cli.py purge_deleted_files searchargs
|
||||||
'''.strip(),
|
'''.strip(),
|
||||||
|
|
||||||
purge_empty_albums='''
|
purge_empty_albums='''
|
||||||
|
@ -555,7 +634,34 @@ purge_empty_albums:
|
||||||
Consider running purge_deleted_files first, so that albums containing
|
Consider running purge_deleted_files first, so that albums containing
|
||||||
deleted files will get cleared out and then caught by this function.
|
deleted files will get cleared out and then caught by this function.
|
||||||
|
|
||||||
|
With no args, all albums will be checked.
|
||||||
|
Or you can pass specific album ids. (searchargs is not available since
|
||||||
|
albums only appear in search results when a matching photo is found, and
|
||||||
|
we're looking for albums with no photos!)
|
||||||
|
|
||||||
> etiquette_cli.py purge_empty_albums
|
> etiquette_cli.py purge_empty_albums
|
||||||
|
> etiquette_cli.py purge_empty_albums id id id
|
||||||
|
'''.strip(),
|
||||||
|
|
||||||
|
reload_metadata='''
|
||||||
|
reload_metadata:
|
||||||
|
Reload photos' metadata by reading the files from disk.
|
||||||
|
|
||||||
|
With no args, all files under the cwd will be reloaded.
|
||||||
|
Or, you can pass specific photo ids or searchargs.
|
||||||
|
|
||||||
|
> etiquette_cli.py reload_metadata
|
||||||
|
> etiquette_cli.py reload_metadata id id id
|
||||||
|
> etiquette_cli.py reload_metadata searchargs
|
||||||
|
|
||||||
|
flags:
|
||||||
|
--force:
|
||||||
|
By default, we wil skip any files that have the same mtime and byte
|
||||||
|
size as before. You can pass --force to always reload.
|
||||||
|
|
||||||
|
--hash_bytes_per_second:
|
||||||
|
A string like "10mb" to limit the speed of file hashing for the purpose
|
||||||
|
of reducing system load.
|
||||||
'''.strip(),
|
'''.strip(),
|
||||||
|
|
||||||
relocate='''
|
relocate='''
|
||||||
|
@ -619,6 +725,9 @@ search:
|
||||||
--mimetype A,B,C:
|
--mimetype A,B,C:
|
||||||
Photo with any mimetype of A, B, C...
|
Photo with any mimetype of A, B, C...
|
||||||
|
|
||||||
|
--sha256 A,B,C:
|
||||||
|
Photo with any sha256 of A, B, C...
|
||||||
|
|
||||||
--tag_musts A,B,C:
|
--tag_musts A,B,C:
|
||||||
Photo must have all tags A and B and C...
|
Photo must have all tags A and B and C...
|
||||||
|
|
||||||
|
@ -728,9 +837,14 @@ def main(argv):
|
||||||
|
|
||||||
p_digest = subparsers.add_parser('digest', aliases=['digest_directory', 'digest-directory'])
|
p_digest = subparsers.add_parser('digest', aliases=['digest_directory', 'digest-directory'])
|
||||||
p_digest.add_argument('directory')
|
p_digest.add_argument('directory')
|
||||||
|
p_digest.add_argument('--exclude_directories', '--exclude-directories', nargs='+', default=None)
|
||||||
|
p_digest.add_argument('--exclude_filenames', '--exclude-filenames', nargs='+', default=None)
|
||||||
|
p_digest.add_argument('--glob_directories', '--glob-directories', nargs='+', default=None)
|
||||||
|
p_digest.add_argument('--glob_filenames', '--glob-filenames', nargs='+', default=None)
|
||||||
p_digest.add_argument('--no_albums', '--no-albums', dest='make_albums', action='store_false', default=True)
|
p_digest.add_argument('--no_albums', '--no-albums', dest='make_albums', action='store_false', default=True)
|
||||||
p_digest.add_argument('--ratelimit', dest='ratelimit', type=float, default=0.2)
|
p_digest.add_argument('--ratelimit', dest='ratelimit', type=float, default=0.2)
|
||||||
p_digest.add_argument('--no_recurse', '--no-recurse', dest='recurse', action='store_false', default=True)
|
p_digest.add_argument('--no_recurse', '--no-recurse', dest='recurse', action='store_false', default=True)
|
||||||
|
p_digest.add_argument('--hash_bytes_per_second', '--hash-bytes-per-second', default=None)
|
||||||
p_digest.add_argument('--yes', dest='autoyes', action='store_true')
|
p_digest.add_argument('--yes', dest='autoyes', action='store_true')
|
||||||
p_digest.set_defaults(func=digest_directory_argparse)
|
p_digest.set_defaults(func=digest_directory_argparse)
|
||||||
|
|
||||||
|
@ -756,6 +870,12 @@ def main(argv):
|
||||||
p_purge_empty_albums.add_argument('--yes', dest='autoyes', action='store_true')
|
p_purge_empty_albums.add_argument('--yes', dest='autoyes', action='store_true')
|
||||||
p_purge_empty_albums.set_defaults(func=purge_empty_albums_argparse)
|
p_purge_empty_albums.set_defaults(func=purge_empty_albums_argparse)
|
||||||
|
|
||||||
|
p_reload_metadata = subparsers.add_parser('reload_metadata', aliases=['reload-metadata'])
|
||||||
|
p_reload_metadata.add_argument('--hash_bytes_per_second', '--hash-bytes-per-second', default=None)
|
||||||
|
p_reload_metadata.add_argument('--force', action='store_true')
|
||||||
|
p_reload_metadata.add_argument('--yes', dest='autoyes', action='store_true')
|
||||||
|
p_reload_metadata.set_defaults(func=reload_metadata_argparse)
|
||||||
|
|
||||||
p_relocate = subparsers.add_parser('relocate')
|
p_relocate = subparsers.add_parser('relocate')
|
||||||
p_relocate.add_argument('photo_id')
|
p_relocate.add_argument('photo_id')
|
||||||
p_relocate.add_argument('filepath')
|
p_relocate.add_argument('filepath')
|
||||||
|
@ -777,6 +897,7 @@ def main(argv):
|
||||||
p_search.add_argument('--has_tags', '--has-tags', dest='has_tags', default=None)
|
p_search.add_argument('--has_tags', '--has-tags', dest='has_tags', default=None)
|
||||||
p_search.add_argument('--has_thumbnail', '--has-thumbnail', dest='has_thumbnail', default=None)
|
p_search.add_argument('--has_thumbnail', '--has-thumbnail', dest='has_thumbnail', default=None)
|
||||||
p_search.add_argument('--is_searchhidden', '--is-searchhidden', dest='is_searchhidden', default=False)
|
p_search.add_argument('--is_searchhidden', '--is-searchhidden', dest='is_searchhidden', default=False)
|
||||||
|
p_search.add_argument('--sha256', default=None)
|
||||||
p_search.add_argument('--mimetype', dest='mimetype', default=None)
|
p_search.add_argument('--mimetype', dest='mimetype', default=None)
|
||||||
p_search.add_argument('--tag_musts', '--tag-musts', dest='tag_musts', default=None)
|
p_search.add_argument('--tag_musts', '--tag-musts', dest='tag_musts', default=None)
|
||||||
p_search.add_argument('--tag_mays', '--tag-mays', dest='tag_mays', default=None)
|
p_search.add_argument('--tag_mays', '--tag-mays', dest='tag_mays', default=None)
|
||||||
|
|
|
@ -64,7 +64,7 @@ def etiquette_flask_launch(
|
||||||
pipeable.stderr('Try `etiquette_cli.py init` to create the database.')
|
pipeable.stderr('Try `etiquette_cli.py init` to create the database.')
|
||||||
return 1
|
return 1
|
||||||
|
|
||||||
message = f'Starting server on port {port}, pid={os.getpid()}'
|
message = f'Starting server on port {port}, pid={os.getpid()}.'
|
||||||
if use_https:
|
if use_https:
|
||||||
message += ' (https)'
|
message += ' (https)'
|
||||||
print(message)
|
print(message)
|
||||||
|
|
|
@ -602,6 +602,57 @@ def upgrade_17_to_18(photodb):
|
||||||
|
|
||||||
m.go()
|
m.go()
|
||||||
|
|
||||||
|
def upgrade_18_to_19(photodb):
|
||||||
|
m = Migrator(photodb)
|
||||||
|
|
||||||
|
m.tables['photos']['create'] = '''
|
||||||
|
CREATE TABLE photos(
|
||||||
|
id TEXT PRIMARY KEY NOT NULL,
|
||||||
|
filepath TEXT COLLATE NOCASE,
|
||||||
|
basename TEXT COLLATE NOCASE,
|
||||||
|
override_filename TEXT COLLATE NOCASE,
|
||||||
|
extension TEXT COLLATE NOCASE,
|
||||||
|
mtime INT,
|
||||||
|
sha256 TEXT,
|
||||||
|
width INT,
|
||||||
|
height INT,
|
||||||
|
ratio REAL,
|
||||||
|
area INT,
|
||||||
|
duration INT,
|
||||||
|
bytes INT,
|
||||||
|
created INT,
|
||||||
|
thumbnail TEXT,
|
||||||
|
tagged_at INT,
|
||||||
|
author_id TEXT,
|
||||||
|
searchhidden INT,
|
||||||
|
FOREIGN KEY(author_id) REFERENCES users(id)
|
||||||
|
);
|
||||||
|
'''
|
||||||
|
m.tables['photos']['transfer'] = '''
|
||||||
|
INSERT INTO photos SELECT
|
||||||
|
id,
|
||||||
|
filepath,
|
||||||
|
basename,
|
||||||
|
override_filename,
|
||||||
|
extension,
|
||||||
|
NULL,
|
||||||
|
NULL,
|
||||||
|
width,
|
||||||
|
height,
|
||||||
|
ratio,
|
||||||
|
area,
|
||||||
|
duration,
|
||||||
|
bytes,
|
||||||
|
created,
|
||||||
|
thumbnail,
|
||||||
|
tagged_at,
|
||||||
|
author_id,
|
||||||
|
searchhidden
|
||||||
|
FROM photos_old;
|
||||||
|
'''
|
||||||
|
|
||||||
|
m.go()
|
||||||
|
|
||||||
def upgrade_all(data_directory):
|
def upgrade_all(data_directory):
|
||||||
'''
|
'''
|
||||||
Given the directory containing a phototagger database, apply all of the
|
Given the directory containing a phototagger database, apply all of the
|
||||||
|
|
Loading…
Reference in a new issue