Initial publication.
commit
6d40170c78
|
@ -0,0 +1,2 @@
|
|||
_bringrss/*
|
||||
myscripts/*
|
|
@ -0,0 +1,12 @@
|
|||
Contact
|
||||
=======
|
||||
|
||||
Please do not open pull requests without talking to me first. For serious issues and bugs, open a GitHub issue. If you just have a question, please send an email to `contact@voussoir.net`. For other contact options, see [voussoir.net/#contact](https://voussoir.net/#contact).
|
||||
|
||||
I also mirror my work to other git services:
|
||||
|
||||
- https://github.com/voussoir
|
||||
|
||||
- https://gitlab.com/voussoir
|
||||
|
||||
- https://codeberg.org/voussoir
|
|
@ -0,0 +1,29 @@
|
|||
BSD 3-Clause License
|
||||
|
||||
Copyright (c) 2022, Ethan Dalool aka voussoir
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
|
||||
2. Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
|
||||
3. Neither the name of the copyright holder nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
||||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
@ -0,0 +1,148 @@
|
|||
BringRSS
|
||||
========
|
||||
|
||||
It brings you the news.
|
||||
|
||||
Live demo: https://bringrss.voussoir.net
|
||||
|
||||
## What am I looking at
|
||||
|
||||
BringRSS is an RSS client / newsreader made with Python, SQLite3, and Flask. Its main features are:
|
||||
|
||||
- Automatic feed refresh with separate intervals per feed.
|
||||
- Feeds arranged in hierarchical folders.
|
||||
- Filters for categorizing or removing news based on your criteria.
|
||||
- Sends news objects to your own Python scripts for arbitrary post-processing, emailing, downloading, etc.
|
||||
- Embeds videos from YouTube feeds.
|
||||
- News text is filtered by [DOMPurify](https://github.com/cure53/DOMPurify) before display.
|
||||
- Supports multiple enclosures.
|
||||
|
||||
Because BringRSS runs a webserver, you can access it from every device in your house via your computer's LAN IP. BringRSS provides no login or authentication, but if you have a reverse proxy handle that for you, you could run BringRSS on an internet-connected machine and access your feeds anywhere.
|
||||
|
||||
## Screenshots
|
||||
|
||||
![](https://user-images.githubusercontent.com/7299570/160224740-734f0517-1b03-48e1-80a6-b96129d7d7fc.png)
|
||||
|
||||
![](https://user-images.githubusercontent.com/7299570/160224741-398da5aa-af92-42e3-a921-118cc6e54a68.png)
|
||||
|
||||
![](https://user-images.githubusercontent.com/7299570/160224742-f482b8dd-59cd-4a8a-b70f-67c1e9250e83.png)
|
||||
|
||||
![](https://user-images.githubusercontent.com/7299570/160224743-9f287446-2f1f-4465-8c23-f8d2591fe5e0.png)
|
||||
|
||||
![](https://user-images.githubusercontent.com/7299570/160224744-e43d8838-74a4-4a06-b304-10e5102614fc.png)
|
||||
|
||||
![](https://user-images.githubusercontent.com/7299570/160224745-479bd9c5-9c42-4514-8a49-42df972ec978.png)
|
||||
|
||||
![](https://user-images.githubusercontent.com/7299570/160224748-f73e7db7-1664-47ce-a391-86f947fd6c84.png)
|
||||
|
||||
![](https://user-images.githubusercontent.com/7299570/160224750-10aa322e-8036-4410-8415-9fdbb9e8da99.png)
|
||||
|
||||
## Setting up
|
||||
|
||||
As you'll see below, BringRSS has a core backend package and separate frontends that use it. These frontend applications will use `import bringrss` to access the backend code. Therefore, the `bringrss` package needs to be in the right place for Python to find it for `import`.
|
||||
|
||||
1. Run `pip install -r requirements.txt --upgrade` after reading the file and deciding you are ok with the dependencies.
|
||||
|
||||
2. Make a new folder somewhere on your computer, and add this folder to your `PYTHONPATH` environment variable. For example, I might use `D:\pythonpath` or `~/pythonpath`. Close and re-open your Command Prompt / Terminal so it reloads the environment variables.
|
||||
|
||||
3. Add a symlink to the bringrss folder into that folder:
|
||||
|
||||
The repository you are looking at now is `D:\Git\BringRSS` or `~/Git/BringRSS`. You can see the folder called `bringrss`.
|
||||
|
||||
Windows: `mklink /d fakepath realpath`
|
||||
for example `mklink /d "D:\pythonpath\bringrss" "D:\Git\BringRSS\bringrss"`
|
||||
|
||||
Linux: `ln --symbolic realpath fakepath`
|
||||
for example `ln --symbolic "~/Git/BringRSS/bringrss" "~/pythonpath/bringrss"`
|
||||
|
||||
4. Run `python -c "import bringrss; print(bringrss)"`. You should see the module print successfully.
|
||||
|
||||
## Running BringRSS CLI
|
||||
|
||||
BringRSS offers a commandline interface so you can use cronjobs to refresh your feeds. More commands may be added in the future.
|
||||
|
||||
1. `cd` to the folder where you'd like to create the BringRSS database.
|
||||
|
||||
2. Run `python frontends/bringrss_cli.py --help` to learn about the available commands.
|
||||
|
||||
3. Run `python frontends/bringrss_cli.py init` to create a database in the current directory.
|
||||
|
||||
Note: Do not `cd` into the frontends folder. Stay in the folder that contains your `_bringrss` database and specify the full path of the frontend launcher. For example:
|
||||
|
||||
Windows:
|
||||
D:\somewhere> python D:\Git\BringRSS\frontends\bringrss_cli.py
|
||||
|
||||
Linux:
|
||||
/somewhere $ python /Git/BringRSS/frontends/bringrss_cli.py
|
||||
|
||||
It is expected that you create a shortcut file or launch script so you don't have to type the whole filepath every time. For example, I have a `bringcli.lnk` on my PATH with `target=D:\Git\BringRSS\frontends\bringrss_cli.py`.
|
||||
|
||||
## Running BringRSS Flask locally
|
||||
|
||||
1. Run `python frontends/bringrss_flask/bringrss_flask_dev.py --help` to learn the available options.
|
||||
|
||||
2. Run `python frontends/bringrss_flask/bringrss_flask_dev.py [port]` to launch the flask server. If this is your first time running it, you can add `--init` to create a new database in the current directory. Port defaults to 27464 if not provided.
|
||||
|
||||
3. Open your web browser to `localhost:<port>`.
|
||||
|
||||
Note: Do not `cd` into the frontends folder. Stay in the folder that contains your `_bringrss` database and specify the full path of the frontend launcher. For example:
|
||||
|
||||
Windows:
|
||||
D:\somewhere> python D:\Git\BringRSS\frontends\bringrss_flask\bringrss_flask_dev.py 5001
|
||||
|
||||
Linux:
|
||||
/somewhere $ python /Git/BringRSS/frontends/bringrss_flask/bringrss_flask_dev.py 5001
|
||||
|
||||
Add `--help` to learn the arguments.
|
||||
|
||||
It is expected that you create a shortcut file or launch script so you don't have to type the whole filepath every time. For example, I have a `bringflask.lnk` on my PATH with `target=D:\Git\BringRSS\frontends\bringrss_flask\bringrss_flask_dev.py`.
|
||||
|
||||
## Running BringRSS Flask with Gunicorn
|
||||
|
||||
BringRSS provides no authentication whatsoever, so you probably shouldn't deploy it publicly unless your proxy server does authentication for you. However, I will tell you that for the purposes of running the demo site, I am using a script like this:
|
||||
|
||||
export BRINGRSS_DEMO_MODE=1
|
||||
~/cmd/python ~/cmd/gunicorn_py bringrss_flask_prod:site --bind "0.0.0.0:PORTNUMBER" --worker-class gevent --access-logfile "-" --access-logformat "%(h)s | %(t)s | %(r)s | %(s)s %(b)s"
|
||||
|
||||
## Running BringRSS REPL
|
||||
|
||||
The REPL is a great way to test a quick idea and learn the data model.
|
||||
|
||||
1. Use `bringrss_cli init` to create the database in the desired directory.
|
||||
|
||||
2. Run `python frontends/bringrss_repl.py` to launch the Python interpreter with the BringDB pre-loaded into a variable called `B`. Try things like `B.get_feed` or `B.get_newss`.
|
||||
|
||||
Note: Do not `cd` into the frontends folder. Stay in the folder that contains your `_bringrss` database and specify the full path of the frontend launcher. For example:
|
||||
|
||||
Windows:
|
||||
D:\somewhere> python D:\Git\BringRSS\frontends\bringrss_repl.py
|
||||
|
||||
Linux:
|
||||
/somewhere $ python /Git/BringRSS/frontends/bringrss_repl.py
|
||||
|
||||
It is expected that you create a shortcut file or launch script so you don't have to type the whole filepath every time. For example, I have a `bringrepl.lnk` on my PATH with `target=D:\Git\BringRSS\frontends\bringrss_repl.py`.
|
||||
|
||||
## Help wanted: javascript perf & layout thrashing
|
||||
|
||||
I think there is room for improvement in [root.html](https://github.com/voussoir/bringrss/blob/master/frontends/bringrss_flask/templates/root.html)'s javascript. When reading a feed with a few thousand news items, the UI starts to get slow at every interaction:
|
||||
|
||||
- After clicking on a news, it takes a few ms before it turns selected.
|
||||
- The newsreader takes a few ms to populate with the title even though it's pulled from the news's DOM, not the network.
|
||||
- After receiving the news list from the server, news are inserted into the dom in batches, and each batch causes the UI to stutter if you are also trying to scroll or click on things.
|
||||
|
||||
If you have any tips for improving the performance and responsiveness of the UI click handlers and reducing the amount of reflow / layout caused by the loading of news items or changing their class (selecting, reading, recycling), I would appreciate you getting in touch at contact@voussoir.net or opening an issue. Please don't open a pull request without talking to me first.
|
||||
|
||||
I am aware of virtual scrolling techniques where DOM rows don't actually exist until you scroll to where they would be, but this has the drawback of breaking ctrl+f and also it is hard to precompute the scroll height since news have variable length titles. I would prefer simple fixes like adding CSS rules that help the layout engine make better reflow decisions.
|
||||
|
||||
## To do list
|
||||
|
||||
- Maybe we could add a very basic password system to facilitate running an internet-connected instance. No user profiles, just a single password to access the whole system. I did this with [simpleserver](https://github.com/voussoir/else/blob/master/SimpleServer/simpleserver.py).
|
||||
- "Fill in the gaps" feature. Many websites have feeds that don't reach back all the way to their first post. When discovering a new blog or podcast, catching up on their prior work requires manual bookmarking outside of your newsreader. It would be nice to dump a list of article URLs into BringRSS and have it generate news objects as if they really came from the feed. Basic information like url, title, and fetched page text would be good enough; auto-detecting media as enclosures would be better. Other attributes don't need to be comprehensive. Then you could have everything in your newsreader.
|
||||
|
||||
## Mirrors
|
||||
|
||||
https://github.com/voussoir/bringrss
|
||||
|
||||
https://gitlab.com/voussoir/bringrss
|
||||
|
||||
https://codeberg.com/voussoir/bringrss
|
|
@ -0,0 +1,5 @@
|
|||
from . import bringdb
|
||||
from . import constants
|
||||
from . import exceptions
|
||||
from . import helpers
|
||||
from . import objects
|
|
@ -0,0 +1,722 @@
|
|||
import bs4
|
||||
import random
|
||||
import sqlite3
|
||||
import typing
|
||||
|
||||
from . import constants
|
||||
from . import exceptions
|
||||
from . import helpers
|
||||
from . import objects
|
||||
|
||||
from voussoirkit import cacheclass
|
||||
from voussoirkit import pathclass
|
||||
from voussoirkit import sentinel
|
||||
from voussoirkit import sqlhelpers
|
||||
from voussoirkit import vlogging
|
||||
from voussoirkit import worms
|
||||
|
||||
log = vlogging.get_logger(__name__)
|
||||
|
||||
RNG = random.SystemRandom()
|
||||
|
||||
####################################################################################################
|
||||
|
||||
class BDBFeedMixin:
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
||||
@worms.transaction
|
||||
def add_feed(
|
||||
self,
|
||||
*,
|
||||
autorefresh_interval=86400,
|
||||
description=None,
|
||||
icon=None,
|
||||
isolate_guids=False,
|
||||
parent=None,
|
||||
refresh_with_others=True,
|
||||
rss_url=None,
|
||||
title=None,
|
||||
web_url=None,
|
||||
ui_order_rank=None,
|
||||
):
|
||||
if parent is None:
|
||||
parent_id = None
|
||||
else:
|
||||
if not isinstance(parent, objects.Feed):
|
||||
raise TypeError(parent)
|
||||
parent.assert_not_deleted()
|
||||
parent_id = parent.id
|
||||
|
||||
autorefresh_interval = objects.Feed.normalize_autorefresh_interval(autorefresh_interval)
|
||||
refresh_with_others = objects.Feed.normalize_refresh_with_others(refresh_with_others)
|
||||
rss_url = objects.Feed.normalize_rss_url(rss_url)
|
||||
web_url = objects.Feed.normalize_web_url(web_url)
|
||||
title = objects.Feed.normalize_title(title)
|
||||
description = objects.Feed.normalize_description(description)
|
||||
icon = objects.Feed.normalize_icon(icon)
|
||||
isolate_guids = objects.Feed.normalize_isolate_guids(isolate_guids)
|
||||
if ui_order_rank is None:
|
||||
ui_order_rank = self.get_last_ui_order_rank() + 1
|
||||
else:
|
||||
ui_order_rank = objects.Feed.normalize_ui_order_rank(ui_order_rank)
|
||||
|
||||
data = {
|
||||
'id': self.generate_id(objects.Feed),
|
||||
'parent_id': parent_id,
|
||||
'rss_url': rss_url,
|
||||
'web_url': web_url,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'created': helpers.now(),
|
||||
'refresh_with_others': refresh_with_others,
|
||||
'last_refresh': 0,
|
||||
'last_refresh_attempt': 0,
|
||||
'last_refresh_error': None,
|
||||
'autorefresh_interval': autorefresh_interval,
|
||||
'http_headers': None,
|
||||
'isolate_guids': isolate_guids,
|
||||
'icon': icon,
|
||||
'ui_order_rank': ui_order_rank,
|
||||
}
|
||||
self.insert(table=objects.Feed, data=data)
|
||||
feed = self.get_cached_instance(objects.Feed, data)
|
||||
return feed
|
||||
|
||||
def get_bulk_unread_counts(self):
|
||||
'''
|
||||
Instead of calling feed.get_unread_count() on many separate feed objects
|
||||
and performing lots of duplicate work, you can call here and get them
|
||||
all at once with much less database access. I brought my /feeds.json
|
||||
down from 160ms to 6ms by using this.
|
||||
|
||||
Missing keys means 0 unread.
|
||||
'''
|
||||
# Even though we have api functions for all of this, I want to squeeze
|
||||
# out the perf. This function is meant to be used in situations where
|
||||
# speed matters more than code beauty.
|
||||
feeds = {feed.id: feed for feed in self.get_feeds()}
|
||||
childs = {}
|
||||
for feed in feeds.values():
|
||||
if feed.parent_id:
|
||||
childs.setdefault(feed.parent_id, []).append(feed)
|
||||
roots = [feed for feed in feeds.values() if not feed.parent_id]
|
||||
|
||||
query = '''
|
||||
SELECT feed_id, COUNT(rowid)
|
||||
FROM news
|
||||
WHERE recycled == 0 AND read == 0
|
||||
GROUP BY feed_id
|
||||
'''
|
||||
counts = {feeds[feed_id]: count for (feed_id, count) in self.select(query)}
|
||||
|
||||
def recursive_update(feed):
|
||||
counts.setdefault(feed, 0)
|
||||
children = childs.get(feed.id, None)
|
||||
if children:
|
||||
counts[feed] += sum(recursive_update(child) for child in children)
|
||||
pass
|
||||
return counts[feed]
|
||||
|
||||
for root in roots:
|
||||
recursive_update(root)
|
||||
|
||||
return counts
|
||||
|
||||
def get_feed(self, id) -> objects.Feed:
|
||||
return self.get_object_by_id(objects.Feed, id)
|
||||
|
||||
def get_feed_count(self) -> int:
|
||||
return self.select_one_value('SELECT COUNT(id) FROM feeds')
|
||||
|
||||
def get_feeds(self) -> typing.Iterable[objects.Feed]:
|
||||
query = 'SELECT * FROM feeds ORDER BY ui_order_rank ASC'
|
||||
return self.get_objects_by_sql(objects.Feed, query)
|
||||
|
||||
def get_feeds_by_id(self, ids) -> typing.Iterable[objects.Feed]:
|
||||
return self.get_objects_by_id(objects.Feed, ids)
|
||||
|
||||
def get_feeds_by_sql(self, query, bindings=None) -> typing.Iterable[objects.Feed]:
|
||||
return self.get_objects_by_sql(objects.Feed, query, bindings)
|
||||
|
||||
def get_last_ui_order_rank(self) -> int:
|
||||
query = 'SELECT ui_order_rank FROM feeds ORDER BY ui_order_rank DESC LIMIT 1'
|
||||
rank = self.select_one_value(query)
|
||||
if rank is None:
|
||||
return 0
|
||||
return rank
|
||||
|
||||
def get_root_feeds(self) -> typing.Iterable[objects.Feed]:
|
||||
query = 'SELECT * FROM feeds WHERE parent_id IS NULL ORDER BY ui_order_rank ASC'
|
||||
return self.get_objects_by_sql(objects.Feed, query)
|
||||
|
||||
@worms.transaction
|
||||
def reassign_ui_order_ranks(self):
|
||||
feeds = list(self.get_root_feeds())
|
||||
rank = 1
|
||||
for feed in feeds:
|
||||
for descendant in feed.walk_children():
|
||||
descendant.set_ui_order_rank(rank)
|
||||
rank += 1
|
||||
|
||||
####################################################################################################
|
||||
|
||||
class BDBFilterMixin:
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
||||
@worms.transaction
|
||||
def add_filter(self, name, conditions, actions):
|
||||
name = objects.Filter.normalize_name(name)
|
||||
conditions = objects.Filter.normalize_conditions(conditions)
|
||||
actions = objects.Filter.normalize_actions(actions)
|
||||
objects.Filter._jank_validate_move_to_feed(bringdb=self, actions=actions)
|
||||
|
||||
data = {
|
||||
'id': self.generate_id(objects.Filter),
|
||||
'name': name,
|
||||
'created': helpers.now(),
|
||||
'conditions': conditions,
|
||||
'actions': actions,
|
||||
}
|
||||
self.insert(table=objects.Filter, data=data)
|
||||
filt = self.get_cached_instance(objects.Filter, data)
|
||||
return filt
|
||||
|
||||
def get_filter(self, id) -> objects.Filter:
|
||||
return self.get_object_by_id(objects.Filter, id)
|
||||
|
||||
def get_filter_count(self) -> int:
|
||||
return self.select_one_value('SELECT COUNT(id) FROM filters')
|
||||
|
||||
def get_filters(self) -> typing.Iterable[objects.Filter]:
|
||||
return self.get_objects(objects.Filter)
|
||||
|
||||
def get_filters_by_id(self, ids) -> typing.Iterable[objects.Filter]:
|
||||
return self.get_objects_by_id(objects.Filter, ids)
|
||||
|
||||
def get_filters_by_sql(self, query, bindings=None) -> typing.Iterable[objects.Filter]:
|
||||
return self.get_objects_by_sql(objects.Filter, query, bindings)
|
||||
|
||||
@worms.transaction
|
||||
def process_news_through_filters(self, news):
|
||||
def prepare_filters(feed):
|
||||
filters = []
|
||||
for ancestor in feed.walk_parents(yield_self=True):
|
||||
filters.extend(ancestor.get_filters())
|
||||
return filters
|
||||
|
||||
feed = news.feed
|
||||
original_feed = feed
|
||||
filters = prepare_filters(feed)
|
||||
status = objects.Filter.THEN_CONTINUE_FILTERS
|
||||
too_many_switches = 20
|
||||
|
||||
while feed and filters and status is objects.Filter.THEN_CONTINUE_FILTERS:
|
||||
filt = filters.pop(0)
|
||||
status = filt.process_news(news)
|
||||
|
||||
switched_feed = news.feed
|
||||
if switched_feed == feed:
|
||||
continue
|
||||
|
||||
feed = switched_feed
|
||||
filters = prepare_filters(feed)
|
||||
|
||||
too_many_switches -= 1
|
||||
if too_many_switches > 0:
|
||||
continue
|
||||
raise Exception(f'{news} from {original_feed} got moved too many times. Something wrong?')
|
||||
|
||||
####################################################################################################
|
||||
|
||||
class BDBNewsMixin:
|
||||
DUPLICATE_BAIL = sentinel.Sentinel('duplicate bail')
|
||||
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
||||
@worms.transaction
|
||||
def add_news(
|
||||
self,
|
||||
*,
|
||||
authors,
|
||||
comments_url,
|
||||
enclosures,
|
||||
feed,
|
||||
published,
|
||||
rss_guid,
|
||||
text,
|
||||
title,
|
||||
updated,
|
||||
web_url,
|
||||
):
|
||||
if not isinstance(feed, objects.Feed):
|
||||
raise TypeError(feed)
|
||||
feed.assert_not_deleted()
|
||||
|
||||
rss_guid = objects.News.normalize_rss_guid(rss_guid)
|
||||
if feed.isolate_guids:
|
||||
rss_guid = f'_isolate_{feed.id}_{rss_guid}'
|
||||
|
||||
published = objects.News.normalize_published(published)
|
||||
updated = objects.News.normalize_updated(updated)
|
||||
title = objects.News.normalize_title(title)
|
||||
text = objects.News.normalize_text(text)
|
||||
web_url = objects.News.normalize_web_url(web_url)
|
||||
comments_url = objects.News.normalize_comments_url(comments_url)
|
||||
authors = objects.News.normalize_authors_json(authors)
|
||||
enclosures = objects.News.normalize_enclosures_json(enclosures)
|
||||
|
||||
data = {
|
||||
'id': self.generate_id(objects.News),
|
||||
'feed_id': feed.id,
|
||||
'original_feed_id': feed.id,
|
||||
'rss_guid': rss_guid,
|
||||
'published': published,
|
||||
'updated': updated,
|
||||
'title': title,
|
||||
'text': text,
|
||||
'web_url': web_url,
|
||||
'comments_url': comments_url,
|
||||
'created': helpers.now(),
|
||||
'read': False,
|
||||
'recycled': False,
|
||||
'authors': authors,
|
||||
'enclosures': enclosures,
|
||||
}
|
||||
self.insert(table=objects.News, data=data)
|
||||
news = self.get_cached_instance(objects.News, data)
|
||||
return news
|
||||
|
||||
def get_news(self, id) -> objects.News:
|
||||
return self.get_object_by_id(objects.News, id)
|
||||
|
||||
def get_news_count(self) -> int:
|
||||
return self.select_one_value('SELECT COUNT(id) FROM news')
|
||||
|
||||
def get_newss(
|
||||
self,
|
||||
*,
|
||||
read=False,
|
||||
recycled=False,
|
||||
feed=None,
|
||||
) -> typing.Iterable[objects.News]:
|
||||
|
||||
if feed is not None and not isinstance(feed, objects.Feed):
|
||||
feed = self.get_feed(feed)
|
||||
|
||||
wheres = []
|
||||
bindings = []
|
||||
|
||||
if feed:
|
||||
feed_ids = [descendant.id for descendant in feed.walk_children()]
|
||||
wheres.append(f'feed_id IN {sqlhelpers.listify(feed_ids)}')
|
||||
|
||||
if recycled is True:
|
||||
wheres.append('recycled == 1')
|
||||
elif recycled is False:
|
||||
wheres.append('recycled == 0')
|
||||
|
||||
if read is True:
|
||||
wheres.append('read == 1')
|
||||
elif read is False:
|
||||
wheres.append('read == 0')
|
||||
if wheres:
|
||||
wheres = ' AND '.join(wheres)
|
||||
wheres = ' WHERE ' + wheres
|
||||
else:
|
||||
wheres = ''
|
||||
query = 'SELECT * FROM news' + wheres + ' ORDER BY published DESC'
|
||||
|
||||
rows = self.select(query, bindings)
|
||||
for row in rows:
|
||||
yield self.get_cached_instance(objects.News, row)
|
||||
|
||||
def get_newss_by_id(self, ids) -> typing.Iterable[objects.News]:
|
||||
return self.get_objects_by_id(objects.News, ids)
|
||||
|
||||
def get_newss_by_sql(self, query, bindings=None) -> typing.Iterable[objects.News]:
|
||||
return self.get_objects_by_sql(objects.News, query, bindings)
|
||||
|
||||
def _get_duplicate_news(self, feed, guid):
|
||||
if feed.isolate_guids:
|
||||
guid = f'_isolate_{feed.id}_{guid}'
|
||||
|
||||
match = self.select_one('SELECT * FROM news WHERE rss_guid == ?', [guid])
|
||||
if match is None:
|
||||
return None
|
||||
|
||||
return self.get_cached_instance(objects.News, match)
|
||||
|
||||
def _ingest_one_news_atom(self, entry, feed):
|
||||
rss_guid = entry.id
|
||||
|
||||
web_url = helpers.pick_web_url_atom(entry)
|
||||
|
||||
updated = entry.updated
|
||||
if updated is not None:
|
||||
updated = updated.text
|
||||
updated = helpers.dateutil_parse(updated)
|
||||
updated = updated.timestamp()
|
||||
|
||||
published = entry.published
|
||||
if published is not None:
|
||||
published = published.text
|
||||
published = helpers.dateutil_parse(published)
|
||||
published = published.timestamp()
|
||||
elif updated is not None:
|
||||
published = updated
|
||||
|
||||
title = entry.find('title')
|
||||
if title:
|
||||
title = title.text.strip()
|
||||
|
||||
if rss_guid:
|
||||
rss_guid = rss_guid.text.strip()
|
||||
elif web_url:
|
||||
rss_guid = web_url
|
||||
elif title:
|
||||
rss_guid = title
|
||||
elif published:
|
||||
rss_guid = published
|
||||
|
||||
if not rss_guid:
|
||||
raise exceptions.NoGUID(entry)
|
||||
|
||||
duplicate = self._get_duplicate_news(feed=feed, guid=rss_guid)
|
||||
if duplicate:
|
||||
log.loud('Skipping duplicate feed=%s, guid=%s', feed.id, rss_guid)
|
||||
return BDBNewsMixin.DUPLICATE_BAIL
|
||||
|
||||
text = entry.find('content')
|
||||
if text:
|
||||
text = text.text.strip()
|
||||
|
||||
comments_url = None
|
||||
|
||||
raw_authors = entry.find_all('author')
|
||||
authors = []
|
||||
for raw_author in raw_authors:
|
||||
author = {
|
||||
'name': raw_author.find('name'),
|
||||
'email': raw_author.find('email'),
|
||||
'uri': raw_author.find('uri'),
|
||||
}
|
||||
author = {key:(value.text if value else None) for (key, value) in author.items()}
|
||||
authors.append(author)
|
||||
|
||||
raw_enclosures = entry.find_all('link', {'rel': 'enclosure'})
|
||||
enclosures = []
|
||||
for raw_enclosure in raw_enclosures:
|
||||
enclosure = {
|
||||
'type': raw_enclosure.get('type', None),
|
||||
'url': raw_enclosure.get('href', None),
|
||||
'size': raw_enclosure.get('length', None),
|
||||
}
|
||||
if enclosure.get('size') is not None:
|
||||
enclosure['size'] = int(enclosure['size'])
|
||||
|
||||
enclosures.append(enclosure)
|
||||
|
||||
news = self.add_news(
|
||||
authors=authors,
|
||||
comments_url=comments_url,
|
||||
enclosures=enclosures,
|
||||
feed=feed,
|
||||
published=published,
|
||||
rss_guid=rss_guid,
|
||||
text=text,
|
||||
title=title,
|
||||
updated=updated,
|
||||
web_url=web_url,
|
||||
)
|
||||
return news
|
||||
|
||||
def _ingest_one_news_rss(self, item, feed):
|
||||
rss_guid = item.find('guid')
|
||||
|
||||
title = item.find('title')
|
||||
if title:
|
||||
title = title.text.strip()
|
||||
|
||||
text = item.find('description')
|
||||
if text:
|
||||
text = text.text.strip()
|
||||
|
||||
web_url = item.find('link')
|
||||
if web_url:
|
||||
web_url = web_url.text.strip()
|
||||
elif rss_guid and rss_guid.get('isPermalink'):
|
||||
web_url = rss_guid.text
|
||||
|
||||
if web_url and '://' not in web_url:
|
||||
web_url = None
|
||||
|
||||
published = item.find('pubDate')
|
||||
if published:
|
||||
published = published.text
|
||||
published = helpers.dateutil_parse(published)
|
||||
published = published.timestamp()
|
||||
else:
|
||||
published = 0
|
||||
|
||||
if rss_guid:
|
||||
rss_guid = rss_guid.text.strip()
|
||||
elif web_url:
|
||||
rss_guid = web_url
|
||||
elif title:
|
||||
rss_guid = f'{feed.id}_{title}'
|
||||
elif published:
|
||||
rss_guid = f'{feed.id}_{published}'
|
||||
|
||||
if not rss_guid:
|
||||
raise exceptions.NoGUID(item)
|
||||
|
||||
duplicate = self._get_duplicate_news(feed=feed, guid=rss_guid)
|
||||
if duplicate:
|
||||
log.loud('Skipping duplicate news, feed=%s, guid=%s', feed.id, rss_guid)
|
||||
return BDBNewsMixin.DUPLICATE_BAIL
|
||||
|
||||
comments_url = item.find('comments')
|
||||
if comments_url is not None:
|
||||
comments_url = comments_url.text
|
||||
|
||||
raw_authors = item.find_all('author')
|
||||
authors = []
|
||||
for raw_author in raw_authors:
|
||||
author = raw_author.text.strip()
|
||||
if author:
|
||||
author = {
|
||||
'name': author,
|
||||
}
|
||||
authors.append(author)
|
||||
|
||||
raw_enclosures = item.find_all('enclosure')
|
||||
enclosures = []
|
||||
for raw_enclosure in raw_enclosures:
|
||||
enclosure = {
|
||||
'type': raw_enclosure.get('type', None),
|
||||
'url': raw_enclosure.get('url', None),
|
||||
'size': raw_enclosure.get('length', None),
|
||||
}
|
||||
|
||||
if enclosure.get('size') is not None:
|
||||
enclosure['size'] = int(enclosure['size'])
|
||||
|
||||
enclosures.append(enclosure)
|
||||
|
||||
news = self.add_news(
|
||||
authors=authors,
|
||||
comments_url=comments_url,
|
||||
enclosures=enclosures,
|
||||
feed=feed,
|
||||
published=published,
|
||||
rss_guid=rss_guid,
|
||||
text=text,
|
||||
title=title,
|
||||
updated=published,
|
||||
web_url=web_url,
|
||||
)
|
||||
return news
|
||||
|
||||
def _ingest_news_atom(self, soup, feed):
|
||||
atom_feed = soup.find('feed')
|
||||
|
||||
if not atom_feed:
|
||||
raise exceptions.BadXML('No feed element.')
|
||||
|
||||
for entry in atom_feed.find_all('entry'):
|
||||
news = self._ingest_one_news_atom(entry, feed)
|
||||
if news is not BDBNewsMixin.DUPLICATE_BAIL:
|
||||
yield news
|
||||
|
||||
def _ingest_news_rss(self, soup, feed):
|
||||
rss = soup.find('rss')
|
||||
|
||||
# This won't happen under normal circumstances since Feed.refresh would
|
||||
# have raised already. But including these checks here in case user
|
||||
# calls directly.
|
||||
if not rss:
|
||||
raise exceptions.BadXML('No rss element.')
|
||||
|
||||
channel = rss.find('channel')
|
||||
|
||||
if not channel:
|
||||
raise exceptions.BadXML('No channel element.')
|
||||
|
||||
for item in channel.find_all('item'):
|
||||
news = self._ingest_one_news_rss(item, feed)
|
||||
if news is not BDBNewsMixin.DUPLICATE_BAIL:
|
||||
yield news
|
||||
|
||||
@worms.transaction
|
||||
def ingest_news_xml(self, soup:bs4.BeautifulSoup, feed):
|
||||
if soup.rss:
|
||||
newss = self._ingest_news_rss(soup, feed)
|
||||
elif soup.feed:
|
||||
newss = self._ingest_news_atom(soup, feed)
|
||||
else:
|
||||
raise exceptions.NeitherAtomNorRSS(soup)
|
||||
|
||||
for news in newss:
|
||||
self.process_news_through_filters(news)
|
||||
|
||||
####################################################################################################
|
||||
|
||||
class BringDB(
|
||||
BDBFeedMixin,
|
||||
BDBFilterMixin,
|
||||
BDBNewsMixin,
|
||||
worms.DatabaseWithCaching,
|
||||
):
|
||||
def __init__(
|
||||
self,
|
||||
data_directory=None,
|
||||
*,
|
||||
create=False,
|
||||
skip_version_check=False,
|
||||
):
|
||||
'''
|
||||
data_directory:
|
||||
This directory will contain the sql file and anything else needed by
|
||||
the process. The directory is the database for all intents
|
||||
and purposes.
|
||||
|
||||
create:
|
||||
If True, the data_directory will be created if it does not exist.
|
||||
If False, we expect that data_directory and the sql file exist.
|
||||
|
||||
skip_version_check:
|
||||
Skip the version check so that you don't get DatabaseOutOfDate.
|
||||
Beware of modifying any data in this state.
|
||||
'''
|
||||
super().__init__()
|
||||
|
||||
# DATA DIR PREP
|
||||
if data_directory is not None:
|
||||
pass
|
||||
else:
|
||||
data_directory = pathclass.cwd().with_child(constants.DEFAULT_DATADIR)
|
||||
|
||||
if isinstance(data_directory, str):
|
||||
data_directory = helpers.remove_path_badchars(data_directory, allowed=':/\\')
|
||||
self.data_directory = pathclass.Path(data_directory)
|
||||
|
||||
if self.data_directory.exists and not self.data_directory.is_dir:
|
||||
raise exceptions.BadDataDirectory(self.data_directory.absolute_path)
|
||||
|
||||
# DATABASE / WORMS
|
||||
self._init_sql(create=create, skip_version_check=skip_version_check)
|
||||
|
||||
# WORMS
|
||||
self.id_type = int
|
||||
self._init_column_index()
|
||||
self._init_caches()
|
||||
|
||||
def _check_version(self):
|
||||
'''
|
||||
Compare database's user_version against constants.DATABASE_VERSION,
|
||||
raising exceptions.DatabaseOutOfDate if not correct.
|
||||
'''
|
||||
existing = self.execute('PRAGMA user_version').fetchone()[0]
|
||||
if existing != constants.DATABASE_VERSION:
|
||||
raise exceptions.DatabaseOutOfDate(
|
||||
existing=existing,
|
||||
new=constants.DATABASE_VERSION,
|
||||
filepath=self.data_directory,
|
||||
)
|
||||
|
||||
def _first_time_setup(self):
|
||||
log.info('Running first-time database setup.')
|
||||
self.executescript(constants.DB_INIT)
|
||||
self.commit()
|
||||
|
||||
def _init_caches(self):
|
||||
self.caches = {
|
||||
objects.Feed: cacheclass.Cache(maxlen=2000),
|
||||
objects.Filter: cacheclass.Cache(maxlen=1000),
|
||||
objects.News: cacheclass.Cache(maxlen=20000),
|
||||
}
|
||||
|
||||
def _init_column_index(self):
|
||||
self.COLUMNS = constants.SQL_COLUMNS
|
||||
self.COLUMN_INDEX = constants.SQL_INDEX
|
||||
|
||||
def _init_sql(self, create, skip_version_check):
|
||||
self.database_filepath = self.data_directory.with_child(constants.DEFAULT_DBNAME)
|
||||
existing_database = self.database_filepath.exists
|
||||
|
||||
if not existing_database and not create:
|
||||
msg = f'"{self.database_filepath.absolute_path}" does not exist and create is off.'
|
||||
raise FileNotFoundError(msg)
|
||||
|
||||
self.data_directory.makedirs(exist_ok=True)
|
||||
log.debug('Connecting to sqlite file "%s".', self.database_filepath.absolute_path)
|
||||
self.sql = sqlite3.connect(self.database_filepath.absolute_path)
|
||||
self.sql.row_factory = sqlite3.Row
|
||||
|
||||
if existing_database:
|
||||
if not skip_version_check:
|
||||
self._check_version()
|
||||
self._load_pragmas()
|
||||
else:
|
||||
self._first_time_setup()
|
||||
|
||||
def _load_pragmas(self):
|
||||
log.debug('Reloading pragmas.')
|
||||
self.executescript(constants.DB_PRAGMAS)
|
||||
self.commit()
|
||||
|
||||
@classmethod
|
||||
def closest_bringdb(cls, path='.', *args, **kwargs):
|
||||
'''
|
||||
Starting from the given path and climbing upwards towards the filesystem
|
||||
root, look for an existing BringRSS data directory and return the
|
||||
BringDB object. If none exists, raise exceptions.NoClosestBringDB.
|
||||
'''
|
||||
path = pathclass.Path(path)
|
||||
starting = path
|
||||
|
||||
while True:
|
||||
possible = path.with_child(constants.DEFAULT_DATADIR)
|
||||
if possible.is_dir:
|
||||
break
|
||||
parent = path.parent
|
||||
if path == parent:
|
||||
raise exceptions.NoClosestBringDB(starting.absolute_path)
|
||||
path = parent
|
||||
|
||||
path = possible
|
||||
log.debug('Found closest BringDB at "%s".', path.absolute_path)
|
||||
bringdb = cls(
|
||||
data_directory=path,
|
||||
create=False,
|
||||
*args,
|
||||
**kwargs,
|
||||
)
|
||||
return bringdb
|
||||
|
||||
def __del__(self):
|
||||
self.close()
|
||||
|
||||
def __repr__(self):
|
||||
return f'BringDB(data_directory={self.data_directory})'
|
||||
|
||||
def close(self) -> None:
|
||||
super().close()
|
||||
|
||||
def generate_id(self, thing_class) -> int:
|
||||
'''
|
||||
Create a new ID number that is unique to the given table.
|
||||
'''
|
||||
if not issubclass(thing_class, objects.ObjectBase):
|
||||
raise TypeError(thing_class)
|
||||
|
||||
table = thing_class.table
|
||||
|
||||
while True:
|
||||
id = RNG.getrandbits(32)
|
||||
exists = self.select_one(f'SELECT 1 FROM {table} WHERE id == ?', [id])
|
||||
if not exists:
|
||||
return id
|
|
@ -0,0 +1,349 @@
|
|||
import requests
|
||||
|
||||
from voussoirkit import sqlhelpers
|
||||
|
||||
DATABASE_VERSION = 1
|
||||
DB_VERSION_PRAGMA = f'''
|
||||
PRAGMA user_version = {DATABASE_VERSION};
|
||||
'''
|
||||
|
||||
DB_PRAGMAS = f'''
|
||||
-- 50 MB cache
|
||||
PRAGMA cache_size = -50000;
|
||||
PRAGMA foreign_keys = ON;
|
||||
'''
|
||||
|
||||
DB_INIT = f'''
|
||||
BEGIN;
|
||||
{DB_PRAGMAS}
|
||||
{DB_VERSION_PRAGMA}
|
||||
----------------------------------------------------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS feeds(
|
||||
id INT PRIMARY KEY NOT NULL,
|
||||
parent_id INT,
|
||||
rss_url TEXT,
|
||||
web_url TEXT,
|
||||
title TEXT,
|
||||
description TEXT,
|
||||
created INT,
|
||||
refresh_with_others INT NOT NULL,
|
||||
last_refresh INT NOT NULL,
|
||||
last_refresh_attempt INT NOT NULL,
|
||||
last_refresh_error TEXT,
|
||||
autorefresh_interval INT NOT NULL,
|
||||
http_headers TEXT,
|
||||
isolate_guids INT NOT NULL,
|
||||
icon BLOB,
|
||||
ui_order_rank INT
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS index_feeds_id on feeds(id);
|
||||
----------------------------------------------------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS filters(
|
||||
id INT PRIMARY KEY NOT NULL,
|
||||
name TEXT,
|
||||
created INT,
|
||||
conditions TEXT NOT NULL,
|
||||
actions TEXT NOT NULL
|
||||
);
|
||||
----------------------------------------------------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS news(
|
||||
id INT PRIMARY KEY NOT NULL,
|
||||
feed_id INT NOT NULL,
|
||||
original_feed_id INT NOT NULL,
|
||||
rss_guid TEXT NOT NULL,
|
||||
published INT,
|
||||
updated INT,
|
||||
title TEXT,
|
||||
text TEXT,
|
||||
web_url TEXT,
|
||||
comments_url TEXT,
|
||||
created INT,
|
||||
read INT NOT NULL,
|
||||
recycled INT NOT NULL,
|
||||
-- The authors and enclosures are stored as a JSON list of dicts. Normally I
|
||||
-- don't like to store JSON in my databases, but I'm really not interested
|
||||
-- in breaking this out in a many-to-many table to achieve proper normal
|
||||
-- form. The quantity of enclosures is probably going to be low enough, disk
|
||||
-- space is cheap enough, and for the time being we have no SQL queries
|
||||
-- against the enclosure fields to justify a perf difference.
|
||||
authors TEXT,
|
||||
enclosures TEXT,
|
||||
FOREIGN KEY(feed_id) REFERENCES feeds(id)
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS index_news_id on news(id);
|
||||
CREATE INDEX IF NOT EXISTS index_news_feed_id on news(feed_id);
|
||||
|
||||
-- Not used very often, but when you switch a feed's isolate_guids setting on
|
||||
-- and off, we need to rewrite the rss_guid for all news items from that feed,
|
||||
-- so having an index there really helps.
|
||||
CREATE INDEX IF NOT EXISTS index_news_original_feed_id on news(original_feed_id);
|
||||
|
||||
-- This will be the most commonly used search index. We search for news that is
|
||||
-- not read or recycled, ordered by published desc, and belongs to one of
|
||||
-- several feeds (feed or folder of feeds).
|
||||
CREATE INDEX IF NOT EXISTS index_news_recycled_read_published_feed_id on news(recycled, read, published, feed_id);
|
||||
|
||||
-- Less common but same idea. Finding read + unread news that's not recycled,
|
||||
-- published desc, from your feed or folder.
|
||||
CREATE INDEX IF NOT EXISTS index_news_recycled_published_feed_id on news(recycled, published, feed_id);
|
||||
|
||||
-- Used to figure out which incoming news is new and which already exist.
|
||||
CREATE INDEX IF NOT EXISTS index_news_guid on news(rss_guid);
|
||||
----------------------------------------------------------------------------------------------------
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS feed_filter_rel(
|
||||
feed_id INT NOT NULL,
|
||||
filter_id INT NOT NULL,
|
||||
order_rank INT NOT NULL,
|
||||
FOREIGN KEY(feed_id) REFERENCES feeds(id),
|
||||
FOREIGN KEY(filter_id) REFERENCES filters(id),
|
||||
PRIMARY KEY(feed_id, filter_id)
|
||||
);
|
||||
----------------------------------------------------------------------------------------------------
|
||||
COMMIT;
|
||||
'''
|
||||
SQL_COLUMNS = sqlhelpers.extract_table_column_map(DB_INIT)
|
||||
SQL_INDEX = sqlhelpers.reverse_table_column_map(SQL_COLUMNS)
|
||||
|
||||
DEFAULT_DATADIR = '_bringrss'
|
||||
DEFAULT_DBNAME = 'bringrss.db'
|
||||
|
||||
# Normally I don't even put version numbers on my projects, but since we're
|
||||
# making requests to third parties its fair for them to know in case our HTTP
|
||||
# behavior changes.
|
||||
VERSION = '0.0.1'
|
||||
http_session = requests.Session()
|
||||
http_session.headers['User-Agent'] = f'BringRSS v{VERSION} github.com/voussoir/bringrss'
|
||||
|
||||
# Thank you h-j-13
|
||||
# https://stackoverflow.com/a/54629675/5430534
|
||||
DATEUTIL_TZINFOS = {
|
||||
'A': 1 * 3600,
|
||||
'ACDT': 10.5 * 3600,
|
||||
'ACST': 9.5 * 3600,
|
||||
'ACT': -5 * 3600,
|
||||
'ACWST': 8.75 * 3600,
|
||||
'ADT': 4 * 3600,
|
||||
'AEDT': 11 * 3600,
|
||||
'AEST': 10 * 3600,
|
||||
'AET': 10 * 3600,
|
||||
'AFT': 4.5 * 3600,
|
||||
'AKDT': -8 * 3600,
|
||||
'AKST': -9 * 3600,
|
||||
'ALMT': 6 * 3600,
|
||||
'AMST': -3 * 3600,
|
||||
'AMT': -4 * 3600,
|
||||
'ANAST': 12 * 3600,
|
||||
'ANAT': 12 * 3600,
|
||||
'AQTT': 5 * 3600,
|
||||
'ART': -3 * 3600,
|
||||
'AST': 3 * 3600,
|
||||
'AT': -4 * 3600,
|
||||
'AWDT': 9 * 3600,
|
||||
'AWST': 8 * 3600,
|
||||
'AZOST': 0 * 3600,
|
||||
'AZOT': -1 * 3600,
|
||||
'AZST': 5 * 3600,
|
||||
'AZT': 4 * 3600,
|
||||
'AoE': -12 * 3600,
|
||||
'B': 2 * 3600,
|
||||
'BNT': 8 * 3600,
|
||||
'BOT': -4 * 3600,
|
||||
'BRST': -2 * 3600,
|
||||
'BRT': -3 * 3600,
|
||||
'BST': 6 * 3600,
|
||||
'BTT': 6 * 3600,
|
||||
'C': 3 * 3600,
|
||||
'CAST': 8 * 3600,
|
||||
'CAT': 2 * 3600,
|
||||
'CCT': 6.5 * 3600,
|
||||
'CDT': -5 * 3600,
|
||||
'CEST': 2 * 3600,
|
||||
'CET': 1 * 3600,
|
||||
'CHADT': 13.75 * 3600,
|
||||
'CHAST': 12.75 * 3600,
|
||||
'CHOST': 9 * 3600,
|
||||
'CHOT': 8 * 3600,
|
||||
'CHUT': 10 * 3600,
|
||||
'CIDST': -4 * 3600,
|
||||
'CIST': -5 * 3600,
|
||||
'CKT': -10 * 3600,
|
||||
'CLST': -3 * 3600,
|
||||
'CLT': -4 * 3600,
|
||||
'COT': -5 * 3600,
|
||||
'CST': -6 * 3600,
|
||||
'CT': -6 * 3600,
|
||||
'CVT': -1 * 3600,
|
||||
'CXT': 7 * 3600,
|
||||
'ChST': 10 * 3600,
|
||||
'D': 4 * 3600,
|
||||
'DAVT': 7 * 3600,
|
||||
'DDUT': 10 * 3600,
|
||||
'E': 5 * 3600,
|
||||
'EASST': -5 * 3600,
|
||||
'EAST': -6 * 3600,
|
||||
'EAT': 3 * 3600,
|
||||
'ECT': -5 * 3600,
|
||||
'EDT': -4 * 3600,
|
||||
'EEST': 3 * 3600,
|
||||
'EET': 2 * 3600,
|
||||
'EGST': 0 * 3600,
|
||||
'EGT': -1 * 3600,
|
||||
'EST': -5 * 3600,
|
||||
'ET': -5 * 3600,
|
||||
'F': 6 * 3600,
|
||||
'FET': 3 * 3600,
|
||||
'FJST': 13 * 3600,
|
||||
'FJT': 12 * 3600,
|
||||
'FKST': -3 * 3600,
|
||||
'FKT': -4 * 3600,
|
||||
'FNT': -2 * 3600,
|
||||
'G': 7 * 3600,
|
||||
'GALT': -6 * 3600,
|
||||
'GAMT': -9 * 3600,
|
||||
'GET': 4 * 3600,
|
||||
'GFT': -3 * 3600,
|
||||
'GILT': 12 * 3600,
|
||||
'GMT': 0 * 3600,
|
||||
'GST': 4 * 3600,
|
||||
'GYT': -4 * 3600,
|
||||
'H': 8 * 3600,
|
||||
'HDT': -9 * 3600,
|
||||
'HKT': 8 * 3600,
|
||||
'HOVST': 8 * 3600,
|
||||
'HOVT': 7 * 3600,
|
||||
'HST': -10 * 3600,
|
||||
'I': 9 * 3600,
|
||||
'ICT': 7 * 3600,
|
||||
'IDT': 3 * 3600,
|
||||
'IOT': 6 * 3600,
|
||||
'IRDT': 4.5 * 3600,
|
||||
'IRKST': 9 * 3600,
|
||||
'IRKT': 8 * 3600,
|
||||
'IRST': 3.5 * 3600,
|
||||
'IST': 5.5 * 3600,
|
||||
'JST': 9 * 3600,
|
||||
'K': 10 * 3600,
|
||||
'KGT': 6 * 3600,
|
||||
'KOST': 11 * 3600,
|
||||
'KRAST': 8 * 3600,
|
||||
'KRAT': 7 * 3600,
|
||||
'KST': 9 * 3600,
|
||||
'KUYT': 4 * 3600,
|
||||
'L': 11 * 3600,
|
||||
'LHDT': 11 * 3600,
|
||||
'LHST': 10.5 * 3600,
|
||||
'LINT': 14 * 3600,
|
||||
'M': 12 * 3600,
|
||||
'MAGST': 12 * 3600,
|
||||
'MAGT': 11 * 3600,
|
||||
'MART': 9.5 * 3600,
|
||||
'MAWT': 5 * 3600,
|
||||
'MDT': -6 * 3600,
|
||||
'MHT': 12 * 3600,
|
||||
'MMT': 6.5 * 3600,
|
||||
'MSD': 4 * 3600,
|
||||
'MSK': 3 * 3600,
|
||||
'MST': -7 * 3600,
|
||||
'MT': -7 * 3600,
|
||||
'MUT': 4 * 3600,
|
||||
'MVT': 5 * 3600,
|
||||
'MYT': 8 * 3600,
|
||||
'N': -1 * 3600,
|
||||
'NCT': 11 * 3600,
|
||||
'NDT': 2.5 * 3600,
|
||||
'NFT': 11 * 3600,
|
||||
'NOVST': 7 * 3600,
|
||||
'NOVT': 7 * 3600,
|
||||
'NPT': 5.5 * 3600,
|
||||
'NRT': 12 * 3600,
|
||||
'NST': 3.5 * 3600,
|
||||
'NUT': -11 * 3600,
|
||||
'NZDT': 13 * 3600,
|
||||
'NZST': 12 * 3600,
|
||||
'O': -2 * 3600,
|
||||
'OMSST': 7 * 3600,
|
||||
'OMST': 6 * 3600,
|
||||
'ORAT': 5 * 3600,
|
||||
'P': -3 * 3600,
|
||||
'PDT': -7 * 3600,
|
||||
'PET': -5 * 3600,
|
||||
'PETST': 12 * 3600,
|
||||
'PETT': 12 * 3600,
|
||||
'PGT': 10 * 3600,
|
||||
'PHOT': 13 * 3600,
|
||||
'PHT': 8 * 3600,
|
||||
'PKT': 5 * 3600,
|
||||
'PMDT': -2 * 3600,
|
||||
'PMST': -3 * 3600,
|
||||
'PONT': 11 * 3600,
|
||||
'PST': -8 * 3600,
|
||||
'PT': -8 * 3600,
|
||||
'PWT': 9 * 3600,
|
||||
'PYST': -3 * 3600,
|
||||
'PYT': -4 * 3600,
|
||||
'Q': -4 * 3600,
|
||||
'QYZT': 6 * 3600,
|
||||
'R': -5 * 3600,
|
||||
'RET': 4 * 3600,
|
||||
'ROTT': -3 * 3600,
|
||||
'S': -6 * 3600,
|
||||
'SAKT': 11 * 3600,
|
||||
'SAMT': 4 * 3600,
|
||||
'SAST': 2 * 3600,
|
||||
'SBT': 11 * 3600,
|
||||
'SCT': 4 * 3600,
|
||||
'SGT': 8 * 3600,
|
||||
'SRET': 11 * 3600,
|
||||
'SRT': -3 * 3600,
|
||||
'SST': -11 * 3600,
|
||||
'SYOT': 3 * 3600,
|
||||
'T': -7 * 3600,
|
||||
'TAHT': -10 * 3600,
|
||||
'TFT': 5 * 3600,
|
||||
'TJT': 5 * 3600,
|
||||
'TKT': 13 * 3600,
|
||||
'TLT': 9 * 3600,
|
||||
'TMT': 5 * 3600,
|
||||
'TOST': 14 * 3600,
|
||||
'TOT': 13 * 3600,
|
||||
'TRT': 3 * 3600,
|
||||
'TVT': 12 * 3600,
|
||||
'U': -8 * 3600,
|
||||
'ULAST': 9 * 3600,
|
||||
'ULAT': 8 * 3600,
|
||||
'UTC': 0 * 3600,
|
||||
'UYST': -2 * 3600,
|
||||
'UYT': -3 * 3600,
|
||||
'UZT': 5 * 3600,
|
||||
'V': -9 * 3600,
|
||||
'VET': -4 * 3600,
|
||||
'VLAST': 11 * 3600,
|
||||
'VLAT': 10 * 3600,
|
||||
'VOST': 6 * 3600,
|
||||
'VUT': 11 * 3600,
|
||||
'W': -10 * 3600,
|
||||
'WAKT': 12 * 3600,
|
||||
'WARST': -3 * 3600,
|
||||
'WAST': 2 * 3600,
|
||||
'WAT': 1 * 3600,
|
||||
'WEST': 1 * 3600,
|
||||
'WET': 0 * 3600,
|
||||
'WFT': 12 * 3600,
|
||||
'WGST': -2 * 3600,
|
||||
'WGT': -3 * 3600,
|
||||
'WIB': 7 * 3600,
|
||||
'WIT': 9 * 3600,
|
||||
'WITA': 8 * 3600,
|
||||
'WST': 14 * 3600,
|
||||
'WT': 0 * 3600,
|
||||
'X': -11 * 3600,
|
||||
'Y': -12 * 3600,
|
||||
'YAKST': 10 * 3600,
|
||||
'YAKT': 9 * 3600,
|
||||
'YAPT': 10 * 3600,
|
||||
'YEKST': 6 * 3600,
|
||||
'YEKT': 5 * 3600,
|
||||
'Z': 0 * 3600,
|
||||
}
|
|
@ -0,0 +1,148 @@
|
|||
from voussoirkit import stringtools
|
||||
|
||||
class ErrorTypeAdder(type):
|
||||
'''
|
||||
During definition, the Exception class will automatically receive a class
|
||||
attribute called `error_type` which is just the class's name as a string
|
||||
in the loudsnake casing style. NoSuchFeed -> NO_SUCH_FEED.
|
||||
|
||||
This is used for serialization of the exception object and should
|
||||
basically act as a status code when displaying the error to the user.
|
||||
|
||||
Thanks Unutbu
|
||||
http://stackoverflow.com/a/18126678
|
||||
'''
|
||||
def __init__(cls, name, bases, clsdict):
|
||||
type.__init__(cls, name, bases, clsdict)
|
||||
cls.error_type = stringtools.pascal_to_loudsnakes(name)
|
||||
|
||||
class BringException(Exception, metaclass=ErrorTypeAdder):
|
||||
'''
|
||||
Base type for all of the BringRSS exceptions.
|
||||
Subtypes should have a class attribute `error_message`. The error message
|
||||
may contain {format} strings which will be formatted using the
|
||||
Exception's constructor arguments.
|
||||
'''
|
||||
error_message = ''
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__()
|
||||
self.given_args = args
|
||||
self.given_kwargs = kwargs
|
||||
self.error_message = self.error_message.format(*args, **kwargs)
|
||||
self.args = (self.error_message, args, kwargs)
|
||||
|
||||
def __str__(self):
|
||||
return f'{self.error_type}: {self.error_message}'
|
||||
|
||||
def jsonify(self):
|
||||
j = {
|
||||
'type': 'error',
|
||||
'error_type': self.error_type,
|
||||
'error_message': self.error_message,
|
||||
}
|
||||
return j
|
||||
|
||||
# NO SUCH ##########################################################################################
|
||||
|
||||
class NoSuch(BringException):
|
||||
pass
|
||||
|
||||
class NoSuchFeed(NoSuch):
|
||||
error_message = 'Feed "{}" does not exist.'
|
||||
|
||||
class NoSuchFilter(NoSuch):
|
||||
error_message = 'Filter "{}" does not exist.'
|
||||
|
||||
class NoSuchNews(NoSuch):
|
||||
error_message = 'News "{}" does not exist.'
|
||||
|
||||
# XML PARSING ERRORS ###############################################################################
|
||||
|
||||
class BadXML(BringException):
|
||||
error_message = '{}'
|
||||
|
||||
class NeitherAtomNorRSS(BadXML):
|
||||
error_message = '{}'
|
||||
|
||||
class NoGUID(BadXML):
|
||||
error_message = '{}'
|
||||
|
||||
# FEED ERRORS ######################################################################################
|
||||
|
||||
class HTTPError(BringException):
|
||||
error_message = '{}'
|
||||
|
||||
class InvalidHTTPHeaders(BringException):
|
||||
error_message = '{}'
|
||||
|
||||
# FILTER ERRORS ####################################################################################
|
||||
|
||||
class FeedStillInUse(BringException):
|
||||
error_message = 'Cannot delete {feed} because it is used by {filters}.'
|
||||
|
||||
class FilterStillInUse(BringException):
|
||||
error_message = 'Cannot delete {filter} because it is used by feeds {feeds}.'
|
||||
|
||||
class InvalidFilter(BringException):
|
||||
error_message = '{}'
|
||||
|
||||
class InvalidFilterAction(InvalidFilter):
|
||||
error_message = '{}'
|
||||
|
||||
class InvalidFilterCondition(InvalidFilter):
|
||||
error_message = '{}'
|
||||
|
||||
# GENERAL ERRORS ###################################################################################
|
||||
|
||||
class BadDataDirectory(BringException):
|
||||
'''
|
||||
Raised by BringDB __init__ if the requested data_directory is invalid.
|
||||
'''
|
||||
error_message = 'Bad data directory "{}"'
|
||||
|
||||
OUTOFDATE = '''
|
||||
Database is out of date. {existing} should be {new}.
|
||||
Please run utilities\\database_upgrader.py "{filepath.absolute_path}"
|
||||
'''.strip()
|
||||
class DatabaseOutOfDate(BringException):
|
||||
'''
|
||||
Raised by BringDB __init__ if the user's database is behind.
|
||||
'''
|
||||
error_message = OUTOFDATE
|
||||
|
||||
class NoClosestBringDB(BringException):
|
||||
'''
|
||||
For calls to BringDB.closest_photodb where none exists between cwd and
|
||||
drive root.
|
||||
'''
|
||||
error_message = 'There is no BringDB in "{}" or its parents.'
|
||||
|
||||
class NotExclusive(BringException):
|
||||
'''
|
||||
For when two or more mutually exclusive actions have been requested.
|
||||
'''
|
||||
error_message = 'One and only one of {} must be passed.'
|
||||
|
||||
class OrderByBadColumn(BringException):
|
||||
'''
|
||||
For when the user tries to orderby a column that does not exist or is
|
||||
not allowed.
|
||||
'''
|
||||
error_message = '"{column}" is not a sortable column.'
|
||||
|
||||
class OrderByBadDirection(BringException):
|
||||
'''
|
||||
For when the user tries to orderby a direction that is not asc or desc.
|
||||
'''
|
||||
error_message = 'You can\'t order "{column}" by "{direction}". Should be asc or desc.'
|
||||
|
||||
class OrderByInvalid(BringException):
|
||||
'''
|
||||
For when the orderby request cannot be parsed into column and direction.
|
||||
For example, it contains too many hyphens like a-b-c.
|
||||
|
||||
If the column and direction can be parsed but are invalid, use
|
||||
OrderByBadColumn or OrderByBadDirection
|
||||
'''
|
||||
error_message = 'Invalid orderby request "{request}".'
|
|
@ -0,0 +1,163 @@
|
|||
import bs4
|
||||
import datetime
|
||||
import dateutil.parser
|
||||
import importlib
|
||||
import sys
|
||||
|
||||
from . import constants
|
||||
|
||||
from voussoirkit import cacheclass
|
||||
from voussoirkit import httperrors
|
||||
from voussoirkit import pathclass
|
||||
from voussoirkit import vlogging
|
||||
|
||||
log = vlogging.get_logger(__name__)
|
||||
|
||||
_xml_etag_cache = cacheclass.Cache(maxlen=100)
|
||||
|
||||
def dateutil_parse(string):
|
||||
return dateutil.parser.parse(string, tzinfos=constants.DATEUTIL_TZINFOS)
|
||||
|
||||
def fetch_xml(url, headers={}) -> bs4.BeautifulSoup:
|
||||
log.debug('Fetching %s.', url)
|
||||
response = constants.http_session.get(url, headers=headers)
|
||||
httperrors.raise_for_status(response)
|
||||
soup = bs4.BeautifulSoup(response.text, 'xml')
|
||||
return soup
|
||||
|
||||
def fetch_xml_cached(url, headers={}) -> bs4.BeautifulSoup:
|
||||
'''
|
||||
Fetch the RSS / Atom feed, using a local cache to take advantage of HTTP304
|
||||
responses.
|
||||
'''
|
||||
cached = _xml_etag_cache.get(url)
|
||||