diff --git a/README.md b/README.md index bdb30c5..45c788a 100644 --- a/README.md +++ b/README.md @@ -26,38 +26,38 @@ Timesearch is a collection of utilities for archiving subreddits. ### This package consists of: - **get_submissions**: If you try to page through `/new` on a subreddit, you'll hit a limit at or before 1,000 posts. Timesearch uses the pushshift.io dataset to get information about very old posts, and then queries the reddit api to update their information. Previously, we used the `timestamp` cloudsearch query parameter on reddit's own API, but reddit has removed that feature and pushshift is now the only viable source for initial data. - `> timesearch.py get_submissions -r subredditname ` - `> timesearch.py get_submissions -u username ` + `python timesearch.py get_submissions -r subredditname ` + `python timesearch.py get_submissions -u username ` - **get_comments**: Similar to `get_submissions`, this tool queries pushshift for comment data and updates it from reddit. - `> timesearch.py get_comments -r subredditname ` - `> timesearch.py get_comments -u username ` + `python timesearch.py get_comments -r subredditname ` + `python timesearch.py get_comments -u username ` - **livestream**: get_submissions+get_comments is great for starting your database and getting the historical posts, but it's not the best for staying up-to-date. Instead, livestream monitors `/new` and `/comments` to continuously ingest data. - `> timesearch.py livestream -r subredditname ` - `> timesearch.py livestream -u username ` + `python timesearch.py livestream -r subredditname ` + `python timesearch.py livestream -u username ` - **get_styles**: Downloads the stylesheet and CSS images. - `> timesearch.py get_styles -r subredditname` + `python timesearch.py get_styles -r subredditname` - **get_wiki**: Downloads the wiki pages, sidebar, etc. from /wiki/pages. - `> timesearch.py get_wiki -r subredditname` + `python timesearch.py get_wiki -r subredditname` - **offline_reading**: Renders comment threads into HTML via markdown. Note: I'm currently using the [markdown library from pypi](https://pypi.python.org/pypi/Markdown), and it doesn't do reddit's custom markdown like `/r/` or `/u/`, obviously. So far I don't think anybody really uses o_r so I haven't invested much time into improving it. - `> timesearch.py offline_reading -r subredditname ` - `> timesearch.py offline_reading -u username ` + `python timesearch.py offline_reading -r subredditname ` + `python timesearch.py offline_reading -u username ` - **index**: Generates plaintext or HTML lists of submissions, sorted by a property of your choosing. You can order by date, author, flair, etc. With the `--offline` parameter, you can make all the links point to the files you generated with `offline_reading`. - `> timesearch.py index -r subredditname ` - `> timesearch.py index -u username ` + `python timesearch.py index -r subredditname ` + `python timesearch.py index -u username ` - **breakdown**: Produces a JSON file indicating which users make the most posts in a subreddit, or which subreddits a user posts in. - `> timesearch.py breakdown -r subredditname` - `> timesearch.py breakdown -u username` + `python timesearch.py breakdown -r subredditname` + `python timesearch.py breakdown -u username` - **merge_db**: Copy all new data from one timesearch database into another. Useful for syncing or merging two scans of the same subreddit. - `> timesearch.py merge_db --from filepath/database1.db --to filepath/database2.db` + `python timesearch.py merge_db --from filepath/database1.db --to filepath/database2.db` ### To use it diff --git a/timesearch_modules/__init__.py b/timesearch_modules/__init__.py index a7be833..de13b96 100644 --- a/timesearch_modules/__init__.py +++ b/timesearch_modules/__init__.py @@ -16,13 +16,13 @@ The subreddit archiver The basics: 1. Collect a subreddit's submissions - > timesearch.py get_submissions -r subredditname + python timesearch.py get_submissions -r subredditname 2. Collect the comments for those submissions - > timesearch.py get_comments -r subredditname + python timesearch.py get_comments -r subredditname 3. Stay up-to-date - > timesearch.py livestream -r subredditname + python timesearch.py livestream -r subredditname Commands for collecting: @@ -47,7 +47,7 @@ Commands for processing: {offline_reading} TO SEE DETAILS ON EACH COMMAND, RUN -> timesearch.py +python timesearch.py '''.lstrip() MODULE_DOCSTRINGS = dict( @@ -59,8 +59,8 @@ breakdown: Automatically dumps into a _breakdown.json file in the same directory as the database. - > timesearch.py breakdown -r subredditname - > timesearch.py breakdown -u username + python timesearch.py breakdown -r subredditname + python timesearch.py breakdown -u username flags: -r "test" | --subreddit "test": @@ -77,8 +77,8 @@ get_comments=''' get_comments: Collect comments on a subreddit or comments made by a user. - > timesearch.py get_comments -r subredditname - > timesearch.py get_comments -u username + python timesearch.py get_comments -r subredditname + python timesearch.py get_comments -u username flags: -s "t3_xxxxxx" | --specific "t3_xxxxxx": @@ -110,7 +110,7 @@ get_styles=''' get_styles: Collect the stylesheet, and css images. - > timesearch.py get_styles -r subredditname + python timesearch.py get_styles -r subredditname '''.strip(), get_submissions=''' @@ -118,8 +118,8 @@ get_submissions: Collect submissions from the subreddit across all of history, or Collect submissions by a user (as many as possible). - > timesearch.py get_submissions -r subredditname - > timesearch.py get_submissions -u username + python timesearch.py get_submissions -r subredditname + python timesearch.py get_submissions -u username -r "test" | --subreddit "test": The subreddit to scan. Mutually exclusive with username. @@ -149,15 +149,15 @@ get_wiki=''' get_wiki: Collect all available wiki pages. - > timesearch.py get_wiki -r subredditname + python timesearch.py get_wiki -r subredditname '''.strip(), index=''' index: Dump submission listings to a plaintext or HTML file. - > timesearch.py index -r subredditname - > timesearch.py index -u username + python timesearch.py index -r subredditname + python timesearch.py index -u username flags: -r "test" | --subreddit "test": @@ -220,8 +220,8 @@ livestream=''' livestream: Continously collect submissions and/or comments. - > timesearch.py livestream -r subredditname - > timesearch.py livestream -u username + python timesearch.py livestream -r subredditname + python timesearch.py livestream -u username flags: -r "test" | --subreddit "test": @@ -253,7 +253,7 @@ merge_db=''' merge_db: Copy all new posts from one timesearch database into another. - > timesearch merge_db --from redditdev1.db --to redditdev2.db + python timesearch.py merge_db --from redditdev1.db --to redditdev2.db flags: --from: @@ -269,8 +269,8 @@ offline_reading=''' offline_reading: Render submissions and comment threads to HTML via Markdown. - > timesearch.py offline_reading -r subredditname - > timesearch.py offline_reading -u username + python timesearch.py offline_reading -r subredditname + python timesearch.py offline_reading -u username flags: -s "t3_xxxxxx" | --specific "t3_xxxxxx":