Merge pull request 'main' (#1) from rimu/pyfedi:main into main

Reviewed-on: https://codeberg.org/hono4kami/pyfedi/pulls/1
This commit is contained in:
hono4kami 2024-12-17 10:37:56 +00:00
commit e97ac5f6bb
13 changed files with 254 additions and 188 deletions

View file

@ -4,7 +4,7 @@ FROM --platform=$BUILDPLATFORM python:3-alpine AS builder
RUN apk update
RUN apk add pkgconfig
RUN apk add --virtual build-deps gcc python3-dev musl-dev tesseract-ocr tesseract-ocr-data-eng ffmpeg
RUN apk add --virtual build-deps gcc python3-dev musl-dev tesseract-ocr tesseract-ocr-data-eng
WORKDIR /app
COPY . /app

View file

@ -1,7 +1,8 @@
# Contents
* [Setup Database](#setup-database)
* [Install Python Libraries](#install-python-libraries)
* [Choose your path - easy way or hard way](#choose-path)
* [Setup Database](#setup-database)
* [Install Python Libraries](#install-python-libraries)
* [Install additional requirements](#install-additional-requirements)
* [Setup pyfedi](#setup-pyfedi)
* [Setup .env file](#setup-env-file)
@ -14,6 +15,110 @@
* [Notes for Windows (WSL2)](#notes-for-windows-wsl2)
* [Notes for Pip Package Management](#notes-for-pip-package-management)
<div id="choose-path"></div>
## Do you want this the easy way or the hard way?
### Easy way: docker
Docker can be used to create an isolated environment that is separate from the host server and starts from a consistent
configuration. While it is quicker and easier, it's not to everyone's taste.
* Clone PieFed into a new directory
```bash
git clone https://codeberg.org/rimu/pyfedi.git
```
* Copy suggested docker config
```bash
cd pyfedi
cp env.docker.sample .env.docker
```
* Edit docker environment file
Open .env.docker in your text editor, set SECRET_KEY to something random and set SERVER_NAME to your domain name,
WITHOUT the https:// at the front. The database login details doesn't really need to be changed because postgres will be
locked away inside it's own docker network that only PieFed can access but if you want to change POSTGRES_PASSWORD go ahead
just be sure to update DATABASE_URL accordingly.
Check out compose.yaml and see if it is to your liking. Note the port (8030) and volume definitions - they might need to be
tweaked.
* First startup
This will take a few minutes.
```bash
export DOCKER_BUILDKIT=1
docker-compose up --build
```
After a while the gibberish will stop scrolling past. If you see errors let us know at [https://piefed.social/c/piefed_help](https://piefed.social/c/piefed_help).
* Networking
You need to somehow to allow client connections from outside to access port 8030 on your server. The details of this is outside the scope
of this article. You could use a nginx reverse proxy, a cloudflare zero trust tunnel, tailscale, whatever. Just make sure it has SSL on
it as PieFed assumes you're making requests that start with https://your-domain.
Once you have the networking set up, go to https://your-domain in your browser and see if the docker output in your terminal
shows signs of reacting to the request. There will be an error showing up in the console because we haven't done the next step yet.
* Database initialization
This must be done once and once only. Doing this will wipe all existing data in your instance so do not do it unless you have a
brand new instance.
Open a shell inside the PieFed docker container:
`docker exec -it piefed_app1 sh`
Inside the container, run the initialization command:
```
export FLASK_APP=pyfedi.py
flask init-db
```
Among other things this process will get you set up with a username and password. Don't use 'admin' as the user name, script kiddies love that one.
* The moment of truth
Go to https://your-domain in your web browser and PieFed should appear. Log in with the username and password from the previous step.
At this point docker is pretty much Ok so you don't need to see the terminal output as readily. Hit Ctrl + C to close down docker and then run
```bash
docker-compose up -d
```
to have PieFed run in the background.
* But wait there's more
Until you set the right environment variables, PieFed won't be able to send email. Check out env.sample for some hints.
When you have a new value to set, add it to .env.docker and then restart docker with:
```
docker-compose down && docker-compose up -d
```
There are also regular cron jobs that need to be run. Set up cron on the host to run those scripts inside the container - see the Cron
section of this document for details.
You probably want a Captcha on the registration form - more environment variables.
CDN, CloudFlare. More environment variables.
All this is explained in the bare metal guide, below.
### Hard way: bare metal
Read on
<div id="setup-database"></div>
## Setup Database
@ -77,7 +182,7 @@ sudo apt install tesseract-ocr
## Setup PyFedi
* Clone PyFedi
* Clone PieFed
```bash
git clone https://codeberg.org/rimu/pyfedi.git

View file

@ -10,7 +10,6 @@ import httpx
import redis
from flask import current_app, request, g, url_for, json
from flask_babel import _
from requests import JSONDecodeError
from sqlalchemy import text, func, desc
from sqlalchemy.exc import IntegrityError
@ -29,7 +28,7 @@ import pytesseract
from app.utils import get_request, allowlist_html, get_setting, ap_datetime, markdown_to_html, \
is_image_url, domain_from_url, gibberish, ensure_directory_exists, head_request, \
shorten_string, remove_tracking_from_link, \
microblog_content_to_title, generate_image_from_video_url, is_video_url, \
microblog_content_to_title, is_video_url, \
notification_subscribers, communities_banned_from, actor_contains_blocked_words, \
html_to_text, add_to_modlog_activitypub, joined_communities, \
moderating_communities, get_task_session, is_video_hosting_site, opengraph_parse
@ -1009,148 +1008,106 @@ def make_image_sizes_async(file_id, thumbnail_width, medium_width, directory, to
session = get_task_session()
file: File = session.query(File).get(file_id)
if file and file.source_url:
# Videos (old code. not invoked because file.source_url won't end .mp4 or .webm)
if file.source_url.endswith('.mp4') or file.source_url.endswith('.webm'):
new_filename = gibberish(15)
# set up the storage directory
directory = f'app/static/media/{directory}/' + new_filename[0:2] + '/' + new_filename[2:4]
ensure_directory_exists(directory)
# file path and names to store the resized images on disk
final_place = os.path.join(directory, new_filename + '.jpg')
final_place_thumbnail = os.path.join(directory, new_filename + '_thumbnail.webp')
try:
generate_image_from_video_url(file.source_url, final_place)
except Exception as e:
return
if final_place:
image = Image.open(final_place)
img_width = image.width
# Resize the image to medium
if medium_width:
if img_width > medium_width:
image.thumbnail((medium_width, medium_width))
image.save(final_place)
file.file_path = final_place
file.width = image.width
file.height = image.height
# Resize the image to a thumbnail (webp)
if thumbnail_width:
if img_width > thumbnail_width:
image.thumbnail((thumbnail_width, thumbnail_width))
image.save(final_place_thumbnail, format="WebP", quality=93)
file.thumbnail_path = final_place_thumbnail
file.thumbnail_width = image.width
file.thumbnail_height = image.height
session.commit()
# Images
try:
source_image_response = get_request(file.source_url)
except:
pass
else:
try:
source_image_response = get_request(file.source_url)
except:
pass
else:
if source_image_response.status_code == 404 and '/api/v3/image_proxy' in file.source_url:
source_image_response.close()
# Lemmy failed to retrieve the image but we might have better luck. Example source_url: https://slrpnk.net/api/v3/image_proxy?url=https%3A%2F%2Fi.guim.co.uk%2Fimg%2Fmedia%2F24e87cb4d730141848c339b3b862691ca536fb26%2F0_164_3385_2031%2Fmaster%2F3385.jpg%3Fwidth%3D1200%26height%3D630%26quality%3D85%26auto%3Dformat%26fit%3Dcrop%26overlay-align%3Dbottom%252Cleft%26overlay-width%3D100p%26overlay-base64%3DL2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctZGVmYXVsdC5wbmc%26enable%3Dupscale%26s%3D0ec9d25a8cb5db9420471054e26cfa63
# The un-proxied image url is the query parameter called 'url'
parsed_url = urlparse(file.source_url)
query_params = parse_qs(parsed_url.query)
if 'url' in query_params:
url_value = query_params['url'][0]
source_image_response = get_request(url_value)
else:
source_image_response = None
if source_image_response and source_image_response.status_code == 200:
content_type = source_image_response.headers.get('content-type')
if content_type:
if content_type.startswith('image') or (content_type == 'application/octet-stream' and file.source_url.endswith('.avif')):
source_image = source_image_response.content
source_image_response.close()
if source_image_response.status_code == 404 and '/api/v3/image_proxy' in file.source_url:
source_image_response.close()
# Lemmy failed to retrieve the image but we might have better luck. Example source_url: https://slrpnk.net/api/v3/image_proxy?url=https%3A%2F%2Fi.guim.co.uk%2Fimg%2Fmedia%2F24e87cb4d730141848c339b3b862691ca536fb26%2F0_164_3385_2031%2Fmaster%2F3385.jpg%3Fwidth%3D1200%26height%3D630%26quality%3D85%26auto%3Dformat%26fit%3Dcrop%26overlay-align%3Dbottom%252Cleft%26overlay-width%3D100p%26overlay-base64%3DL2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctZGVmYXVsdC5wbmc%26enable%3Dupscale%26s%3D0ec9d25a8cb5db9420471054e26cfa63
# The un-proxied image url is the query parameter called 'url'
parsed_url = urlparse(file.source_url)
query_params = parse_qs(parsed_url.query)
if 'url' in query_params:
url_value = query_params['url'][0]
source_image_response = get_request(url_value)
else:
source_image_response = None
if source_image_response and source_image_response.status_code == 200:
content_type = source_image_response.headers.get('content-type')
if content_type:
if content_type.startswith('image') or (content_type == 'application/octet-stream' and file.source_url.endswith('.avif')):
source_image = source_image_response.content
source_image_response.close()
content_type_parts = content_type.split('/')
if content_type_parts:
# content type headers often are just 'image/jpeg' but sometimes 'image/jpeg;charset=utf8'
content_type_parts = content_type.split('/')
if content_type_parts:
# content type headers often are just 'image/jpeg' but sometimes 'image/jpeg;charset=utf8'
# Remove ;charset=whatever
main_part = content_type.split(';')[0]
# Remove ;charset=whatever
main_part = content_type.split(';')[0]
# Split the main part on the '/' character and take the second part
file_ext = '.' + main_part.split('/')[1]
file_ext = file_ext.strip() # just to be sure
# Split the main part on the '/' character and take the second part
file_ext = '.' + main_part.split('/')[1]
file_ext = file_ext.strip() # just to be sure
if file_ext == '.jpeg':
file_ext = '.jpg'
elif file_ext == '.svg+xml':
return # no need to resize SVG images
elif file_ext == '.octet-stream':
file_ext = '.avif'
else:
file_ext = os.path.splitext(file.source_url)[1]
file_ext = file_ext.replace('%3f', '?') # sometimes urls are not decoded properly
if '?' in file_ext:
file_ext = file_ext.split('?')[0]
if file_ext == '.jpeg':
file_ext = '.jpg'
elif file_ext == '.svg+xml':
return # no need to resize SVG images
elif file_ext == '.octet-stream':
file_ext = '.avif'
else:
file_ext = os.path.splitext(file.source_url)[1]
file_ext = file_ext.replace('%3f', '?') # sometimes urls are not decoded properly
if '?' in file_ext:
file_ext = file_ext.split('?')[0]
new_filename = gibberish(15)
new_filename = gibberish(15)
# set up the storage directory
directory = f'app/static/media/{directory}/' + new_filename[0:2] + '/' + new_filename[2:4]
ensure_directory_exists(directory)
# set up the storage directory
directory = f'app/static/media/{directory}/' + new_filename[0:2] + '/' + new_filename[2:4]
ensure_directory_exists(directory)
# file path and names to store the resized images on disk
final_place = os.path.join(directory, new_filename + file_ext)
final_place_thumbnail = os.path.join(directory, new_filename + '_thumbnail.webp')
# file path and names to store the resized images on disk
final_place = os.path.join(directory, new_filename + file_ext)
final_place_thumbnail = os.path.join(directory, new_filename + '_thumbnail.webp')
if file_ext == '.avif': # this is quite a big plugin so we'll only load it if necessary
import pillow_avif
if file_ext == '.avif': # this is quite a big plugin so we'll only load it if necessary
import pillow_avif
# Load image data into Pillow
Image.MAX_IMAGE_PIXELS = 89478485
image = Image.open(BytesIO(source_image))
image = ImageOps.exif_transpose(image)
img_width = image.width
img_height = image.height
# Load image data into Pillow
Image.MAX_IMAGE_PIXELS = 89478485
image = Image.open(BytesIO(source_image))
image = ImageOps.exif_transpose(image)
img_width = image.width
img_height = image.height
# Resize the image to medium
if medium_width:
if img_width > medium_width:
image.thumbnail((medium_width, medium_width))
image.save(final_place)
file.file_path = final_place
file.width = image.width
file.height = image.height
# Resize the image to medium
if medium_width:
if img_width > medium_width:
image.thumbnail((medium_width, medium_width))
image.save(final_place)
file.file_path = final_place
file.width = image.width
file.height = image.height
# Resize the image to a thumbnail (webp)
if thumbnail_width:
if img_width > thumbnail_width:
image.thumbnail((thumbnail_width, thumbnail_width))
image.save(final_place_thumbnail, format="WebP", quality=93)
file.thumbnail_path = final_place_thumbnail
file.thumbnail_width = image.width
file.thumbnail_height = image.height
# Resize the image to a thumbnail (webp)
if thumbnail_width:
if img_width > thumbnail_width:
image.thumbnail((thumbnail_width, thumbnail_width))
image.save(final_place_thumbnail, format="WebP", quality=93)
file.thumbnail_path = final_place_thumbnail
file.thumbnail_width = image.width
file.thumbnail_height = image.height
session.commit()
session.commit()
# Alert regarding fascist meme content
if toxic_community and img_width < 2000: # images > 2000px tend to be real photos instead of 4chan screenshots.
try:
image_text = pytesseract.image_to_string(Image.open(BytesIO(source_image)).convert('L'), timeout=30)
except Exception as e:
image_text = ''
if 'Anonymous' in image_text and ('No.' in image_text or ' N0' in image_text): # chan posts usually contain the text 'Anonymous' and ' No.12345'
post = Post.query.filter_by(image_id=file.id).first()
notification = Notification(title='Review this',
user_id=1,
author_id=post.user_id,
url=url_for('activitypub.post_ap', post_id=post.id))
session.add(notification)
session.commit()
# Alert regarding fascist meme content
if toxic_community and img_width < 2000: # images > 2000px tend to be real photos instead of 4chan screenshots.
try:
image_text = pytesseract.image_to_string(Image.open(BytesIO(source_image)).convert('L'), timeout=30)
except Exception as e:
image_text = ''
if 'Anonymous' in image_text and ('No.' in image_text or ' N0' in image_text): # chan posts usually contain the text 'Anonymous' and ' No.12345'
post = Post.query.filter_by(image_id=file.id).first()
notification = Notification(title='Review this',
user_id=1,
author_id=post.user_id,
url=url_for('activitypub.post_ap', post_id=post.id))
session.add(notification)
session.commit()
def find_reply_parent(in_reply_to: str) -> Tuple[int, int, int]:

View file

@ -1364,7 +1364,7 @@ class Post(db.Model):
i += 1
db.session.commit()
if post.image_id:
if post.image_id and not post.type == constants.POST_TYPE_VIDEO:
make_image_sizes(post.image_id, 170, 512, 'posts',
community.low_quality) # the 512 sized image is for masonry view

View file

@ -15,6 +15,7 @@
<a href="{{ url_for('admin.newsletter') }}">{{ _('Newsletter') }}</a> |
<a href="{{ url_for('admin.admin_permissions') }}">{{ _('Permissions') }}</a> |
<a href="{{ url_for('admin.admin_activities') }}">{{ _('Activities') }}</a>
<a href="{{ url_for('main.modlog') }}">{{ _('Modlog') }}</a>
{% if debug_mode %}
| <a href="{{ url_for('dev.tools') }}">{{ _('Dev Tools') }}</a>
{% endif%}

View file

@ -228,8 +228,9 @@
<li><a class="dropdown-item{{ ' active' if active_child == 'admin_federation' }}" href="{{ url_for('admin.admin_federation') }}">{{ _('Federation') }}</a></li>
<li><a class="dropdown-item{{ ' active' if active_child == 'admin_instances' }}" href="{{ url_for('admin.admin_instances') }}">{{ _('Instances') }}</a></li>
<li><a class="dropdown-item{{ ' active' if active_child == 'admin_newsletter' }}" href="{{ url_for('admin.newsletter') }}">{{ _('Newsletter') }}</a></li>
<li><a class="dropdown-item{{ ' active' if active_child == 'admin_activities' }}" href="{{ url_for('admin.admin_activities') }}">{{ _('Activities') }}</a></li>
<li><a class="dropdown-item{{ ' active' if active_child == 'admin_permissions' }}" href="{{ url_for('admin.admin_permissions') }}">{{ _('Permissions') }}</a></li>
<li><a class="dropdown-item{{ ' active' if active_child == 'admin_activities' }}" href="{{ url_for('admin.admin_activities') }}">{{ _('Activities') }}</a></li>
<li><a class="dropdown-item{{ ' active' if active_child == 'modlog' }}" href="{{ url_for('main.modlog') }}">{{ _('Modlog') }}</a></li>
{% if debug_mode %}
<li><a class="dropdown-item{{ ' active' if active_child == 'dev_tools' }}" href="{{ url_for('dev.tools') }}">{{ _('Dev Tools') }}</a></li>
{% endif %}

View file

@ -4,7 +4,7 @@
{% extends "base.html" -%}
{% endif -%} -%}
{% from 'bootstrap5/form.html' import render_form -%}
{% set active_child = 'list_communities' -%}
{% set active_child = 'modlog' -%}
{% block app_content -%}
<h1>{{ _('Moderation log') }}</h1>

View file

@ -4,7 +4,6 @@ import bisect
import hashlib
import mimetypes
import random
import tempfile
import urllib
from collections import defaultdict
from datetime import datetime, timedelta, date
@ -13,11 +12,9 @@ from typing import List, Literal, Union
import httpx
import markdown2
import math
from urllib.parse import urlparse, parse_qs, urlencode
from functools import wraps
import flask
import requests
from bs4 import BeautifulSoup, MarkupResemblesLocatorWarning
import warnings
import jwt
@ -34,7 +31,6 @@ from wtforms.fields import SelectField, SelectMultipleField
from wtforms.widgets import Select, html_params, ListWidget, CheckboxInput
from app import db, cache, httpx_client
import re
from moviepy.editor import VideoFileClip
from PIL import Image, ImageOps
from app.models import Settings, Domain, Instance, BannedInstances, User, Community, DomainBlock, ActivityPubLog, IpBan, \
@ -1109,49 +1105,6 @@ def in_sorted_list(arr, target):
return index < len(arr) and arr[index] == target
# Makes a still image from a video url, without downloading the whole video file
def generate_image_from_video_url(video_url, output_path, length=2):
response = requests.get(video_url, stream=True, timeout=5,
headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0'}) # Imgur requires a user agent
content_type = response.headers.get('Content-Type')
if content_type:
if 'video/mp4' in content_type:
temp_file_extension = '.mp4'
elif 'video/webm' in content_type:
temp_file_extension = '.webm'
else:
raise ValueError("Unsupported video format")
else:
raise ValueError("Content-Type not found in response headers")
# Generate a random temporary file name
temp_file_name = gibberish(15) + temp_file_extension
temp_file_path = os.path.join(tempfile.gettempdir(), temp_file_name)
# Write the downloaded data to a temporary file
with open(temp_file_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=4096):
f.write(chunk)
if os.path.getsize(temp_file_path) >= length * 1024 * 1024:
break
# Generate thumbnail from the temporary file
try:
clip = VideoFileClip(temp_file_path)
except Exception as e:
os.unlink(temp_file_path)
raise e
thumbnail = clip.get_frame(0)
clip.close()
# Save the image
thumbnail_image = Image.fromarray(thumbnail)
thumbnail_image.save(output_path)
os.remove(temp_file_path)
@cache.memoize(timeout=600)
def recently_upvoted_posts(user_id) -> List[int]:
post_ids = db.session.execute(text('SELECT post_id FROM "post_vote" WHERE user_id = :user_id AND effect > 0 ORDER BY id DESC LIMIT 1000'),

View file

@ -1,4 +1,5 @@
services:
db:
shm_size: 128mb
image: postgres
@ -6,37 +7,62 @@ services:
- ./.env.docker
volumes:
- ./pgdata:/var/lib/postgresql/data
networks:
- pf_network
redis:
image: redis
env_file:
- ./.env.docker
networks:
- pf_network
celery:
build:
context: .
target: builder
container_name: piefed_celery1
depends_on:
- db
- redis
env_file:
- ./.env.docker
entrypoint: ./entrypoint_celery.sh
volumes:
- ./media/:/app/app/static/media/
web:
networks:
- pf_network
web:
build:
context: .
target: builder
container_name: piefed_app1
depends_on:
- db
- redis
env_file:
- ./.env.docker
volumes:
- ./.gunicorn.conf.py:/app/gunicorn.conf.py
- ./gunicorn.conf.py:/app/gunicorn.conf.py
- ./media/:/app/app/static/media/
ports:
- '8080:5000'
- '8030:5000'
networks:
- pf_network
adminer:
image: adminer
restart: always
ports:
- 8888:8080
depends_on:
- db
networks:
- pf_network
networks:
pf_network:
name: pf_network
external: false

0
entrypoint.sh Normal file → Executable file
View file

0
entrypoint_celery.sh Normal file → Executable file
View file

24
env.docker.sample Normal file
View file

@ -0,0 +1,24 @@
SECRET_KEY='change this to random characters'
SERVER_NAME='your_domain_here'
POSTGRES_USER='piefed'
POSTGRES_PASSWORD='piefed'
POSTGRES_DB='piefed'
DATABASE_URL=postgresql+psycopg2://piefed:piefed@db/piefed
CACHE_TYPE='RedisCache'
CACHE_REDIS_DB=1
CACHE_REDIS_URL='redis://redis:6379/0'
CELERY_BROKER_URL='redis://redis:6379/1'
MODE='production'
FULL_AP_CONTEXT=0
SPICY_UNDER_10 = 2.5
SPICY_UNDER_30 = 1.85
SPICY_UNDER_60 = 1.25

View file

@ -32,4 +32,3 @@ Werkzeug==2.3.3
pytesseract==0.3.10
sentry-sdk==1.40.6
python-slugify==8.0.4
moviepy==1.0.3