Skip to content

Commit 87dd100

Browse files
committed
Add per-user audit logging for S3 CRUD actions
- New flask_s3_viewer.audit module emits one line per list/download/ upload/delete/presign call with action, namespace, key, user, result, status_code, client_ip, user_agent, and a length-capped error field. Denied operations land at WARNING, exceptions at ERROR, success at INFO; control-byte sanitisation blocks log-line injection. emit() is a documented public helper for host integrations. - Wire emit() into the blueprint: _enforce_auth fires before the 401/403 abort, the @require decorator covers download/delete, and files/files_presign emit via try/finally. A request-scoped sentinel prevents duplicate or missing records, and the Google OAuth redirect path is intentionally silent (covered by a regression test). - Move every direct logging.* call to module loggers and fix the long-standing PURGE / MKDIR / UP_OBJECT debug statements that were silently dropped because the object name was passed as a positional arg against an unparameterised message. - Document the logger name, fields, level mapping, ProxyFix recommend- ation, RedactFilter / KeyErrorRedactFilter examples, and host-side emit() usage. Bump CHANGELOG with the next-release section.
1 parent 6dc9d15 commit 87dd100

18 files changed

Lines changed: 1620 additions & 197 deletions

File tree

CHANGELOG.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,59 @@ All notable changes to this project will be documented in this file. Dates are d
44

55
Generated by [`auto-changelog`](https://github.com/CookPete/auto-changelog).
66

7+
#### [1.1.0] - unreleased
8+
9+
> Audit logging & logging surface clean-up. The public API (`FlaskS3Viewer`
10+
> constructor, `add_new_one`, `init_app`, `get_instance`,
11+
> `get_boto_client`, `get_boto_session`) is unchanged — the changes
12+
> below affect log output only. The version line is provisional;
13+
> the final release tag is decided at release time.
14+
15+
**Added**
16+
17+
- New audit logger `flask_s3_viewer.audit`. Every S3 CRUD action that
18+
flows through the blueprint (`list` / `download` / `upload` /
19+
`delete` / `presign`) emits exactly one structured record carrying
20+
`action`, `namespace`, `key`, `user`, `result`, `status_code`,
21+
`client_ip`, `user_agent`, and (on failure) `error`. Level mapping:
22+
`ok` → INFO, `denied` → WARNING, `error` → ERROR. Activation is
23+
pure-`logging`: attach a handler to `flask_s3_viewer.audit`, set its
24+
level, optionally pin `propagate = False`. No constructor flag.
25+
`flask_s3_viewer.audit.emit(...)` is a public helper host code may
26+
call to record extra non-CRUD operations on the same logger. Log
27+
injection is defended at this layer — ASCII control bytes in any
28+
user-controllable field (key / email / User-Agent / exception text)
29+
are escaped to `\xNN` before the record is built. UA is capped at
30+
256 bytes; free-form fields at 1024 bytes. See *Audit logging* in
31+
`docs/source/usage/configuration.rst` for ProxyFix guidance, JSON
32+
handler setup, and PII / ARN / bucket-name redaction filter
33+
examples.
34+
35+
**Changed**
36+
37+
- Internal `logging.info(...)` / `logging.error(...)` /
38+
`logging.debug(...)` calls in `flask_s3_viewer/__init__.py`,
39+
`aws/s3.py`, `aws/session.py`, and `aws/cache.py` have been routed
40+
through per-module `logging.getLogger(__name__)` instances. Record
41+
`name` now reads `flask_s3_viewer.<module>` instead of `root`. Host
42+
pipelines that filter by record `name` (rather than by handler
43+
attachment) need to update their allow-lists; host pipelines that
44+
attach handlers to the root logger (or to `flask_s3_viewer`) are
45+
unaffected — the module loggers default to `propagate=True`.
46+
- `app.url_map` registration dump was demoted from INFO to DEBUG so
47+
registering many namespaces (or large blueprints) no longer floods
48+
INFO logs at app startup.
49+
50+
**Fixed**
51+
52+
- `aws/s3.py`'s `PURGE:` / `MKDIR:` / `UP_OBJECT:` DEBUG lines used
53+
the broken `logging.debug('PURGE:', name)` form (the second
54+
positional argument was treated as a logging-args tuple, against an
55+
unparameterised message string, which the stdlib silently drops).
56+
They now use `logger.debug('PURGE: %s', name)` etc., so a host that
57+
pins the logger at DEBUG actually sees the keys involved in each
58+
cache invalidation / placeholder create / upload event.
59+
760
#### [1.0.1](https://github.com/hidekuma/flask-s3-viewer/compare/1.0.0...1.0.1)
861

962
> 15 May 2026

docs/doctrees/changelog.doctree

16.6 KB
Binary file not shown.

docs/doctrees/environment.pickle

767 Bytes
Binary file not shown.
25.6 KB
Binary file not shown.

docs/html/_sources/usage/configuration.rst.txt

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,191 @@ allow-list case — internally they wire up the
417417
``permission_callback``. Pass your own ``permission_callback`` for
418418
fine-grained per-action policy.
419419

420+
Audit logging
421+
-------------
422+
423+
Every S3 CRUD action that flows through the blueprint emits a single
424+
structured record on the ``flask_s3_viewer.audit`` logger. The logger
425+
is always present — host applications opt in by attaching a handler
426+
and/or adjusting its level via the standard ``logging`` API. No
427+
constructor flag toggles audit on or off; the v1.0 public API is
428+
unchanged.
429+
430+
**Logger name:** ``flask_s3_viewer.audit``
431+
**Default level:** unset (records propagate to root and are filtered
432+
by the host's effective level). Successful actions emit at
433+
``INFO``; permission denials emit at ``WARNING``; unexpected
434+
exceptions emit at ``ERROR``.
435+
436+
**Record fields** (attached as ``LogRecord`` attributes via ``extra=``):
437+
438+
- ``action`` — one of ``list``, ``download``, ``upload``, ``delete``,
439+
``presign``
440+
- ``namespace`` — viewer namespace the request landed on
441+
- ``key`` — canonical S3 key / prefix (post-``base_path``)
442+
- ``user`` — authenticated email or the literal string ``anonymous``
443+
- ``result`` — ``ok`` / ``denied`` / ``error``
444+
- ``status_code`` — HTTP status emitted to the client
445+
- ``client_ip`` — ``request.remote_addr``
446+
- ``user_agent`` — capped at 256 bytes; sanitised
447+
- ``error`` — present only when an exception was attached
448+
449+
The human-readable message is a single space-separated key=value line:
450+
451+
.. code-block:: text
452+
453+
action=download namespace=fsv-test key=docs/report.pdf
454+
user=alice@example.com result=ok status=200
455+
456+
Newlines, carriage returns, and other ASCII control bytes inside
457+
attacker-controllable fields (key, email, User-Agent, exception
458+
message) are escaped as ``\\xNN`` before the record is built, so a
459+
crafted request cannot smuggle a fake row into the log stream.
460+
461+
**Plain file handler example:**
462+
463+
.. code-block:: python
464+
:linenos:
465+
466+
import logging
467+
468+
handler = logging.FileHandler('/var/log/flask_s3_viewer/audit.log')
469+
handler.setFormatter(logging.Formatter('%(asctime)s %(levelname)s %(message)s'))
470+
logging.getLogger('flask_s3_viewer.audit').addHandler(handler)
471+
logging.getLogger('flask_s3_viewer.audit').setLevel(logging.INFO)
472+
473+
**Structured JSON handler example** (uses
474+
`python-json-logger <https://github.com/madzak/python-json-logger>`_):
475+
476+
.. code-block:: python
477+
:linenos:
478+
479+
import logging
480+
from pythonjsonlogger import jsonlogger
481+
482+
audit_handler = logging.FileHandler('/var/log/flask_s3_viewer/audit.jsonl')
483+
audit_handler.setFormatter(jsonlogger.JsonFormatter(
484+
'%(asctime)s %(levelname)s %(action)s %(namespace)s '
485+
'%(key)s %(user)s %(result)s %(status_code)s '
486+
'%(client_ip)s %(user_agent)s'
487+
))
488+
audit = logging.getLogger('flask_s3_viewer.audit')
489+
audit.addHandler(audit_handler)
490+
audit.setLevel(logging.INFO)
491+
# The library leaves propagate=True by default — disable it here
492+
# if you do NOT also want these records flowing to root handlers.
493+
audit.propagate = False
494+
495+
**PII / secret redaction.** Emails and S3 keys are written verbatim,
496+
which may be sensitive depending on deployment policy. Attach a
497+
``logging.Filter`` if you need to mask, hash, or drop fields before
498+
they hit disk — for example to GDPR-truncate the user field, or to
499+
strip ARNs/bucket names from ``error`` messages produced by boto3
500+
``ClientError`` stringification.
501+
502+
.. code-block:: python
503+
:linenos:
504+
505+
class RedactFilter(logging.Filter):
506+
def filter(self, record):
507+
if getattr(record, 'user', None):
508+
user = record.user
509+
record.user = user.split('@', 1)[0][:2] + '***@' + user.split('@', 1)[-1]
510+
return True
511+
512+
audit.addFilter(RedactFilter())
513+
514+
For ``key`` and ``error`` — which can carry full S3 paths and boto3
515+
``ClientError`` text containing bucket names / ARNs / request IDs —
516+
attach a second filter that keeps just enough breadcrumb to trace the
517+
incident without leaking the rest of the path or the AWS account
518+
topology:
519+
520+
.. code-block:: python
521+
:linenos:
522+
523+
import re
524+
525+
_ARN_RE = re.compile(r'arn:aws:[^\s"\']+')
526+
_BUCKET_RE = re.compile(r'(?i)\bbucket[\s:=]+[^\s"\',]+')
527+
528+
class KeyErrorRedactFilter(logging.Filter):
529+
"""Redact prefix tails on ``key`` and AWS identifiers on ``error``."""
530+
def filter(self, record):
531+
key = getattr(record, 'key', None)
532+
if key:
533+
# Keep only the first path segment ("docs/...") so the
534+
# audit trail still distinguishes top-level folders but
535+
# the leaf filename / nested path is masked.
536+
head, sep, _tail = key.partition('/')
537+
record.key = f'{head}{sep}***' if sep else '***'
538+
err = getattr(record, 'error', None)
539+
if err:
540+
err = _ARN_RE.sub('arn:aws:***', err)
541+
err = _BUCKET_RE.sub('bucket=***', err)
542+
record.error = err
543+
return True
544+
545+
audit.addFilter(KeyErrorRedactFilter())
546+
547+
The two filters compose — install both if you want the user, key, and
548+
error fields all masked. Tune the regex set to your environment;
549+
``ClientError`` text varies by API call.
550+
551+
**Capturing the real client IP behind a reverse proxy.** ``client_ip``
552+
is sourced from ``request.remote_addr``, which Werkzeug fills from the
553+
*last hop* on the TCP connection. When the app sits behind a load
554+
balancer, ALB / ELB / nginx / Cloudflare, that hop is the proxy and
555+
every audit row records the proxy IP — not the originating client.
556+
For the audit trail to actually identify clients you must install
557+
Werkzeug's ``ProxyFix`` middleware (or an equivalent) so
558+
``X-Forwarded-For`` / ``Forwarded`` headers are honored:
559+
560+
.. code-block:: python
561+
:linenos:
562+
563+
from werkzeug.middleware.proxy_fix import ProxyFix
564+
565+
# ``x_for=1`` trusts exactly one X-Forwarded-For hop (your edge LB).
566+
# If the request transits N reverse proxies you control end-to-end,
567+
# raise this to N. Trusting too many hops lets clients spoof the IP.
568+
app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_host=1)
569+
570+
Without ProxyFix (or a host-supplied equivalent), every ``client_ip``
571+
field in the audit stream is the LB's address and the audit trail
572+
loses most of its forensic value. The value of ``x_for`` is
573+
deployment-specific — adjust it for nested LB / CDN topologies, and
574+
**only** trust hops you operate.
575+
576+
**Calling :func:`emit` from host code.** The ``emit`` function is part
577+
of the public surface: host integrations may import it and emit
578+
extra audit lines for non-CRUD operations they layer on top of the
579+
viewer (e.g. a custom admin route that bulk-tags objects). Usage:
580+
581+
.. code-block:: python
582+
:linenos:
583+
584+
from flask_s3_viewer.audit import emit as audit_emit
585+
from flask_s3_viewer.auth import ACTION_LIST # or ACTION_DOWNLOAD/...
586+
587+
# Call from inside a Flask request context so client_ip / user_agent
588+
# are populated automatically; outside a request both fields emit as
589+
# empty strings.
590+
audit_emit(
591+
action=ACTION_LIST,
592+
namespace='my-bucket',
593+
key='reports/2026/', # caller pre-normalises (post-base_path)
594+
user=current_user_email,
595+
result='ok',
596+
status_code=200,
597+
)
598+
599+
Prefer the ``flask_s3_viewer.auth.ACTION_*`` constants over raw
600+
strings; ``action`` and ``result`` are sanitised but the level mapping
601+
(``ok``→INFO, ``denied``→WARNING, ``error``→ERROR) depends on
602+
``result``. The signature is part of v1.x stability — additions will
603+
be backwards-compatible.
604+
420605
Use Caching
421606
-----------
422607
S3 is charged per call. Therefore, Flask S3Viewer supports caching (currently only supports file caching, in-memory database will be supported later).

docs/html/changelog.html

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
<li class="toctree-l1"><a class="reference internal" href="usage/templates.html"> Templates</a></li>
5454
<li class="toctree-l1"><a class="reference internal" href="license.html"> License</a></li>
5555
<li class="toctree-l1 current"><a class="current reference internal" href="#"> Changelog</a><ul>
56+
<li class="toctree-l2"><a class="reference internal" href="#unreleased">[1.1.0] - unreleased</a></li>
5657
<li class="toctree-l2"><a class="reference internal" href="#id2">1.0.1</a></li>
5758
<li class="toctree-l2"><a class="reference internal" href="#id4">1.0.0</a></li>
5859
<li class="toctree-l2"><a class="reference internal" href="#id5">1.0.0a1</a></li>
@@ -108,6 +109,61 @@
108109
<h1>Changelog<a class="headerlink" href="#changelog" title="Link to this heading"></a></h1>
109110
<p>All notable changes to this project will be documented in this file. Dates are displayed in UTC.</p>
110111
<p>Generated by <cite>``auto-changelog`</cite> &lt;<a class="reference external" href="https://github.com/CookPete/auto-changelog">https://github.com/CookPete/auto-changelog</a>&gt;`_.</p>
112+
<section id="unreleased">
113+
<h2>[1.1.0] - unreleased<a class="headerlink" href="#unreleased" title="Link to this heading"></a></h2>
114+
<blockquote>
115+
<div><p>Audit logging &amp; logging surface clean-up. The public API (<code class="docutils literal notranslate"><span class="pre">FlaskS3Viewer</span></code>
116+
constructor, <code class="docutils literal notranslate"><span class="pre">add_new_one</span></code>, <code class="docutils literal notranslate"><span class="pre">init_app</span></code>, <code class="docutils literal notranslate"><span class="pre">get_instance</span></code>,
117+
<code class="docutils literal notranslate"><span class="pre">get_boto_client</span></code>, <code class="docutils literal notranslate"><span class="pre">get_boto_session</span></code>) is unchanged — the changes
118+
below affect log output only. The version line is provisional;
119+
the final release tag is decided at release time.</p>
120+
</div></blockquote>
121+
<p><strong>Added</strong></p>
122+
<ul class="simple">
123+
<li><p>New audit logger <code class="docutils literal notranslate"><span class="pre">flask_s3_viewer.audit</span></code>. Every S3 CRUD action that
124+
flows through the blueprint (<code class="docutils literal notranslate"><span class="pre">list</span></code> / <code class="docutils literal notranslate"><span class="pre">download</span></code> / <code class="docutils literal notranslate"><span class="pre">upload</span></code> /
125+
<code class="docutils literal notranslate"><span class="pre">delete</span></code> / <code class="docutils literal notranslate"><span class="pre">presign</span></code>) emits exactly one structured record carrying
126+
<code class="docutils literal notranslate"><span class="pre">action</span></code>, <code class="docutils literal notranslate"><span class="pre">namespace</span></code>, <code class="docutils literal notranslate"><span class="pre">key</span></code>, <code class="docutils literal notranslate"><span class="pre">user</span></code>, <code class="docutils literal notranslate"><span class="pre">result</span></code>, <code class="docutils literal notranslate"><span class="pre">status_code</span></code>,
127+
<code class="docutils literal notranslate"><span class="pre">client_ip</span></code>, <code class="docutils literal notranslate"><span class="pre">user_agent</span></code>, and (on failure) <code class="docutils literal notranslate"><span class="pre">error</span></code>. Level mapping:
128+
<code class="docutils literal notranslate"><span class="pre">ok</span></code> → INFO, <code class="docutils literal notranslate"><span class="pre">denied</span></code> → WARNING, <code class="docutils literal notranslate"><span class="pre">error</span></code> → ERROR. Activation is
129+
pure-<code class="docutils literal notranslate"><span class="pre">logging</span></code>: attach a handler to <code class="docutils literal notranslate"><span class="pre">flask_s3_viewer.audit</span></code>, set its
130+
level, optionally pin <code class="docutils literal notranslate"><span class="pre">propagate</span> <span class="pre">=</span> <span class="pre">False</span></code>. No constructor flag.
131+
<code class="docutils literal notranslate"><span class="pre">flask_s3_viewer.audit.emit(...)</span></code> is a public helper host code may
132+
call to record extra non-CRUD operations on the same logger. Log
133+
injection is defended at this layer — ASCII control bytes in any
134+
user-controllable field (key / email / User-Agent / exception text)
135+
are escaped to <code class="docutils literal notranslate"><span class="pre">\xNN</span></code> before the record is built. UA is capped at
136+
256 bytes; free-form fields at 1024 bytes. See <em>Audit logging</em> in
137+
<code class="docutils literal notranslate"><span class="pre">docs/source/usage/configuration.rst</span></code> for ProxyFix guidance, JSON
138+
handler setup, and PII / ARN / bucket-name redaction filter
139+
examples.</p></li>
140+
</ul>
141+
<p><strong>Changed</strong></p>
142+
<ul class="simple">
143+
<li><p>Internal <code class="docutils literal notranslate"><span class="pre">logging.info(...)</span></code> / <code class="docutils literal notranslate"><span class="pre">logging.error(...)</span></code> /
144+
<code class="docutils literal notranslate"><span class="pre">logging.debug(...)</span></code> calls in <code class="docutils literal notranslate"><span class="pre">flask_s3_viewer/__init__.py</span></code>,
145+
<code class="docutils literal notranslate"><span class="pre">aws/s3.py</span></code>, <code class="docutils literal notranslate"><span class="pre">aws/session.py</span></code>, and <code class="docutils literal notranslate"><span class="pre">aws/cache.py</span></code> have been routed
146+
through per-module <code class="docutils literal notranslate"><span class="pre">logging.getLogger(__name__)</span></code> instances. Record
147+
<code class="docutils literal notranslate"><span class="pre">name</span></code> now reads <code class="docutils literal notranslate"><span class="pre">flask_s3_viewer.&lt;module&gt;</span></code> instead of <code class="docutils literal notranslate"><span class="pre">root</span></code>. Host
148+
pipelines that filter by record <code class="docutils literal notranslate"><span class="pre">name</span></code> (rather than by handler
149+
attachment) need to update their allow-lists; host pipelines that
150+
attach handlers to the root logger (or to <code class="docutils literal notranslate"><span class="pre">flask_s3_viewer</span></code>) are
151+
unaffected — the module loggers default to <code class="docutils literal notranslate"><span class="pre">propagate=True</span></code>.</p></li>
152+
<li><p><code class="docutils literal notranslate"><span class="pre">app.url_map</span></code> registration dump was demoted from INFO to DEBUG so
153+
registering many namespaces (or large blueprints) no longer floods
154+
INFO logs at app startup.</p></li>
155+
</ul>
156+
<p><strong>Fixed</strong></p>
157+
<ul class="simple">
158+
<li><p><code class="docutils literal notranslate"><span class="pre">aws/s3.py</span></code>‘s <code class="docutils literal notranslate"><span class="pre">PURGE:</span></code> / <code class="docutils literal notranslate"><span class="pre">MKDIR:</span></code> / <code class="docutils literal notranslate"><span class="pre">UP_OBJECT:</span></code> DEBUG lines used
159+
the broken <code class="docutils literal notranslate"><span class="pre">logging.debug('PURGE:',</span> <span class="pre">name)</span></code> form (the second
160+
positional argument was treated as a logging-args tuple, against an
161+
unparameterised message string, which the stdlib silently drops).
162+
They now use <code class="docutils literal notranslate"><span class="pre">logger.debug('PURGE:</span> <span class="pre">%s',</span> <span class="pre">name)</span></code> etc., so a host that
163+
pins the logger at DEBUG actually sees the keys involved in each
164+
cache invalidation / placeholder create / upload event.</p></li>
165+
</ul>
166+
</section>
111167
<section id="id2">
112168
<h2><a class="reference external" href="https://github.com/hidekuma/flask-s3-viewer/compare/1.0.0...1.0.1">1.0.1</a><a class="headerlink" href="#id2" title="Link to this heading"></a></h2>
113169
<blockquote>

0 commit comments

Comments
 (0)