Skip to content
Our open-source security work.See the research
Case study · RAGFlow · AI infrastructure
RAGFlow

A pickle RCE footgun in RAGFlow's deserialiser

RAGFlow's deserialise_b64 helper chose bare pickle.loads behind a safety flag that was never set, so decoded database values were unpickled with no restrictions. Reading one crafted column could execute code inside the RAGFlow process.

Surface
Database read path
Class
CWE-502
Severity
Medium
Type
Research

The system

RAGFlow is a widely used open-source retrieval-augmented generation engine. Its deserialise_b64 helper decodes base64 database payloads back into Python objects, sitting behind Peewee's SerializedField.python_value, which runs whenever a serialised column is read from MySQL.

The finding

The helper picked between a restricted loader and raw pickle.loads based on a use_deserialize_safe_module flag that defaulted to false and was set nowhere in the repository. The default path was bare pickle.loads, which Python's own documentation warns must never touch untrusted data.

Anyone who could influence a pickled column, through another injection, stolen database credentials or an untrusted backup restore, could turn a routine database read into code execution in the RAGFlow process. A latent footgun rather than a live endpoint, but insecure by default in exactly the place a future field would reuse.

The fix

Remove the flag-controlled branch so every decoded payload routes through the existing RestrictedUnpickler, which limits class resolution to an allow-list. A malicious __reduce__ payload that ran posix.system before the change was rejected after it, while a benign numpy array still round-tripped. The fix deleted more code than it added.

Want this rigour on your stack

Tell us what you're shipping and we'll scope a focused review, then Sebastion keeps it secure on every pull request.