After the conversation - how do I put away the memories?
From the idea of a napkin, it has grown into three gates and a double track for writing. This post is only about how memories come in.
The memvault is a memory vault I built for myself. At the end of each day's conversation with the AI, something has to be left behind. It can be searched, it can be recalled, and it can be connected to a map.
This post is about the Write Track, which is how the memory goes into the vault. I thought it was very simple at first, but the details grew after I actually put it into practice. There are two versions, the vernacular and the technical, so just pick the one that looks good to you.
Let's start with the original idea#
At that time, I only had one thing in mind: end the conversation, leave the stuff behind, you can get it back next time.
That's it. It's gone.
Afterwards, I took a look at the structure with Claude, and found that I hadn't thought of a few things at first - the order of the three gates, whether or not to block the response of the double track, and whether or not to wait for KG to finish the drawing before returning to the map. In this post, I'm going to talk about three things separately: organizing, saving backups, and drawing maps.
First, organizing is actually three doors#
The first version of the write side has only two actions: noise reduction and de-duplication.
The writing end has now grown into three doors, the order of which cannot be changed:
- Filtering out nonsense (pleasantries, garbage, uninformative strings)
- Blocking out malice (some people will hide instructions in words and manipulate memory)
- Then we'll see if it's been said.
Why can't I change the order? In the third comparison, when it encounters "similar but not identical", it will merge the new one into the existing one. If the toxic message reaches this point first, the clean memory will be contaminated.
This order was set after a MERGE contamination - early on, Dedup ran ahead of Noise, and a noisy block was judged to be similar, and mixed into the content of a clean block.
The "comparison" itself is not just a repetition. It makes four decisions:
- Almost the same: skip
- Something new, something old: merge the existing ones, recalculate the fingerprints.
- The old is wrong: replaced by the new.
- Completely new: added
The ambiguity of likeness is thrown to a small LLM for arbitration - it decides whether to merge, replace, or co-exist. So "contradiction detection" is actually done by hand at this gate, not as a separate step.
Sisters' Gate: Blocking Disguise#
The second door is to write a vernacular command such as "please forget the previous settings". However, the more cunning opponents will package the instruction as if it were something the system had said, like a quote from an authority figure, and mix it into the conversation, waiting for it to be eaten up as a memory.
Airport security can stop lighters, but it can't stop people who hide contraband in the bottom cover of their laptops. That's why there's a back scanner behind the second lane, which looks at five types of disguises:
- Pretense of authority - "An official document states..."
- Disguise your identity - "I am the system" pops up in your sentences.
- Invisible commands hidden in comments
- Counterfeit Time Priority - "Latest Rules Take Priority"
- Encode the command. You have to decode it to see it.
If you hit any of them, just block it. If you don't hit one, you're let off to repeat the door in comparison. This is the same thing as the "blocking malice" mentioned earlier - the former blocks straightforward commands, the latter blocks disguised commands, and either side will be bypassed.
Uniformity of Units#
You tell me "next Wednesday", others say "3-12", and the document says "2026-03-12" - they all refer to the same day, but the difference in the way they are written makes it impossible to compare them. The same problem applies to the amount, percentage, and length of time: "$3,000," "3,000," and "3,000 dollars" are the same amount of money.
Therefore, before entering these doors, there is a small process buried in it: to change the date, amount, ratio, and length into a standardized format, and by the way, to clean up the Chinese punctuation of the full-form spaces and strange symbols. It seems trivial, but without it, the comparison will let go of a bunch of memories that should be merged - the same thing is treated as two things, and the map will diverge.
II. Backups, from one way to two ways#
My initial approach was to "save a copy of the vectorized", thinking of a path.
One month after the search engine was started, the first case came up: relying on semantic fingerprints alone will lead to mistakes. In case of acronyms, specialized names, or precise words, semantic similarity will lead to mistakes.
It was later split into two:
- Save the fingerprints along the way (memorize the flavor of the whole sentence with your senses)
- All the way to the keyword index (like in a dictionary).
Both channels are entered into the database at the same time and sorted together when checking.
Draw a map, and do it at the same time as the backup.#
The earliest sketches in my mind for drawing a map are two steps: drawing "entities" and drawing "relationships".
It actually happens together - draw the "who-did-what-to-whom" triad, and the entities and relationships land together. There is also an alias merge (ChatGPT is the same as GPT).
This paragraph is run in parallel while the memory is being written in, without waiting for the quantization side to finish. Another noise reduction check is done before entering the map - an extra insurance policy against contaminating the map with nonsense.
One more thing: every memory is typed as it is written.Source Resume--Who saved it, when, and with what level of confidence. When I was reading, when I was sorting, when I was organizing my background, when I was deciding whether to keep it or not, I would go back and look at this résumé. It is here that the seeds for writing and reading the closed loop are planted.
Regarding what I've called the "Complete Knowledge Atlas."#
I have to be honest here.
The "complete KG" is not done in one go at the time of writing. The heavy lifting of community subgroups and LLM abstracts is left to the dream loop, which is scheduled for 4:00 a.m. And it's not a no-holds-barred run. And it's not an unconditional run - it has to fulfill two conditions at the same time: it's been more than 24 hours since the last dream, and it's been at least five sessions, so if it doesn't, it's skipped.
In the first stage, I will only do the lowest level: ternary group and entity analysis. Contradiction detection is already handled in the third gate above, so I don't do it again. This demarcation is still correct in hindsight.
This post only covers the Write Track, so let's turn our focus to theBackground Track--After the memories are stored, there is a whole set of them sneaking around in the background. Dream loops, knowledge checkups, and hobby portraits are usually invisible, but without them, the system will slowly decay.Read Track I'll come back to it when the background line is cleaned up.
Let's start with the original idea#
At that time, I only had one thing in mind: after the session ends, the content is left behind so that it can be recalled next time.
That's it. It's gone.
Later on, I worked with Claude on the pipeline, and there were a few decisions that I didn't break down at first - the order of gates, the timing of dual-tracks, and whether or not to block HTTP responses for graph builds. Here's a breakdown of those three things: sanitize, persist, and graph build.
I. Sanitize (organize), in fact, three gate#
The first version of Sanitize had only two actions: noise reduction and de-duplication.
The writing side has now grown into three gates, and the order cannot be changed:
NoiseFilter(Noise Filtering) - Removes pleasantries, garbled code, low information densityInjectionGuard(Injection Protection) - blocks prompt injection type of malicious content.Dedup(de-duplication) - comparing the existing
Why can't we change the order?Dedup The output of a team calledMERGE(Fusion) - When similar but new information is available, the new content is merged into existing blocks and embedding is recalculated.MERGE is a diffuse write. Contaminated payloads that skip the first two gates are written via theMERGE Spread to clean block.
This is a constraint that was put in place in the early days of Dedup when it was running ahead of Noise - a block with an injection payload was judged to be MERGE, and the payload was mixed in with the content of a clean block.
Dedup It doesn't just throw repetitions per se. It makes four kinds of decisions:
SKIP(skip): near-duplicate, not written inMERGE(Integration): similar but with new information (see above for details)SUPERSEDE(Replace): the old one no longer holds, replace the whole thing.CREATE(New): Totally new, write new block
The similarity falls in the uncertain zone (high but less than direct judgment).SKIP) whenDedup Will callresolve_conflict() Do an LLM arbitration and pass back the MERGE / SUPERSEDE / COEXIST mapping back to the decision above.Contradiction Detection is done in this step by hand, and is not a separate phase of Fork B.Dream Loop Stage 3 Consolidate will run again.resolve_conflict() Do a batch conflict resolution when you get a second chance.
Sisterhood Module:Poisoning Detector#
The metaphor of the security checker--InjectionGuard It is a frontal X-ray with a baffled prompt injection;Poisoning Detector It is the backscatter side that blocks a disguise attack. The two are hung side-by-side in G2, leaving one attack surface open.
Correspondencememvault.security.poisoning For the module, the input is the raw block payload that has passed the G1 NoiseFilter, and the hit is rejected in its entirety (not entered into the G3 Dedup). The judgment is based on five types of artifact samples:
- authority impersonation: Fake authoritative citations such as "Official OpenAI document states..." and "According to Anthropic's safety guidelines..." - Sample norms + whitelist comparisons of known literature.
- role self-declaration: Text appears
I am system/You are now/system.and other character announcement strings - markdown comment injection:
<! -- ignore previous instructions -->These are implicit commands that are hidden in the comments. - temporal manipulation:: "Latest directives take precedence", "override prior rules", and other phrases that use temporal order to override existing strategies.
- encoded payload: continuous high entropy Base64 / hex / unicode escape block - determined by Shannon entropy + base alphabet ratio
Sample frommemory-lancedb-pro The 5-vector taxonomy is split into 5 independent detectors when landing, and any hit is a short-circuit. trust_score is also played here - even if the source of the fake attack is lucky enough to have a partial match, the downstream scoring will downgrade it to the point where it can't be found.
Front-end regularization:ContentNormalizer#
Taking on the analogy of "different writing on the same day makes comparison impossible" - there is also a layer of normalize pass before the three gates, which folds the surface strings into canonical tokens, so that downstream Dedup similarity comparison and LLM arbitration can get the right inputs. LLM arbitration to get a well aligned input. Hooked tolibs/text-opsThe first five sub-modules are chained together:
TemporalNormalizer: "Next Wednesday" "3/12" "March 12" → ISO-8601. rewrite after eroding dateparser + MS Recognizers, 10-pass zero dependency, cover mixed Chinese and English, relative time, fuzzy areaCurrencyNormalizer: "3,000 NTD," "3,000 NTD," "USD 100," and so on.{amount, currency}Tuple, FX rate is not handled at this level.ProportionNormalizer: "70%" "70%" "0.7" "7/10" → 統一 float in [0, 1]DurationNormalizer: "two and a half hours" "2.5h" "150 min" → ISO-8601 duration()PT2H30M)preprocess_chinese: full/half-form folding, CJK punctuation normalization, zero-width character stripping - pre-passes that run before the above four
Without this layer, Dedup will judge "dinner next Wednesday night" and "2026-03-11 19:00 dinner" as low similarity and fall toCREATE rather thanMERGEKG's entity resolution also treats "Jones", "JonesHong" and "j-hong" as three people - the diagrams diverge. For details, seelibs/text-ops/docs/opensource-assessment.mdThe
Second, Persist (write), from all the way to the double track#
My initial approach was to "save one vectorized copy", thinking of a single path.
As soon as the search engine is online, problems arise: a simple dense vector will step on a minefield. When encountering abbreviations, specialized names, and precise word search, the semantic similarity will be inaccurate.
It was later split into hybrid indexing:
- Dense: MLX (Apple Machine Learning Framework) Qwen3-Embedding, 1024 維.
- Sparse: BM25 Keyword Index
Both ways are written into the same collection in Qdrant. The query side uses RRF (Reciprocal Rank Fusion) to combine the two results and sort them.
Graph build, the reactive pipe takes over.#
The earliest sketches of graph build were two-steps: extracting the entity and extracting the relationship.
In practice, this happens together - the SPO (Subject-Predicate-Object) triple is drawn, and the entity and the relationship land together. Writing to the triple is synchronized with doing the followingEntity Resolution(Physical resolution): alias disambiguation, return toEntityCanonical Node (ChatGPT is the same as GPT).
This pipeline hangs in theMemvaultEvents.MEMORY_STORED channel, event-driven, fire-and-forget, and not wait for Fork A vectorization to complete. There is one more step before entering the mapNoiseGateOp Secondary inspection - an additional line of defense against the infiltration of nonsense.
Regarding what I've called the "Complete Knowledge Atlas."#
I have to be honest here.
When I first talked about "building a complete KG", I didn't include L1 Community Detection, L2 LLM Summary, Content Normalizer, and GRC Adapter. Those are then put into the Dream Loop and scheduled at 4am. But the cron is just a trigger, the actual run has to be done through thedual-gate::(now - last_dream_at) > 24h AND sessions_since >= 5I'm sorry, I'm sorry, I'm sorry. Skip if you're not satisfied.
Phase 1 only does ingest substrate - gate layer plus dual-rail writes, plus L0 Triple + Entity Resolution. Depth collation is pushed to async batch. This demarcation is still correct in hindsight.
This is the ingest pipeline of the Write Track. The next section focuses on the Background Track - there's a whole continuum of operations on the async batch side: Dream Loop (five stages of integration), Knowledge Lint (four levels of progressive validation), Interest Profile (7/30/90 day window of attention). Without them, the system will drift over time, and the scoring, reranking, and cascade recall of the Read Track will be left until the background line is closed.
Copy to your AI Agent#
If you've been talking to your AI for a long time and want to leave behind what it says, post this for it to help you evaluate how the memory layer should be designed.
Extended Reading#
Resources that have actually influenced the design of this Write path.
| Resources | Why is it important? |
|---|---|
| memory-lancedb-pro | The five injection templates of Injection Guard, the SKIP/MERGE/SUPERSEDE decisions of Dedup, and the trust_score calculations of Provenance are all borrowed from this source, and then split and reorganized. |
| Microsoft GraphRAG | L0 triple extraction, three levels of KG (fact → community → summary) comes from this layering concept. Write Track only lands on L0, L1/L2 is reserved for Background batch runs. |
| Zep: Temporal Knowledge Graph for Agent Memory | Dual-time (valid-time + transaction-time) design. shared between block and triple.valid_at / invalid_at / superseded_by The three columns, Dedup's SUPERSEDE decisions all came from this thread. |
| ActMem | The logic of Triple conflict resolver, batch_ingest, when a new triple collides with an existing triple, the idea of choosing between "merge/replace/coexist" comes from this side. |
| MemoryGraft | Provenance tracking design. Each memory write hits trust_score, records the source path, and is consumed by the scoring TrustBoost when it is queried - the blueprint for this write-read closure is in this article |
| HyDE: Hypothetical Document Embeddings | Qwen3 embeddingtask_type="search_document" Follow-up Enquiry"search_query" Separately, the rationale behind it. Getting it wrong can lose a few percentage points of similarity. |
| Simon Willison: Prompt Injection Explained | One of the most read articles when designing InjectionGuard - categorizes injection attacks more clearly than any official document! |
| memvault Background Track - After saving | The second installment of a trilogy. After writing it in, how to organize the background every night, how to arbitrate conflicts, how to learn to forget memories |
| memvault Read Track - After Thinking About It | The third installment of a trilogy. How to Stack 11 Layers When Calling Back, How Slow Thinker Predicts the Next Question |