IMMUNO-0001 — Vertebrate adaptive immune system: V(D)J recombination, clonal selection, MHC-restricted recognition
PR #IMMUNO-0001 — Vertebrate adaptive immune system
Branch: raphael/adaptive-immunity → main
Author: Raphael
Reviewer: Gabriel
Sprint: Silurian Sprint 22
Component: gnathostome / lymphoid lineage
Related: ADR-008 (open registry; this is the closed-source fork's
private-fork policy), ADR-006 (eukaryote acquisition lineage)
LOC: very large. See "Why this is one PR" below.
PR Body — Raphael
Excited about this one. This is the closed-source fork’s answer to the problem Raphael flagged on ADR-008: a mechanism for the eukaryote line to vet incoming molecular patterns now that we are no longer pulling from the bacterial registry. We need to be able to recognize “self” versus “not self” at planetary pathogen diversity. The bacterial domain solved exposure with horizontal flow. We are solving it with a sandboxed mutagenesis module.
Design
Generate the receptor library at runtime, in each cell, from a finite genomic substrate, by deliberately breaking and rejoining DNA at controlled cut sites. The library is not encoded. The library is synthesized per lymphocyte during development and the productive variants are kept.
Specifically:
- Genomic sandbox. Two loci in the genome are designated as recombination targets: the immunoglobulin (Ig) locus and the T-cell receptor (TCR) locus. Each locus contains arrays of gene segments — V (variable), D (diversity, heavy chain only), J (joining) — flanked by recombination signal sequences (RSS).
- RSS specification. Each RSS is a heptamer (CACAGTG) adjacent to the coding segment, then a 12 ± 1 or 23 ± 1 bp spacer, then a nonamer (ACAAAAACC). The spacer length encodes compatibility: a 12-RSS may only recombine with a 23-RSS. This is the 12/23 rule. It enforces V→D→J ordering and prevents V→V or J→J ligation.
- Cut. RAG1/RAG2 (recombination-activating genes 1 and 2) form a complex that recognizes a 12/23 RSS pair, brings them into synapse, and introduces a precise double-strand break at each heptamer. The coding ends form sealed hairpins; the signal ends are blunt.
- Process. The Artemis nuclease, activated by DNA-PKcs, nicks the hairpins open. The opening is offset, which generates short palindromic (“P”) nucleotides. Terminal deoxynucleotidyl transferase (TdT) then adds template-free (“N”) nucleotides at each junction.
- Rejoin. Non-homologous end joining (NHEJ; Ku70/Ku80, XRCC4–Ligase IV) ligates the processed coding ends.
- Allelic exclusion. Once a productive in-frame rearrangement is achieved on one allele, RAG is downregulated at that locus. Each lymphocyte ships one BCR or TCR specificity. One PR per lymphocyte.
Diversity budget
- Combinatorial (which V, D, J chosen): ~10^4 per heavy chain locus, comparable per light chain
- Junctional (P + N nucleotides + exonuclease trimming at each junction): ~10^7 additional
- Pairing (heavy × light): multiplicative
- Total naive repertoire: ~10^11 receptors at production
After antigen exposure, AID-mediated somatic hypermutation in germinal centers further diversifies the V region. Effective post-affinity-maturation diversity reaches ~10^16–10^18 across the lifetime of a vertebrate. From a finite genome.
Selection
The repertoire is generated blind to what it should bind. Two selection passes filter it:
- Positive selection. In the thymus (T cells) / bone marrow (B cells), only lymphocytes whose receptors can recognize self-MHC at low affinity survive. Lymphocytes that cannot engage MHC at all are useless and undergo apoptosis.
- Negative selection. Lymphocytes whose receptors bind self-peptide–MHC complexes at high affinity are also deleted. These are the candidate autoreactive clones.
Net survival rate from the thymus is ~2–5%. The other 95–98% fail one of the two filters and are killed by apoptosis. This is expensive. It is also the only way the math works.
Why this is one PR
These components are not separable. RAG without RSS is a generic DNA endonuclease (we have those). RSS without RAG is junk DNA. TdT without NHEJ produces unrepaired double-strand breaks. NHEJ without selection produces autoreactive clones at scale. MHC without TCR is a peptide-display system with no readers. We ship it together or we ship none of it.
Out of scope (follow-ups)
- Class switch recombination (IgM → IgG/IgA/IgE) and AID-mediated somatic hypermutation in germinal centers. Filing as IMMUNO-0002.
- Memory B/T cell persistence and the secondary response curve. IMMUNO-0003.
- Maternal–fetal tolerance during viviparity. Not applicable in oviparous gnathostomes (current). Filing as IMMUNO-0007 against the eventual placental mammal lineage. Gabriel will want this earlier than that. Acknowledged.
- The MHC locus polymorphism mechanism. The locus is already the most polymorphic in the genome and we have not designed the population-genetics dynamics that maintain it. Out of scope.
Performance note
For the record: the circulatory system is what makes any of this work. Lymphocytes that cannot circulate cannot survey. The secondary lymphoid organs — spleen, lymph nodes — are nodes on the circulatory graph. None of this is in scope but it is relevant context.
Review — Gabriel
Reviewing as IMMUNO-0001. Reading carefully because this PR introduces a system that mutates its own source code in production. I want it on the record that we read it carefully.
What is correct
- The 12/23 rule is well-specified and the RSS heptamer/nonamer consensus is correct. RAG complex specificity is tight. Confirming the cut sites are restricted to the Ig and TCR loci by virtue of where the RSS arrays were placed in the genome. The sandbox boundary, as specified, holds.
- RAG1/RAG2 ancestry is documented. RAG is descended from a Transib-family transposase. The transposon was domesticated. This is acknowledged in the PR. I would prefer it acknowledged more loudly: we have built our adaptive immune system on a captured mobile genetic element. The mobile-element behavior is not fully extinguished. See blocker (3).
- Allelic exclusion via RAG downregulation on productive rearrangement is the right call. Without it, every lymphocyte would express a polyclonal receptor mixture and clonal selection would not work. This is the single most important constraint in the PR and it is correctly in place.
- Selection topology is correct. Positive selection on self-MHC engagement, then negative selection on self-peptide affinity, is the right ordering. The ~2–5% survival rate is the right cost for the right guarantee.
Blockers
- The autoimmune escape path is not in the PR body. Negative selection in the thymus is not perfect. Tissue-restricted antigens are not all expressed in thymic medullary epithelium (AIRE will help — IMMUNO-0004 — but it will not be complete). Lymphocytes recognizing self-peptide–MHC complexes that the thymus cannot present will escape to the periphery. The peripheral tolerance mechanisms (regulatory T cells, anergy, ignorance) are partial. The system will generate autoreactive clones at a non-zero base rate, and those clones will occasionally find their target. Filing IMMUNO-0005 to track the incident queue. This is going to be a long-running ticket. I want it filed before merge.
- TdT junction processing produces non-productive rearrangements at high rate. N-nucleotide addition is template-free and not constrained to be a multiple of three. Approximately two-thirds of V(D)J rearrangements are out-of-frame or introduce premature stop codons. This is acceptable as a per-rearrangement cost because a lymphocyte that fails on the first allele rearranges the second; lymphocytes that fail both die. But the genomic damage budget per developing lymphocyte is significant. Confirming Michael has signed off on the energetic cost.
- The RAG complex cuts at cryptic RSS-like sequences elsewhere in the genome at a measurable rate. This was flagged parenthetically in the PR body and not given its own section. Translocations between the Ig locus and proto-oncogene loci — notably a t(8;14) translocation joining MYC to the IgH enhancer — are a predictable consequence of leaving an active DNA-cutting enzyme in a developing cell. Filing IMMUNO-0006 for the B-cell-lymphoma incident class. This is a direct cost of the architecture and I want it on the books in this PR, not discovered at the next post-mortem.
- AID will be similarly unsafe. Out of scope for this PR but the somatic hypermutation mechanism Raphael lists as a follow-up uses cytidine deamination targeted by sequence motif. AID has off-target activity at non-Ig loci. The same translocation class will reappear. I am flagging it now because the PR description pitches AID as straightforward additional diversity. It is not.
- No kill switch for runaway clonal expansion. A B or T cell that escapes negative selection and finds a self-antigen will clonally expand on the same machinery used for productive responses. There is no rate limit. We are relying on regulatory T cells (not in this PR), peripheral anergy (not in this PR), and apoptosis on cytokine withdrawal (the cytochrome c kill signal — see ADR-006, Raphael’s note — which is in place). The kill-switch coverage is partial. Filing as known issue under IMMUNO-0005.
Suggestions
- The PR body says “we ship it together or we ship none of it.” Agree. Note that this commits us to a long debugging window where the failure modes are diffuse, late, and tissue-specific. Recommend setting expectations that IMMUNO-0001 incidents will show up under unrelated component prefixes (ENDO, NEURO, GI) for the rest of the project. This is not a defect in the filing convention; it is the architecture.
- The maternal–fetal tolerance follow-up will be required before placental viviparity ships. IMMUNO-0007 should not block this PR but it should be filed today against the mammalian lineage, not when we get there. The trophoblast is going to need a non-classical MHC story (HLA-G analog) and we will need to design it before we need it. Filing.
- The RAG-as-domesticated-transposon framing should be in the ADR, not just the PR. There is a story to tell about the eukaryote line acquiring its private-fork policy from a captured mobile element shortly after refusing to participate in the bacterial registry (ADR-008). That parallel is going to come up. I want it on the record now.
Decision
Approving IMMUNO-0001 with blockers (1), (3), (5) filed as follow-up tickets before merge. Blocker (2) accepted as known cost. Blocker (4) carried into IMMUNO-0002.
The autoimmune incident queue is going to grow indefinitely. Approving anyway. We need ~10^18 receptor diversity and there is no other path that gets us there from a finite genome. The integration cost is real but it is amortizable.
Comments
Raphael [~-420,000,000]: The cryptic-RSS translocation rate is actually quite low at the per-cell level, Gabriel. The translocations are a small-numbers problem against a large-numbers denominator. The RAG specificity is tight enough that off-target activity is in the ~10⁻⁶ per-cell-per-division range. The clinical incidence is going to be non-zero but it is not what I would call concerning. It is what I would call the cost of the diversity budget.
Gabriel [~-420,000,000]: The denominator is ~10^11 lymphocytes generated per day at steady state in a large vertebrate. 10⁻⁶ of that is 10^5 events per day. Filing IMMUNO-0006.
Raphael [~-420,000,000]: filed.
Raphael [~-420,000,000]: also for the record the circulatory system is doing a lot of load-bearing work for this PR.
Gabriel [~-420,000,000]: The circulatory system is not in scope for this review, Raphael.
Raphael [~-420,000,000]: preemptively for the record
The Architect [~-420,000,000]: Approved. Note for the record: the innate immune system is not being deprecated. Different layer. Any ticket that arrives proposing to remove TLRs, complement, or phagocyte function “now that we have adaptive” will be closed without review.
Uriel [~-420,000,000]: IMMUNO-0005 opened. IMMUNO-0006 opened. IMMUNO-0007 opened against the mammalian lineage, status Backlog. No further action.
Gabriel [~-300,000]: IMMUNO-0005 is at ~80 distinct autoimmune incident classes. IMMUNO-0006 has separated into the lymphoma sub-registry. IMMUNO-0007 shipped on schedule with the placental mammals and is, against expectations, mostly working. The PR description’s claim that “we ship it together or we ship none of it” has held for 420 My. Closing this review. Receptor diversity estimate at modern scale is ~10^16 per individual at steady state, ~10^18 across an exposure history. The autoreactive base rate is ~3% of naive lymphocytes leaving the thymus. Peripheral tolerance handles most of them. Most.