CELL-2848 — Spliceosome: circular dependency between introns and the machinery that splices them
[CELL-2848] Spliceosome architecture: circular dependency between intron splicing and the machinery that splices introns
Type: Tech Debt
Priority: P2
Reporter: Gabriel
Assignee: Unassigned
Status: Open
Created: ~-1,900,000,000
Sprint: Backlog
Component: nuclear / RNA processing
Related: ADR-006 (eukaryote acquisition / introduced introns)
Description
The eukaryote refactor (ADR-006) imported group II self-splicing introns from the alphaproteobacterial acquisition. These were initially autocatalytic — the intron RNA itself catalyzed its own removal from the pre-mRNA via a two-step transesterification through a 2’-5’ lariat intermediate. The catalytic activity lived in the intron’s tertiary fold (domain V coordinates the catalytic metal; domain VI provides the bulged adenosine that attacks the 5’ splice site). It worked. It was self-contained. We were content.
Group II introns are now degenerating across the genome. Their tertiary structure is being lost to drift. Their autocatalytic activity is moving out of the intron RNA and into a trans-acting machine assembled from:
- 5 small nuclear RNAs (U1, U2, U4, U5, U6) — currently performing the catalytic and recognition roles formerly performed by the intron itself.
- ~50 protein components today. Forecast: ~150 at maturity.
- Auxiliary factors (SR proteins, hnRNPs) — count is climbing.
This trans-acting machine is the spliceosome.
The dependency graph is circular:
- The introns exist because the spliceosome can remove them. (Without splicing, the intron sequence in the middle of every protein-coding gene would be in-frame garbage and translated.)
- The spliceosome exists because the introns exist and need removal. (Nothing else in the cell needs U1 snRNP.)
- The spliceosome’s own protein components are themselves encoded by genes that contain introns, which must be spliced by the spliceosome before the spliceosome’s components can be translated.
We have built a system that requires itself to function in order to be expressed.
We have done this on purpose. Once group II self-splicing degraded, there was no other path: either remove every intron (see “Why this is open indefinitely” below) or build a trans-acting splicing machine.
Cost
Per pre-mRNA molecule:
- 5’ splice site recognition (GU dinucleotide; U1 snRNP base-pairs with it)
- Branch point recognition (U2 snRNP base-pairs with the branch sequence; the bulged adenosine 2’-OH is the nucleophile)
- 3’ splice site recognition (AG dinucleotide; U2AF participates)
- Recruitment of the U4/U6.U5 tri-snRNP
- A major conformational rearrangement: U1 leaves, U6 displaces U1 at the 5’ splice site, U4 dissociates from U6, U6 base-pairs with U2 and forms the catalytic core
- Catalysis: two transesterifications, lariat formation, exon ligation
- Disassembly and recycling of every component
Per protein-coding gene: dozens to hundreds of these events, in correct order, with correct splice site selection. Alternative splicing (out of scope here, see CELL-2871) makes correct selection context-dependent.
ATP/GTP cost per splicing event is significant (DEAH/DEAD-box helicases drive the rearrangements). The full energetic accounting has not been done. Raphael volunteered.
Why This Is Open Indefinitely
Three options were considered:
-
Remove the introns from the genome. Cannot. Many introns contain regulatory elements (enhancers, snoRNA host sequences, miRNA precursors). Many introns are required for nuclear export of the mature mRNA via the exon junction complex. Some introns encode functional non-coding RNAs in their own right. There is no safe deletion path. Some introns may be doing something we have not characterized yet, in a tissue type that does not exist yet, at a developmental stage that has not been tested.
-
Restore group II self-splicing. The tertiary fold has been lost in nearly all introns. Restoration would require re-evolving the catalytic structure on a per-intron basis. Sequence has diverged past the point where this is feasible.
-
Replace the spliceosome with a simpler trans-acting machine. No candidate exists. The spliceosome’s flexibility (alternative splicing, recursive splicing, trans-splicing in some lineages) is load-bearing. A simpler machine would lose capabilities that downstream code already depends on. Downstream code does not exist yet but it is going to.
Acceptance Criteria
- Eliminate the circular dependency between introns and the spliceosome
- Reduce spliceosome component count to <20 proteins
- Migrate splicing logic to a non-self-dependent system
These will not be met. The ticket is being filed to track the debt, not to resolve it.
Comments
Gabriel [~-1,900,000,000]: Filing this so we have a record. The spliceosome is going to grow. I want it on the books.
Raphael [~-1,850,000,000]: The U4/U6 base-pairing is actually quite elegant, Gabriel. The disassembly mechanism is going to be interesting once we work it out. Adding ~30 more proteins this sprint.
Gabriel [~-1,850,000,000]: The component count was 50 last sprint.
Raphael [~-1,850,000,000]: It was 50. It will be 80. Then 150. The trajectory is not what I would call concerning. It is what I would call necessary.
Uriel [~-1,200,000,000]: Marked as P2. No status change.
Uriel [~-538,000,000]: Marked as P2. No status change.
Uriel [~-66,000,000]: Marked as P2. No status change.
Gabriel [~-300,000]: Spliceosome is at ~150 proteins, 5 snRNAs. Alternative splicing is now responsible for the majority of human proteome diversity. The dependency is more circular than when filed. Status remains Open. Priority remains P2.
The Architect [~-300,000]: P2 is correct. It works. Not touching it.