Tiramisù Calculator · Experimental Validation Plan · v1

Validation Test Path

A staged experimental program to confirm the model's predictions about shelf-life-limiting compartments, hygiene leverage, and quantitative dose-response — using a VHP cup sterilizer (or γ-irradiated materials), controlled-bioburden ingredients, sterile filling, and a controlled-environment fill room.

§ 01Purpose and approach

The strategy report makes specific predictions that are not yet confirmed by published tiramisù-specific data. This document specifies an experimental test path that validates those predictions using the equipment that a competent dairy R&D lab typically has access to. The program is staged so that each tier can be executed independently and the most informative tests come first.

Three tiers, in execution order:

Pass/fail logic for the model as a whole The model is confirmed for screening-tool use if the realistic-mode predictions fall within the experimental confidence band of the corresponding tests, for the majority of Tier 1 tests. Individual deviations on Tier 2 or Tier 3 tests should be used to calibrate specific parameters (e.g. nutrient_factor) rather than reject the model. The model is invalidated if multiple Tier 1 tests show the wrong limiter or the wrong relative ordering of failure times.

§ 02Materials, equipment, organisms

Cup and lid preparation

Three sterility states for cup interior and lid product-facing surfaces, applied independently:

StateMethodExpected residual CFU/cm²Use for
SterileVHP cycle (35 % H₂O₂ vapour, 30 min, 30 °C dwell) or γ-irradiation (25 kGy from accredited supplier)< 10⁻³ (effectively zero)All tests requiring controlled surface bioburden
Inoculated (controlled)VHP-sterilised cup spray-coated with calibrated suspension of P. roqueforti spores in 0.1 % Tween-80, dried under laminar flow10⁻¹ to 10² (target value)Dose-response tests (T2A), nutrient-factor tests (T2B)
Standard industrialCups as received from supplier, no decontamination0.05–0.5 mold spores/cm² (measure by swab to confirm)Baseline / control arm in Tier 1 tests

Bulk ingredients — controlled bioburden

Two ingredient sterility states:

StateMethodUse for
SterileUHT mascarpone reconstituted just before fill; pasteurised egg yolk autoclaved 121 °C 15 min; coffee infusion sterile-filtered (0.22 µm); savoiardi γ-irradiated at 10 kGy; cocoa γ-irradiated at 25 kGyAll Tier 1, 2, 3 tests except where ingredient bioburden is the variable
InoculatedSterile base + calibrated suspension of Z. bailii (yeast) at 10² CFU/g and/or P. roqueforti spores (mold) at target doseBulk-vs-surface tests (T1A), ingredient bioburden tests

Filling environment

ElementSpecification
Cleanroom classISO 7 (cleanroom) for Tier 1 sterile-arm tests; ISO 8 acceptable for Tier 2/3 if measured airborne < 5 CFU/m³
Filling equipmentPre-sterilised piston filler (autoclave or VHP cycle) inside laminar-flow hood
Operator garbingFull sterile suit + gloves + face shield
Settle-plate monitoringOne Sabouraud-Dextrose Agar plate at fill point per session, exposed for 1 min during fill, incubated at 25 °C for 7 days

Challenge organisms

OrganismStrain referenceCultivationInoculum prep
Penicillium roqueforti (worst-case dairy mold)ATCC 10110 or CECT 2904 (national equivalents acceptable)PDA, 25 °C, 7 days until heavy sporulationHarvest conidia in 0.1 % Tween-80 saline, count with haemocytometer, dilute to target dose
Zygosaccharomyces bailii (worst-case dairy yeast)ATCC 60483 or CECT 1131Sabouraud Dextrose Broth, 25 °C, 48 h to stationary phaseCentrifuge, wash, resuspend in sterile saline; plate-count to confirm target dose

Endpoint scoring

Each cup is photographed daily through the (transparent or removed-for-photo) lid under 10× magnification. Endpoint criteria:

All scoring blinded — codes on cups, decoded only at analysis.

§ 03Test inventory — summary table

TestValidatesReplicatesDurationTier
T1A — Bulk vs surface failureLimiter is surface, not bulk8 × 5 cups60 dTier 1
T1B — Cocoa-quality saturationP13/P14/P15 prediction3 × 6 cups45 dTier 1
T1C — Hygiene compositeAll 3 hygiene variables matter (P03 prediction)4 × 5 cups30 dTier 1
T2A — Airborne dose-responset_visible(N) ≈ t_single − σ_germ × ln(N)5 × 6 cups30 dTier 2
T2B — Nutrient-factor calibrationLid < rim < bulk growth rates3 × 6 cups30 dTier 2
T2C — Temperature γ_T checkCardinal-parameter model for mold3 × 6 cups30 dTier 2
T3A — Biscuit pore airVacuum-impregnation >> standard soak under N₂ flush2 × 6 cups60 dTier 3
T3B — Marsala vapour effectVapour reaches surfaces, not just direct contact3 × 6 cups45 dTier 3
T3C — Rim → cocoa drip-flux5 % of rim spores transfer at t = 03 × 8 cups30 dTier 3

Total cells consumed: ~290 cups over ~12 weeks if tests are run serially, or 4-6 weeks in parallel with adequate incubator capacity. Each Tier-1 test requires roughly 30-40 hours of analyst time including prep, fill, scoring, and final enumeration.

§ 04Tier 1 — Core model assumptions

T1A
Limiter identification — bulk vs surface failure
Tier 1 · Foundational

What this tests

The model's most important claim: that for a hygienically-made tiramisù the limiter is a surface compartment (cup_rim, lid, or cocoa_surface) and not the bulk. If this claim is wrong, the entire intervention strategy is wrong.

Design — 2 × 2 factorial

ArmBulk ingredientsSurfaces (cup, lid, cocoa, rim)Predicted limiterPredicted t_fail (realistic)
A1 · Both sterileSterileVHP-sterilisedNone within 60 d> 60 d
A2 · Surfaces dirty, bulk sterileSterileStandard industrial (untreated)cup_rim or lid~17 d (≈ P00b)
A3 · Surfaces sterile, bulk dirtyInoculated with Z. bailii at 10² CFU/gVHP-sterilisedtop_cream or bottom_cream (yeast)~25-30 d
A4 · Both dirtyInoculated as A3Standard industrialcup_rim or lid (surface still faster)~17 d

Protocol

Prepare 4 × 5 = 20 cups per condition (total 80 cups). Cups are 100 g, transparent PP for visual scoring through the side wall.
Apply VHP cycle (35 % H₂O₂, 30 min, 30 °C dwell) to half the cups + lids; verify by swab → DRBC (target < 1 CFU per cup).
For dirty surface arm: leave cups as received from supplier; measure baseline by swabbing 3 cups → expect 0.05–0.5 mold/cm².
Prepare sterile and inoculated bulk ingredient sets per § 02.
Fill all 80 cups in ISO 7 cleanroom under laminar flow, single fill session, single batch per arm.
Seal lids (no flush; ambient air) and store at 4 °C ± 0.5 °C.
Photograph daily through transparent walls; score first-visible mold and first-yeast indication (pH, off-odour through a small sniff-port if equipped, or sacrificial cup at days 7, 14, 21, 28).
At each cup's failure point, enumerate yeast and mold separately on the cream surface, on the cocoa, and on the cup rim (sterile swab + DRBC).

Pass criteria

The model is CONFIRMED for limiter identification if:

  • Arm A1 shows no growth within 60 d (validates sterile-baseline; absence of contamination during fill).
  • Arm A2 fails at 14-22 d with mold on cup_rim or lid as the first visible signal.
  • Arm A3 fails with yeast in the cream (log N ≥ 5) before visible mold on any surface.
  • Arm A4 fails at a time within 20 % of A2 (surface still dominates even with dirty bulk).

The model is REJECTED if A1 fails on its own (process contamination dominates) or A3 fails on surfaces despite sterile cups and lids (means our surface bioburden model is wrong).

Calibration outputs

Even if the model passes, the exact A2 fail time provides the calibration value for the realistic-mode nutrient_factor on the rim. Reset that parameter so the model predicts A2's measured value; rerun the strategy report.

T1B
Cocoa-quality binary saturation
Tier 1 · Foundational

What this tests

The strategy report's strongest practical claim: cocoa quality is a binary lever. Untreated → steam-treated extends shelf life; steam-treated → sterilised does not. If this is wrong, expensive sterilised-cocoa procurement would be worth it.

Design — three arms (replicates of P13, P14, P15)

ArmCocoa (CFU/g)SourcePredicted limiterPredicted t_fail (realistic)
B1 — untreated~3000 yeast / 300 moldAs supplied by typical cocoa wholesalercocoa_surface (mold)~14 d (≈ P13)
B2 — steam-treated~100 yeast / 30 moldNPC, Granocacao, or similar steam-treated gradecup_rim (mold)~21 d (≈ P14)
B3 — γ-irradiated (model "sterilised")< 10 / < 1Sample of B2 cocoa irradiated at 25 kGycup_rim (mold)~21 d (≈ P15)

Protocol

All other variables held at "clean baseline" (P04): VHP-sterilised cups and lids, sterile bulk ingredients except cocoa, ISO 7 fill. Six cups per arm.

Pass criteria

The model is CONFIRMED if:

  • B1 fail time is significantly shorter (Δ ≥ 5 d) than B2 — confirming untreated cocoa is the limiter when present.
  • B2 fail time is within 3 d of B3 — confirming saturation (further cocoa cleaning buys nothing).
  • B1 limiter is identified as cocoa-surface mold; B2 and B3 limiters are rim or lid.

Sample-size note: with ~25-30 % CV typical in challenge tests, n = 6 per arm gives ~80 % power to detect a 5-day difference at the 0.05 significance level via two-sample t-test.

T1C
Hygiene composite — single-variable improvement insufficient
Tier 1 · Foundational

What this tests

The "fix all three hygiene variables together" claim (P03 vs P02 in the strategy report). If cleaning only the airborne deposition or only the cups doesn't help much, that justifies the recommendation; if it does help substantially, the model's area-weighted distribution of airborne deposition is wrong.

Design — four arms

ArmAirborneCupsLidsPredicted t_fail (realistic)
C1 · All dirtyOpen lab (~100 spores/cup)As suppliedAs supplied~12 d (≈ P02)
C2 · Air only cleanISO 7 cleanroomAs suppliedAs supplied~14 d (≈ P03; only modest improvement)
C3 · Cups + lids only cleanOpen labVHP-sterilisedVHP-sterilised~14 d (similar to C2 by symmetry)
C4 · All three cleanISO 7 cleanroomVHP-sterilisedVHP-sterilised~21 d (≈ P04)

Protocol

All four arms use sterile bulk ingredients, steam-treated cocoa (to remove cocoa as a confounding variable), no Marsala.
"Open lab" airborne = standard lab area, not cleanroom; measure baseline by 1-min settle plate next to fill point during each session.
5 cups per arm (n = 20 total).
Score first-visible mold daily for 30 days at 4 °C.

Pass criteria

The model is CONFIRMED if:

  • C1 fails at the predicted time (~10-14 d). Establishes baseline.
  • C2 and C3 both improve over C1 by only 1-3 d (single-variable improvement is small).
  • C4 improves over C1 by 5-10 d (composite hygiene matters).
  • C2 ≈ C3 within experimental error (no large asymmetry between the two hygiene routes).

The model is REJECTED if C2 or C3 alone matches the C4 improvement — this would mean the model's area-weighted deposition model is wrong, and either the airborne or the cup pathway dominates the other in reality.

§ 05Tier 2 — Quantitative parameter checks

T2A
Airborne dose-response curve
Tier 2 · Quantitative

What this tests

The model's quantitative dose-response claim (§ 09 of the strategy report): t_visible(N) ≈ t_single − σ_germ × ln(N) with σ_germ ≈ 0.13 × t_single in realistic mode. Five dose levels confirm both the shape and the slope.

Design — five doses, n = 6 each

ArmMold spores per cup (deposited)Predicted realistic t_fail (d)
D11~22
D210~19
D3100~14
D41 000~9
D510 000~5

Protocol

All 30 cups use VHP-sterilised packaging, sterile bulk ingredients, steam-treated cocoa, no Marsala. Filled in ISO 7 cleanroom.
Before sealing each cup, spray-deposit a calibrated suspension of P. roqueforti conidia (in 50 µL of 0.1 % Tween-80) onto the cocoa surface using a micro-atomiser. Calibrate spore concentration by direct count + plate verification, prepare five 10× dilutions.
Photograph daily, score first visible colony.
Fit measured t_fail vs ln(N) by linear regression. Expected slope is approximately −σ_germ (≈ −2.5 d/decade in realistic mode).

Pass criteria

The model is CONFIRMED if:

  • Mean t_fail decreases monotonically with increasing dose.
  • Regression slope of t_fail vs ln(N) is within ±50 % of −2.5 d/decade (i.e. between −1.3 and −3.8).
  • D5 t_fail is not less than 2 d (the model's 30 % floor).

Calibration outputs

The fitted slope is the true σ_germ for your strain and matrix; use it to update the model's sigma_germ_frac in realistic mode.

T2B
Nutrient-factor calibration on package surfaces
Tier 2 · Quantitative

What this tests

The most uncertain parameter set in the model: nutrient_factor for lid (0.3), rim (0.5), and bulk (1.0). These weren't validated by tiramisù-specific data when the model was built (strategy report § 10 acknowledges this).

Design — three substrate types in identical environment

ArmSubstrate (sterile, identical area = 4 cm²)Predicted relative growth rate
N1Sterile mascarpone cream (bulk dairy proxy)1.0 (reference)
N2Sterile PP coupon coated with sterile condensate from headspace above N1 sample, refreshed daily (lid proxy)~0.3
N3Sterile PP coupon mounted vertically above N1 sample with periodic agitation to simulate splatter (rim proxy)~0.5

Protocol

Prepare 18 small Petri-style chambers: 4 cm² substrate in a sealed 50 mL container with ~10 mL headspace.
Inoculate each substrate centrally with 100 P. roqueforti conidia in 5 µL 0.1 % Tween.
Incubate at 25 °C (accelerated, to get faster signal in 14 d).
Daily photographs at 10×. Measure colony diameter at days 5, 7, 10, 14.
For each arm, fit colony radius vs time to extract Kr (mm/day).
Compute ratios Kr(N2)/Kr(N1) and Kr(N3)/Kr(N1). These are the true nutrient_factor values for your matrix.

Pass criteria

This is a calibration test, not pass/fail. Output: numerical nutrient_factor for lid and rim (each as a ratio relative to the bulk cream). Compare to model defaults (0.3 and 0.5) and update accordingly.

The model is broadly correct if the measured values fall in [0.1, 0.6] for lid and [0.3, 0.8] for rim. Outside these ranges, re-examine the assumed deposition physics.

T2C
Temperature cardinal-parameter check
Tier 2 · Quantitative

What this tests

The model uses Rosso/Zwietering cardinal-parameter γT with Tmin = −2 °C and Topt = 25 °C for P. roqueforti. Check that the predicted ratios of growth rate at 4, 8, 12 °C match measurement on your specific strain.

Design — same matrix (sterile cream cups, fixed inoculum 100 spores), three temperatures

ArmTemperaturePredicted γ_TPredicted relative t_fail vs T1 (4 °C)
T14 °C ± 0.50.0521.0×
T28 °C ± 0.50.1390.37× (faster)
T312 °C ± 0.50.2810.18× (much faster)

Protocol

Six cups per temperature, all otherwise identical (sterile packaging, sterile cream, 100-spore inoculum on cocoa surface). Three calibrated incubators with continuous logging. Score daily for 30 days.

Pass criteria

The model is CONFIRMED if measured t_fail ratios are within ±30 % of predicted ratios.

This is a sensitive check because the cardinal model is the most-validated component of predictive microbiology; if it fails here, something specific to your strain or matrix is wrong rather than the general formalism.

§ 06Tier 3 — Specific mechanisms

T3A
Biscuit pore air defeats MAP — vacuum impregnation vs standard soak
Tier 3 · Mechanism

What this tests

The strategy report's most economically consequential mechanism claim: an Al-foil lid + pure N₂ flush gives ~15 d if the biscuit is standard-soaked but >60 d if vacuum-impregnated. The trapped pore air carries enough O₂ to support surface mold for weeks.

Design — two arms × six cups

ArmBiscuit treatmentPredicted realistic t_fail
P1Standard 5 s espresso dip (residual pore-air fraction ~50 %)~25-30 d
P2Vacuum impregnation: biscuit + sterile coffee in vacuum chamber, 15 min at 20 mbar, vent slowly (pore air ~5 %)> 60 d

Protocol

All other variables identical: VHP-sterile cups + Al-foil lids, sterile cream, ISO 7 fill, pure N₂ flush at sealing.
Inoculate cocoa surface with 100 P. roqueforti spores (controlled challenge).
Optional: measure headspace O₂ at days 0, 7, 14, 30 by needle sampling through a self-sealing septum patch.
Score first-visible mold daily.

Pass criteria

The model is CONFIRMED if:

  • P1 fails between 18 and 40 d.
  • P2 shows no visible mold within 60 d.
  • Headspace O₂ in P1 starts at ≥ 0.5 % at day 0; P2 starts at < 0.1 %.

If P2 also fails within 30 d, either the biscuit didn't impregnate as expected (measure residual pore air directly by mass uptake) or there is a continuous biscuit-headspace flux that the model doesn't capture (strategy report § 10 limitation).

T3B
Marsala vapour effect on sterile surfaces
Tier 3 · Mechanism

What this tests

Marsala's mechanism in the model is vapour-phase ethanol partition into the headspace (Henry's law K_LV ≈ 0.05 at 4 °C, Dantigny 2005 inhibition with MIC = 5 % v/v). At 2 % aqueous biscuit ethanol the predicted γ_EtOH ≈ 0.97 — small but measurable on a sufficiently sensitive test.

Design — three arms × six cups

ArmBiscuitPredicted realistic t_fail
M1 — no alcoholSterile biscuit + sterile coffee only~21 d (≈ P04)
M2 — standard MarsalaSterile biscuit + sterile coffee + 2 % v/v ethanol~24 d (small Marsala benefit)
M3 — heavy MarsalaSterile biscuit + sterile coffee + 5 % v/v ethanol~32 d (strong vapour effect)

Protocol

All cups VHP-sterile, all bulk ingredients sterile except for the biscuit ethanol content. Cocoa surface inoculated with 100 P. roqueforti spores. Filled in ISO 7. Stored at 4 °C, scored daily for 45 d.

Pass criteria

The model is CONFIRMED if M1 < M2 < M3, with M3 − M1 ≥ 7 d and M2 − M1 in the range 1-5 d.

If M2 = M1 within noise (no Marsala benefit at standard concentration), the Henry's-law partition coefficient in the model is wrong or the Dantigny MIC should be lowered.

T3C
Rim → cocoa drip-flux at t = 0
Tier 3 · Mechanism

What this tests

The model assumes 5 % of cup-rim spores transfer to the cocoa at t = 0 via gravity, vibration during transport, and condensation runoff. This is one of the more speculative model assumptions. If wrong by 10×, the limiter ranking between rim and cocoa shifts.

Design — three arms × eight cups

ArmRim contaminationCocoa contaminationPredicted first-visible site
R1 — rim only500 P. roqueforti spores on rim (sterile cocoa)0Rim first, then cocoa later from drip
R2 — cocoa only0 (sterile rim)25 spores on cocoa (= 5% of rim count)Cocoa only, at same time as R1's cocoa colony
R3 — control00None within 60 d

Protocol

VHP-sterile cups and lids; sterile bulk ingredients; ISO 7 fill.
Apply spore suspensions to specific anatomical sites with micro-syringe before bulk fill: rim spray (R1), cocoa centre droplet (R2), neither (R3).
Simulate transport vibration: 30 s on orbital shaker (150 rpm) immediately after sealing.
Score daily for 30 d. Record location of first visible colony separately from later colonies.

Pass criteria

The model is CONFIRMED if:

  • R1 shows rim colonies first; cocoa colonies appear 3-10 d later (consistent with 5 % drip).
  • R2 shows cocoa colonies at the same time as R1's first cocoa colony (matched dose hypothesis).
  • R3 shows no growth (sterile-fill validation).

The drip fraction can be back-calculated from the time gap between R1's rim-colony and R1's cocoa-colony if the model's t_visible(N) curve is independently calibrated (via T2A). Update rim_drip_fraction accordingly.

§ 07Execution sequence and decision tree

Execute in three stages. Each stage gates the next: a failed Tier 1 test means the model needs rebuilding before further tests have meaning.

Stage 1 — Foundational (weeks 1-6)

Run T1A, T1B, T1C in parallel if you have the cleanroom capacity; otherwise serial. These collectively use ~120 cups and ~120 analyst-hours. After Stage 1:

Stage 2 — Quantitative calibration (weeks 5-10, overlapping)

Run T2A, T2B, T2C. These produce numerical calibration values that should be fed back into the model's defaults. After Stage 2 the model is locked in for your specific strain + matrix.

Stage 3 — Mechanism confirmation (weeks 9-15, overlapping)

Run T3A, T3B, T3C. These are the most product-specific tests and inform recipe / packaging decisions. If T3A confirms biscuit-pore-air dominance, the recommended capex is vacuum impregnation; if T3B confirms Marsala vapour effect, alcohol-free formulations need a stronger surface-protection compensation strategy.

After full validation A successful pass of Stage 1 + 2 makes the calculator a defensible screening tool for your products. Stage 3 sharpens the recommendations. Total program: ~15 weeks, ~290 cups, ~400 analyst-hours, estimated direct cost €15-25k excluding equipment depreciation.

§ 08Statistical and reporting requirements

Sample size and power

Sample sizes specified per test (n = 5-8 per arm) are based on:

For survival-curve comparison (Kaplan-Meier with log-rank test, the appropriate analysis for first-visibility data), n = 6 per arm gives ~80 % power to detect a hazard ratio of 2.5. Reduce or expand based on your prior data.

Reporting template per test

Each completed test should produce a one-page summary including:

  1. Test ID, dates, operator initials, lot numbers of all materials
  2. Settle-plate / swab baseline measurements for each arm (verifying that the intended bioburden was achieved)
  3. Kaplan-Meier curves of time-to-first-visible-mold per arm, with 95 % confidence bands
  4. Final yeast and mold enumeration per compartment (cream, cocoa, rim swab, lid swab)
  5. Photographs of representative endpoints
  6. Pass/fail call against the model's predictions per the test-specific criteria above
  7. Updated parameter recommendations for the model (if any)

Cross-test consistency checks

Beyond individual test outcomes, check consistency:

Inconsistencies between these points indicate either lot-to-lot variation in materials, undocumented protocol drift, or strain instability — investigate before drawing larger conclusions.

§ 09What this program does NOT validate

Honest scoping — these are explicitly out of scope:

A validated screening tool plus the items above plus a sensory panel collectively constitute the basis for setting a defensible commercial shelf life. This program addresses only the first.