Scientific plan

The full strategy doc, kept in the repo so it evolves with the work. The status banner below is auto-generated from BUILD-PROGRESS.md on every build.

Build status

17 done · 19 open · 45% complete

Foundations (new — gates AWS deploy + local-dev verification)

  • F.1~~Local-dev Docker Compose stack~~ **Reverted.** AWS-only chosen; Docker artifacts removed. Was in cc87ef9, reverted in
  • F.2`infra/` AWS CDK app skeleton (TypeScript) — empty `Mud2dustCore-dev` stack, `cdk.json` config, `aws-cdk-lib` + `constructs` deps, `cdk:synth` clean, `cdk:bootstrap` script for us-west-2 + us-east-1. _Done in
  • F.3~~`docs/local-dev.md`~~ **Reverted.** Replaced by F.4 (AWS-direct dev guide). Was in cb00d70, removed in the dev-mode-sw
  • F.5`infra/` NEW `Mud2dustDev` stack — Aurora Serverless v2 cluster (Postgres 15, `min_capacity=0` auto-pause, `max_capacity=2`, publicly-accessible, SG locked to dev IP), master credentials in Secrets Manager `mu
  • F.4`docs/dev-setup.md` AWS-direct developer setup: prereqs, AWS + Tiger CLI auth, connecting to Aurora + Timescale Cloud, running sensor-bridge locally, common operations, troubleshooting. _Done in d19bf79._
  • F.6`docs/deployment.md` step-by-step prod deploy: account split, region rationale, stack catalog, `cdk deploy` per stack, secret rotation, rollback, teardown, cost levers, Phase 4 cutover checklist. _Done in e97e391; updated

Phase 0 scaffolding

  • P0.1Top-level monorepo scaffolding `pnpm-workspace.yaml`, `.nvmrc`, `.editorconfig`, `tsconfig.base.json`, root `package.json` workspace config, sub-dir READMEs (`sensor-bridge/`, `pipeline/`, `titiler-deploy/`, `site/`, `infra/`). _Do
  • P0.2`sensor-bridge` TypeScript + Hono scaffold, `package.json`, `tsconfig.json`, eslint, vitest. Health endpoint + base router. _Done in d55bc47._
  • P0.3`sensor-bridge` OpenAPI 3.1 spec for v1 endpoints + types via `openapi-typescript` + `/v1/openapi.json` route serving the spec. _Spec in bdc7a5e (`sensor-bridge/openapi/v1.yaml`); type gen + route wiring + tests in 5
  • P0.4`sensor-bridge` SQL migrations split across Aurora (relational + spatial) and Timescale Cloud (hypertables). Aurora: contributors, stations, boundaries, annotations, samples, collections, events*meta. Timescale: obse
  • P0.5`sensor-bridge` domain types (re-exported from generated OpenAPI types), trust-model utility (`sensor_class × operator_class × installation_quality → training_weight`), coordinate-fuzzing (deterministic seeded jitter
  • P0.6`sensor-bridge` vendor-adapter framework + registry + generic-json webhook adapter + YDOC adapter (parses VWC/TEMP/EC channel names → soil*vwc/soil_temp/soil_ec at depth) + HMAC verify/sign helper. \_Done in 84d6d48.

Phase 1 — Anchor + one satellite layer

  • P1.1`infra` core resources substantially covered by F.5: Aurora SS v2 + dev S3 buckets (`mud2dust-cogs-dev`, `mud2dust-contributions-dev`) + Secrets Manager namespace (`mud2dust/dev/*` for service creds, `contributor/{id}/*` fo
  • P1.2`pipeline` Sentinel-1 ingest **as a CLI** — `mud2dust_pipeline` Python package, `CmrClient` against NASA CMR, OPERA RTC-S1 source, EDL auth (`9f576ca`), mosaic + COG writer (`5b5dfb1`), HTTPS downloader + end-to
  • P1.6(NEW, formerly P1.2b iter 4) `pipeline` Lambda packaging — Dockerfile based on AWS Lambda Python 3.12 with GDAL/rasterio, lambda_handler wrapping the CLI ingest logic, CDK `PipelineStack` with the DockerImageFunction + EventBridg
  • P1.3`titiler-deploy` container Lambda (Python 3.12 ARM_64) running titiler.core + Mangum, CDK `Mud2dustTitilerStack` with HTTP API + CloudFront + tile-friendly cache policy. Real OPERA σ⁰ COG served as PNG tiles end-to-en
  • P1.4`site` Next.js 15 App Router scaffold, public landing page at `/`, Mapbox GL JS map (basemap = Mapbox Satellite Streets), **layer-registry pattern** at `site/src/layers/registry.ts` with the σ⁰ entry (raster · **iter 1 done in 215399a:** local Next.js + Mapbox + layer-registry + credits-registry + LayerPanel with Sources block per card and "Built on open data — credits & thanks" footer rendering all six
  • P1.5`sensor-bridge` `/v1/observations` and `/v1/stations` POST handlers wired to Aurora (stations) + Timescale Cloud (observations); 74/74 tests pass; smoke against real DBs verified — POST station returns 201 with jitte
  • P1.7`dev-site` Next.js 15 static-export at `mud2dust.dev`, plus brand mark + OG card on **both** site + dev-site so iMessage / WhatsApp / Slack / Twitter / Android+iOS share previews + home-screen banners all render · **OpenAPI explorer (Scalar) for `/api/`** is queued as a follow-up — the static page lists endpoints today; the interactive renderer lands when sensor-bridge serves the OpenAPI spec at `https://api
  • P1.8API versioning hardening `src/versions.ts` is the single source of truth (id, label, status, sunsetDate, successor, openapi URL). Sensor-bridge serves `GET /v1/version` returning `{current, all[]}`. `src/middleware/deprecatio
  • P1.9AWS SES for `mud2dust.com` CDK `SesStack` provisioning (a) SES domain identity for `mud2dust.com`, (b) DKIM key auto-rotated, (c) configuration set with bounce/complaint event tracking, (d) IAM role granting `ses:SendEmail` to

Phase 2 — Multi-source fusion

  • P2.1`pipeline` Earthdata Login → STS Lambda layer (token exchange).
  • P2.2`pipeline` HLS Sentinel-2 ingest Lambda — NDVI/NDWI/NDRE COGs. **Adds layer cards** to `site/src/layers/registry.ts` for each (NDVI = greenness/canopy density, used to gate σ⁰ sensitivity to soil; NDWI = canopy
  • P2.3`pipeline` HLS Landsat ingest Lambda — thermal LST + reflectance. **Adds LST layer card** (surface temperature; combines with σ⁰ in the fusion model — drier soils heat faster mid-day).
  • P2.4`pipeline` HRRR ingest Lambda (us-east-1) — 7-day rolling precipitation accumulation COG. **Adds precip layer card** (precipitation history is the strongest single predictor of VWC at the surface; the fusion mod
  • P2.5`pipeline` static priors one-shots — POLARIS (UC Davis), Copernicus DEM. **Adds prior layer cards** (soil texture from POLARIS sets retention capacity; topographic wetness index from DEM modifies drainage assump
  • P2.6`pipeline` fusion model training scaffold (scikit-learn RF) + serving Lambda (per-pixel inference). **Adds OPERA RTC-STATIC local incidence angle (LIA) as a model input** + **adds the calibrated VWC layer card**

Phase 3 — Open public API + dashboard

  • P3.1`pipeline` WAF web ACL stack, attribution headers via CloudFront response headers policy.
  • P3.2`pipeline` pgstac on Aurora — STAC catalog populated for `mud2dust-sigma0`, `mud2dust-moisture-rootzone`, `mud2dust-ndvi`, etc.
  • P3.3`site` public dashboard at `/` — calibrated map browse via Mapbox GL JS + tile API, date picker, expanded LayerPanel (now showing all Phase-2 layers), attribution.
  • P3.4`site` STAC catalog browse page (`/stac`) — collection list + per-item view.
  • P3.5`site` AOI extract UI — paste GeoJSON or draw bbox, submit to `/v1/extract`, poll for result.
  • P3.6`site` Auth.js (NextAuth v5) wiring with email magic link + ORCID OAuth providers. User signup → `pk_view`.
  • P3.7`site` contributor dashboard skeleton at `/dashboard` — empty-state guidance, "connect a sensor" CTA.

Cross-cutting

  • X.1`.gitlab-ci.yml` lint + test pipelines per subdir.
  • X.2Root README updated with monorepo structure + per-subdir how-to-run.
  • X.3`docs/deployment.md` _moved earlier as F.4._
  • X.4`docs/local-dev.md` _moved earlier as F.3._

mud2dust — strategy plan (rev. 2026-05-06)

Status: plan stage. Empty repo. Domains secured. Hardware in hand. METER meeting scheduled 2026-05-07. Phase 0 implementation begins after METER meeting and AWS/repo scaffolding.

Brand: mud2dust is locked. Domains owned: mud2dust and mudtodust across .com, .net, .org, .io, .dev, .ag, .farm, .ai, .earth, .co (20 domains total). Primary mud2dust.com. Defensive coverage is comprehensive — no further TLD acquisition needed.

Hardware in hand: 6× Sentek 36" drill-and-drop probes (multi-depth VWC + temp + EC) on YDOC ML-417ADS data loggers. Plus Tempest weather stations and Vaisala WXT520. Campbell Scientific in conversation. METER hardware to be purchased; logger choice (YDOC vs ZENTRA Cloud) deferred per deployment.


Context

mud2dust is two products built on one platform, deliberately:

  1. An open, calibrated, high-resolution soil-moisture map of US agricultural land — daily 5–30m VWC raster fused from satellite + soil-texture priors + a contributor probe network + (when available) airborne L-Band SAR cal/val campaigns.
  2. A multi-vendor contribution platform — farmers, researchers, hobbyists, and partner applications connect not just soil moisture and weather time-series but seven first-class data shapes: observations, profiles, samples, events, collections, annotations, and boundaries. mud2dust normalizes, stores, visualizes, and exposes all of it via API. High-trust contributions train the calibration model. Everyone benefits from the calibrated output.

The platform is what makes the map possible; the platform is also the immediate carrot. Users can browse the public map for free without signing up; contributors sign up to connect their hardware, lab samples, drone flights, agronomist annotations, or field boundaries; partner applications (Farming Game first) integrate via OAuth.

Two existing planning documents in the sibling farminggame project describe overlapping pieces. mud2dust unifies them; Farming Game becomes the first OAuth partner application. The 6× Sentek deployment originally scoped under farminggame Phase 13a becomes mud2dust's anchor training stations.

This plan supersedes both source documents on the soil-map question. Farming Game's Phase 13 should be retitled to "Integrate with mud2dust": deploy Sentek + Vaisala + Tempest sensors as anchor stations, register them via mud2dust's OAuth API, replace AGROMONITORING_API_KEY with mud2dust API calls, drop the in-product SAR pipeline.


1. Vision

An open, calibrated, high-resolution soil-moisture map of US agricultural land — free for research and small operators, paid only at the bulk-extraction tier — built on a multi-shape contribution platform that any farmer, researcher, drone operator, or partner app can connect to.

1a. The map

Daily 5–30m volumetric water content (VWC) raster, fused from Sentinel-1 SAR backscatter + Sentinel-2 vegetation indices + soil-texture priors + the contributor station network. Around it, the same infrastructure serves NDVI/NDWI/NDRE, thermal LST + ET, precipitation, static priors. Pitch the soil moisture map; ship the rest because the pipeline already passes it.

1b. The platform

Multi-shape, multi-vendor contribution surface — contributors connect ZENTRA Cloud, WeatherFlow, WeatherLink, FieldClimate accounts; push from raw loggers (YDOC, Campbell, Davis); upload drone or aircraft scenes; submit lab sample results; mark field boundaries; add agronomist annotations. Cross-source unified dashboard. Per-contribution provenance and trust scoring. Free for everyone. Partner applications integrate via OAuth.

1c. The cal/val story

Distributed L-Band drone SAR campaigns over instrumented fields create triple-validated calibration anchors (in-situ probes + airborne L-Band + Sentinel-1 C-Band). This is SMAPVEX-class cal/val data, distributed across contributors instead of NASA campaigns only. Distinct, fundable research narrative for NASA-CSDA / NSF.

Pitch — to map users

Stop paying $300–1500/mo for someone else's wrapper around free public data. Use the same data, calibrated against ground truth, with an open governance model and an attribution-only license.

Pitch — to platform contributors

One place to see all your soil and weather sensors regardless of vendor — and your drone flights, lab results, field boundaries, and agronomist notes alongside them. We handle the protocols. If you have well-installed research-grade hardware or calibrated airborne instruments, your data improves the public moisture model and you get higher API tiers. If you don't, you still get the dashboard, the cross-source exports, and the calibrated companion data — free.

Pitch — to partner-app developers

Integrate once via OAuth, get sensor data, drone uploads, and calibrated outputs for any user who connects to mud2dust. No per-vendor adapter code in your app. Same tokens give your users access to their own data and to the public model.

Why "open" matters strategically (not ideologically)

  1. Network effects on calibration. Every contribution makes the model better for everyone in that soil/canopy class. Closed competitors can't accept arbitrary contributor data because their license forbids redistributing improvements.
  2. Default citation status. Academic papers cite open infrastructure (OpenStreetMap, Zenodo, OpenET). Once cited 100 times, you're permanent.
  3. Grant and consortium funding. NSF, USDA NIFA, NRCS Conservation Innovation Grants, NASA Western Water Applications all fund open-data projects. None fund closed SaaS.
  4. Vendor-agnostic platform attracts users locked into single-vendor portals. Onset HOBOlink, ZENTRA, WeatherFlow each lock data in. mud2dust attacks the lock-in.

Competitive landscape

  • Wrappers around free satellite data — Agromonitoring, EOS Crop Monitoring, OneSoil. Thin convenience layers.
  • Single-vendor sensor platforms — Onset HOBOlink, METER ZENTRA Cloud, WeatherFlow Tempest, Davis WeatherLink. Each is a closed silo per vendor.
  • Drone data platforms — DroneDeploy, Pix4D, OpenDroneMap (open). Imagery processing and storage; not a calibration target, not a multi-shape platform.
  • Real proprietary products — Climate FieldView, Granular. Genuinely private data and bundle it. Not the target.

2. Why this is feasible now (and not in 2018)

2a. AWS Open Data Sponsorship (matured ~2020). AWS pays storage + intra-region egress.

2b. COG / STAC tooling (matured ~2023). GDAL 3.x, rasterio 1.3+, titiler, rio-tiler, pgstac, pystac-client. STAC is the lingua franca for episodic raster, including drone data — the same catalog handles satellite scenes and contributor-uploaded drone flights.

2c. Indigo Ag's RTC bucket. Indigo invested ~$2M/year preprocessing Sentinel-1 to terrain-corrected γ⁰ COGs in AWS Open Data. ~80% of the satellite engineering work is already done.

2d. Ground-truth network is bootstrappable. Sentek anchor stations at JMR + per-contributor data + (when available) METER research network + episodic L-Band drone campaigns. Multiple sources of training data, not one vendor.

2e. Drone L-Band SAR is now commercially available. ImSAR, AeroVironment, others. Was $5M+ aircraft-only in 2018; now sub-$100K drone-mounted. Distributed cal/val is feasible.


3. Architecture

3a. System diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                    SATELLITE INGEST (Fargate / Lambda)                  │
│  Sentinel-1, Sentinel-2/HLS, Landsat, HRRR, NEXRAD, GOES, SMAP,         │
│  ECOSTRESS, MODIS, TROPOMI, POLARIS, DEM, SSURGO, OpenET, etc.          │
└──────────────────────────────────┬──────────────────────────────────────┘
                                   ▼
                 PROCESSING: σ⁰ → VWC, NDVI/NDWI, fusion model, etc.
                                   ▼
                      S3 (us-west-2) — output COGs
                                   ▼
                CloudFront + titiler + STAC catalog (pgstac/RDS)
                                   ▼
        ┌──────────────────┬──────────┴──────────┬──────────────────┐
        ▼                  ▼                     ▼                  ▼
   Public web        Python/R/QGIS         Partner apps      Contributor
   (free map browse) via STAC              via OAuth         dashboards

┌──────────────────────────────────────────────────────────────────────────┐
│                CONTRIBUTION BRIDGE (mud2dust/sensor-bridge)              │
│                                                                          │
│  Seven first-class object types — one auth/trust/privacy stack          │
│                                                                          │
│  Observation  Profile  Sample   Event    Collection  Annotation Boundary │
│  (time-       (multi-  (lab/    (drone/  (multi-     (geotag    (field   │
│   series      depth)   discrete  flight)  flight       note)    bounds)  │
│   point)               sample)            survey)                        │
│      │         │         │        │         │           │         │      │
│      ▼         ▼         ▼        ▼         ▼           ▼         ▼      │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │         Adapter registry (per vendor × per shape)               │   │
│  │  YDOC, Campbell, ZENTRA Cloud, WeatherFlow, WeatherLink,        │   │
│  │  FieldClimate, Onset, generic JSON, generic CSV/SFTP,           │   │
│  │  drone-COG-upload, soil-lab-CSV, shapefile/GeoJSON,             │   │
│  │  CRNP, AmeriFlux pull, ...                                      │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                          ▼                                              │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  Intake gateways:                                                │   │
│  │   • HTTPS webhook  • SFTP server   • MQTT broker                │   │
│  │   • Presigned-multipart S3 upload (for Events)                  │   │
│  │   • OAuth-pull workers (per contributor token)                  │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                          ▼                                              │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  Per-shape storage:                                              │   │
│  │   Observation/Profile → TimescaleDB                             │   │
│  │   Sample             → Postgres (relational)                    │   │
│  │   Event/Collection   → S3 + pgstac                              │   │
│  │   Annotation         → Postgres + PostGIS                       │   │
│  │   Boundary           → Postgres + PostGIS                       │   │
│  │  + Secrets Manager (per-contributor vendor tokens)              │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────────┘
                                ▲                ▲
                                │                │
              Nightly retrain ──┘                └── Calibration consumer
              (training-tier contributions)         (everyone gets corrections)

3b. Region rationale (two AWS regions, two worker pools)

  • us-west-2 — most Tier 1 + Tier 2 satellite buckets (~70%); also primary for sensor-bridge, dashboards, and APIs.
  • us-east-1 — entire NOAA Big Data Program (~25%); workers only.
  • Tier 3 (non-AWS) — POLARIS at UC Davis, SoilGrids at ISRIC, USDA CropScape FTP, OpenET via Earth Engine, TROPOMI via Copernicus.

Intra-region S3 reads free; cross-region $0.02/GB.

3c. Worker types

TypeUse caseLimitsCost shape
LambdaPer-scene index calc, COG mosaicking, STAC, intake webhooks, sensor pull workers15 min, ~10 GB RAM, 5 GB ephemeralPay-per-invoke
FargateCross-region pulls, large mosaicking, GDAL native deps, drone preprocessing, retraining, historical-backfillAny duration, any RAMPer-second billing
EC2 spotOnly if Fargate gets expensiveSpot interruptionCheaper at sustained load

Failure isolation rule: each satellite source and each adapter (vendor × shape) gets its own Lambda or Fargate task, triggered independently by EventBridge. SMAP outage doesn't block Sentinel-1; ZENTRA outage doesn't block YDOC push; a stuck drone-preprocessing job doesn't block Observation ingest.

3d. Processing layer

  1. σ⁰ → VWC index per pixel. Empirical model trained against high-trust contributions (Sentek anchor + verified researchers + L-Band drone campaigns + METER if/when accessible).
    • v1: linear index = (sigma0 - sigma0_dry) / (sigma0_wet - sigma0_dry).
    • v2: scikit-learn random forest trained only on training-tier contributions.
    • v3+: XGBoost or small NN; Fargate or SageMaker.
  2. NDVI / NDWI / NDRE / EVI. Trivial pixel math from HLS reflectance.
  3. ET ensemble. Consume OpenET for v1.
  4. Per-pixel calibration. Conditions on POLARIS soil texture, HRRR precipitation, DEM topographic wetness index. The research contribution that justifies the project's existence.
  5. Correction surfaces. For every contributor station, every nightly retrain produces (satellite_estimate, ground_truth, delta) series. Surfaced on the contributor's dashboard regardless of trust tier.
  6. Drone L-Band cal/val anchors. When a drone Event is tagged as L-Band SAR with proper radiometric cal, the preprocessor extracts pixel-level σ⁰ over the flight extent and timestamps it; nightly retrain treats those as high-weight anchor scenes, similar to in-situ profiles.

3e. Output bucket structure

s3://mud2dust-cogs/
  ├── moisture-surface/   moisture-rootzone/
  ├── ndvi/  ndwi/  ndre/  lst/  et-daily/  precip-7d/  ...
  ├── sigma0/                          (raw, 90-day retention)
  ├── priors/
  │   ├── polaris/ soilgrids/ ssurgo/ dem-30m/ (static, indefinite)
  └── ...

s3://mud2dust-contributions/
  ├── events/{contributor_id}/{flight_id}/scene.tif    (drone, aircraft)
  ├── collections/{contributor_id}/{survey_id}/
  ├── samples-attachments/{contributor_id}/{sample_id}/lab-report.pdf
  └── boundaries/{contributor_id}/{boundary_id}.geojson

Lifecycle: raw satellite layers 90d, fused/calibrated indefinite, priors indefinite. Contributor raster Standard 90d → IA → Glacier IR (policy details deferred per §12l).

3f. Tile serving — titiler on Lambda + CloudFront

Standard pattern. Cold start ~500ms, warm ~30ms per tile. Two-tier caching (CloudFront edge + titiler render). Same titiler also serves contributor-uploaded drone COGs that opted to public visibility — /tiles/contrib/{event_id}/{z}/{x}/{y}.png.

3g. STAC catalog — pgstac on Aurora Serverless v2

Sub-100ms search. Two collection types:

  • mud2dust-{layer} — official calibrated layers (anonymous public)
  • contrib/{contributor_id}/{collection_id} — contributor-uploaded Events and Collections (public if opted; private otherwise)

The same pgstac instance handles both — STAC was designed for this.

3h. Contribution bridge — seven first-class object types

One repo (mud2dust/sensor-bridge), MIT-licensed. All seven shapes share auth, multi-tenancy, trust scoring, privacy controls, and adapter registry. Per-shape modules differ only in schema and validators.

Object typeCoversSchema basisStorageIngest
ObservationTime-series point readings (soil moisture, weather, CRNP, GNSS-IR, sap flow, eddy-cov fluxes, stream gauge)OGC SensorThings (Observation)TimescaleDBHTTPS push, REST pull, SFTP, MQTT
ProfileMulti-depth time-series (Sentek 9-depth, soil temp profiles, lysimeters)SensorThings extended with depth_cm[]TimescaleDBHTTPS push, REST pull
SampleDiscrete lab results (gravimetric VWC, bulk density, texture, OM, plant tissue, LAI)Custom (sample location/depth/method/lab)Postgres + S3 attachmentsHTTPS POST + optional PDF/CSV upload
EventSingle drone flight, single aircraft pass, single PhenoCam capture, irrigation event, planting eventSTAC ItemS3 (raster) + pgstacPresigned S3 multipart + STAC item POST
CollectionMulti-flight drone survey, multi-pass aircraft campaign, weekly PhenoCam seriesSTAC CollectionpgstacPOST /v1/collections then add Events
AnnotationGeotagged agronomist note, citizen-science observation, photo with timestamp + location + free textGeoJSON Feature (structured properties)Postgres + PostGISHTTPS POST
BoundaryField boundary, management zone, EC-mapped zone, irrigation prescription extentGeoJSON Feature (Polygon/MultiPolygon)Postgres + PostGISHTTPS POST or shapefile/GeoJSON upload

Common cross-cutting concerns (single implementation, applied to all shapes):

  1. Per-contributor credential vault — AWS Secrets Manager, namespaced contributor/{id}/{vendor}. Encrypted, scoped IAM, rotation supported. Contributor revocation deletes the token; in-flight pulls drain.
  2. Trust model — see §3i. Every contribution lands with sensor_class, operator_class, and computed trust_weight.
  3. Privacy / coordinate fuzzing — every shape has geom_internal (exact, model-internal) and geom_public (jittered ±N km or aggregated). Default fuzzed; opt-in to exact precision per object. Boundaries respect a visibility flag (private / aggregate-only / public).
  4. Provenance — every contribution carries source_adapter, raw_ref (pointer to the original payload in cold storage), submitter_user_id, submitting_app_id (for OAuth-submitted), creation timestamp, last-edit timestamp.
  5. QC — per-shape validators. Observation: range checks, constant-value sensor failure. Profile: depth-monotonicity sanity. Sample: unit + range. Event: COG conformance + radiometric cal flag (for SAR/multispec). Annotation: schema validation. Boundary: topology validity (no self-intersection, valid CRS).
  6. Adapter registry — adapters declare which shapes they can produce. Some adapters output multiple shapes (a Campbell logger emits Observations + Profiles; a drone upload pipeline emits Event + auto-derived Annotations like NDVI summary).

Vendor adapter set (initial, see §5b for full table):

  • Sentek + YDOC, Campbell HTTP push, Vaisala-on-logger, METER on YDOC or ZENTRA Cloud
  • Tempest WeatherFlow, Davis WeatherLink, FieldClimate, Onset HOBOlink
  • Drone uploads: generic COG, OpenDroneMap-processed output, common drone vendor exports (DJI Terra, Pix4D, DroneDeploy)
  • Lab CSV / NRCS report PDF (Sample)
  • Shapefile / GeoJSON (Boundary)
  • AmeriFlux pull (Observation, eddy-cov fluxes — at training-contributor request)
  • USDA SCAN/SNOTEL pull (Observation)

3i. Trust model — applies to every shape

Two independent dimensions combine into a per-contribution trust_weight:

sensor_class (intrinsic to the source):

ClassExamples
researchMETER Teros 12 with cal docs; Campbell CS655 fresh-cal; L-Band drone with corner reflectors and radiometric calibration; lab analysis from accredited lab; AmeriFlux eddy-cov tower
professionalSentek drill-and-drop, Vaisala WXT520, agronomist-grade EC mapping, multispectral drone with calibration panel
consumerTempest, Ambient Weather, basic capacitance probes, hand-flown DJI RGB without panels
diyArduino sensors, citizen-science photo annotations

operator_class (who installed/collected/submitted):

ClassVerification
researcherInstitutional email + ORCID/ROR
agronomist_supportedSelf-declared, optional pro reference
farmerSelf-declared, address geocodes to ag land
hobbyistSelf-declared homeowner
unknownAnonymous

training_weight = sensor_class × operator_class × installation_quality_factor (the third factor is earned over time — contributions that track neighbors and pass QC accumulate quality; outliers lose it).

Two paths through the system:

  1. Training contributions (training_weight ≥ 0.5): used in nightly model retrain. Few hundred training stations + a few dozen calibrated drone campaigns per year at maturity.
  2. Correction-takers (everyone else): full feature parity, dashboard, exports, calibrated companion data. Not used in retrain, but bulk anomaly signal informs diagnostics.

UX rule: never show users their trust tier as a number or rank. What we surface instead is contribution health — drift detection, cross-source consistency, sensor-failure or scene-quality flags. Helpful, not pejorative.

3j. Identity, auth, and partner apps

Three identity types:

  • User — a person (email + password, optional MFA via Cognito or Auth.js).
  • Organization — farm, lab, agency. Owns contributions collectively.
  • Partner application — third-party app integrating via OAuth 2.0 + PKCE.

OAuth scopes:

ScopeAllows
stations:read stations:writeManage stations
observations:read observations:writeTime-series I/O
profiles:read profiles:writeMulti-depth time-series I/O
samples:read samples:writeLab samples
events:read events:writeDrone/aircraft scene I/O (write requires presigned upload flow)
collections:read collections:writeSurvey grouping
annotations:read annotations:writeGeotagged notes
boundaries:read boundaries:writeField/zone vector I/O
corrections:readCalibrated companion data
alerts:readFrost/anomaly events
public:readPublic tile/STAC API (no user context)

Verification mechanisms:

  • operator_class = researcher: ORCID iD with affiliation; manual review for borderline cases.
  • operator_class = farmer: station coords geocode to USDA-mapped agricultural land (CDL/CSB).
  • operator_class = hobbyist: default.

3k. Where the existing hardware fits

The 6× Sentek 36" drill-and-drop probes on YDOC ML-417ADS go in at JMR as the anchor training station set — multi-depth profiles ideal for calibrating both surface and rootzone VWC outputs. Tempest + Vaisala WXT520 round out on-farm meteorology. METER hardware (planned purchase) lands via either YDOC or ZENTRA Cloud per deployment.

The Sentek drill-and-drop is better training data than the originally-planned single-depth Teros 12 — multi-depth profiles let the calibration model learn rootzone integration directly.

3l. L-Band drone cal/val — the headline research story

When a contributor flies calibrated L-Band SAR over a field that has live in-situ probes during a Sentinel-1 overpass window, three sources of truth are collocated:

  1. In-situ point truth — Sentek profile, METER probes, etc.
  2. Airborne high-res truth — drone L-Band σ⁰ over the flight extent (typically 5–50 cm resolution).
  3. The C-Band layer being calibrated — Sentinel-1 σ⁰ at 10 m.

This is SMAPVEX-class cal/val data, distributed across contributors instead of NASA-campaign-only. The platform's drone Event ingest path treats radiometrically-calibrated L-Band uploads as high-weight anchor scenes in the nightly retrain. NASA-CSDA, NSF-NRT, and DOE Atmospheric Radiation Measurement programs all fund distributed cal/val — this is a fundable research narrative independent of the data-platform story.

Operationally: an L-Band drone Event uploads as a STAC Item with metadata flags sensor=l_band_sar, radiometric_cal_method=corner_reflector, cal_target_in_scene=true, flight_window_overlaps_s1_overpass=true (auto-computed from timestamp). Preprocessor verifies the calibration metadata; contribution lands as sensor_class=research if intact, demoted otherwise.

3m. Where Farming Game's role fits

Farming Game is the first OAuth partner application. JMR's stations register through /v1/stations on behalf of the JMR farm user. Field boundaries, irrigation events, and any drone flights JMR runs over its blocks all flow through the same OAuth scopes. Calibrated corrections come back through /v1/stations/{id}/corrections and the AOI extract endpoints. See §11.


4. AWS account structure

Phase 0–1: single account

One AWS account, root locked, MFA on, IAM Identity Center.

Phase 4+: multi-account AWS Organizations

AccountRoleNotes
mud2dust-billingOrg root, no workloadsConsolidated billing only
mud2dust-prodPipeline + tiles + API + bridgeDefault region us-west-2
mud2dust-devSandbox, breakableSame regions as prod
mud2dust-dataOutput S3 + contributor S3 + Secrets ManagerDefense in depth — compromised compute can't delete archive

SCPs:

  • Prevent disabling CloudTrail
  • Prevent deleting backup snapshots
  • Restrict deployable regions to us-west-2 and us-east-1
  • Restrict access to mud2dust-data Secrets Manager namespaces by IAM role only

Networking

  • VPC with private subnets per region.
  • Lambda outside VPC for simplicity.
  • Fargate in private subnets with VPC interface endpoints for S3, STS, Secrets Manager, CloudWatch Logs.

Cost guards from day one

  • AWS Budget at $200/mo with alarm at 80%.
  • AWS Cost Anomaly Detection enabled.
  • Tag every resource: project=mud2dust, env=prod|dev, layer=ingest|process|serve|bridge.

5. Ingest job catalog

5a. Satellite + raster ingest

The "first six" — phase 1–2 — get 80% of value:

  1. Sentinel-1 RTC (moisture primary signal)
  2. HLS Sentinel-2 (NDVI, NDWI, NDRE)
  3. HLS Landsat (thermal LST, longer-archive optical)
  4. HRRR (precipitation accumulation, soil temp)
  5. POLARIS (static soil prior)
  6. Copernicus DEM (static topography prior)

Second wave — phase 5: ECOSTRESS, SMAP L4, OpenET, GOES-18 thermal, VIIRS Active Fire. Third wave — phase 6+: TROPOMI, GEDI, MODIS LST, GPM IMERG, USDA CDL/CSB, SSURGO/gNATSGO.

SourceTierCadenceWorkerOutputRetention
Sentinel-1 RTC1 (anonymous S3)6-day; nightly STAC pollLambdaσ⁰ + VWC COG90d raw, ∞ fused
Sentinel-2 (HLS)2 (Earthdata)2–3 day; dailyLambda + authNDVI/NDWI/NDRE COG90d raw
Landsat 8/9 (HLS-L30)28-day; dailyLambda + auththermal LST + reflectance90d
HRRR1hourlyLambdaprecip-7d + soil temp COG30d
NEXRAD MRMS15-min (sample hourly)Lambdaprecip-hr COG30d
GOES-18 thermal110-min (sample 30-min)LambdaLST COG14d
SMAP L42dailyLambda + authregional VWC reference30d
ECOSTRESS2irregular ISS revisitLambda + authhigh-res LST scenes90d
MODIS LST + Snow2dailyLambda + authcontinuity gap-fill30d
Sentinel-5P TROPOMI3 (Copernicus)dailyFargate (cross-region)air quality / methane30d
Copernicus DEM1one-shotLambdastatic priorindefinite
POLARIS3 (UC Davis HTTP)one-shot, refresh annuallyFargatestatic soil priorindefinite
SSURGO / gNATSGO3 (USDA NRCS)annualFargatestatic soil priorindefinite
USDA CDL / CSB3 (USDA FTP)annualFargatecrop classificationindefinite
OpenET3 (GEE/REST)monthlyLambda + GEE authET CONUS12 months
GPM IMERG230-min (sample hourly)Lambda + authglobal precip14d

5b. Contribution bridge — adapters by shape

Adapter / sourceShapes producedDirectionProtocol
Sentek + YDOC ML-417Observation, ProfilePushHTTPS POST (HMAC-signed)
Campbell ScientificObservation, ProfilePushHTTPS POST or SFTP (CRBasic template)
Vaisala WXT520ObservationPushvia host logger (Campbell adapter)
METER on YDOCObservation, ProfilePushYDOC HTTP POST
METER on ZENTRA CloudObservation, ProfilePullREST (per-contributor token)
Tempest WeatherFlowObservationPullREST + UDP local
Davis WeatherLinkObservationPullREST (per-contributor token)
FieldClimate (METOS)Observation, ProfilePullREST (per-contributor token)
Onset HOBOlinkObservationPullREST (per-contributor token)
WeatherBugObservation (regional)PullREST (regional context, not on-farm)
Generic JSON webhookObservationPushHTTPS POST
Generic CSV/SFTPObservation, SamplePushSFTP
AmeriFlux pullObservation (eddy-cov)PullREST (training-contributor request)
USDA SCAN/SNOTEL pullObservation, ProfilePullNRCS Awdb API
CRNP / COSMOS-USAObservationPullREST (where exposed)
Soil-lab CSV / PDFSamplePushHTTPS POST + S3 multipart for attachment
Drone COG uploadEventPushPresigned S3 multipart + STAC POST
OpenDroneMap outputEventPushSame as above; auto-detect outputs
DJI Terra / Pix4D / DroneDeploy exportEventPushPresigned S3 multipart + STAC POST
Aircraft campaign uploadEvent, CollectionPushPresigned S3 multipart + STAC POST
PhenoCam-style seriesCollection (auto-grouped Events)PushPeriodic HTTPS POST
Shapefile / GeoJSONBoundaryPushHTTPS POST or multipart upload
Agronomist note (mobile)AnnotationPushHTTPS POST (mobile/web)
Citizen-science photoAnnotationPushHTTPS POST (multipart with EXIF)

Failure handling: every adapter emits CloudWatch metric IngestSuccess{adapter=X, contributor=Y, shape=Z}. Alarm on >2 consecutive failures. Failed contributions DLQ'd to SQS for retry. Contributor sees a status indicator per source on their dashboard.

Earthdata auth wrapper (Tier 2 satellite): built once as a shared Lambda layer.


6. Output products / API surface

6a. Public raster tile API (anonymous tier)

GET /tiles/{layer}/{date}/{z}/{x}/{y}.{png|webp}?colormap=...

Standard XYZ tiles. Default PNG; WebP optional; raw GeoTIFF via ?format=tif. CloudFront long TTL on (layer, date). WAF rate limit 600 req/hr per IP. Attribution headers on every response.

6b. Public STAC catalog (anonymous tier)

GET /stac/collections                          → list official + public-opted contrib collections
GET /stac/collections/{id}/items?bbox=...      → search
GET /stac/collections/{id}/items/{id}          → single item (links presigned COG, 1-hour TTL)

6c. AOI extraction API (authenticated free tier)

POST /v1/extract
{ "geom": <GeoJSON or boundary_id>, "layer": "moisture-rootzone",
  "from": "2026-04-01", "to": "2026-04-28", "format": "parquet" }
→ 202 with job_id; poll /v1/jobs/{job_id}

Async via Step Functions. The geom field accepts a contributor's saved Boundary by ID — natural ergonomics for "extract over my field."

6d. Public station / event browse (anonymous tier)

GET /v1/public/stations?bbox=...               → public-opted stations (jittered coords)
GET /v1/public/stations/{id}/observations?...
GET /v1/public/events?bbox=...&type=drone       → public-opted Events

6e. Contributor dashboard API (authenticated user tier)

GET /v1/me/stations                             → all my stations across vendors
GET /v1/me/stations/{id}/observations?...
GET /v1/me/stations/{id}/profiles?...
GET /v1/me/samples?...
GET /v1/me/events?...
GET /v1/me/collections?...
GET /v1/me/annotations?...
GET /v1/me/boundaries?...
GET /v1/me/corrections?stations=...&from=...    → satellite vs station companion
GET /v1/me/exports?shapes=...&format=parquet    → cross-shape unified export
GET /v1/me/alerts                                → frost / anomaly / contribution-health

6f. Partner-app API (OAuth scoped tokens)

Endpoints for each shape, all gated by OAuth scopes from §3j:

POST /v1/stations            → register a station         (scope: stations:write)
POST /v1/observations        → push time-series readings  (scope: observations:write)
POST /v1/profiles            → push multi-depth readings  (scope: profiles:write)
POST /v1/samples             → submit a lab sample        (scope: samples:write)
POST /v1/uploads/initiate    → start a presigned upload   (scope: events:write)
POST /v1/uploads/complete    → finalize, create STAC item (scope: events:write)
POST /v1/collections         → group flights/passes       (scope: collections:write)
POST /v1/annotations         → geotagged note             (scope: annotations:write)
POST /v1/boundaries          → field/zone vector          (scope: boundaries:write)
GET  /v1/stations/{id}/corrections    → calibrated companion (scope: corrections:read)
GET  /v1/alerts              → frost / anomaly events     (scope: alerts:read)

The in-product onboarding wizard is a thin shell over these endpoints — the API is the product surface, not an afterthought.

6g. Bulk download (paid tier — defer to phase 6)


7. Access tiers

AudienceAuthRate limitCan doCan't do
Anonymous publicnone600 req/hr per IPBrowse calibrated map, public STAC, public-opted stations + events, AOI extract (small)Connect contributions; access raw contributor data
Signed-in viewer (pk_view)email + password5,000 req/hrAll anonymous + saved AOIs, alerts, multi-source dashboard for any public contributionsConnect contributions
Connected contributor (pk_contrib_pending)as above + ≥1 contribution10,000 req/hrAll viewer + their own dashboard across all seven shapes, raw API for their contributions, cross-shape exportTrain the public model
Validated contributor (pk_contrib)as above + 30 days validated data50,000 req/hrAll contributor + recognition badge
Training contributor (pk_contrib_train)as above + sensor_class ≥ professional + verified install50,000 req/hrAll validated + their data trains the public model + "training contributor" badge
Partner app (client_id + user OAuth token)OAuth 2.0 PKCEper-scope, per-userScoped on user's behalfAnything outside granted scope
Commercial bulkAPI key + signed agreementunlimitedBulk Parquet, no rate limit(defer to phase 6)

Tier transitions: sign up → pk_view → connect first contribution → pk_contrib_pending → 30 days validated → pk_contrib → research-grade hardware + verified ORCID + verified install → pk_contrib_train (manual review for first cohort).

Validated = passes per-shape QC + non-degenerate values + plausible location. Stops fake contributions from harvesting tier bumps.


8. Cost estimate

Phase 1 — regional satellite + JMR anchor stations

ItemLowHighDriver
S3 storage (COGs)$5$15200–600 GB rolling
S3 PUT/GET$1$3
Lambda compute$5$15Satellite ingest + bridge intake + pull workers
Fargate compute$50$100Cross-region NOAA, tier-3, drone preprocessing
CloudFront egress$20$200Variable
Cross-region transfer$5$15NOAA us-east-1 → us-west-2
API Gateway$3$10
RDS Aurora Serverless (pgstac + TimescaleDB + relational + PostGIS)$80$130Combined or split
Secrets Manager$2$10Per-contributor vendor tokens
Cognito (or equivalent)$0$5Free tier covers <50k MAU
CloudWatch logs/metrics$5$15
EventBridge + SQS + DLQ$1$5
WAF$5$10
Total (Phase 1)$182$533Doesn't include contributor-uploaded Event volume (§12l)

Plus contributor probe ops (separate budget line):

  • YDOC cellular SIMs, 6 anchor loggers: ~$30–60/mo
  • Tempest data plan: $0 (WeatherFlow API free for personal use)
  • Vaisala / Campbell logger comms: TBD
  • Sentek hardware (sunk): already purchased
  • METER hardware: TBD pending purchase

Phase 4+ adders — contribution-driven

When the bridge opens to outside contributors:

  • Drone Event storage: 100 contributors × 10 flights/yr × 1 GB avg = 1 TB/yr. Standard tier $23/mo first year; tiering to IA/Glacier per §12l.
  • Drone preprocessing compute: Fargate spikes per upload. Budget $50–200/mo at moderate volume.
  • Contributor pull workers: Lambda invocations scale with contributor count × poll frequency. ~$0.05/contributor/mo at hourly poll; trivial.
  • Total Phase 4 platform overhead: $80–300/mo on top of Phase 1 baseline.

CONUS scaling (phase 6)

Storage ~5–8×. Compute ~3–5×. Realistic CONUS-with-traction: $1,500–4,000/mo.


9. Phased build plan

Phase 0 — Foundation (1 week)

Done: brand locked, domains registered, hardware purchased.

Day 1–2: AWS account. MFA on root, IAM Identity Center, budget alarms, CloudTrail, tag policy.

Day 3–4: Repository scaffolding under GitHub org mud2dust/:

  • mud2dust/sensor-bridge — multi-shape contribution bridge
  • mud2dust/pipeline — satellite workers, CDK or Terraform infra
  • mud2dust/titiler — tile server, customized titiler config
  • mud2dust/site — Next.js dashboard + public landing + onboarding wizard

MIT license on all four.

Day 5: Auth scaffolding placeholder in mud2dust/site — Cognito user pool created (or Auth.js setup), OAuth app-registration table stubbed in DB, no UI yet.

Day 6: Vercel project for the landing site. mud2dust.com placeholder.

Day 7: Buffer / METER follow-up / JMR conversation per §14.

Phase 1 — Anchor station + one satellite layer end-to-end (3 weeks)

Week 1: Sentinel-1 RTC ingest. Lambda on EventBridge daily cron @ 03:00 UTC. Mosaic clipped to Eastern WA bbox, write s3://mud2dust-cogs/sigma0/yyyy/mm/dd/eastern-wa.tif.

Week 2: Bridge MVP — Observation + Profile shape only, YDOC adapter only. HTTPS endpoint with HMAC validation. JMR's YDOC ML-417ADS configured to POST. Land observations in TimescaleDB. Bare-bones internal dashboard rendering JMR Sentek depths.

Week 3: Tile route + station overlay. Titiler on Lambda + CloudFront. /tiles/sigma0/{date}/{z}/{x}/{y}.png. Demo page with Mapbox raster + JMR Sentek anchor station overlay.

Phase 1 deliverable: internal URL renders Sentinel-1 backscatter over Franklin County with the JMR Sentek anchor station live-overlaid. σ⁰ change correlates with rain events from Tempest readings.

Phase 2 — Multi-source fusion (4 weeks)

Week 1: HLS ingest with Earthdata auth. NDVI/NDWI COGs. Week 2: HRRR ingest in us-east-1. 7-day rolling precip. Week 3: Static priors. POLARIS from UC Davis, Copernicus DEM from S3. Week 4: First fusion model. RF on JMR Sentek + Tempest + HLS NDVI + Sentinel-1 σ⁰. Pickle, ship to Lambda. /tiles/moisture-rootzone/{date}/... runs the model per pixel. Holdout-validate against held-back Sentek depths.

Phase 2 deliverable: calibrated moisture map. Farming Game can switch off AGROMONITORING_API_KEY.

Phase 3 — Open the public API + dashboard (3 weeks)

Week 1: WAF, attribution headers, STAC catalog populated. Week 2: Public dashboard at mud2dust.com — calibrated map browse, public-opted contributions visible, no signup. Week 3: Soft launch — blog, social, HN, Awesome-Geospatial.

Phase 3 deliverable: strangers using the public map.

Phase 4 — Full contribution platform (~15 weeks, split into 4a–4e)

Phase 4a — Bridge expansion to all time-series sources (3 weeks)

Week 1: ZENTRA Cloud pull adapter (per-contributor token + Secrets Manager). Tested against METER hardware once purchased; tested against any METER research-network access from §14. Week 2: WeatherFlow Tempest pull, Davis WeatherLink pull, FieldClimate pull, Onset HOBOlink pull. Week 3: Campbell HTTP-push adapter + CRBasic template publication. Generic JSON webhook adapter. SFTP intake gateway. AmeriFlux + USDA SCAN pull.

Phase 4b — Event / Collection (drone + aircraft) (4 weeks)

Week 1: Presigned S3 multipart upload flow. Storage layout under s3://mud2dust-contributions/events/. Week 2: Drone preprocessing pipeline (Fargate). COG-conformance validation, metadata extraction, STAC item generation. Generic COG, OpenDroneMap output, DJI Terra / Pix4D / DroneDeploy exports. Week 3: L-Band SAR Event handling. Radiometric-cal metadata schema. Cal-target detection flag. Auto-cross-reference with Sentinel-1 overpass windows. Week 4: Collection grouping. Public-opt-in for STAC catalog. Tile route for public Events.

Phase 4c — Sample, Annotation, Boundary (2 weeks)

Week 1: Sample shape — schema, lab-CSV/PDF intake, attachment storage. Annotation shape — geotagged notes, citizen-science photo intake. Week 2: Boundary shape — shapefile/GeoJSON upload, PostGIS storage, integration with AOI extract endpoint.

Phase 4d — Onboarding UX + cross-shape dashboard (3 weeks)

Week 1: User signup flow. Onboarding wizard scaffolding ("what do you have?" branching across all seven shapes). Week 2: Vendor-specific onboarding paths. Per-station/event/etc registration UI with sensor_class / operator_class / privacy controls. Week 3: Cross-shape unified dashboard — stations + flights + samples + boundaries on one map. Anomaly / drift / contribution-health surfacing. Cross-shape Parquet/CSV export.

Phase 4e — Trust model + retrain + partner-app OAuth (3 weeks)

Week 1: Trust model fully wired across all shapes. ORCID verification flow. Tier transitions automated. Week 2: Nightly retraining job. Pulls last-30-days training-tier contributions across Observation/Profile/Event (L-Band drone scenes weighted as anchors). A/B test, promote if better. Week 3: OAuth 2.0 partner-app flow. App registration UI. Farming Game integrated as first partner app — registers JMR's stations + boundaries through the API, pushes Tempest + Vaisala observations, reads calibrated corrections.

Phase 4 deliverable: any farmer, researcher, or drone operator can connect; partner apps integrate via OAuth; calibration model improves visibly per month with multi-shape contributions.

Phase 5 — Add layers + grant momentum (6 weeks)

ECOSTRESS, OpenET, TROPOMI, GOES frost, VIIRS Active Fire. Each ~1 week. Sequence by user demand.

Phase 6 — National + paid tier (8 weeks)

Weeks 1–4: scale satellite ingest to CONUS. Weeks 5–6: Stripe paid tier. Weeks 7–8: first grant application (NSF Pathways, USDA NIFA, NASA CSDA).


10. Key architecture decisions

DecisionChoiceWhy
Brandmud2dustLocked. Domains owned across major TLDs.
LicenseCC-BY for tiles + STAC outputs, MIT for codeLowest-friction with attribution required.
Bridge object typesSeven first-class shapes (Observation, Profile, Sample, Event, Collection, Annotation, Boundary)Most platforms pick one or two; designing for all seven is the differentiator. Shared auth/trust/privacy/storage layer keeps cost manageable.
Drone L-Band as anchorTreat radiometrically-calibrated L-Band drone Events as training-tier anchor scenesDistributed SMAPVEX-class cal/val. Distinct, fundable research narrative.
Bridge layoutOne repo, modular by shapeShapes share 80% of plumbing; fork modules within the repo, not the codebase.
Trust modelsensor_class × operator_class → training_weight, invisible to usersHonest weighting without alienating low-trust contributors.
Training/correction splitFew hundred training contributions; everyone else consumes correctionsCalibration needs quality, not quantity.
Partner-app integrationOAuth 2.0 + PKCE with per-shape scopesStandard pattern; lets Farming Game and any future app integrate.
Vendor token storageAWS Secrets Manager, namespaced per contributorEncrypted at rest; per-contributor IAM; rotation; revocation.
Coordinate privacyTwo coord fields per object — internal (exact) and public (jittered)Default fuzzed for STAC/public; opt-in per object.
Event upload mechanismPresigned S3 multipart, not API POSTDrone scenes are too big for API POST; presigned multipart is the AWS-native pattern.
STAC backendpgstac on Aurora Serverless v2 (one instance, official + contrib collections)Sub-100ms search; STAC is the lingua franca for both satellite and contributor raster.
Sensor-data backendTimescaleDB on RDS (multi-tenant by contributor_id)Easier model-training queries, easier contributor SQL.
Tile renderertitiler on Lambda + CloudFrontIdiomatic; copies OpenET / Planetary Computer.
Failure isolationOne Lambda/Fargate per source/adapter, EventBridge-triggeredA SMAP outage doesn't block Sentinel-1; ZENTRA outage doesn't block YDOC; drone preprocessing doesn't block Observation ingest.
Account splitSingle account through phase 3, multi-account org from phase 4Defense-in-depth once data exists.
Showcase customerFarming Game as first OAuth partner appValidates partner-app API; useful for grants.
Anchor stations6× Sentek 36" drill-and-drop on YDOC ML-417ADS (already purchased)Multi-depth profile sensors better than single-depth.

11. What this means for Farming Game

Four concrete changes in the farminggame repo, sequenced after mud2dust phase 4:

  1. Become the first OAuth partner app on mud2dust. Register JMR's Sentek + Tempest + Vaisala stations through /v1/stations. Stream observations through /v1/observations. Read calibrated corrections through /v1/stations/{id}/corrections.
  2. Push field boundaries. JMR's block boundaries become Boundary contributions on mud2dust via /v1/boundaries. AOI extracts and per-block calibrated outputs become trivial.
  3. Push irrigation events as Annotations or domain-specific Events. Closes the loop between "what was applied" and "what the satellite + sensors see."
  4. Drop Agromonitoring. Replace AGROMONITORING_API_KEY with mud2dust API calls via OAuth user token. Saves $30–300/mo per farm.
  5. Retitle Phase 13. Rename "Direct Satellite Pipeline & Soil Calibration" to "Integrate with mud2dust." Drop subphases 13b–13d. Keep 13a (sensor deployment) but reframe as "deploy stations into mud2dust as the first OAuth partner app." Subphases 13e–13g stay in farminggame.
  6. Become the showcase customer. mud2dust's website links to Farming Game as the working partner-app example. Farming Game's website credits mud2dust as the data layer.

The two projects share an AWS organization but separate accounts/billing.


12. Open questions

12a. Brand name + domain — Resolved. mud2dust locked; both mud2dust and mudtodust owned across .com, .net, .org, .io, .dev, .ag, .farm, .ai, .earth, .co (20 domains). Primary mud2dust.com. Defensive coverage complete.

12b. Legal entity

LLC for phase 0–3, hybrid LLC-owned-by-501(c)(3) when contributor revenue exceeds $20k/yr. LLC keeps you nimble; 501(c)(3) unlocks grants; hybrid (OpenStreetMap Foundation pattern) adds ~$3K/yr legal/accounting overhead.

12c. METER co-brand terms

See §14 for the three-tier ask. Standard structure for the biggest ask: logo + advisory seat + 30-day model first-look + joint paper opportunity. No exclusivity.

12d. Anchor station funding split

6× Sentek + YDOC + Tempest + Vaisala (hardware sunk). Ongoing comms ~$30–60/mo cellular + ZENTRA Cloud if used. Options:

  • mud2dust absorbs fully.
  • JMR co-funds as founding contributor (gets pk_contrib_train in perpetuity).
  • Hybrid: mud2dust pays cellular + bridge ops, JMR pays any ZENTRA Cloud subscription.

12e. Initial advisors (recruit before phase 3 launch)

  1. Northwest credibility — USDA-ARS Pendleton or WSU Prosser.
  2. Federal-process knowhow — OpenET / NASA open-data ag programs alum.
  3. Commercial validation — Bayer / Climate / Granular alum.

12f. Funding strategy (sequenced)

PhaseSourceAmountWhy
0–3self-fund~$3–5KAWS + domains (sunk) + marginal time
4METER + 2–3 universities at $5K/yr$20KConsortium fee covers ops
5NIFA SBIR Phase I + NASA-CSDA L-Band cal/val$175–500KTwo distinct narratives
6NRCS CIG or Climate Smart Commodities$500K+National rollout
6+Paid bulk-tier customers$50–500K/yrRecurring revenue

12g. Data-use agreement language

Contributors own their raw data. mud2dust gets a license to (a) render their dashboard, (b) use their data in retrain if they're at training tier, (c) produce aggregate/derived public outputs under CC-BY. Revocable. One-paragraph plain-English summary at signup + longer version reviewed by counsel. Tied to §12b legal entity timing.

12h. Researcher verification mechanism

ORCID iD with affiliation at v1; manual review for borderline cases (institutional email without ORCID; non-academic researchers with publication record).

12i. Hardware in hand — reflects updated plan

6× Sentek 36" drill-and-drop on YDOC ML-417ADS, plus Tempest + Vaisala WXT520 + Campbell-in-conversation + METER-to-purchase. Multi-depth profile sensors are better training data than single-depth Teros 12.

12j. Contributor freemium / dashboard-only path

Contributors who only want the dashboard support pk_contrib with all stations/events marked geom_public_mode = "private". Their data is excluded from public model and public browse. Costs almost nothing to support. Keeps funnel wide.

12k. METER hardware logger choice

For each METER probe purchase, choose YDOC (existing fleet, push) vs ZENTRA Cloud (vendor portal, pull) per deployment.

12l. Contributor raster storage policy — deferred

At what volume should contributor-uploaded Events trigger storage policy (per-tier quotas, aggressive Glacier lifecycle, COG-only at upload, etc.)? Defer until phase 5 when real volume signal is available. Track contributor-storage as a separate budget line from day one so the trigger point is visible.

12m. L-Band drone cal/val — recruitment strategy

Who flies L-Band SAR over instrumented fields and would partner? Candidates: NASA AirMOSS alumni, USDA-ARS Beltsville, university radar labs (CSU, OU, Univ. Michigan), commercial L-Band drone vendors (ImSAR pilot programs). At least one before phase 5 makes the cal/val story concrete enough to write into a NASA-CSDA proposal.

12n. Cross-shape exports — what does "unified" mean?

A contributor with stations + flights + samples + boundaries on one farm wants a single coherent export. What's the format? Options:

  • One Parquet per shape, zipped together with a manifest.
  • A single STAC catalog where stations are STAC items with assets pointing to Parquet.
  • A custom mud2dust archive format.

Recommendation: Parquet-per-shape + STAC manifest; defer custom format unless users ask. Decide during phase 4d.


13. Things deliberately deferred

ItemDefer untilWhy
Disaster recovery (cross-region replication, RDS backups, full IaC)Phase 4Single account is recoverable enough during build
Observability beyond CloudWatchPhase 3 launchCloudWatch is fine until you have users
GDPR compliancePhase 3Trivial — only collecting emails for API keys
SOC 2Phase 6Only matters for paid-tier enterprise customers
Versioning (model_version on tiles, reproducible old outputs)Phase 4Once retraining starts, old tiles need to be reproducible
InternationalizationPhase 6+Algorithms generalize globally; only POLARIS + SSURGO are US-only
MFA enforcement on all usersPhase 4eEncouraged at training-contributor tier; required there; optional elsewhere until then
Mobile appPhase 6+Web dashboard responsive enough; native app waits for product/market fit
Water-quality / nitrate runoff samplesPhase 6+Adjacent industry; could fragment focus
Stream gauges / hydrology beyond awarenessPhase 6+Not on the soil-moisture critical path
In-cab / equipment telemetry (John Deere, AgLeader, Trimble)Phase 6+Valuable but a different integration class; revisit once partner-app pattern is proven
Climate model output as contributionPhase 6+Drought projection is a separate product; defer
Contributor raster storage policyPhase 5See §12l

14. Next concrete steps

METER meeting (2026-05-07) — three asks ranked by ease

  1. (Easy / must-have) Confirm ZENTRA Cloud REST API can be used as a per-contributor pull adapter — any METER customer who authorizes mud2dust with their own ZENTRA token grants us read access to their devices and observations. Standard third-party API use; should not require formal agreement, but worth confirming there's no ToS restriction.
  2. (Medium) Co-publish the ZENTRA Cloud adapter under MIT in mud2dust/sensor-bridge. Gives METER a co-contributor citation; gives any open-data project a reusable adapter.
  3. (Big) METER's research network contributes as a Training Contributor on the platform, with co-branding (logo + tile attribution; advisory seat; 30-day model first-look; joint paper opportunity).

If their developer-relations / API team is the right counterpart for asks 1 and 2, while research/sales handles ask 3, try to bring both groups in.

Other Phase 0 prerequisites

  1. Stand up an AWS account. 15 minutes if not already done — root locked, MFA, IAM Identity Center.
  2. Conversation with Jackass Mountain Ranch to confirm willingness to host the anchor stations and decide §12d funding split.
  3. Legal-entity decision (LLC formation) — gate for any contributor-data-use agreement at scale.
  4. Repo scaffolding: create mud2dust/sensor-bridge, mud2dust/pipeline, mud2dust/titiler, mud2dust/site under the GitHub org.
  5. Domain DNS: point mud2dust.com at Vercel (landing) and api.mud2dust.com at API Gateway (placeholder).
  6. L-Band drone cal/val partnership scouting — first conversation per §12m, before phase 5 grant-writing.

Once those are done, phase 0 closes and phase 1 starts.


Verification

PhaseVerification gate
0Domains registered (done). AWS account exists with budget alarm + tags. Four repos scaffolded with MIT license. Auth scaffolding placeholder in site. Placeholder landing live at mud2dust.com.
1Internal URL renders Sentinel-1 backscatter tiles over Franklin County, WA. JMR Sentek station registered via API; observations flowing into TimescaleDB; visible on internal dashboard. σ⁰ change correlates with rain events from Tempest readings.
2/tiles/moisture-rootzone/{date} returns calibrated VWC. Holdout JMR Sentek depths validated within ±3% VWC RMSE. Farming Game can switch off AGROMONITORING_API_KEY and field-detail panel still renders moisture.
3Anonymous tile + STAC API live behind WAF. Public dashboard browseable without signup. Five external users have made pystac-client requests. Attribution headers present on all responses.
4aBridge supports ≥6 vendor adapters end-to-end across Observation + Profile shapes (YDOC push, ZENTRA pull, WeatherFlow pull, WeatherLink pull, Campbell push, AmeriFlux pull). Per-contributor credential vault working.
4bDrone Event upload works end-to-end (presigned multipart → preprocessing → STAC item → tile route). At least one L-Band SAR upload validated with calibration metadata. Public tile route for opted Events live.
4cSample, Annotation, and Boundary shapes all support upload + retrieval + dashboard rendering. Boundary integrates with AOI extract.
4dOnboarding wizard live. Farmer can connect their hardware in <10 min from signup. Cross-shape dashboard renders mixed sources correctly. Cross-shape Parquet/STAC export produces a coherent archive.
4eOAuth partner-app flow live. Farming Game integrated as first partner app — registers stations + boundaries, pushes observations + irrigation events, reads corrections. Trust model weighting confirmed in retrain logs. Nightly retrain promotes a new model with measurable accuracy gain.
5+Each new layer ships with a STAC collection, a tile route, and a demo notebook.
6Stripe-billed paid customer pulls bulk Parquet. Grant application submitted (preferably both NASA-CSDA L-Band cal/val and NIFA open ag data infra).

Critical files / paths

This plan currently lives at /Users/willmachugh/.claude/plans/we-were-working-on-lovely-widget.md and the prior revision at /Users/willmachugh/.claude/plans/glittery-stirring-origami.md. After approval, copy into the new umbrella repo as mud2dust/PLAN.md (or mud2dust/.claude/plans/mud2dust-plan.md to mirror farminggame convention) and mark farminggame/.claude/plans/openagdata-architecture.md superseded with a pointer here.

Files to be created during Phase 0:

  • mud2dust/PLAN.md (copy of this file)
  • mud2dust/README.md
  • mud2dust/LICENSE (MIT)
  • Four sub-repos under the GitHub org: sensor-bridge, pipeline, titiler, site.

No existing functions or utilities to reuse — mud2dust/ is empty. Phase 13 references in farminggame/.claude/plans/farminggame-plan.md need to be retitled per §11 once this plan is approved.