🧭 Spatial Web Foundation

Spatial Web Features: World-to-Vec Realized

A workflow for identifying features in AI hyperspace — uniting Goodchild's general theory of geographic representation, the IEEE P2874 Universal Domain Graph requirements, and the convergent-features evidence from modern mechanistic interpretability research.

📐 IEEE P2874 / UDG §6.2 🧠 Sparse-autoencoder features 🗺️ Goodchild geo-objects in hyperspace

Artificial intelligence is changing the definition of space. The same mathematical machinery that embeds words, pixels, molecules, and locations in high-dimensional vector spaces is now used to embed the world itself — a development the IEEE P2874 Spatial Web community calls World-to-Vec.

This page synthesizes three threads. First, the foundations of spatial embedding in modern AI. Second, the IEEE / Spatial Web Foundation Universal Domain Graph (UDG) requirements for Hyperspace Reference Systems and entity embeddings. Third, Goodchild, Yuan and Cova's general theory of geographic representation — which we propose as the appropriate ontological lens for features in AI hyperspace.

Recent AI research — the Platonic Representation Hypothesis, the Linear Representation Hypothesis, and the scaling of sparse autoencoders to frontier models — demonstrates that independently trained models converge on common features. These features, accessible as directions or sparse latents in hyperspace, are the AI analog of geo-objects as defined by Goodchild. Treating them as first-class Spatial Web entities is what makes World-to-Vec engineerable.

Core insight
Features exist in AI hyperspace, and they are stable across independent models. The mechanistic-interpretability literature now has multiple lines of evidence that the same features recur in independent models trained on different data. That is the same property that makes Earth-surface features — coasts, rivers, urban cores — stable enough to catalog in a GIS.
The bridge
Goodchild's geo-atom <location, property, value> is general enough to cover both Earth-surface and hyperspace. In GIS, the location is a coordinate on Earth. In AI, the location is an activation vector. The aggregation rules, phase-space construction, and indeterminacy machinery transfer one-for-one.
What this page proposes
A nine-stage Feature Identification Workflow — scope, ingest, embed, define a Hyperspace Reference System, discover candidates with SAEs and probes, validate them across models, cartograph them for human review, mint Hyperspace Entity Locators, and operate under Spatial Web governance.

Part 1 · Theoretical Basis

From Word2Vec to World2Vec

Five strands of theory converge to give us the basis for feature identification in AI hyperspace.

Foundation — 2013

Spatial embedding methods from AI

Word2Vec showed that meaning has geometric structure: high-dimensional vectors whose offsets capture syntactic and semantic regularities. The textbook example, King − Man + Woman ≈ Queen, remains the canonical demonstration. Every modern foundation model rests on this primitive.

word2vec embeddings vector arithmetic
UDG — §6.2

World2Vec in the Spatial Web

PyTorch-BigGraph — internally named World2Vec — extended the idea to entities of any kind. The UDG Functional Requirements treat hyperspace as the vector-space version of the world: every Spatial Web ENTITY (actor, activity, domain, place, norm) has an embedding. Reasoning, retrieval and governance operate on those embeddings.

PBG HSML UDG §6.2
Presentation — Percivall

AI is changing the definition of space

Computation now routinely occurs in spaces of extraordinarily high dimension and non-coordinate structure. Video and agency bring time into the same hyperspace, with actors and activities as first-class objects. Tobler's First Law generalizes to telecoupling — relatedness in hyperspace independent of Earth-surface distance.

tensors telecoupling hyperspace geography
Convergence — 2024

Common features across models

Independently trained models discover the same features. The Platonic Representation Hypothesis (Huh et al. ICML 2024) shows representational alignment grows with scale and task diversity. The Linear Representation Hypothesis (Park, Choe, Veitch ICML 2024) shows concepts live as linear directions. Scaling Monosemanticity (Anthropic 2024) extracts tens of millions of features from Claude 3 Sonnet.

Platonic linear concepts SAE features
Bridge — Goodchild 2007

Features as geo-objects in hyperspace

Goodchild's atomic form <location, property, value> — the geo-atom — produces geo-fields (continuous properties) and geo-objects (aggregated points satisfying rules). Phase space, c(x) = f(z₁,…,zₘ), induces bona fide objects. This is exactly the construction AI feature identification performs: an embedding is a phase space; coherent regions are features.

geo-atom phase space ISO 19110
UDG — §6.2.3

Hyperspace Reference Systems

An HRS is the AI analog of a Coordinate Reference System: a Coordinate System (CS — dimensions, units) plus a Datum that anchors the abstract CS to real-world semantics. Once an HRS is defined, Hyperspace Entity Locators (HELs) address features the way URLs address documents.

HRS semantic datum HEL
"Representations in AI models, particularly deep networks, are converging — driving toward a shared statistical model of reality."
Huh, Cheung, Wang, Isola — The Platonic Representation Hypothesis, ICML 2024
Platonic Linear Rep. Monosemanticity Cross-model Convergence

Side by side

GIS Features and AI Hyperspace Features

Mapped concept-by-concept against Goodchild's general theory and the UDG requirements.

Concept Geographic Information Systems AI / Spatial Web hyperspace
Atomic formGeo-atom: <x, property, value> on Earth surfaceActivation tuple: <v, property, value> in hyperspace
Reference frameCoordinate Reference System (CS + Datum, e.g. WGS-84)Hyperspace Reference System (CS + semantic datum)
LocatorURL, URI, geo-URIHyperspace Entity Locator (HEL)
Continuous representationGeo-field (six standard discretizations)Embedding manifold; activation field over data
Discrete representationGeo-object / Feature (OGC Simple Features)Sparse-autoencoder feature; linear concept direction
Aggregation principleTobler's First Law (proximity → similarity)Platonic / distributional hypothesis (training proximity → representational proximity)
Phase-space constructionc(x) = f(z₁, …, zₘ) over m fieldsConcept = partition of activation space along learned directions
Boundary indeterminacyMembership function m(x); fuzzy classesPolysemantic neurons; sparse-feature activation strength
Catalog standardISO 19110 Feature Catalog; OGC API – Features(Emerging) SAE feature catalogs, Neuronpedia, Embedding Atlas

Part 2 · Tools, Datasets, Models

An Ecosystem for Feature Identification

Feature identification draws on two complementary toolchains: mechanistic interpretability from AI safety, and geospatial foundation models from GeoAI.

Mechanistic interpretability and embedding analysis

Tool / methodWhat it doesTheory fit
Sparse Autoencoders (SAE)Decompose model activations into sparse, monosemantic latents. Foundational to Anthropic's Scaling Monosemanticity.Strong. Directly produces feature candidates — the AI analog of geo-objects.
TransformerLensLibrary for accessing internal activations; standard substrate for SAE and circuit work.Strong. Provides the activation tuples that SAEs and probes operate on.
NeuronpediaOpen platform hosting >50M SAE latents across many models, with autointerp explanations, search, steering, circuit-tracing demos.Strong. A working analog of an ISO 19110 feature catalog for hyperspace.
Goodfire (Ember API)Commercial platform turning SAE features into steering / analysis for production models. Backed by Anthropic.Strong as a production realization; closed parts limit fully open verification.
Embedding Atlas (Apple, 2025)Open-source interactive viewer for million-point embeddings with density clustering, automated labels, metadata filtering.Strong for human-in-the-loop cartographic visualization (UDG §6.2.2).
UMAP / t-SNE / PCADimensionality reduction for visual exploration and preprocessing.Moderate. Useful but lossy; should never be the sole basis for feature definition.
Linear probesTrain a linear classifier from activations to a labeled concept.Strong validator under the Linear Representation Hypothesis.
CKA / SVCCA / Model stitchingQuantify representational similarity between models; test whether one model's layer substitutes for another's.Strong. Empirical evidence for or against convergence on each candidate feature.
Natural Language AutoencodersMap activations to natural language and back; verbalize internal states.Useful for automated labeling of discovered features.

GeoAI foundation models and libraries

Model / libraryWhat it providesTheory fit
Prithvi (NASA / IBM)Earth Observation foundation model. Variants Prithvi-EO (6-band multispectral) and Prithvi-WxC (weather/climate). 100M – 2B parameters.Strong. Embeddings of Earth tiles are simultaneously geo-objects and hyperspace features.
SatMAE / SatMAE++ / SpectralGPT / CROMA / SkySenseSelf-supervised vision transformers for satellite imagery; spectral and temporal awareness.Strong for raw embedding generation; less interpretable out-of-the-box.
SatCLIP (Microsoft Research)Global, general-purpose location encoder. Contrastively aligns satellite imagery with coordinates using spherical-harmonics encoding.Strong. A direct World-to-Vec primitive — raw location into a vector usable downstream.
GeoCLIPCLIP-based location-image alignment for geo-localization. Random Fourier feature encoder.Moderate. Tuned for geo-localization; less general than SatCLIP.
MOSAIKS / CSP / S2VecAlternative location/image embedding methods. S2Vec (2025) is self-supervised geospatial.Moderate. Useful baselines and ablation comparators.
TorchGeoPyTorch domain library: geospatial datasets, samplers, transforms, pretrained models.Strong foundation for any geospatial embedding pipeline.
TerraTorch (IBM, 2025)Fine-tuning toolkit for geospatial foundation models on TorchGeo + PyTorch Lightning. HPO, benchmarking, full workflows.Strong for operationalizing GeoFMs.
GeoAI Python packageHigher-level wrapper that generates geospatial embeddings via TorchGeo foundation models for similarity search, clustering, change detection.Strong as an applied-side runtime.
H3 (Uber) / DGGSHexagonal hierarchical spatial index; basis for the OGC AI-DGGS Disaster Pilot (2025) exposing DGGS-indexed data via OGC APIs for AI agents.Strong as a cellular indexing layer beneath any HRS — directly cited in UDG §6.1.
Datasets
Earth Observation archives (Sentinel-2, Landsat, MODIS, NAIP, GEO-Bench); location-image corpora (S2-100K, MP-16, OpenStreetMap derivatives); activation datasets for SAE training from open-weights models (Llama-3, Mistral, Gemma, Pythia, GPT-2 Small, with Gemma Scope as canonical cross-layer SAE reference); cross-model alignment benchmarks (ImageNet, COCO, LAION subsets) used in Platonic Representation experiments; UDG-referenced ontologies (Common Core, SUMO, IEEE Ethics) as candidate semantic datums.
Gaps against the framework
Three notable gaps. First, no widely adopted HRS standard — each model defines its own coordinate basis with no agreed semantic datum. Second, feature catalogs are model-specific — Neuronpedia indexes features per model, but no equivalent of ISO 19110 lets features be shared and cited across models. Third, datums for transformation between HRSs are open research; recent CKA / stitching / Platonic work provides the empirical raw material, but a normative procedure is missing.

Part 3 · Recommended Workflow

Feature Identification via Spatial Embedding

Nine stages that operationalize World-to-Vec. Each stage produces an output that maps to a concept in either GIS practice (after Goodchild) or the Spatial Web UDG.

The Nine-Stage Workflow
1 Scope
2 Ingest
3 Embed
4 Define HRS
5 Discover
6 Validate
7 Visualize
8 Catalog
9 Operate
Stage 1

Scope and define the world to embed

Define the entities, properties, and relations of interest. State the use case and the questions discovered features must answer. In UDG terms, choose which ENTITIES will be embedded; in Goodchild terms, choose which geo-atom properties matter.

Deliverable: scope document listing entity types, relations, source datasets, downstream queries.
Stage 2

Ingest and prepare the source data

Assemble data: text corpora, knowledge graphs, satellite imagery, sensor streams, OSM. Normalize, deduplicate, and index. For geographic data, index with H3 or another DGGS so the cellular structure of UDG §6.1 is preserved end-to-end.

Tools: TorchGeo for geospatial; LlamaIndex/LangChain for text; PyTorch-BigGraph for very large graphs.
Stage 3

Generate or load embeddings

Embed prepared data with a domain-appropriate foundation model — Prithvi-EO or SatCLIP for Earth Observation, SatCLIP or GeoCLIP for raw location, a Llama- or Mistral-class model for text, PyTorch-BigGraph for graphs. Persist activations or final embeddings in a queryable store (FAISS, ScaNN, vector DB).

Output: labeled tensor of vectors. Each row is an activation tuple — the AI analog of a geo-atom.
Stage 4

Define a working Hyperspace Reference System

Specify the CS (dimensions, units, normalization) and a tentative semantic datum. Early-stage: a fixed set of probe concepts (is_a_country, is_a_river, is_a_road) with measured directions. Over time, align datums to an ontology such as Common Core or the Spatial Web HSML.

Deliverable: HRS specification; vector of probe directions; versioned datum manifest.
Stage 5

Discover candidate features

Train an SAE on activations from a chosen layer (or on final embeddings if no layered model is used), recover monosemantic latents, rank by activation density and reconstruction loss. Complement with linear probes for predefined concepts and density-based clustering for emergent groupings.

Output: candidate-feature catalog — each with description, activation direction, top examples, tentative name.
Stage 6

Validate features across models

Test whether each candidate is a property of the world rather than a quirk of one model. Apply Centered Kernel Alignment or canonical correlation against features from at least one other model trained on related data. Use model stitching for stronger evidence.

Theory: operationalizes the Platonic Representation Hypothesis. Features that recur in independent models reflect reality.
Stage 7

Cartographically visualize and label

Project validated features to 2D / 3D using UMAP or related methods, render in Embedding Atlas or a similar viewer, apply automated labels (Natural Language Autoencoders, or an LLM autointerp pipeline as used by Neuronpedia). Subject-matter experts review and rename. Implements UDG §6.2.2.

Output: human-readable atlas of features, each linked to its activation evidence.
Stage 8

Assign Hyperspace Entity Locators and catalog

For each validated, named feature, mint a HEL referencing the HRS and a stable identifier. Publish using a schema modeled on ISO 19110, bound to the Spatial Web (HSML / HSTP), so other systems can resolve and reuse features. Record cross-model alignment evidence and HRS-to-HRS transformations.

Deliverable: published, citable feature catalog. Each entry: HEL, name, definition, evidence, alignment record.
Stage 9

Operate, monitor, and govern

Use cataloged features downstream: as steering vectors for generative models, as variables for spatial analysis, as queryable entities for AI agents. Monitor feature drift as models are retrained. Apply IEEE P2874 governance — norms, contracts, ratings — to consequential features (e.g. those covering deception, bias, dangerous content).

Closure: features become first-class Spatial Web entities under the same governance as any other ENTITY.

Workflow at a glance

Mapped to GIS and to the UDG

StageOutputGIS analogUDG concept
1 ScopeScope documentGeographic area of interestENTITY type selection
2 IngestCleaned, indexed dataSpatial databaseSource DOMAINS
3 EmbedVector storeCoordinate transformationSpatial embedding (§6.2)
4 HRSReference-frame specCRS = CS + DatumHRS = CS + semantic datum
5 DiscoverCandidate featuresObject extractionFeature candidates in hyperspace
6 ValidateCross-model evidenceGround truthingDatum-to-datum transformation
7 VisualizeAtlas + labelsCartographic mapCartographic visualization (§6.2.2)
8 CatalogPublished features + HELsISO 19110 catalogHEL-addressable ENTITIES
9 OperateUsed, monitored, governedGIS in productionSpatial Web governance (P2874)

Why this matters

Engineering Implications

Treating hyperspace features as first-class entities closes the long-standing gap between "embedding" and "feature".

🧭

Reference frames for AI

Just as WGS-84 turned latitude/longitude into a globally interoperable system, a semantic-datum HRS turns embedding coordinates into something models can share. Without it, every embedding is provincial.

🗂️

Catalogable features

Features that recur across independent models can be named, cited, and reused. ISO 19110-style feature catalogs for hyperspace would let one team's "fraud-detection direction" be the same direction another team trusts.

🛰️

Geo and AI unified

Prithvi and SatCLIP embed Earth-surface places into hyperspace. SAEs extract concepts. The same workflow handles both because they share Goodchild's atomic form.

🛡️

Governance with traction

Once safety-relevant features (deception, bias, dangerous content) are catalogued and HEL-addressable, IEEE P2874 governance can attach norms, contracts, and ratings to them — instead of to opaque whole models.

Key References

Source Material

The survey draws on UDG functional requirements, Percivall's presentation, the Goodchild paper, and recent literature on representational convergence and mechanistic interpretability.

Related Work

Explore Further

GeoRoundtable work connecting AI hyperspace, Spatial Web standards, and geographic theory.

Make hyperspace features first-class

GeoRoundtable brings together expertise in geospatial standards, agentic AI, mechanistic interpretability, and philosophy of engineering. We help organizations build the bridges between embeddings and features that the next generation of AI infrastructure depends on.

Get in touch