Spatial Web Features: World-to-Vec Realized

A workflow for identifying features in AI hyperspace — uniting Goodchild's general theory of geographic representation, the IEEE P2874 Universal Domain Graph requirements, and the convergent-features evidence from modern mechanistic interpretability research.

📐 IEEE P2874 / UDG §6.2 🧠 Sparse-autoencoder features 🗺️ Goodchild geo-objects in hyperspace

Artificial intelligence is changing the definition of space. The same mathematical machinery that embeds words, pixels, molecules, and locations in high-dimensional vector spaces is now used to embed the world itself — a development the IEEE P2874 Spatial Web community calls World-to-Vec.

This page synthesizes three threads. First, the foundations of spatial embedding in modern AI. Second, the IEEE / Spatial Web Foundation Universal Domain Graph (UDG) requirements for Hyperspace Reference Systems and entity embeddings. Third, Goodchild, Yuan and Cova's general theory of geographic representation — which we propose as the appropriate ontological lens for features in AI hyperspace.

Recent AI research — the Platonic Representation Hypothesis, the Linear Representation Hypothesis, and the scaling of sparse autoencoders to frontier models — demonstrates that independently trained models converge on common features. These features, accessible as directions or sparse latents in hyperspace, are the AI analog of geo-objects as defined by Goodchild. Treating them as first-class Spatial Web entities is what makes World-to-Vec engineerable.

Concept	Geographic Information Systems	AI / Spatial Web hyperspace
Atomic form	Geo-atom: <x, property, value> on Earth surface	Activation tuple: <v, property, value> in hyperspace
Reference frame	Coordinate Reference System (CS + Datum, e.g. WGS-84)	Hyperspace Reference System (CS + semantic datum)
Locator	URL, URI, geo-URI	Hyperspace Entity Locator (HEL)
Continuous representation	Geo-field (six standard discretizations)	Embedding manifold; activation field over data
Discrete representation	Geo-object / Feature (OGC Simple Features)	Sparse-autoencoder feature; linear concept direction
Aggregation principle	Tobler's First Law (proximity → similarity)	Platonic / distributional hypothesis (training proximity → representational proximity)
Phase-space construction	c(x) = f(z₁, …, zₘ) over m fields	Concept = partition of activation space along learned directions
Boundary indeterminacy	Membership function m(x); fuzzy classes	Polysemantic neurons; sparse-feature activation strength
Catalog standard	ISO 19110 Feature Catalog; OGC API – Features	(Emerging) SAE feature catalogs, Neuronpedia, Embedding Atlas

Concept

Geographic Information Systems

AI / Spatial Web hyperspace

Atomic form

Geo-atom: <x, property, value> on Earth surface

Activation tuple: <v, property, value> in hyperspace

Reference frame

Coordinate Reference System (CS + Datum, e.g. WGS-84)

Hyperspace Reference System (CS + semantic datum)

Locator

URL, URI, geo-URI

Hyperspace Entity Locator (HEL)

Continuous representation

Geo-field (six standard discretizations)

Embedding manifold; activation field over data

Discrete representation

Geo-object / Feature (OGC Simple Features)

Sparse-autoencoder feature; linear concept direction

Aggregation principle

Tobler's First Law (proximity → similarity)

Platonic / distributional hypothesis (training proximity → representational proximity)

Phase-space construction

c(x) = f(z₁, …, zₘ) over m fields

Concept = partition of activation space along learned directions

Boundary indeterminacy

Membership function m(x); fuzzy classes

Polysemantic neurons; sparse-feature activation strength

Catalog standard

ISO 19110 Feature Catalog; OGC API – Features

(Emerging) SAE feature catalogs, Neuronpedia, Embedding Atlas

Tool / method	What it does	Theory fit
Sparse Autoencoders (SAE)	Decompose model activations into sparse, monosemantic latents. Foundational to Anthropic's Scaling Monosemanticity.	Strong. Directly produces feature candidates — the AI analog of geo-objects.
TransformerLens	Library for accessing internal activations; standard substrate for SAE and circuit work.	Strong. Provides the activation tuples that SAEs and probes operate on.
Neuronpedia	Open platform hosting >50M SAE latents across many models, with autointerp explanations, search, steering, circuit-tracing demos.	Strong. A working analog of an ISO 19110 feature catalog for hyperspace.
Goodfire (Ember API)	Commercial platform turning SAE features into steering / analysis for production models. Backed by Anthropic.	Strong as a production realization; closed parts limit fully open verification.
Embedding Atlas (Apple, 2025)	Open-source interactive viewer for million-point embeddings with density clustering, automated labels, metadata filtering.	Strong for human-in-the-loop cartographic visualization (UDG §6.2.2).
UMAP / t-SNE / PCA	Dimensionality reduction for visual exploration and preprocessing.	Moderate. Useful but lossy; should never be the sole basis for feature definition.
Linear probes	Train a linear classifier from activations to a labeled concept.	Strong validator under the Linear Representation Hypothesis.
CKA / SVCCA / Model stitching	Quantify representational similarity between models; test whether one model's layer substitutes for another's.	Strong. Empirical evidence for or against convergence on each candidate feature.
Natural Language Autoencoders	Map activations to natural language and back; verbalize internal states.	Useful for automated labeling of discovered features.

Tool / method

What it does

Theory fit

Sparse Autoencoders (SAE)

Decompose model activations into sparse, monosemantic latents. Foundational to Anthropic's Scaling Monosemanticity.

Strong. Directly produces feature candidates — the AI analog of geo-objects.

TransformerLens

Library for accessing internal activations; standard substrate for SAE and circuit work.

Strong. Provides the activation tuples that SAEs and probes operate on.

Neuronpedia

Open platform hosting >50M SAE latents across many models, with autointerp explanations, search, steering, circuit-tracing demos.

Strong. A working analog of an ISO 19110 feature catalog for hyperspace.

Goodfire (Ember API)

Commercial platform turning SAE features into steering / analysis for production models. Backed by Anthropic.

Strong as a production realization; closed parts limit fully open verification.

Embedding Atlas (Apple, 2025)

Open-source interactive viewer for million-point embeddings with density clustering, automated labels, metadata filtering.

Strong for human-in-the-loop cartographic visualization (UDG §6.2.2).

UMAP / t-SNE / PCA

Dimensionality reduction for visual exploration and preprocessing.

Moderate. Useful but lossy; should never be the sole basis for feature definition.

Linear probes

Train a linear classifier from activations to a labeled concept.

Strong validator under the Linear Representation Hypothesis.

CKA / SVCCA / Model stitching

Quantify representational similarity between models; test whether one model's layer substitutes for another's.

Strong. Empirical evidence for or against convergence on each candidate feature.

Natural Language Autoencoders

Map activations to natural language and back; verbalize internal states.

Useful for automated labeling of discovered features.

Model / library	What it provides	Theory fit
Prithvi (NASA / IBM)	Earth Observation foundation model. Variants Prithvi-EO (6-band multispectral) and Prithvi-WxC (weather/climate). 100M – 2B parameters.	Strong. Embeddings of Earth tiles are simultaneously geo-objects and hyperspace features.
SatMAE / SatMAE++ / SpectralGPT / CROMA / SkySense	Self-supervised vision transformers for satellite imagery; spectral and temporal awareness.	Strong for raw embedding generation; less interpretable out-of-the-box.
SatCLIP (Microsoft Research)	Global, general-purpose location encoder. Contrastively aligns satellite imagery with coordinates using spherical-harmonics encoding.	Strong. A direct World-to-Vec primitive — raw location into a vector usable downstream.
GeoCLIP	CLIP-based location-image alignment for geo-localization. Random Fourier feature encoder.	Moderate. Tuned for geo-localization; less general than SatCLIP.
MOSAIKS / CSP / S2Vec	Alternative location/image embedding methods. S2Vec (2025) is self-supervised geospatial.	Moderate. Useful baselines and ablation comparators.
TorchGeo	PyTorch domain library: geospatial datasets, samplers, transforms, pretrained models.	Strong foundation for any geospatial embedding pipeline.
TerraTorch (IBM, 2025)	Fine-tuning toolkit for geospatial foundation models on TorchGeo + PyTorch Lightning. HPO, benchmarking, full workflows.	Strong for operationalizing GeoFMs.
GeoAI Python package	Higher-level wrapper that generates geospatial embeddings via TorchGeo foundation models for similarity search, clustering, change detection.	Strong as an applied-side runtime.
H3 (Uber) / DGGS	Hexagonal hierarchical spatial index; basis for the OGC AI-DGGS Disaster Pilot (2025) exposing DGGS-indexed data via OGC APIs for AI agents.	Strong as a cellular indexing layer beneath any HRS — directly cited in UDG §6.1.

Model / library

What it provides

Theory fit

Prithvi (NASA / IBM)

Earth Observation foundation model. Variants Prithvi-EO (6-band multispectral) and Prithvi-WxC (weather/climate). 100M – 2B parameters.

Strong. Embeddings of Earth tiles are simultaneously geo-objects and hyperspace features.

SatMAE / SatMAE++ / SpectralGPT / CROMA / SkySense

Self-supervised vision transformers for satellite imagery; spectral and temporal awareness.

Strong for raw embedding generation; less interpretable out-of-the-box.

SatCLIP (Microsoft Research)

Global, general-purpose location encoder. Contrastively aligns satellite imagery with coordinates using spherical-harmonics encoding.

Strong. A direct World-to-Vec primitive — raw location into a vector usable downstream.

GeoCLIP

CLIP-based location-image alignment for geo-localization. Random Fourier feature encoder.

Moderate. Tuned for geo-localization; less general than SatCLIP.

MOSAIKS / CSP / S2Vec

Alternative location/image embedding methods. S2Vec (2025) is self-supervised geospatial.

Moderate. Useful baselines and ablation comparators.

TorchGeo

PyTorch domain library: geospatial datasets, samplers, transforms, pretrained models.

Strong foundation for any geospatial embedding pipeline.

TerraTorch (IBM, 2025)

Fine-tuning toolkit for geospatial foundation models on TorchGeo + PyTorch Lightning. HPO, benchmarking, full workflows.

Strong for operationalizing GeoFMs.

GeoAI Python package

Higher-level wrapper that generates geospatial embeddings via TorchGeo foundation models for similarity search, clustering, change detection.

Strong as an applied-side runtime.

H3 (Uber) / DGGS

Hexagonal hierarchical spatial index; basis for the OGC AI-DGGS Disaster Pilot (2025) exposing DGGS-indexed data via OGC APIs for AI agents.

Strong as a cellular indexing layer beneath any HRS — directly cited in UDG §6.1.

Stage	Output	GIS analog	UDG concept
1 Scope	Scope document	Geographic area of interest	ENTITY type selection
2 Ingest	Cleaned, indexed data	Spatial database	Source DOMAINS
3 Embed	Vector store	Coordinate transformation	Spatial embedding (§6.2)
4 HRS	Reference-frame spec	CRS = CS + Datum	HRS = CS + semantic datum
5 Discover	Candidate features	Object extraction	Feature candidates in hyperspace
6 Validate	Cross-model evidence	Ground truthing	Datum-to-datum transformation
7 Visualize	Atlas + labels	Cartographic map	Cartographic visualization (§6.2.2)
8 Catalog	Published features + HELs	ISO 19110 catalog	HEL-addressable ENTITIES
9 Operate	Used, monitored, governed	GIS in production	Spatial Web governance (P2874)

Stage

Output

GIS analog

UDG concept

1 Scope

Scope document

Geographic area of interest

ENTITY type selection

2 Ingest

Cleaned, indexed data

Spatial database

Source DOMAINS

3 Embed

Vector store

Coordinate transformation

Spatial embedding (§6.2)

4 HRS

Reference-frame spec

CRS = CS + Datum

HRS = CS + semantic datum

5 Discover

Candidate features

Object extraction

Feature candidates in hyperspace

6 Validate

Cross-model evidence

Ground truthing

Datum-to-datum transformation

7 Visualize

Atlas + labels

Cartographic map

Cartographic visualization (§6.2.2)

8 Catalog

Published features + HELs

ISO 19110 catalog

HEL-addressable ENTITIES

9 Operate

Used, monitored, governed

GIS in production

Spatial Web governance (P2874)

Spatial Web Features: World-to-Vec Realized

From Word2Vec to World2Vec

Spatial embedding methods from AI

World2Vec in the Spatial Web

AI is changing the definition of space

Common features across models

Features as geo-objects in hyperspace

Hyperspace Reference Systems

GIS Features and AI Hyperspace Features

An Ecosystem for Feature Identification

Mechanistic interpretability and embedding analysis

GeoAI foundation models and libraries

Feature Identification via Spatial Embedding

Scope and define the world to embed

Ingest and prepare the source data

Generate or load embeddings

Define a working Hyperspace Reference System

Discover candidate features

Validate features across models

Cartographically visualize and label

Assign Hyperspace Entity Locators and catalog

Operate, monitor, and govern

Mapped to GIS and to the UDG

Engineering Implications

Reference frames for AI

Catalogable features

Geo and AI unified

Governance with traction

Source Material

Explore Further

Make hyperspace features first-class