Eosin II: Towards a Unified Standard // Thomas Havlik

Digital pathology has quietly fallen decades behind every other data-intensive field. Despite the enormous clinical and research value locked inside whole slide imaging, the surrounding software ecosystem looks more like a collection of isolated experiments than a coherent platform. Every lab, institution, and vendor ends up reinventing the same infrastructure - storage, serving, annotation, search, and analysis - because there is no shared substrate capable of supporting them.

What exists today is a patchwork: desktop-first tools that were never meant to run in the cloud, university prototypes that don't survive beyond the papers they support, and institutional deployments that can't be adopted elsewhere. These tools are valuable, often impressive, and sometimes groundbreaking - but they don't compose into a real infrastructure layer. They were not designed for multi-user concurrency, automation, GPU inference, real-time APIs, or global accessibility. They assume a single machine, a single user, or a single institution.

Meanwhile, the broader software world has moved on. We now take for granted things like horizontally scalable services, stateless workers, container orchestration, fast object stores, GPU-accelerated inference, streaming logs, and cloud-native APIs. None of these ideas are new. None are experimental. Yet they are almost entirely absent from pathology.

This gap matters. Without a common foundation, the field cannot standardize workflows, build shared tooling, or meaningfully collaborate across institutions. Every new project starts from zero; every promising idea stalls at the infrastructure layer. Pathology needs a unifying architecture - something neutral, modular, permissively licensed, and engineered for real-world scale.

This is the problem the Eosin platform is designed to address.

Eosin isn't trying to replace existing tools; it is trying to give them a foundation. A horizontally scalable, cloud-native substrate for tiles, metadata, models, viewers, and analysis pipelines. A system where hot paths run in non-GC languages, where components can be swapped like CNCF projects, and where institutions can deploy infrastructure without legal or licensing friction. A platform that treats whole slide imaging as a first-class, programmable medium rather than a desktop artifact.

To understand the gap Eosin is stepping into, it's worth taking an honest look at the existing landscape - and why none of it can serve as the basis for a shared, modern computational pathology stack.

A Brief Review of Related Projects (non-exhaustive)

QuPath is an actively maintained Java desktop application widely used for research workflows. Its GPL licensing and desktop-oriented architecture make it well suited for individual analysis but not intended as a backend service component or distributed infrastructure layer.
CellProfiler is a mature and actively maintained Python desktop application with an excellent license. Its design centers on batch-oriented local analysis pipelines rather than long-running service deployments or horizontally scalable data backends.
ilastik provides interactive machine-learning workflows in a desktop environment. It is LGPL-licensed and optimized for user-driven, session-based analysis rather than distributed, service-oriented processing.
Trident focuses on patch-level slide analysis. Its license restricts derivative works and integration into broader systems, reflecting research-focused distribution rather than infrastructural deployment.
PathML is a Python toolkit under GPL, oriented toward research pipelines and experiment iteration. It is not positioned as a multi-tenant or horizontally scalable backend service, and its architecture reflects typical research-tool assumptions.
Digital Slide Archive from Kitware provides a permissively licensed web-based environment with worker components. Its tile server, girder + large_image, can be replicated, but coordinated distribution, sharding, and large-scale concurrency typically rely on shared storage layers (NFS/S3), which become the dominant scaling boundary.
Omero combines Python, Java, and C++ services to provide a featureful data-management platform for microscopy. Its architecture assumes a single globally consistent metadata database and a shared repository filesystem, which is well-aligned with institutional data stewardship but not designed around horizontal sharding or cloud-native distribution.
Cytomine is a Java/Python-based platform under Apache 2.0, offering a modular service layout and a Helm chart for Kubernetes deployment. Its core components (PostGIS, PIMS, metadata services) operate as a unified logical instance with shared state, prioritizing collaborative annotation and reproducibility. Horizontal scaling of the data plane would require architectural changes beyond the current design goals.

These tools weren't designed as multi-user, concurrent, globally accessible, cloud-native services. They were built as local power-user tools, research sandboxes, or institution-bound deployments. None of them assumed a world where GPU inference is cheap, global collaboration is normal, Kubernetes is boring, and developers want real-time APIs - not desktop binaries.

As a developer that's relatively new to this space, exactly where am I supposed to build? The ecosystem is extensively fragmented. Each part solves a subset of the problem, but none fill the infrastructure-shaped hole. Those that come closest fail to exploit available performance, likely because the backend has since been revolutionized with containerization, orchestration, and cheap compute. Vertically scaling something like Digital Slide Archive is a viable strategy - even for larger institutions - but the system chokes when you consider usage at global scale. The result is an ecosystem optimized for experimentation, not for robust deployment.

Towards a Unified Standard

What this space needs is standardization. This takes the shape of computational pathology infrastructure that does everything correctly.

If this field had the equivalent of a CNCF project - neutral, modular, permissively licensed, and built for scale - the entire ecosystem could align on shared protocols, APIs, and expectations. That standard is what Eosin aims to become.

True Vertical and Horizontal Scaling

Every component of the platform that gets hot under typical usage needs to scale both horizontally and vertically. Horizontal scaling only matters if nothing collapses into a single I/O chokepoint (e.g., shared object storage or a single metadata DB). Fault tolerance, predictable performance, and HPC applications can only be realized via horizontal scale-out. The maximum system size cannot be limited by the amount of hardware that can fit in a single box; we need industrial scale.

Moreover, professional applications warrant High Availability. HA requires either stateless services that scale horizontally or some sophisticated master/replica orchestration. From an operation's perspective, stateless services are always preferable.

Hot Paths in Non-GC Languages

It's not that Python, Java, or Go are unusable. It's that your hot code paths - tile decode, compression/decompression, on-the-fly analysis, viewport-aware tile selection - must remain predictable under load. Garbage Collection makes worst-case latency unpredictable, especially under load. Hot paths belong in languages like Rust, C++, Zig, etc.

On the operations side, the memory overhead of Python and Java can be substantial. You shouldn't need multiple gigabytes of memory just to run and use a minimal stack as a single user. Desktop-era memory footprints don't translate into scalable multi-user services.

Minimal Abstraction Tax

The idea of "zero-cost abstraction" needs to be applied at the architectural level. If your deployment is slow, you should know with confidence it's not due to slow code. User-facing features (e.g. tile delivery, slide analysis) should achieve near-theoretical maximum speeds in benchmarks. The system should be fully attributable in end-to-end profiling: no mystery waits or pauses.

Permissive Licensing

This is a big one because restrictive licensing prohibits derivative or integrated works outside the original project's ecosystem. Corporations, institutions, hospitals, startups, and individual entrepreneurs have to be careful when depending on GPL or LGPL code. For many (possibly most?) it's a non-starter. CC BY-SA-ND is explicitly a non-starter for derivative work of virtually any kind, yet it still pops up. These choices are often dictated by institutional policy rather than developer preference. In any case, the contributions such projects make are effectively siloed.

Modern AI/ML pipelines and pathology tooling need to exist in mixed commercial + academic ecosystems. GPL/LGPL/ND licenses break composability. CNCF projects succeed because permissive licensing invites universal adoption. Pathology desperately needs a similar foundation, but restrictive licensing guarantees permanent fragmentation.

Trivial Extensibility & Interoperability

The public API comes first, and it can't be a monolith that forcibly links every module at build time. Components need to be independently deployable and maintainable. Developers need obvious decoupling and clean APIs to conclude that an ecosystem offers a low-friction experience.

If digital pathology is ever going to mature into a proper ecosystem - the way cloud-native infrastructure did under CNCF stewardship - it needs a foundational project that sets the bar. This is the motivation behind Eosin.

-Tom