Why a locate hook is unnecessary

In creating the upgrade path in SystemJS for permitting URLs as module identifiers, it has turned out best to deprecate the locate hook. This may be over-explaining the obvious or dwelling on decisions already made, but coming to this conclusion has taken me a surprising amount of consideration so I'd like to describe the reasoning behind this here to retain some reference for the decision and attempt to leave somewhat reasoned feedback.

The question that started this was whether we should re-introduce the locate hook into the specification. Re-introducing the locate hook will enable normalize to normalize module names into a custom schema that can be defined by the loader implementation and form the string names that are stored in the module registry. Locate then handles the final resolution into URLs that can be fetched.

The justification for considering this was to retain compatibility with AMD-style module loading where we have a baseURL-schema. In this schema, modules names are stored in the registry always as plain names relative to some baseURL. jspm also uses its own schema in the registry to refer to modules such as `npm:module@x.y.z`.

There is a draw to having the sense of storing these universal schema names inside of the module registry as a portable naming system but I'd argue this lure is mostly one of elegance as opposed to practicality.

I've implemented the baseURL-schema normalization in the current SystemJS, and have been experimenting recently with at least three different complete implementations of normalization of a custom schema alongside URLs (the new requirements of the spec, which completely make sense).

_In the end, trying to make a custom schema work alongside URLs in the same registry space, ends up causing more issues, for no practical gain._
#### Dot Normalization

As soon as we allow both AMD-style module IDs relative to some baseURL alongside URLs, the first issue we hit is the need to define "dot normalization". This basically means that relative normalization needs to be defined for the subset of both URLs and non-URLs.

It's not a lot of code, but it is the first sign here that we're duplicating work.
#### Non-uniqueness

The next issue we have is the non-uniqueness of our schema. This issue here is that `import '/local/path.js'` is now distinct and separate to the module at `import 'local/path.js'` in the scenario where `baseURL='/'` (one resolves as a name and the other as a URL). This will cause confusion as we are allowing the same unique module to be referred to by two different possible names breaking a key principle of the registry being unique.

Having two ways to refer to the same module is a bug waiting to happen, causing problems for configuration (which variation do we configure?), creating the possibility of a module being executed twice, and interfering with bundling workflows.

Expecting the user to know that they should write `import('x')` instead of `import('./x')` arbitrarily is a hard ask.

This leads down a road of trying to catch these uniqueness issues in the normalization pipeline itself, which then ends up becoming URL normalization, followed by a reverse normalization into the schema.
#### All schemas have the non-uniques problem

Schema non-uniqueness with URLs applies to any custom schema chosen that maps to URLs, not just the baseURL system. Even if we come up with the perfect custom naming schema, as soon as we want that schema to co-exist alongside URL requests we hit these issues.

In order to retain unique identification and configuration of modules, one ends up normalizing from schema space into URL space, and then reverse-normalizing back into schema space at the end of normalize, before resolving back into URLs from the schema in locate, just in order to have our perfect schema names stored in the registry.

Add to this the idea of a configuration space consisting of both schema and URL identifiers as well, and this compounds the problem even further.

One ends up swapping between spaces in such a way that URLs become the primary space anyway, and we're just pretending that the schema is the primary space.
#### Beyond the baseURL-schema

Another common issue with baseURLs is that when back-tracking below the baseURL, we end up with "normalized" paths looking like "../../module.js", which is really not acceptable for a naming system either.

If we return to the question of what AMD's baseURL schema is really trying to accomplish, the core principle is one of portability of modules, which is completely in agreement with what we should be aiming for. URLs are obviously not a portable naming system for modules (modules can move between environments and hence change URL), so the question is simply how to maintain portability of modules in spite of using URLs?
#### URLs are the schema

It turns out to be very simple to do this - normalization is seen as the process of converting a "portable module name" into an "environment-specific name". And the most environment-specific name is the URL which we store in the module registry.

The concept that we need to have a registry based on our perfect portable schema is flawed. We still keep our schema if we like - which we can bundle into just the same:

``` javascript
System.register('custom:portable/schema', ...);
```

Where the name above name is normalized into a resolved name of `http://www.site.com/packages/custom/portable/schema.js` by the loader when being processed and stored in the registry (bundle names are now treated as unnormalized).

There is no big loss that the registry now contains this value under an environment-specific URL instead of the schema. One can just accept that any lookup into the registry must pass through a normalization phase first:

``` javascript
// lookup a module by its schema name by passing through a simple URL-normalization first
Reflect.loader.lookup(Reflect.loader.schemaToURL('custom:portable/schema'));
```

If an implementor really wants to use a custom schema, make the schema URL-based and add the implementation to the fetch hook so everything works out well anyway:

``` javascript
import 'custom:///portable/schema'
```

The other consequence of using URLs is that configuration then always goes through a normalization phase itself:

``` javascript
Reflect.loader.configure({
  module: {
    './some/local/module.js': {
      moduleFormat: 'CommonJS'
    }
  }
});
```

The above would normalize the above configuration into `http://site.com/local/path/some/local/module.js`.

The benefit of this is that users don't need to understand the special naming schema - they can just reference modules as URLs exactly as they expect and correctly configure things without needing to have studied the system in detail.

One implication here for implementors is that build systems wanting to use portable naming system schemas need to reverse-map the schemas at build time from URLs in the registry, but that is a very minimal cost and a straightforward 1-1 mapping. 

I've yet to hear a single use case that is lost by enforcing that the registry is only to store URLs - the justifications for allowing the registry to store a custom schema seem to cling to dated models due to history, while there are many benefits as described to both implementors and users in enforcing URLs as the schema and keeping the locate hook deprecated.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why a locate hook is unnecessary #52

Dot Normalization

Non-uniqueness

All schemas have the non-uniques problem

Beyond the baseURL-schema

URLs are the schema

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why a locate hook is unnecessary #52

Description

Dot Normalization

Non-uniqueness

All schemas have the non-uniques problem

Beyond the baseURL-schema

URLs are the schema

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions