ULID is Awesome for Object Identifiers

ULID is a method for generating numbers, they are large numbers -- 128bits. And this is the cool part: Any system can generate these numbers themselves and they will never conflict[0] This removes the dependency on the State for these silly numbers.

Secondly, large, unique value (like these ULID) are the kinds of things that are used by most inventory control/management/reporting types of systems. And this allows one number to remain on the Lot forever (but you still sub-lot out for each sale, this is a viable inventory control method we should embrace).

Thirdly, ULID removes the confusion with our current numbers, that is the 1/I/L or 0/O or profanity problem.

Fourthly, with large unique numbers, finding one is pretty easy, and typos will not likely match other numbers, which prevents miss-identification.

And these large numbers can still be managed/masked by a "friendly-refrence-id" -- like what Danielle is advocating for. But that is solution that is implementation specific, ie: for Danielle and GF customers. ULID is a step in the right direction, industry wide.

However the software in front (eg: Growflow, WeedTraQR, Kraken, Traceweed, LeafOps) implements some management in front of this numbers is their own issue. Using an incrementing integer is a proven anti-pattern of asset management, so really we should be looking at a better patterns for management. Perhaps we should separate the management of assets from the identification of assets -- which are two unique problems, yet not independent from each other.

[0] There is a 1 in 2^80[1] chance that we could generate a duplicate ULID, in the same microsecond, this is, effectively, not possible. [1] 2^80 is 1.21e+24 -- or 1,210,000,000,000,000,000,000,000 -- one point twenty one, giga-giga watts baby!!!

Organized Thoughts

  • super easy to make and use, huge number-space for the data-identifier, embedded time-stamp.
  • in a centralized method, the LCB might host a system that generates these numbers for example a single-file, no dependencies PHP script could be hosted on their existing web-infrastructure (powered by PHP/Drupal).
  • in a decentralized method, any integrator could use a pre-existing library available in any language.  Additionally, the method for making these numbers is very simple since it's basically base32(time() + rand());
  • the ability to identify an object type from simply inspecting the numbers was pointed out to be valuable, I agree and think there was consensus for this.  In a ULID, the last 16 characters are the random portion of the data and some libraries already to some "tricks" with the high-order bits -- we could adopt those as well.  Simply replacing the first to characters of the random segment with the identifiers currently in use by LeafData (or fix to a better mapping, TBD).   An identifier then could be represented like: '0123456789-PL-ABCDEFGHIJKLM'This format leaves the time-value fully intact and still provides 70 bits of entropy, which is really, really a lot. For any one implementing our "special" type-system the algo would not be strictly necessary -- just handy.  But the implementation would be simple for all of us (I think) -- it's just a replacing to chars. char s[] = "0123456789PLABCDEFGHIJKLM";s[10] = "P";s[11] = "L";printf("%s\n", s);
  • this brings us to tagging and scanning identifiers.  Because the space for these item is so huge, we can take a small segment of this to use for the unique/human readable portion which should be mostly unique across this whole space with only a few characters of the full identifier used (like how git-commit hashes are displayed).  Humans see a 10 or 15 character identifier -- computers would store and use the full 26 character string. The identifier above could be represented by simply putting PLABCDEFGH on the label.  This will fit nicely into C128 for the legacy scanners and software.  2D codes such as QR or 2DDM could be used and easily store the full character string.  C128 could also support the full 26 character string -- but the barcode would need to be wider than most de-facto standard labels. When generating 1000 IDs per second! a 10 characters string has a collision probability of 1% in an hour.At 15 characters the collision probability of 1% over a 319 day period.Of course, we don't make 1000 IDs per second, we're like 10,000 per hour maybe? Which, at 10 chars, takes days to get to that small collision probability and more than 300 years at 15 chars.  See this page for the maths: https://zelark.github.io/nano-id-cc/

We also get advantage against collision because not all identifiers are sent around from license to license -- there are millions of records for trees that would never collide at any processor or retailer since, at any given point any licensee only has a few tens of thousands of active records that would need to match the scan, which further shrinks the already small probability. And the collision is is completely eliminated because the full number would still be available, we are only talking about collision on the short-display-segment.  So, maybe a user has to add one more digit, or the application search-lookup would have to remember to sort matched results by new-est record on top - which is easy because of the built-in time sort-ability of ULID.

Originally Posted on 502 Cannabis Group