About writing numbers…
(Written by a human fully aware of how austere this topic is)
You know how it is: everything is standardized. And if there’s one topic that should be a standard-bearer for this curious human habit, it’s probably the way we write numbers.
In fact, this subject really is overseen by the very serious International Bureau of Weights and Measures.
In 2019, this organization published the 9th edition of the “SI Brochure”, which describes the conventions used for writing various units and also numbers themselves (SI Brochure - BIPM), as evidenced by the following passage that I’m daring to paste here at the risk of losing you:
5.4.4 Writing numbers and the decimal separator The symbol used to separate the integer part of a number from its decimal part is called the “decimal separator”. In accordance with the decision of the CGPM at its 22nd meeting
(2003, Resolution 10), “the symbol for the decimal separator may be either the point on the line or the comma on the line.” The chosen decimal separator will be the one commonly used in the relevant language and context. If the number lies between +1 and −1, the decimal separator is always preceded by a zero. In accordance with the decisions of the CGPM at its 9th meeting (1948, Resolution 7) and its 22nd meeting (2003, Resolution 10),
numbers with many digits may be divided into groups of three digits, separated by a space, in order to facilitate reading. These groups are never separated by dots or commas.
However, when there are only four digits before or after the decimal separator, it is customary not to isolate a single digit with a space. The habit of grouping digits in this way is a matter of personal choice; it is not always followed in certain specialized fields such as engineering drawings, financial documents, and scripts that must be read by computers.
The format used to write numbers in a table must remain consistent within a single column.
Still here?
Bravo.
Use a space to group digits, not dots or commas. Beautiful. I dream of diving for an hour into that immaculate world where people say “hello” and “please” and write numbers according to the standard.
But beyond an hour, perfection becomes unbearable.
Any return to reality reminds us that the masses don’t care about standards—worse—have no interest at all in the publications of the International Bureau of Weights and Measures (ooohhh!!); the result being that some of us end up having to normalize unbearable bursts of creative number-writing if we want to make progress at work.
One tool to format them all…
All of them? No. At best… only a few. Because even with a will to be tolerant about inputs, it’s hard—if not impossible—to truly take “all formats” into account.
First, because the word “format” is a slippery little devil, hiding many meanings: writing style, language, choice of base, symbol system, writing direction, number of characters… Each of these has its own well-furnished apartment in the building called “format”.
Next, it’s also complicated because some rules make normalization steps mutually exclusive. We’re forced to draw a hard line on what will be recognized as a sequence corresponding to a number.
What we’ve kept here are base-10 formats commonly encountered in Europe, Switzerland, and the Anglo-Saxon world. Left aside, despite their usage, are a local convention found in India where the grouping of three begins at the end of the number (the lakh/crore system), and also a decimal notation where .65 is intended to be interpreted as 0.65.
Maybe that will come later.
Also left out: numbers written in Roman numerals; numbers written top-to-bottom or bottom-to-top; numbers in kanji, Eastern Arabic, Ancient Greek, and an infinite array of other things. I can sense your disappointment; don’t be: the tool stays simple.
So how does it work under the hood?
It’s ChatGPT that finds and converts the numbers for every visitor.
It costs a fortune, so please donate.
LOL. No.
At the core, there’s a somewhat special syntax called a
“regular expression”.
It looks like this:
/^[\d-]abc{1,3}/
In this prompt this example, we’re trying to match any string that begins (^) with a digit (\d
) or ([…]
) a hyphen (-
), followed by the sequence “abc” where the character “c” can repeat between one and three times (because it’s followed by {1,3}
).
Don’t look for a use case for this one: there isn’t any.
This syntax is used to describe patterns we want to search for in a character sequence. It’s a powerful tool through which most validation rules on the internet are implemented.
In other words, it’s the mechanism that makes us grumble when a form throws an error because the email or phone number format isn’t recognized—even when we didn’t want to disclose our email or phone number…
How does the tool handle decimal operations?
It’s a perfectly valid question. On computers, decimal numbers are handled via their binary representation. When performing operations such as addition, this introduces a very, very slight imprecision.
A well-documented example is 0.1 + 0.2, whose result in many programming languages is 0.30000000000000004.
This problem can accumulate if operations between non-rounded decimals are repeated.
That’s why our tool uses a well-known open-source library called decimal.js to handle rounding for each operation. It’s designed to offer reliable handling of information in scientific and financial applications.
Ultimately, why build a tool to format numbers and compute their sum?
To meet a need.
Some people work with heterogeneous documents in which numbers aren’t consistently formatted. They also need to perform operations on them. The most common is addition.
Traditional tools are designed to strictly process the expected input format to perform operations. As soon as things are inconsistent, simple copy-and-paste is no longer enough, and a cleanup step becomes necessary.
For lack of anything better, this cleanup is done with “find/replace,” and the expected result is often achieved in several passes—with just as many re-reads to ensure compliance. We quickly realize it’s a process that’s both heavy and anxiety-inducing.
By automating the formatting of different inputs while preserving the ability to control the accuracy of the output, this tool aims to be a quick, practical solution to lighten the cognitive load of those who deal with these operations.