Archives: 2023-03-28

Phase Behaviour of the Duffing Oscillator

The Duffing Oscillator is known as one of the simplest models for studying the response of a non-linear damped oscillator to an external driving force. Besides its relevance to structural mechanics, it has enjoyed much popularity among physicists due to its dynamic behaviour including dynamic symmetry breaking and chaotic solutions.

For an introduction, see C.L. Olson, M.G. Olsson, Dynamical Symmetry Breaking and Chaos in Duffing’s Equation, Am. J. Phys. 59, 907-911 (1991). Following the assumptions and conventions of this paper, the Duffing equation – after appropriate rescaling – can be written in its normal form

\(\ddot{x} + 2\gamma\dot{x} + x + x^3 = f \cos(\Omega t)\)

This equation models a damped driven oscillator equipped with a hard spring (recoil force is more than proportional to the displacement for large displacements). Starting at arbitrary initial values \(x_0,\dot{x}_0\), the damping \(\gamma\) ensures that any transient behaviour will die down and the solution will converge to a stationary or chaotic attractor. Different initial values may lead to different attractors; the set of initial conditions leading to the same attractor is called its basin.

A common method to study the behaviour of the attractor as a function of the driving frequency \(\Omega\) is to look at the projection of its Poincaré section onto the \(x\) axis at a given phase \(\varphi\) of the driving force, i.e. \(x(t_i)\) for \(t_i = \frac{2\pi i}{\Omega}+\varphi, i \in \mathbb{N}\), essentially graphically representing histograms of how frequently a certain sub-interval of the \(x\)-axis is hit by the solution. In the resulting graph, each pixel represents a neighborhood of a given \(x\)-value and a given frequency \(\Omega\), coded by color (black = frequent hits, white = no hits).

Commonly the phase is taken to be \(\varphi=0\). As a typical case, for \(\gamma=0.1, f=25\) and \(\Omega \in [1.25,1.46]\), the Poincaré section for a set of runs with different initial conditions (\(x_0\in[-5.1,5.1], \dot{x}_0=0\) will look as follows (\(x\)-axis horizontal, \(\Omega\)-axis vertical).:

Here one can see cascades of bifurcations leading to regions of chaotic behaviour, partially overlayed with periodic solutions in the same frequency range.

But what happens when we consider \(\varphi\neq 0\)? Qualitatively, there are no substantial changes, but look how nicely the attractors respond to a full sweep of the driving force \(\varphi=0\ldots 2\pi\)!

Implementation notes:

The Duffing equation is solved numerically using a classical fixed step 4th order Runge-Kutta solver written in Fortran. The output is processed and graphics created using Python‘s Pandas and Matplotlib packages. After downsizing the images using Imagemagick, the animation was created with Ffmpeg.

The parameter space was investigated using a wide range of model (\(\gamma, f, \Omega\), initilal conditions) and numerical (step width, decay of transients, …) parameters. It was no surprise that the interesting behaviour occurs at the set of parameters used in most of the literature.


Regular Expressions and Prime Factors

How do you explain divisibility of numbers to a six-year old? Cuisenaire Rods provide a haptic, pre-numeracy means of conceptualizing division (rods of identical length \(m\) are added \(k\) times to equal a rod of given length \(n\)) and divisibility (for which \(n\) and \(m\) can this task be solved?).

So, put a rod of length \(n\) on the table, then begin adding rods length \(m=2\). If these do not fit, continue with \(m=3\), and so on.

The number \(9\) is not divisible by \(2\), but by \(3\).

The number \(7\) can be divided by no number \(1<m<7\), thus is prime.

Can we automate this activity using an equivalent process on a computer, say, e.g regular expressions? Consider the following Python function:

from collections import Counter
import re
def f_re(n):
  f = Counter()
  while (m := re.fullmatch(r'(11+?)\1*', '1'*n)):
    l = len(m.group(1))
    f[l] += 1
    n //= l
  return dict(f)

Guess what it is doing…

How it began: the following curiosity showed up in my feed: the regular expression
^1?$|^(11+?)\1+$
can be used to test if a number \(n\) is prime. The explanation is simple and equivalent to the Cuisenaire rods primality test presented above: after converting a number \(n\) into its unary string representation \(s_n={\tt 111}\ldots {\tt 1}\) (\(n\) times), you check if a substring \(s_m\) of length \(1<m<n\) exists that evenly divides into \(s_n\). If so, \(n\) is composite, else it is prime. This is what the right-hand alternative of this regex tests. The left-hand alternative treats the special cases \(n=0,1\), which are non-prime by definition.

The labor of searching for \(m\) is left to the regex engine. The ? in (11+?) makes this pattern non-greedy, so the search starts at \(m=2,3\ldots\). The backreference \1+ requires that whatever (11+?) matched must repeat \(k\) times for some \(k\ge 1\) (this corresponds to using rods of equal length), and the anchors ^ and $ ensure that the pattern matches only the full string \(s_n\). In Python, re.fullmatch() implies these anchors, so this can be further simplified into

import re
ip = re.compile(r'1?|(11+?)\1+')
def isprime_re(n):
  return not ip.fullmatch(n*'1')

Have you ever seen a more minimalist primality test? Obviously, we can expect this to be inefficient to the point of silliness. In particular for primes, the regex engine will supposedly1 iterate over all integers \(m=2,3,4\ldots n\), where a search \(m=2,3,5,7\ldots\lfloor\sqrt{n}\rfloor,\; m\in\mathbb{P}\) would suffice.

How does this compare in practice? In what follows, we compare the execution times for non-greedy r'1?|(11+?)\1+' and greedy r'1?|(11+)\1+' regex variants in Python. Given that Perl’s regex has become a model for many other regex engines and is reputedly quite efficient, we also look at the Perl version

sub isprime_re {
  my ( $n ) = @_ ;
  ( "1" x $n ) !~ /^1?$|^(11+?)\1+$/ + 0 ;
}

(and analogously for the greedy variant), further a moderately simple primality test in Python that skips over multiples of 2, 3, and 5, based on an algorithm I originally found in a collection of HP calculator programs (HP 67/97 Math Pac page L01-02 “Factors and Primes”), also known as Wheel Factorization (here simplified for primality test),

from itertools import chain, cycle
def isprime(n):
  if n < 2:
    return False
  ni = chain([1,2,2],cycle([4,2,4,2,4,6,2,6]))
  i = 2
  while i**2 <= n:
    if n % i == 0:
      return False
    else:
      i += next(ni)
  return True

and finally Sympy’s built-in primality test sympy.ntheory.primetest.isprime(n). The following double-logarithmic graph shows execution times on an old Intel Xeon server running at 2.6 GHz for all these tests, along with auxiliary black lines to indicate \(O(n)\) and \(O(n^2)\) complexity.

As expected, the times for the wheel and sympy algorithms remain at least an order of magnitude below the regex variants. We will ignore those for the rest of this article. For each of the four regex implementations (Perl, Python combined with greedy, non-greedy) we see distinct clusters of points, showing clear differences in complexity. For Perl, greedy (orange) and non-greedy (blue) times are similar, and the non-greedy Python variant (green) outperforms Perl. All these appear to be roughly of complexity \(O(n)\) as indicated by the black lines (maybe with an additional factor \(\log(n)\), which would become obvious at much larger values of \(n\)). Python’s greedy execution times (red) are much higher and appear to have quadratic complexity \(O(n^2)\).

The discontinuities in some of the quasi-curves are due to fluctuations in the execution of the program and are not reproducible. They could be eliminated by repeated runs taking the minimum execution time for each case.

The fact that for each method the data points do not form smooth curves but are aggregated in families of curve-like clusters reflects how the prime factor composition of these numbers affects the matching algorithm, as illustrated by the graph below. It animates2 run times for composite numbers \(n\) with increasing smallest prime factors, i.e. \(n\in k\times\mathbb{P},\,k=2, 3, 5, 7, 11, 13, 17\) and prime numbers \(n\in\mathbb{P}\).

This shows that the regex runtime depends on the smallest prime factor of the composite numbers, and that prime numbers take most time to test with – unsurprisingly – no difference between greedy and non-greedy patterns. The fact that for composite numbers, Perl’s run times differ little between greedy and none-greedy, whereas the greedy Python regex starts off with \(O(n^2)\), shows that Python’s regex engine, while maybe modeled after Perl’s, clearly is an independent implementation.

  1. … or will it? Let’s keep this question open for another exploration… ↩︎
  2. The animation does not work on all browsers. ↩︎

Microcomputer Trigonometrics Archaeology (Part 4)

In Part 3, we (sort of) confirmed that MS-Basic most probably used an 11th order minimax polynomial from Hart’s book Computer Approximations for computing the sine function. While I failed to dig up any information on the implementation of math libraries of early mainframes such as IBM’s and CDC’s (purportedly, they used CORDIC), the ROM of Sinclair Basic has been published.

Sinclair approximates the sine function as a linear combination \(S(t)=t \sum_{i=0}^n c_i T_i(2t^2-1),\; t:=\frac{2}{\pi}x\) of Chebyshev polynomials.

Chebyshev polynomials are orthogonal with respect to a weight function \(\frac{1}{\sqrt{1-x^2}}\), and they individually share a desirable property with the error of a Minimax approximation: they are equioscillating between \(-1\) and \(1\). If a Chebyshev series expansion of a function converges quickly, the error of an \(n\)th order truncation will be dominated by the first omitted term of order \(n+1\), and thus will be very close to the Minimax polynomial of \(n\)th order.

By a rationale similar to that described in Part 2, factoring out \(x\) and expanding in terms of \(T_i(x^2)\) ensures the symmetry of the approximation and optimizes for relative error. Consistent with the accuracy required for 40-bit floats, Sinclair used a \(5\)th order Chebyshev polynomial, effectively leading to an \(11\)th order approximation of \(\sin(x)\). To obtain the numerical values for the coefficients, we note that, aside from some limiting cases, Sinclair’s floating point format is the same 40-bit (8-bit biased exponent, 32-bit mantissa with implicit leading 1) representation as MS-Basic, so we can extract the coefficients from the source code.

Here we compare the relative errors for Sinclair Basic with Hart’s 9th and 11th order SIN 3341 and SIN 3342 polynomials (which are identical to Minimax polynomials obtained by Remez optimization), and results of truncated Chebyshev expansions. As expected we see that Hart and Chebyshev approximations are very close to each other, but Sinclair’s result is worse than Hart’s 11th order approximation by more than an order of magnitude. Adding Hart’s 3342 and Sinclair’s sine approximation using 40-bit floats to the comparison one sees that the Sinclair approximation suffers from error induced by inaccurate coefficients worse than roundoff alone would dictate.

Finally, it is worth noting that on page 115 of Hart’s book we find the identity \( \sin x = \sin \frac{x}{3} * \left( 3 – 4 \sin^2 \frac{x}{3}\right) \), originally intended for range reduction, which can also be used to iteratively compute \(\sin x\) from arguments sufficiently small that \(\sin x \approx x\) to machine precision.

def sini(n, x):
  s = x / 3**n
  for _ in range(n):
    s *= 3-4*s**2
  return s

In the range \(0\le x \le \frac{\pi}{2}\), 18 iterations are sufficient to produce results accurate to double precision.

Implementation notes (see also Part 2): Chebyshev polynomials were implemented using the well-known recurrence relation invented by Chlenshaw. The Chebyshev coefficients for \(\sin(\pi x / 2) \approx x \sum c_n T_n(x^2)\) were approximated using the discrete transformation described in Example 1 of the Wikipedia article.

The Remez coefficients were computed using Sam Hocevar’s LolRemez program ( lolremez -d -r '-1:1' "sin(x*pi/2)" "sin(x*pi/2)" ) and verified to agree with Hart’s approximations.

The Sinclair coefficients in text representation were originally found in a web page that seems to be no longer published. The only place they still can be found is in the Agena programming language web site. They were cross-checked by inserting into above-described Chebyshev routine and reverse-engineering the computation from the source code. Their origin is most likely Clenshaw’s article Polynomial Approximations to Elementary Functions (note the different normalization of \(T_n\) for \(n\ge 1\)). What follows is the Python code used for the present post.

import numpy as np
def chebp(c, x, a, b):
  """chebyshev polynomial(x) from given coefficients c in [a,b]"""
  u = ( 2*x - (a+b) ) / ( b-a)
  t0 = np.ones_like(u)
  t1 = u
  f = c[0]*t0 + c[1]*t1
  for ci in c[2:]:
    t0, t1 = t1, 2 * u*t1 - t0
    f += ci*t1
  return f
def sin2cheb(x):
  c = [ 1.276278962, ] + [
          ci * 2 for ci in  ( -.142630785, 0.004559008,
                 -.000068294, 0.000000592, -.000000003, ) ]
  y = 2/np.pi*x * chebp(c, (2/np.pi*x)**2, 0, 1)
  return y

Further reading: There is a nice discussion about the history of early Microcomputer floating point implementations on the Stack Exchange Retrocomputing web site.


Microcomputer Trigonometrics Archaeology (Part 3)

In Part 2, we delved into the details of how and why truncated Taylor series are not the best choice for approximation elementary functions and found out that MS-Basic used better polynomials than Taylor polynomials. But typical annotations in source code listing seldom noted the obvious deviations from Taylor coefficients, and lacked any hints to the source of these deviations. This is what had sparked my curiosity.

An extensive search eventually led to the original source-code for Microsoft’s GW-BASIC interpreter, as of 1983 (according to the web site). The file GW-BASIC/MATH1.ASM contains the following coefficients (lines merged) for the sine function:

$SINCN:
;       COEFFICIENTS FOR SINE FUNCTION
;       SEE HART'S #3341
;       RELATIVE ERROR 8.27
203 111 017 333     #   6.283185272
206 245 135 341     #  -41.34167747
207 043 064 130     #   81.60223119
207 231 046 145     #  -76.57498378
206 036 327 373     #   39.71091766

These are coefficients (called SINCN here) for a 9th degree polynomial coded as 32-bit MBF floats (vs. 11th degree, 40-bit; byte representation varies between octal and hexadecimal in various sources).The binary values are identical to the 1978 BASIC M6502 8K VER 1.1 BY MICRO-SOFT [SIC] source, which contains both 32-bit MBF coefficients SINCON4 for a 9th degree polynomial and 40-bit SINCON5 for degree 11. The values in above GW-Basic comments are good for a little surprise – see below.

But the actual gem is the reference to “HART'S #3341”. The book Computer Approximations by John F. Hart & al. (1968) is a comprehensive textbook on theory and practice of approximations of elementary and special functions (square and cube roots; exponential, hyperbolic and trigonometric functions and their inverses; gamma, error, and Bessel functions, and elliptic integrals), including mathematical and algorithmic foundations how to arrive at optimal (i.e. minimum effort at a given precision) approximations ab initio, and extensive tables of coefficients for polynomial and rational approximations. Paraphrasing from the preface, this book takes another step (the first being Hasting’s 1955 book Approximations for Digital Computers) of turning the art of computer approximation into a science. I am lucky that my library has Hart’s book for loan.

Chapter 3 describes the theory of Least Minimum Approximations and the Remez algorithm. The idea is if you want to approximate a function \(f(x)\) by a polynomial \(p_n(x)\) with coefficients \(c_0, c_1,\ldots,c_n\) in an interval \([a,b]\), how to choose the coefficients \(c_i\) such that the worst-case error \(||f-g||_\infty := \max_{x\in[a,b]}|(f(x)-g(x)|\) (i.e. maximum absolute value of the error) is minimized. (This is sometimes also called a minimax approximation). Variants for rational approximations and minimizing the relative error are also given.

Almost half of the book consists of tables. Here we find SIN 3341 and SIN 3342, which are effectively (up to a scale transformation) coefficients for 9th and 11th degree approximation polynomials for \(\sin(x)\) that minimize the relative error for \(x\in[0,\frac{\pi}{2}]\). The counterparts for minimal absolute error are SIN 3301 and SIN 3302.

First let us compare the error graphs of the sine polynomial approximations using Hart’s coefficients vs. SINCON of MS Basic. First, Hart’s 11th order polynomials vs SINCON5:

We see that the Hart polynomials are perfectly balanced; as expected the oscillations of SIN 3342 are smaller close to the origin to keep the relative error within consistent bounds. SINCON5 appears to be systematically off; we’ll return to this later.

Due to the symmetry, we can limit our attention to \(0\le x\le\frac{\pi}{2}\) and for the next graph, which compares the relative errors of the 9th and 11th degree approximations, we return to a semilogarithmic graph and additionally evaluate Hart’s polynomials at the respective floating point precisions for which they were used in the code (3341 for 32-bit, and 3342 for 40-bit).

While both SINCON4 and 5 polynomials systematically differ from Hart 3341 and 3342, they lie within the numerical limits of the respective floating point representations MBF32 and MBF40.

Note that the cusps correspond to zeros of the error. The equioscillation theorem states that for sufficiently well-behaved functions, the error \(p_n(x)-f(x)\) will be equal in magnitude with opposite signs at \(n+2\) points \(x\in[a,b]\) (including the ends of the interval), so the error as a function of \(x\) will have \(n+1\) zeros. This is exactly what we expect to see with absolute error of the related Hart 3302 approximation, which has been optimized for absolute error:

Astute readers will notice, though, that there is one too many zero – this is an effect of \(\sin(x)\) actually being approximated by a 5th order polynomial in \(x^2\), i.e. \(S(x)=x P_5(x^2)\), which guarantees that \(S(x)\) is an odd polynomial, including its zero at the origin. Indeed, \(P_5\) has 6 zeros in \([0,\frac{\pi}{2}]\).

For Hart 3342 and relative error, the zero at the origin is turned into a maximum (\(\frac{\mathrm{err}(x)}{\sin(x)} \to \frac{0}{0}\) – L’Hôpital’s rule), so the relative error graph will have one less zero:

Let us return to the quality of SINCON5. Can its deviations from the Hart coefficients be explained by the limited precision of the coefficients represented by MBF? We compare Hart’s SIN 3342 with versions that have coefficients \(c_i\) randomly modified by 5 parts in 1010 (i.e. \(c_{i,r}=c_i\times(1+\mathcal{N})\) where \(\mathcal{N}\) is a Gaussian random variable with \(3\sigma=5\times10^{-10}\)), which corresponds roughly to the expected roundoff error of MBF40.

The orange graphs are 10 random variants of Hart 3342. SINCON5 is qualitatively and quantitatively well within the range of these random variations.

According to various sorces (e.g. the Pagetable article) Monte Davidoff wrote the Math package, and as far as I can tell, (aside from byte order and octal or hexadecimal notations) this has never been touched again, because all published versions of the MS-Basic source code implement identical coefficients SINCON4 and SINCON5.

The numbers given in the comments, though, vary greatly, as illustrated by the following graphs:

Curiously, the GW-Basic comments have accurate values which agree with Hart’s SIN 3341 although the “true” coefficients are less accurate, which raises the question if even these comments might be a later addition.

For the 11th degree approximations, the original coefficients in the comments are way off, in both cases as a result of typos.

t_ = TRUE coefficients converted from binary representation
c_ = coefficients from COMMENTS

c_mspetkim_5
t_sincon5 c_mspetkim_5 error
1 6.28318530694 6.28318531 -3.0600002531855353e-09
3 -41.3417021036 -41.3147021 -0.02700000360000132
^^ ^^
5 81.6052236855 81.6052237 -1.4500002976092219e-08
7 -76.7041702569 -76.7041703 4.310000178975315e-08
9 42.007797122 42.0077971 2.2000001820288162e-08
11 -14.3813906722 -14.3813907 2.7800000523825474e-08

c_m6502mac_sincon5
t_sincon5 c_m6502mac_sincon5 error
1 6.28318530694 6.283185307 -6.000000496442226e-11
3 -41.3417021036 -41.34170209 -1.3600001125269046e-08
5 81.6052236855 81.60522369 -4.5000092541158665e-09
7 -76.7041702569 -76.704133676 -3.658090000158154e-05
9 42.007797122 42.07777095 -0.06997382800000196
^ ^ ^ ^
11 -14.3813906722 -14.381383816 -6.8562000006267e-06

This concludes our musings on the MS-Basic implementation of the sine function and its background. Given the material at hand, the question on the true origin of MS-Basic’s SINCON4 and SINCON5 cannot be answered conclusively. In Part 4, we’ll take a look at the competition.

Additional material: the C64-Wiki identifies Hart’s book as the likely source of MS-Basic’s elementary functions and gives additional insight. Wikipedia’s article on approximation theory gives a good first overview of the topic.

Implementation notes: all graphs were created using Python 3.10, NumPy, Mpmath, and Matplotlib. Coefficients were taken from Hart’s book and published source code. For demonstration of MBF32, MBF40 and FLOATnn effects, no emulation was available, instead the Mpmath precision was set to the number of significant bits of the respective formats.


Microcomputer Trigonometrics Archaeology (Part 2)

In Part 1 we discovered that in the early 80s microcomputers using Microsoft Basic had a significantly better implementation of the sine function than one would guess from the comments in various published versions of the source code, which typically claim that a (possibly “modified” – whatever this means) truncated Taylor series was used. While we know that the Taylor series \(\sin(x)=\sum_{i=0}^{\infty}(-1)^{i}\frac{x^{2i+1}}{(2i+1)!}\) converges everywhere, it is poorly suited for practical implementations of the sine function.

Why this?

First, for small arguments \(x < 1\) the series converges quickly. For larger arguments \(x \gg 1\) the individual numerators \(x^i\) explode exponentially as functions of \(i\), only to be compensated later by the denominators \((2i+1)!\), which grow superexponentially (Stirling formula), yielding a value between \(-1\) and \(+1\), despite individual terms of enormous magnitude. The absolute values of individual Taylor terms can be easily graphed for various selected values of \(x\):

This graph ignores the alternating signs of the terms.

Given the finite precision of floating point numbers, these subtractions of large numbers lead to gradual loss of precision due to cancellation as \(x\) increases. At a certain point (determined by the floating point precision), all significant digits are lost, and the results are erratic.

Second, for practical calculations, we want to carry summation only to a finite number of terms, truncating the series to an \(n\)th degree polynomial \(T_n(x)\). Again, for small arguments \(x\), a Taylor series truncated to a fixed order \(n\) will approximate \(\sin(x)\) reasonably well. However, even ignoring machine precision, the truncation error (i.e. difference between \(\sin(x)\) and the \(T_n(x)\)) increases rapidly with increasing \(x\) because the polynomial is dominated by its leading term \(x^n\).

Both effects can be nicely demonstrated by comparing the Taylor polynomials \(T_n\) of various degrees \(n\), in this case using a 40-bit floating point representation trying to mimic the precision of MS-Basic’s MBF40 format (which has a mantissa of 32 bits, corresponding to 9.6 significant decimal digits) for \(0\le x\le 8\pi\):

We see as we increase the order \(n\), more and more oscillations of \(\sin(x)\) are modeled correctly by \(T_n(x)\), but around \(x\approx 20\) the noise due to roundoff becomes clearly visible, and quickly the results become meaningless no matter how many terms you add to the Taylor series. The onset of roundoff error is not sudden, as a semilogarithmic graph of the absolute error \(|T_n(x)-\sin(x)|\) shows (values of \(n\) as above):

Here we see our truncated \(n\)th degree Taylor polynomials veer off later and later as \(n\) increases. Before this happens, the error is dominated by roundoff, which increases by an order of magnitude for every increase of \(x\) by about \(2.3\approx\ln 10\) (visual estimate – proof needed). The slope of the roundoff error curve results from the envelope of the Taylor polynomials, offset by a factor \(\varepsilon=2^{-32}\approx 2.3\times 10^{-10}\) (Machine epsilon for MBF40) (black lines).

The effect of how roundoff depends on the floating point representation is demonstrated in the next graph:

Here we sum up to degree \(n=121\) which guarantees that the neglected part of the series is smaller than roundoff for all selected representations in the interval \(0\le x\le 8\pi\). We see (ignoring noise) the exponential growth of the roundoff error for each floating point precision (straight graph in a semilogarithmic plot) right from the origin.

Let us apply this to the specific situation at hand: thanks to the periodicity and symmetry properties of the sine function, we can reduce the range to \(0\le x\le\frac{\pi}{2}\).

Here the relative error of the approximation polynomial computed with MS Basic’s SINCON5 coefficients is compared to that of Taylor polynomials up to degree 19, using standard IEEE 64 bit double precision floating point arithmetic. We see that the 11th degree SINCON5 polynomial is balanced far better across the interval and that we need a 17th degree Taylor polynomial to keep the maximum error below SINCON5 level.

Taking into account the actually relevant 40-bit floating point representation with its 32-bit mantissa, we find that the SINCON5 polynomial is dominated by roundoff error, which is consistently larger than the truncation error across the entire interval:

So how did they arrive at this approximation that spreads the relative error almost evenly across the entire interval and has coefficients close enough to the Taylor coefficients to be confused with these by many? This will be clarified in Part 3.

Implementation notes: all graphs were created using Python 3.10, NumPy, Mpmath, and Matplotlib. For demonstration of MBF32, MBF40 and FLOATnn effects, no emulation was available, instead the Mpmath precision was set to the number of significant bits of the respective formats.


Microcomputer Trigonometrics Archaeology (Part 1)

So I stumbled upon Michael Steil’s article on Bill Gate’s Easter Egg in Microsoft Basic. While the story of burying an authorship notice among the numeric coefficients of the sine function is quite amusing in itself, my numeric eye’s attention was immediately drawn to the comments for these coefficients, first used in Microsoft’s 40-bit (5-byte) floating point Basic for the KIM and PET computers.

.;E063 05                        6 coefficients for SIN()
.;E064 84 E6 1A 2D 1B            -((2*PI)**11)/11! = -14.3813907
.;E069 86 28 07 FB F8             ((2*PI)**9)/9!   =  42.0077971
.;E06E 87 99 68 89 01            -((2*PI)**7)/7!   = -76.7041703
.;E073 87 23 35 DF E1             ((2*PI)**5)/5!   =  81.6052237
.;E078 86 A5 5D E7 28            -((2*PI)**3)/3!   = -41.3147021
.;E07D 83 49 0F DA A2               2*PI           =  6.28318531
.;E082 A1 54 46 8F 13            "SOFT!" | backwards and with
.;E087 8F 52 43 89 CD            "MICRO" | random upper bits

This was in 1979. Even with range reduction to the first quadrant \(0\le x\le\frac{\pi}{2}\), using an \(n\)th order truncated Taylor series

\[\sin(2\pi t)\approx\sum_{i=0}^{(n-1)/2}(-1)^{i}\frac{\;\;(2\pi t)^{2i+1}}{(2i+1)!};\;\; x\equiv 2\pi t\]

with its rapidly increasing truncation error for increasing arguments doesn’t exactly look plausible (more details on why in Part 2). Besides, back then, every self-respecting developer of math functions for calculators and computers implemented trigonometrics using CORDIC, didn’t they?

On the other hand, four years later still, the 1983 edition of Dr. Dobbs Journal (Vol 8, pp 15+16/810 in the PDF – 400MB download) recommended using the (4,3) Padé approximant

\[\sin(x)\approx S(x) := \frac{x-\frac{14}{120}x^3}{1+\frac{1}{20}x^2}\]

as being “accurate to about 1 part in 10,000 over the interval \(0\le x\le\frac{\pi}{4}\), a degree of precision which is adequate for typical microcomputer applications“. For \(\frac{\pi}{4}\le x\le\frac{\pi}{2}\), they would compute \(\small\sqrt{1-S(\frac{\pi}{2}-x)^2}\); for more details see the article. Note that this looks like a (3,2) approximant, but the even coefficients in the numerator and the odd coefficients in the denominator are 0, so the actual order is indeed (4,3).

Assuming that this notion of “adequacy” was common sense in the early 80s, no surprise colleagues dissuaded me from buying a microcomputer for doing physics calculations around 1980.

But are these numbers really the Taylor coefficients?

  Taylor term         Taylor value   Coefficient
    2*PI              6.283185307    6.28318531
 -((2*PI)**3)/3!     -41.34170224   -41.3147021
  ((2*PI)**5)/5!      81.60524928    81.6052237
 -((2*PI)**7)/7!     -76.70585975   -76.7041703
  ((2*PI)**9)/9!      42.05869394    42.0077971
 -((2*PI)**11)/11!   -15.09464258   -14.3813907

These deviations do not look fortuitous. Let us compare the absolute errors \(\left|f(x)-\sin(x)\right|\) of Microsoft’s polynomial with that of the truncated Taylor series (both 11th degree), and – just for fun – of Dr. Dobb’s (4,3) Padé approximant. Semilogarithmic plot \(0\le x \le\pi\) – reference is the NumPy IEEE double precision implementation of \(\sin(x)\):

We see that in the first octant \(0\le x \le\frac{\pi}{4}\approx 0.79\) Dr. Dobb’s claim to accuracy \(10^{-4}\) is satisfied. For Microsoft’s power series (approximation domain \(0\le x \le\frac{\pi}{2}\)), the error is way too high for an 11th order approximation. Were my colleagues right about avoiding microcomputers? But back then, Microsoft Basic was ubiquitous even in scientific applications – someone must have noticed if the results were off so badly.

So maybe it’s the comments that were off. Microsoft’s 40-bit encoding of floating point numbers is well-documented, so it is easy to decode and compare the actual coefficients:

power  true (decoded)      comment     error
1      6.28318530694      6.28318531   -3.06e-09
3     -41.3417021036     -41.3147021   -0.027  <<<<<<
           ^^                 ^^
5      81.6052236855      81.6052237   -1.45e-08
7     -76.7041702569     -76.7041703    4.31e-08
9      42.007797122       42.0077971    2.20e-08
11    -14.3813906722     -14.3813907    2.78e-08

Indeed we find two transposed digits in the \(x^3\) coefficient.

What follows is the corrected plot (c_ and t_ referring to coefficients as written in the comments and their true values, and cc_ to the corrected comment value after fixing the transposition). For reference, the results of a 45-step CORDIC algorithm have been added; and note that the right half of the graph \(\frac{\pi}{2}\le x\le\pi\) is outside the domain of all of these approximations):

Note also that due to IEEE double precision numbers being used here, roundoff error becomes visible around \(10^{-15}\) instead of \(10^{-9}\).

Finally, given the requirement that a function approximation should consistently yield a specified number of correct significant digits even close to its zeros, what really matters is the relative error:

We see that across the entire first quadrant \(0\le x\le\frac{\pi}{2}\), the relative error of Microsoft’s 11th order polynomial as implemented (t_mspetkim_5_11) is well-balanced and less than \(10^{-10}\), thus consistently within the limits of the 9.6-digit numerical accuracy of 40-bit floats with their 32-bit mantissa \((32 \lg 2\approx 9.6)\), and it is substantially better than the 11th order Taylor error in the right half of this interval. We also see that Cordic’s relative error is inadequate when implemented naively. Somewhat surprising is that after correcting the transposed digits (cc_mspetkim_5_11), the values obtained with the coefficients in the comments show practically constant relative error in the entire interval \([0,\frac{\pi}{2}]\). However, this error is significantly worse than t_mspetkim_5_11.

Conclusion: someone had thoroughly done their homework devising the MS-Basic coefficients. We leave it to the reader’s discretion if the transposition in the comments was a bona fide mistake or a deliberate act of concealment of trade secrets. Or, what is probably closest to the truth as we shall see later, the comments referring to Taylor coefficients were added later, not by the original authors, who obviously knew their approximation theory.

In Part 2, we’ll start to delve into the effects of truncation and roundoff error, more MS-Basic implementations, and some theoretical background of these findings.

Notes

The Easter Egg article contains links to several versions of the source codes. The first release of Microsoft Basic for the Altair used 32-bit (4-byte) floats (6 significant decimal digits), a cursory analysis of some aspects of the algorithms (briefly noting that the coefficients are modified with respect to Taylor but not digging deeper) can be found on the Altairbasic web site.