Last update: 1996-1-29. My intention is to update this file as and when I acquire additional details and suggestions. JN
This set of notes is a response to the quiz question posed on the CBC radio program " Quirks and Quarks" on January 27, 1996. From the CBC Web site (http://www.cbc.ca) we have drawn down the following statement of the question:
Question: Hello, I'm Professor Maria Klawe of the Computer Science department at the University of British Columbia. My question is about the result of a calculation done on a personal computer. Suppose I write a program that adds 1 plus 1/2 plus 1/3 plus 1/4 plus 1/5 plus 1/6 and so on, and stops when the sum no longer changes. Will the final sum be (A) less than 10, (B) between 10 and 100, or (C) greater than 100 and why?
1 - 1/2 + 1/4 - 1/3 + ....
for which the general term is
(-1)^(k-1) * 1/k
The underlying series here, that is,
x(i) = 1/i for i=1,2,....
is called the HARMONIC SERIES. Its sum is "well known" to be divergent, that is, to have no finite sum. I say it is "well-known", since I learned how to show the divergence in First Year Calculus in 1964-65 (Protter M H & Morrey C B Jr., 1964, College Calculus with Analytic Geometry, Addison-Wesley, Reading MA, page 614). Earlier still Mellor (Mellor J W, 1902, Higher Mathematics for Students of Chemistry and Physics, republication of the 4th edition of 1912 by Dover Publications Inc., New York, 1954) states on p. 271 that the series
" 1 + 1/2^m + 1/3^m + 1/4^m + ....
which is known to be convergent when m is greater than unity; and divergent, if m is equal to or less than unity." [We use the notation x^y to mean "x to the power y" to avoid difficulty in emailing text files.] Nevertheless, modern courses in mathematics may leave out the harmonic series as an interesting but nonessential detail, so I will present two ways to show the divergence that are useful in elucidating the problem.
Note that for the general term we have ( 2^k - 2^(k-1) ) terms and
their SMALLEST value is 1/2^k. Thus each group contributes at least
( 2^k - 2^(k-1) ) / 2^k = 1 - 1/2 = 1/2 to the sum.
Since we can go on adding groups forever, the series is divergent.
Another proof comes from looking at the sum as a form of integral. To avoid a minor problem, we will set the first term (1/1) to one side and simply add the 1 to our total separately. The "function" f(x) = 1/int(x) where int(x) = the integer part of x is then a series of steps 1/2, 1/3 etc. each 1 unit wide. Therefore the summation of the harmonic series is
1 + integral(1/int(x), x = 2..infinity) = hsumI
Note that 1/int(x) >= 1/x for all x in the range of interest. Note also that 1/(x-1) >= 1/int(x) for all x in the range of interest. This second observations will be of use later.
We know that the sum hsumI, which is equal to the true sum we desire, is made up of positive terms all of which are greater than those in the "sum" found by the integral
LL = 1 + integral( 1/x, x = 2..infinity )
Defining
LL(m) = 1 + integral( 1/x, x = 2..m ) = 1 + ln(m) - ln(2) = 1 + ln(m/2)
where ln() is the natural or Napierian logarithm function. Clearly this grows with m without bound, so this sum and the original are both divergent.
Here is a picture of the bounding functions and the harmonic series terms.
The problem with carrying out the summation on a computer, whether personal or not, is that we have to instruct the computer what to do. Depending on how we instruct the computer, we can get lots of different answers. Usually we consider such sets of instructions programs, but many of the key elements of the summation we want to do are buried deep inside the "built in" instructions of the machine, and some are even part of the hardware. To get some idea of the possibilities, consider the following program in BASIC
10 PRINT "harmsum.bas -- Quirks and Quarks harmonic sum " 20 PRINT "J C Nash 1996-1-29" 30 LET C=0! 40 LET X=0! 50 LET C=C+1! 60 LET X0=X 70 LET T=1!/C 80 LET X=X0+T 90 IF (100000!*INT(C/100000!)=C) THEN PRINT C, X0 100 IF X<>X0 THEN GOTO 50 110 PRINT X0, C 120 STOP
This program uses the counter C to keep track of the term, which is computed in T as 1/C. We add this to X0, a variable to hold the current sum, and save the result in X. If X is different from X0, we continue, first saving X into X0 for the next step. Every so often, in fact every 100000 terms, we display the partial sum so we can track what is going on. This summation takes a few minutes on a modern PC. The exclamation marks in the program are there to ensure single precision arithmetic.
For the record, here is a Turbo Pascal (version 6.0) form of the program that we force (with the {$N+} directive) to use IEEE floating point arithmetic.
program hsum; {$N+} uses crt; {Quirks and quarks 1996-1-27} var x, t, xold: single; ii: longint; begin x:=0.0; xold:=-1.0; ii:=0; while x<>xold do begin xold:=x; ii:=ii+1; t:=1.0/(1.0*ii); x:=xold+t; if ii = 100*(ii div 100) then begin gotoXY(1,10); clreol; write(ii,' ',xold); end; end; writeln('Terminated with ii=',ii); writeln('Sum (SINGLE) = ',xold); end.
Here is the same program in FORTRAN
c harmsum.for -- Quirks and Quarks harmonic sum implicit real (a-h, o-z) count=0.0 x=0.0 10 count=count+1.0 xold=x temp=1.0/count x=xold+temp if (100000.0*aint(count/100000.0).eq.count) # write(*,900)count, xold if (x.ne.xold) goto 10 write(*,901)xold, count stop 900 format(1h ,f20.2,2x,1pe20.12) 901 format('0done sum = ',f20.12,' at ',f20.2) end
It is very easy to convert the FORTRAN program to double precision. Then it ran overnight on a fast Intel 486 based PC and was nowhere near stopping! The counter was in the tens of billions and the sum between 20 and 30 when we needed the machine for more serious work.
I decided not to bother putting the program in C, but will gladly add such programs and credit the author(s) if they are sent to me.
Perhaps we really should not do things this way. Why not use software that lets us put in the series directly. Such a system is DERIVE (The Soft Warehouse, Honolulu, URL=http://www.derive.com/swh.htm). The instruction (.MTH) file looks like this:
HSUM(m):=SUM(1/k,k,1,m) LIM(HSUM(m),m,inf) ;Simp(#2) ZETA(1)
In the formatted version of this note, we have a screen image that shows that the first term is the partial sum of the harmonic series up to m. The second line is the limit of this partial sum as m is allowed to go to infinity, and the third is the "simplification" of this limit, which DERIVE converts to the ZETA function of 1, which is undefined according to the DERIVE manual. However, a graph of ZETA(x) shows the function increasing very rapidly as x approaches 1 from above and decreasing rapidly as it approaches 1 from below.
Here is a copy of the screen image.
A similar computer algebra system is Maple. In Maple V, the same sum was tried (I am not expert with this software) giving the result of "sum(1/k, k=1..infinity)" as "infinity". [Note -- 960212: I originally used "inf" rather than "infinity" and got the result "Psi(inf + 1) + gamma". "inf" is used by DERIVE. Thanks to David Hart and Doug Meade for pointers.]
Clearly neither DERIVE nor Maple give answers that let us respond to Prof. Klawe's question. Yet they are closer to telling us that the underlying harmonic series is divergent.
I do not have Mathematica easily available to me, but will gladly add Mathematica files to this note and credit the author(s) if they are sent.
The key to the termination of the harmonic series sum in our BASIC, Pascal and FORTRAN programs is the finite nature of the floating-point arithmetic that is generally used for computations on REAL numbers. I have capitalized REAL to indicate that I am talking about the numbers that are stored inside a computer. These are not always the same as the numbers mathematicians (and sometimes the rest of us) think about.
Most of us have no trouble with the idea of integers -- the whole numbers -- such as 1, 2, 3, 3456, or whatever. In a computer, however, the type INTEGER (in whatever computer language) does not allow for the "whatever" because the storage space for such numbers has a finite size. To get some idea of the issue, think of an older style adding machine or calculator where we fix the decimal point at the right hand side of the display. Suppose the display handles only 6 decimal digits. Then the largest integer we can put into this machine is
999999
Moreover, if we try to add any other (positive) integer to this, we will get an error, called "overflow".
To do the harmonic series sum, we need to handle fractions. Common fractions 1/2, 1/3, etc are not generally supported in computer programming languages (though algebra systems such as DERIVE may be quite happy to store and manipulate them). On a calculator, we find that when we divide 1 by 3 we get a decimal fraction
.333333
Of course, this leaves out a small part of the value of the common fraction. We have made a representation error. That is, in representing the common fraction 1/3 as a decimal fraction .333333 we do not have enough precision to store the exact value. In fact we can never have enough decimal digits.
In our calculator, if we only have 6 decimal digits, then we could put the decimal point in the middle, so our numbers look like
xxx.xxx
We will assume the sign of numbers is separately displayed, but in fact do not need the sign to do our harmonic series sum.
It is clear that in this FIXED-POINT calculator, the sum will never change after we get a term that is smaller than .001, since that is the smallest representable number. However, we should not assume that the sum is then equal to the representation of the partial sum
sum( 1/k, k = 1..1000 )
since we have to worry about the details of the intermediate calculations. Also, if our calculator rounds intermediate numbers, then we will only stop when the exact value of a term is smaller than .0005, that is, term 1/.0005 or approximately the 2000th term. We also did not check to see if the sum by this point is too big to represent, that is greater than 999.999. (It will not be, but the possibility of overflow cannot be ignored.)
To overcome the problem of small and large numbers in fixed-point representations, most computing systems now use FLOATING-POINT arithmetic. This is best illustrated by the scientific notation of numbers, that is, the representation of numbers as a fractional part multiplied by a power of 10. Tradiationally, the scientific notation representation of the number .00123 is 1.23 * 10^(-3), or more succinctly as 1.23E-3. (We will use the succinct form from here on.) However, to get a true fraction, let us use .123E-2.
Inside the computer, we need to store the fraction, or mantissa, as well as the exponent and the sign of both the number and the exponent. The mantiss will be stored in a certain number of digits, here decimal digits. Let us use 6 for illustration. The largest fraction allowed is then .999999. The smallest is usually .100000, because we would multiply .010000 by 10 and decrease the exponent by 1. This is called NORMALIZATION of the mantissa.
The sign of the number is generally kept well-defined as a single "bit" of information, since it has only two values. The exponent and its sign are, however, usually stored together. If we allow two decimal digits for the sign of numbers, then the biggest number we can store is 99 and the smallest is 00. If we add 49 to all exponents, then the largest exponent allowable is 50, and the largest representable magnitude is .999999E+50 which would be stored as .999999 99. The smallest magnitude we can represent is stored as .100000 00 which is .100000E-49.
Note that we could actually store .000001 00 which is .000001E-49. This is a DENORMALIZED number. Some workers have proposed ways to exploit denormalized numbers in some schemes for computer arithmetic.
Let us consider what happens when we do arithmetic inside a floating- point computer, and use as our example machine the 6 digit mantissa and 2 digit exponent we have been working with. The main operation is addition, and all the numbers are positive so signs will not be a problem. Let us add a small number to 1. First, 1 will be represented as .100000E01 or .100000 50. Let us add .123E-5 to this. First we need to get both numbers to the same scale
.100000 E+01 .000000123 E+01
The sum is
.100000123 E+01
which gets represented as .100000E01. This is the way in which our harmonic sum will terminate. What we have NOT specified is how the intermediate operations are carried out. "Good" arithmetic should have working storage (called registers) that are wider than the places where we store numbers so we can compute the intermediate sum precisely. We also would like the result rounded to the nearest representable number. Unfortunately, some machines use only working precision or they truncate or "chop" rather than round the results.
Nevertheless, we are beginning to see how the divergent sum will be computed as a finite value from the termination of the process of addition. A key concept is that of the MACHINE PRECISION. This is the smallest positive representable number that, when added to 1, gives a number bigger than 1. If we use the notation fl(x) to denote the stored representable number closest to x, then we want to find
eps = minimum x>0 such that fl(1+x) > fl(1)
[Note: some workers prefer
eps' = maximum x>0 such that fl(1+x) = fl(1)
which is clearly very similar to the eps defined above.]
The summation programs in floating-point arithmetic will carry out the following operations:
setup: x:=fl(0) c:=fl(0) looptop: c:=fl(c+1) x0:=x t:=fl( fl(1)/c ) x:=fl(x0+t) if x <> x0 then goto looptop
[Some workers would use
if x > x0 ...]
Clearly, when x0 + t is the same as x0 in the floating-point form, we must have t/x0 = eps apart from some rounding details. This lets us approximate the results from programs rather than actually wait a long time for the actual termination.
Many people who work with computer programs will say that we continue the summation until the program "converges" to the answer. Clearly this is a dangerous word when we are dealing with the harmonic series which is truly divergent. In fact, it is a bad word to use in general when talking about computer programs. We really mean that the process "terminates". There are many other ways the program terminates or stops. We could overflow a fixed-point register. We could run out of space to store an integer counter or the exponent could overflow its space.
Having worked on iterative programs for many years, I try to reserve "convergence" for the underlying sum or process or algorithm, and use "termination" for all the ways that the program may stop. When we really have the right combination of algorithm and program we can write " the process converges to .... and the program terminates accordingly." More detail is given in J C Nash, 1987) Termination strategies for nonlinear parameter determination, Proceedings of the Australian Society for Operations Research Annual Conference (ed. Santosh Kumar), Melbourne, October 1987, pp.322-334. This is republished in Nash J C and Walker-Smith M (1995) Nonlinear Parameter Estimation Examples and Extensions, Nash Information Services Inc., Ottawa.
Our search to understand the termination of the sum has so far dealt with a largely fictitious decimal computer. Actually I have owned a number of systems that perform decimal arithmetic, among them a Radio Shack RS-100 portable (of the kind used by Marc Garneau on the NASA Space Shuttle), two North Star Horizon microcomputers that had separate decimal floating point hardware, and a Sharp hand-held computer that was really a glorified calculator. Even on PCs, I still have an old but workable version of Turbo Pascal 3.01 that runs decimal arithmetic.
Anyone who has worked for any length of time with floating-point computation is painfully aware that reading computer manuals will usually only give the faintest idea how the underlying processes for calculations are actually done. To avoid getting hurt, numerical analysts have devised a number of tools to actually try out the arithmetic and report such information as the machine precision. Some early references are:
J C Nash (1981d) Binary Machines That Round, v.6, n.11, November 1981, pp.30,33.
J C Nash (1981c) Infinite Series: Knowing When to Stop, v.6, n.8, August 1981, pp.36,37,38.
J C Nash (1981b) Operations on Floating-point Numbers, v.6, n.6, June 1981, pp.30,33,34.
J C Nash (1981a) Determining Floating-point Properties, v.6, n.5, May 1981, pp.30,31,33,34,36.
Malcolm M A (1972) Algorithms to reveal floating-point properties, Communications of the ACM, v. 15., n. 11, pp. 949-951, November.
Cody W J Jr and Waite W (1980) Software manual for the elementary functions, Pretice-Hall, Englewood Cliffs NJ.
The tour de force, however, is the suite of programs that go under the collective name PARANOIA by Prof. W Kahan of Berkeley and described by Richard Karpinski (1985) PARANOIA: a Floating-Point Benchmark, Byte, February 1985, v. 10, n. 2, pp. 223-235. The programs and some documentation are available from the Netlib collection (ftp://ftp.netlib.org).
The single precision results for the harmonic sum are roughly the same in all languages. Here are the results using GWBASIC (a rather ancient Microsoft BASIC interpreter).
harmsum.bas -- Quirks and Quarks harmonic sum J C Nash 1996-1-29 100000 12.09084 200000 12.78275 300000 13.19532 400000 13.48142 500000 13.69069 600000 13.88143 700000 14.07126 800000 14.16662 900000 14.26199 1000000 14.35736 1100000 14.45273 1200000 14.54809 1300000 14.64346 1400000 14.73883 1500000 14.83419 1600000 14.92956 1700000 15.02493 1800000 15.1203 1900000 15.21566 2000000 15.31103 15.40368 2097152
The result for the counter is, in fact, the same for all programs, thought the sum was reported slightly differently.
Can we predict both the sum and the term at which we will get termination? The answer is "yes, approximately".
Let us recall some quantities already defined:
hsum(m) = sum(1/k, k=1..m-1 ) = 1 + integral( 1/int(x), x=2..m )
[Note: we must use m-1 in the partial sum to correspond with the integrals for upper and lower bounds below. ]
LL(m) = 1 + integral(1/x, x = 2..m) = 1 + ln(m-1) UU(m) = 1 + integral(1/(x-1), x = 2..m) = 1 + ln(m/2)
DERIVE was used to simplify these last integrals to an analytic form, though the integral is easy to do by hand. DERIVE can also actually solve the equation for the term counter that arises by dividing the term by the approximate harmonic sum so far and setting the result to the machine precision, that is,
(1/m) / approx_sum(m) = eps
In the DERIVE file are some hints on ways to do this sensibly. I have mainly avoided "large" errors by
1) computing the actual sum for the first ss terms to avoid the larger differences between the upper and lower bounds early in the series, that is, we shift the start of the integrals;
2) averaged the upper and lower bounds to approximate the sum, since the graph of the step function almost divides the difference between the bounding functions.
"QQ9601i.mth -- attempt to compute approx. limit of sum" U(x):=1/(x-1) L(x):=1/x HS(m):=SUM(1/k,k,1,m-1) LL(m,ss):=HS(ss)+INT(L(x),x,ss,m) UU(m,ss):=HS(ss)+INT(U(x),x,ss,m) DUL(mm,ss):=UU(mm,ss)-LL(mm,ss) ;Simp(#7) LN(mm-1)-LN(mm)-LN(ss-1)+LN(ss) AA(mm,ss):=(UU(mm,ss)+LL(mm,ss))/2 ;Simp(#9) 0.5*LN(mm-1)+0.5*LN(mm)-0.5*LN(ss-1)-0.5*LN(ss)+SUM(1/k,k,1,ss-1) precision:=approximate seps:=2^(-25) ss:=50 1/(m*AA(m,ss))=seps "now solve approximately, putting limits 10^5, 10^9" ;Solve(#14) m=2.20962*10^6 deps:=2^(-53) 1/(md*AA(md,ss))=deps " Cannot solve properly because of large nos."
We see that the solution is not easy to find. After all, we are mixing big and small numbers. Let us try the rootfinder from my book J C Nash (1979, 1990) Compact numerical methods for computers: linear algebra and function minimisation, Adam Hilger, Bristol. The Pascal code for this is available from Netlib, but without commentary. The function used was
val:=x*ln(x-1.0)+x-1/meps;
That is, the lower bound function without using any shift ss.
While I have every reason to believe it is a "good" code, it still has some trouble near the termination. Here is some output for single precision:
dr1618.pas -- One dimensional root-finding -- Quirks & Quarks 1996/01/29 09:54:15 File for input of control data ([cr] for keyboard) con File for console image ([cr] = nul) qq9601a.r01 Bits of precision = 25 Machine precision to use = 2.9802322386E-08 Enter lower bound for search 1.0000000000E+05 Enter upper bound for search 1.0000000000E+09 Enter a tolerance for root search interval width 0.0000000000E+00 Enter the number of intervals for grid search (0 for none) 10 alg16.pas -- one-dimensional grid search In gridsrch lbound= 1.0000000000E+05 ubound= 1.0000000000E+09 lb f( 1.0000000000E+05)=-3.2303140456E+07 f( 1.0009000000E+08)= 1.9103515431E+09 *** sign change *** f( 2.0008000000E+08)= 3.9909002741E+09 f( 3.0007000000E+08)= 6.1237398355E+09 f( 4.0006000000E+08)= 8.2905440320E+09 f( 5.0005000000E+08)= 1.0482606404E+10 f( 6.0004000000E+08)= 1.2694798194E+10 f( 7.0003000000E+08)= 1.4923730190E+10 f( 8.0002000000E+08)= 1.7166993398E+10 f( 9.0001000000E+08)= 1.9422786535E+10 f( 1.0000000000E+09)= 2.1689711404E+10 Minimum so far is f( 1.0000000000E+05)=-3.2303140456E+07 Sign change observed last in interval [ 1.0000000000E+05, 1.0009000000E+08] Now try rootfinder alg18.pas -- root of a function of one variable f(100000.00000)=-3.2303140456E+07 f(100090000.00000)= 1.9103515431E+09 Bisect 3 evalns: f( 5.009500000E+07)= 9.047E+08 width interval= 9.999E+07 False P 4 evalns: f( 1.823581869E+06)=-5.442E+06 width interval= 4.999E+07 False P 5 evalns: f( 2.112186539E+06)=-6.820E+05 width interval= 4.827E+07 False P 6 evalns: f( 2.148329743E+06)=-8.302E+04 width interval= 4.798E+07 False P 7 evalns: f( 2.152729368E+06)=-1.007E+04 width interval= 4.795E+07 Bisect 8 evalns: f( 2.612386468E+07)= 4.387E+08 width interval= 4.794E+07 False P 9 evalns: f( 2.153279681E+06)=-9.466E+02 width interval= 2.397E+07 False P 10 evalns: f( 2.153331402E+06)=-8.896E+01 width interval= 2.397E+07 False P 11 evalns: f( 2.153336262E+06)=-8.360E+00 width interval= 2.397E+07 False P 12 evalns: f( 2.153336719E+06)=-7.857E-01 width interval= 2.397E+07 Bisect 13 evalns: f( 1.413860070E+07)= 2.134E+08 width interval= 2.397E+07 False P 14 evalns: f( 2.153336763E+06)=-5.383E-02 width interval= 1.199E+07 False P 15 evalns: f( 2.153336766E+06)=-3.632E-03 width interval= 1.199E+07 False P 16 evalns: f( 2.153336767E+06)=-3.052E-04 width interval= 1.199E+07 False P 17 evalns: f( 2.153336767E+06)= 0.000E+00 width interval= 1.199E+07 Bisect 18 evalns: f( 8.145968734E+06)= 1.042E+08 width interval= 1.199E+07 False P 19 evalns: f( 8.145968734E+06)= 1.042E+08 width interval= 0.000E+00 Converged to f( 8.1459687340E+06)= 1.0421861095E+08 Final interval width = 0.0000000000E+00
Here we have indeed "converged" to a counter a little bigger than 8 million. However, note that at the 16th evaluation, we found a smaller function value for 2153338, which compares favourably with our observed termination at 2097152. [The rootfinder averages upper and lower bounds for the root. The scaling here is so bad that this strategy is not helpful.]
In double precision, a similar calculation (53 bits precision) gives a count of md = 2.633341738E+14. With a number this large DERIVE complains that it may lose precision in its approximation of the logarithm. Using an HP15C calculator, where I have reason to believe the function routines are quite well implemented, we get
LL(md) = 1 + ln(md-1) = 34.20444496 UU(md) = 1 + ln(md/2) = 33.51129778
Returning to the double precision computation on our PC, let us use a timing program (Nash J C and Nash M M (1994) Scientific computing with PCs, Nash Information Services, Ottawa) to see how long the single precision computation took.
TIMER (C) J.C.Nash 1991 -- to execute and time programs in DOS environments about to execute: HARMSUM.EXE with para 100000.00 0.000000000000E+00 200000.00 0.000000000000E+00 300000.00 0.000000000000E+00 400000.00 0.000000000000E+00 500000.00 0.000000000000E+00 600000.00 0.000000000000E+00 700000.00 0.000000000000E+00 800000.00 0.000000000000E+00 900000.00 0.000000000000E+00 1000000.00 0.000000000000E+00 1100000.00 0.000000000000E+00 1200000.00 0.000000000000E+00 1300000.00 0.000000000000E+00 1400000.00 0.000000000000E+00 1500000.00 0.000000000000E+00 1600000.00 0.000000000000E+00 1700000.00 0.000000000000E+00 1800000.00 0.000000000000E+00 1900000.00 0.000000000000E+00 2000000.00 0.000000000000E+00 done sum = 15.403682700000 at 2097152.00 meters: TIMER (C) J.C.Nash 1991 == start at 14: 8:53.72 DosExitCode =0 TIMER (C) J.C.Nash 1991 == started 14: 8:53.72 TIMER (C) J.C.Nash 1991 == finish at 14: 9: 7.23 Elapsed seconds = 13.51 -- end of TIMER run --
We took 13.5 seconds to do about 2 million steps of the sum on an Intel 486 DX2 (66 Mhz) computer using the Lahey F77L Fortran compiler (version 5.01). Let us approximate this as 6 seconds per million operations. We need to do 2.6E+14 operations, so to get our sum to terminate, we will need to wait
2.6E+14 * 6 s / 1E+6 = 1.56E+9 s = 433,333 h = 18056 days
or almost 50 years!
This material is Copyright (C) John C Nash, 1996.