Users guide for STABLE 4.0.xx (for Windows, 12 July 2006)
John P. Nolan
Math/Stat Department
American University
Washington, DC 20016 USA
e-mail: jpnolan@american.edu
voice: 202.885.3140
The program calculates the density (pdf), cumulative distribution
function (cdf), and quantiles for a general stable distribution.
These routines are based on the formulas in "Numerical calculation
of stable densities and distribution functions", J. P. Nolan,
Commun. Statist.-Stochastic Models, 13(4), 759-774 (1997).
Also included is a version of Chambers, Mallows and Stuck's
algorithm to generate stable random variates.
It also performs maximum likelihood estimation of
stable parameters and some exploratory data analysis techniques
for assessing the fit of a data set. This work is described in
the paper "Maximum likelihood estimation of stable parameters",
J. P. Nolan, in the book Levy Processes, Ed. by Barndorff-Nielsen,
Mikosch and Resnick, Birkhauser, 2001 (currently on my webpage).
Version 3.03 incorporates all previous changes, in particular the
significantly increased accuracy from version 2.11 of the
calculations. Computations are now accurate for 0.2 <= alpha <= 2.0.
(Below alpha = 0.2, the granularitiy of double precision
numbers becomes a problem.) Version 3.03 includes numerous
internal changes, the main visible ones are:
- improved output formatting
- 95% confidence intervals are given for maximum likelihood
estimates of stable parameters
- improved error handling/reporting. The emphasis is on
detecting and reporting errors, not error correction or
detailed explanations. If something doesn't make sense,
please contact me by e-mail.
- numerical computation of g(x)=-f'(x)/f(x), the score
function for the location parameter, also called the "nonlinear"
function in signal processing.
Included in the program is code to quickly approximate stable
densities. This routine is much faster than the
regular density calculations: approximately 1 million density
evaluations/second can be performed on a 1 GHz Pentium. This allows
maximum likelihood estimation of all 4 stable parameters in less than
a second for data sets having thousands of points.
Error detection and handling has been improved in the core routines,
but the driver routine does not do careful checking - you can
still crash the program with bad input.
Note that STABLE uses exact formulas whenever they are available,
e.g. the Gaussian (alpha=2), Cauchy (alpha=1, beta=0) and Levy
(alpha=1/2,beta=1 or -1) cases. This version of STABLE also has a
loop in it to slow it down and discourage commercial use of this
free software.
Parameter meanings
------------------
Stable distributions are described by 4 parameters, which the
program calls:
alpha = index of stability, in the range (0,2]
beta = skewness, in the range [-1,1]
gamma = scale, in the range (0,infinity)
delta = location, in the range (-infinity,+infinity)
In previous versions of this program, gamma was called sigma and
delta was called mu. Non-mathematical users were invariably
confused between these parameters and the mean and standard
deviation (the standard deviation exists only when alpha=2,
in which case it is NOT gamma, the mean exists only when
alpha >1, in which case it will agree with delta only in
the 1-parameterization, see below). Giving them different
names may minimize such problems.
There are different meanings for these parameters, depending on
what notation you are using. The program STABLE has 4 parameterizations
built into the program:
0. "S0" parameterization: based on the (M) representation
of Zolotarev for an alpha stable distribution with skewness
beta. Unlike the Zolotarev (M) parameterization, gamma and delta
are straightforward scale and shift parameters. This
representation is continuous in all 4 parameters, and gives an
intuitive meaning to gamma and delta that is lacking in other
parameterizations.
1. "S" or "S1" parameterization: the parameterization used by Samorodnitsky
and Taqqu in the book Stable Non-Gaussian Random Processes.
It is a slight modification of Zolotarev's (A) parameterization.
The S0 and S1 parameterization agree in the symmetric case (beta=0),
but differ by a shift in the nonsymmetric case.
2. "S*" or "S2" parameterization: a modification of the S0 parameterization
which is defined so that
- the scale gamma agrees with the Gaussian scale (standard dev.)
when alpha=2 and the Cauchy scale when alpha=1.
- the mode is exactly at delta.
3. "S3" parameterization: an internal parameterization.
The scale is the same as the S2 parameterization, the shift is
-beta*g(alpha), where g(alpha) is defined in the paper just below.
The paper "Parameterizations and modes of stable distributions",
Statistics and Probability Letters, 38, 187-195 (1998) describes
the S0, S1 and S2 parameterizations.
All computations are done in the S0 parameterization, which is better
suited to numerical calculations and modeling data than the
standard representations. Unless you have a specific reason to
use a different parameterization, I suggest that you use the S0
one. Conversions are done if you request a different parameterization.
What do you need?
-----------------
STABLE runs on a PC type computer under Windows 95/98/NT/2000/XP.
The program is written in Digital/Compaq Visual Fortran.
To run the program, you need the file STABLE.EXE
While you can run the program from a floppy disk, you will get faster
results if you copy the program to a hard disk and execute from there.
Start the program from "Run" on the "Start" menu by using "Browse" to
locate STABLE.EXE in the appropriate directory. Or from a DOS prompt,
cd to the directory where the program is and type STABLE. You will
be asked to select what you want to do: calculate densities, cumulative
distribution functions, etc. After that, you will be prompted for
various values, e.g. range of parameters alpha, beta, gamma and delta,
and the range of x values. Most prompts are straightforward.
All output is written to the file STABLE.OUT in the current directory.
The program allows you to specify a range of parameter values, say
vary alpha from 0.5 to 1.5 through steps of 0.25, beta through
-1 to 1 through steps of 0.5 and x from -5 to 5 through steps of
0.1.
When graphing output, the program does not pause after each graph
(unlike earlier versions of STABLE). When all graphs are presented,
you will hear a beep. Press the enter key once, and you will return
to the program prompt. The graphics were developed on a SVGA monitor
and may not work on other hardware. The graphing routines are not
intended to do high quality graphics, only to visualize densities
and distribution functions quickly. If you want better quality
graphics, write the output to a file and run your preferred graphics
program on that data file. The vertical scaling on the graph is done
automatically in a way that guarantees there will be no overflow,
but may not show your results clearly if you are far from the mode.
SAMPLE RUNS
-----------
1. Computing stable densities.
The following shows how to compute the density for a stable
distribution with alpha=1.2 and beta ranging through -1, -0.5, 0,
0.5, and 1 and the output is graphed on the screen.
Select
1 for pdf calculations
2 for cdf calculations
3 for quantile calculations
4 for generating stable random variates
5 for setting internal tolerances for the calculations
6 for comparison of quick pdf and pdf calculations
7 for fitting a sample with stable parameters
8 for computing -f'/f
Enter choice: 1
Enter parameterization: 0 for S0, 1 for S, 2 for S* : 0
Enter first alpha, last alpha and alpha stepsize: 1.2 1.2 0
Enter first beta, last beta and beta stepsize: -1 1 0.5
Enter first x, last x and x stepsize: -5 5 0.1
Enter gamma and delta: 1 0
Output destination (0=graph,1=screen,2=write to file): 0
(graphs appear on the screen, pausing for you to press the enter
key at the end.)
The program loops back to the first question. Enter end-of-file
(CTRL-Z on a PC) to stop gracefully.
Note that you can choose a sequence of alpha and beta values, e.g.
Enter first alpha, last alpha and alpha stepsize: 1.2 1.6 .2
will use alpha=1.2, alpha=1.4 and alpha=1.6. If you only want a
single value, do as in the example above.
If you want to produce a good graph for inclusion in a paper,
you will have to write the output to a file (choice 2 to the last
question/prompt), and read that output into a program that produces
graphs - Excel, R, matlab, Mathematica, etc.
2. Computing stable cumulative d.f. is similar to 1.
3. Computing stable quantiles is similar to 1.
Quantile computations are slow because a numerical search
for each quantile is performed by repeated evaluation of the cdf.
4. Generating stable random variables
The first prompt asks for the parameterization you want to use.
Then you are prompted for
n = # of values to generate
iseed = starting seed for the random number generator. A value of
-1 uses a random seed, any other value will generate a sequence
that can be regenerated later by specifying the same seed
alpha = index of stability
beta = skewness
gamma = scale parameter (meaning depends on parameterization)
delta = location parameter (meaning depends on parameterization)
For example, entering
1000 -1 1.4 0 1 0
will generate 1000 standardized symmetric alpha=1.4 stable variates.
Values are generated using the Chambers, Mallows and Stuck algorithm
(corrected) and transformed to the desired parameterization.
As always, the output goes to the file STABLE.OUT, where it can
be edited for other use (e.g. to test parameter estimation).
5. Setting internal tolerances.
Experienced users may want to change tolerances used by STABLE.
This choice shows the internal values used by STABLE and allows you to
change them. You can speed up the programs by altering these
values, but you will generally lose accuracy. The program starts with
tolerance values that attempt to get close to full double precision
accuracy in the numerical evaluations of the integrals involved in
the program. This means calculations will be slower, but highly accurate.
The exact meanings of each tolerance are given in the technical
documentation.
The main use of this feature was to speed up likelihood calculations.
Now that there is an interpolation routine to compute densities
and likelihoods quickly, there is little reason for an average user
to change these values.
WARNING: the tolerances control only the accuracy of the integral
evaluation in the program. In certain cases of cdf evaluation,
the value of the computed integral is added or subtracted from
a constant and subtractive cancelation can lower the accuracy
of the final result. See Theorem 1 in the "Numerical calculation
of stable..." paper cited above for when this occurs.
6. Comparing pdf and quick pdf calculations
This option allows you to compare the density values computed by
the integral representation to the one computed by the
interpolation. At the cost of a large array of coefficients,
the interpolation is highly accurate over most values of the
parameter space. This part of the program lets you compare the
two calculations to detect any problems in the interpolation
scheme. Note that the interpolation program is limited to
alpha > 0.4. Near the boundary of the parameter space
(beta +1/-1, alpha very close to 2, x on the tail of a
highly skewed distribution), the interpolation can
be inaccurate. See note 6 at bottom.
7. Fitting a sample with a stable distribution
You will first be asked to supply the file name of the input file.
(If the file is not in the working directory, you will need to
specify its complete filename, including directory.)
The program reads one value per line from the input file in
plain ascii text format. Because of the way Visual
Fortran reads free form input, the last data value should be
followed by a carriage return. Without this return,
STABLE will miss the last data value. NOTE: the data
should be one column of values, containing only the numeric
values you want to analyze. Coded files from Excel, SAS, etc.
will not work - use only plain ascii.
All output is displayed on both the screen and the output
file, so you do not have to copy results from the screen.
Some descriptive statistics will be displayed and you will
be prompted for what kind of fit you want. The options are
maximum likelihood fit, quantile fit of McCulloch, sample
characteristic method of Koutrovelis-Kogon-Williams, user
supplied parameters, and normal fit. We will focus on the
maximum likelihood fit, the others are straightforward.
After choosing maximum likelihood fit, you will be asked
if you want to take all defaults and do a fast search.
Normally, you would type "Y" and the parameters are
estimated for you automatically. If you do not type
"Y", a series of questions are asked. First, you are asked what
method you want to use. Normally you should choose option
3: approximate gradient search with QKSPDF. This is the
fastest method and simulation studies (technical report in
preparation) show it is highly accurate.
If you end up with a fit that is near a boundary of the parameter
space, you may want to recompute the estimates using the slower
direct search or possibly even the direct search with
(slow) pdf calculations. See note 6 at bottom for more
information.
After selecting the method, the quantile estimates are computed
to use as an initial estimate of the parameters. You will then
be prompted for the bounds of the four parameters (alpha, beta,
gamma and delta) in the search. General limits are given in
parenthesis at each prompt. The safest thing is to use these
bounds, but if you know a priori that you want to restrict the
bounds, you may do so. If you do restrict the range and the
resulting parameter estimates are near the boundary you specify,
you should rerun your search with wider bounds to see if the new
search goes outside your original bounds. If you want to restrict
some parameter to be a specified value, say beta=0 because you
believe you have a symmetric stable, you cannot just enter upper
and lower bounds of 0. The search routine will not work correctly
with any parameter interval having zero width. You can trick the
search routine by using distinct values for the bounds like -.001
and +0.001, which will make little practical difference.
Note: if alpha < 0.4, the quick pdf routine will not work and the
(regular/slow) pdf routine becomes unreliable.
The program will then numerically maximize the likelihood using the
method you selected. For default (method 3), this takes a few seconds
for a data set of size 1,000. (Method 1 takes several times as long,
methods 2 and 3 may take hours.) The resulting parameter values
are then displayed on the screen in all 3 parameterizations.
The program prints out the half width of 95% confidence intervals,
e. g. the term printed is the amount to add and subtract from
each parameter estimate to obtain the confidence interval.
Then you will be asked if you want to run some exploratory data
analysis to assess the stable fit. Answer "Y" for yes and you
will be prompted for values. Again, results are written on the
output file STABLE.OUT for later reference. The diagnostics
are:
- standard and variance-stabilized (thinned) PP plots
- standard (thinned) QQ plots with pointwise confidence bounds
- smoothed data density vs. fitted density
Thinning the PP and QQ plots uses fewer plot points than data points
if the data set is large. The program allows a maximum of 1000 plot
points. PP plots take a few seconds, QQ plots can take a while
because of the way stable quantiles are computed (see 3. above). Very
large data sets will also take a while to compute smoothed data
densities.
8. Computing g(x) = -f'(x)/f(x). This function is called "the
nonlinear function" in signal processing. It has an explicit form
for the Gaussian case (g(x)=x), Cauchy case (g(x)=2x/(1+x^2)
and in the Levy case (alpha=1/2,beta=1, g(x)= (1-3x)exp(-1/x)/(2x^5)),
where the standardized densities are used (gamma=1,delta=0) and
the S1 parameterization is used in the Levy case. In all other
cases, the function is computed numerically.
The program will prompt you for a range for alpha, a range for beta,
and then a grid of x-values. It assumes scale gamma=1 and location delta=0,
and uses the S0 parameterization. It actually computes g
numerically in all cases, so the three cases mentioned above give
a check on the numerical accuracy of the calculations.
STABLEC.EXE
This is a console version of STABLE that has no graphics in it. It also
runs under Windows 95, 98, 2000. It reads from stdin and writes
prompts and input error messages to stdout, with normal program output
going to the file stable.out. The easiest way to run it is to put
your commands/choices in an ascii file, say test.dat. Then run
STABLEC from a DOS prompt with input coming from that file:
stablec < test.dat
Note that the program reads exactly as STABLE does. In particular,
if STABLE expects input on separate lines, then you must enter the
values onto separate lines in your input file. If you are confused,
run STABLE once and record exactly what values go on what input lines.
If you want to hide the prompts, redirect stdout also:
stablec < test.dat > test.junk
Note that input error messages are written to stdout, so if you
redirect it as above, you will have to look at the file test.junk
to see error messages.
BUGS AND LIMITATIONS:
1. When 0 < |alpha-1| < 0.005, the program has numerical problems
evaluating the pdf and cdf. The current version of the program sets
alpha=1 in these cases. This approximation is not bad in the S0
parameterization.
2. When alpha=1 and |beta| < 0.005, the program has numerical
problems. The current version sets beta=0.
3. When alpha < 0.2, the program does not work well. The
densities and cdf vary very quickly in this case. For example, when alpha=0.1
and beta=1, the density is zero for x < tan(pi*0.1/2)=-0.15838; climbs
abruptly to reach a maximum of approximately 1.0E6 at around
x=-0.158285, and then drops very quickly to very small values.
4. The current driver program allows 10001 x-values, and 20 values of
alpha and beta. If you need to, break your range into pieces, run
the program on each subrange, write output to several files, and then
concatenate your files.
5. The user interface is simple and little error checking is
done. Typing errors may lead to the program exiting. Misspecified
ranges may lead to infinite loops.
6. Problems near boundaries of the parameter space. While alpha
can be any positive number, small values of alpha specify distributions
that are very unusual. For alpha < 0.4, you may have to tinker with
some of the internal tolerances to get reliable results. For alpha < .1,
you cannot get reliable results from the current code. Also, when
beta is near 1 or -1, there are some problems. The approximation
to the density is inaccurate for |beta| > 0.99. The current version of
the program interpolates for 0.99 < |beta| <= 1. A side effect of this
is that the maximum likelihood fitting routines will always yield a
value of alpha and beta at a grid point. Specifically,
alpha will be in the set {0.4,0.5,0.6,0.7,0.8,0.9,1,1.1,1.2,1.3,1.4,
1.5,1.6,1.7,1.8,1.9,1.95,2} and beta will be in {0.99,1} (or {-1,-0.99}).
If you run into this problem, you should rerun your search with the
(much slower) method using PDF, not the approximation QKSPDF.
7. On the tails, there are sometimes problems. For example, when
alpha=.9, beta=1, gamma=1, delta=0, x=2, the pdf calculations fail
to give accurate results. While positive, the value of the density
at that point is less than 1.0e-14, which is around machine precision
for double precision. Both the pdf and cdf calculations can be
unreliable in these cases. To the best of my knowledge, this only
happens when the result is less than 1.0e-14.
8. The program works on standardized z=(x-delta)/gamma in the S0
parameterization. When |z-beta*tan(pi*alpha/2)| is small, the
computations of the density and cumulative have numerical problems.
(See the formulas for computations in the paper referenced above
for details.) The program works around this by setting
z = beta*tan(pi*alpha/2) when
|z-beta*tan(pi*alpha/2)| < tol(5)*alpha**(1/alpha). (The bound
on the right is ad hoc, to get reasonable behavior when alpha
is small). You can set tol(5) to any number by changing tolerances
as described above, though you may get unreliable results or
program errors.
................
Please report problems and bugs to me at the address given at the
top of this document.