Linear Algebra is an incredibly powerful tool. A joke among mathematicians is that the only way we can solve a hard problem is to boil it down to Linear Algebra1. Harnessing the numerical power of Linear Algebra has been done via LAPACK for the last 40+ years.
In my first semester of graduate school, I took UC Berkeley's Math 221, i.e.
Numerical Linear Algebra. So in my second semester, I was ready to put my
new skills to use. However, when I tried to use LAPACK from C and C++ code for
an assignment, I gave up before getting anything working. I was frustrated
by my deadline, by the LAPACK documentation (I didn't know where to look
or how to read what I did find), by actually forming the arguments correctly
and by getting the compiler flags right. I just gave up and turned to my trusty
friend Python; I used a combination of NumPy and the low-level
scipy.linalg.lapack package when necessary.
I wished then that there was a blog post exactly like this one. Our goal here
is to call a single LAPACK routine (dgetrf) from a few different contexts
to get a feel for the library structure, argument specification, reading
documentation and linking / building code.
Contents
- LU Factorization
- High-Level Python
- Finding LAPACK
- Low-Level Python
- C
- Compiling on OS X
- C++
- Conclusion
LU Factorization
As mentioned, we'll focus on the dgetrf method for performing
LU Factorization. The names themselves can be confusing to newcomers.
Luckily the Wikipedia page does a great job describing what is happening:
- d: the matrix entries are of type- double(i.e. a 64-bit IEEE-754 floating point number)
- ge: the matrix is in general form, as opposed to triangular, banded, symmetric, or many other types of matrix where information need not be repeated
- trf: triangular factorization
We'll focus on factoring a particular matrix :
The method actually allows row pivoting, so in this case we should have where the pivot matrix is just the identity matrix.
To actually call this method, we'll need to study the documentation.
I learned software in the web development world, so I was used to "modern"
documentation like readthedocs.org. When I first saw the Doxygen-based
LAPACK docs, I was intimidated. It's a bit of a different style and many
URLs are not human readable (for example, the URL for dgetrf).
In addition, the arguments often have short or abbreviated names that can be
hard to parse and each argument has an intent of in, out or inout.
The concept of an argument's intent is a Fortran feature and at first I
didn't understand the purpose of it.
Let's break down the arguments to dgetrf:
Constant Inputs:
- Mis an- integer; the number of rows in the matrix- A.
- Nis an- integer; the number of columns in the matrix- A.
- LDAis an- integer; the leading dimension (LD) of- A. This would initially seem to be redundant since the number of rows- Mshould be the leading dimension. However, by allowing- LDA(the stride) to be different from- M,- Acan be taken as a submatrix of a larger array without having to copy the data.
Outputs:
- IPIVis a vector of- integervalues of dimension- min(M, N); it describes the row pivots used and is a more compact representation of the matrix- P.
- INFOis an- integer; this returns- 0if the method succeeded, uses negative return values to indicate invalid inputs (e.g. a matrix can't have- N = -2rows) and positive return values to indicate if a division by zero occurred during Gaussian elimination.
Mutated Inputs (i.e. intent inout):
- Ais a 2D array of- doubleprecision values of dimension- LDA x N; on input this is the matrix that is being factored. When the routine exits, the lower triangle of- Awill contain the below diagonal elements of- L(the diagonal of- Lis already known to be all ones) and the upper triangle of- Awill contain- U.
Since the original LAPACK implementation (and many current implementations) are written in Fortran, all arguments are passed by reference.
High-Level Python
There are two functions in SciPy for performing an LU decomposition:
scipy.linalg.lapack.dgetrf and scipy.linalg.lu. The former is a "direct"
wrapper but the wrapper handles allocating the outputs and allows a flag
overwrite_a which will determine if a copy of the input matrix is used
when computing the factorization. The other function scipy.linalg.lu varies
slightly, but not enough to warrant a separate discussion here.
Now, we'll call dgetrf with our sample matrix. To avoid any overhead
from copies we make sure A is in Fortran order ("F") and set the
overwrite_a flag to True:
>>> import numpy as np
>>> import scipy.linalg.lapack
>>> A = np.array([
...     [4., 4., -3.],
...     [0., 4., -1.],
...     [1., 1.,  1.],
... ], order="F")
>>> lu_mat, ipiv, info = scipy.linalg.lapack.dgetrf(A, overwrite_a=True)
The method succeeded and the pivots are "trivial", i.e. the identity matrix2
>>> info
0
>>> ipiv
array([0, 1, 2], dtype=int32)
Our returned lu_mat is really just our input A, which has been mutated
to contain L and U in the lower and upper triangles:
>>> lu_mat is A
True
>>> A
array([[ 4.  ,  4.  , -3.  ],
       [ 0.  ,  4.  , -1.  ],
       [ 0.25,  0.  ,  1.75]])
Finding LAPACK
This section can be freely skipped. TL;DR: the
liblapackshared library contains the routinedgetrf_. In other words, the name in the symbol table has an underscore appended to it.
Before moving on to C, C++ and other low-level languages we need to
understand the exported interface of LAPACK. To do this, we'll get
a little help from the ctypes library:
>>> import ctypes.util
>>> ctypes.util.find_library("lapack")
'liblapack.so.3'
Viewing the source in the Python standard library, we see that (on
"most" posix systems) this is essentially equivalent to a direct usage
of ldconfig:
$ LC_ALL=C LANG=C ldconfig -p | grep 'liblapack\.'
    liblapack.so.3 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/liblapack.so.3
    liblapack.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/liblapack.so
If that fails, including -llapack when compiling an empty file
with gcc is used:
$ LC_ALL=C LANG=C gcc -Wl,-t -o /dev/null -llapack 2> /dev/null | grep lapack
-llapack (/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/liblapack.so)
$ objdump -p -j .dynamic \
>   /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/liblapack.so \
>   2> /dev/null | grep SONAME
  SONAME               liblapack.so.3
If that also fails, ld (possibly extended by
the LD_LIBRARY_PATH environment variable) is used:
$ ld -t -llapack 2> /dev/null
ld: mode elf_x86_64
-llapack (//usr/lib/x86_64-linux-gnu/liblapack.so)
$ objdump -p -j .dynamic \
>   //usr/lib/x86_64-linux-gnu/liblapack.so \
>   2> /dev/null | grep SONAME
  SONAME               liblapack.so.3
Once we've found the shared library, we need to look in the symbol table
(-T for table) for dgetrf:
$ objdump -T /usr/lib/x86_64-linux-gnu/liblapack.so | grep -i dgetrf
00000000001273d0 g    DF .text  0000000000000158  Base        clapack_dgetrf
000000000002c520 g    DF .text  0000000000000025  Base        ATL_dgetrf
0000000000036630 g    DF .text  00000000000000d4  Base        atl_f77wrap_dgetrf_
000000000002c550 g    DF .text  0000000000000778  Base        ATL_dgetrfC
000000000002ccd0 g    DF .text  00000000000002e0  Base        ATL_dgetrfR
0000000000215590 g    DF .text  00000000000000b3  Base        dgetrf_
0000000000215650 g    DF .text  0000000000000490  Base        dgetrf2_
We see that there are quite a few dgetrf routines, but the one we'll use
is dgetrf_. The dgetrf2_ routine is a recursive version of dgetrf_
and the other routines are implementation details (specifically CLAPACK
and ATLAS).
This symbol name was a bit confusing to me when using LAPACK, but I took it on faith. The reason for it is historical:
The world's first Fortran 77 compiler ... appended an underscore on Fortran external names. Thus, Fortran
SUBROUTINE foois known asfoo_()in C
This convention has remained even as compilers have changed. For example CLAPACK (i.e. an implementation of LAPACK in C) follows this convention even though no Fortran compiler is used at all:
f2chas added this underscore to all the names in CLAPACK. So, a call that in Fortran would look like:call dgetrf(...)becomes in C:dgetrf_(...);
Now that we understand the shared library, let's actually figure out where it came from. Following the symbolic links we see
/usr/lib/x86_64-linux-gnu/liblapack.so.3
  -> /etc/alternatives/liblapack.so.3-x86_64-linux-gnu
    -> /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3
      -> /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
Once the true location is given, we can find3 the package that owns the shared library
$ dpkg-query -S /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
libatlas3-base:amd64: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
$ # Find ATLAS reverse dependencies of libatlas3-base package
$ apt rdepends libatlas3-base 2> /dev/null | grep Depends | grep atlas
  Depends: libatlas-base-dev (= 3.10.3-5)
Low-Level Python (ctypes)
Now that we know which shared library to use, we can dynamically load
the library into Python and verify that dgetrf_ is in the symbol table:
>>> import ctypes
>>> liblapack = ctypes.cdll.LoadLibrary("liblapack.so.3")
>>> liblapack.dgetrf_
<_FuncPtr object at 0x7f3fd33ba688>
We can also verify that the dgetrf symbol is not present (and the
traceback will show us the actual path to the shared library):
>>> liblapack.dgetrf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.7/ctypes/__init__.py", line 369, in __getattr__
    func = self.__getitem__(name)
  File ".../lib/python3.7/ctypes/__init__.py", line 374, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: dgetrf
To actually call this routine, we first specify the input and output types:
>>> dgetrf = liblapack.dgetrf_
>>> int_ptr = ctypes.POINTER(ctypes.c_int)
>>> double_ptr = ctypes.POINTER(ctypes.c_double)
>>> dgetrf.argtypes = [int_ptr, int_ptr, double_ptr, int_ptr, int_ptr, int_ptr]
>>> dgetrf.restype = None
To call it, we use ctypes and NumPy to allocate all of our arguments:
>>> import numpy as np
>>> M = ctypes.c_int(3)
>>> N = ctypes.c_int(3)
>>> A = np.array([
...     [4., 4., -3.],
...     [0., 4., -1.],
...     [1., 1.,  1.],
... ], order="F")
>>> LDA = ctypes.c_int(3)
>>> IPIV = np.empty(3, dtype=np.intc)
>>> INFO = ctypes.c_int()
>>> # Check Uninitialized
>>> IPIV
array([47155696,        0,        2], dtype=int32)
>>> INFO
c_int(50221552)
Since LAPACK follows the pass by reference convention, we pass in a pointer to each of these arguments
>>> return_value = dgetrf(
...     ctypes.pointer(M),
...     ctypes.pointer(N),
...     A.ctypes.data_as(double_ptr),
...     ctypes.pointer(LDA),
...     IPIV.ctypes.data_as(int_ptr),
...     ctypes.pointer(INFO),
... )
and the results match those from scipy.linalg.lapack.dgetrf:
>>> return_value is None
True
>>> INFO
c_int(0)
>>> IPIV
array([1, 2, 3], dtype=int32)
>>> A
array([[ 4.  ,  4.  , -3.  ],
       [ 0.  ,  4.  , -1.  ],
       [ 0.25,  0.  ,  1.75]])
C
Unfortunately, LAPACK doesn't come with a standard header file, so
the dgetrf_ symbol must be declared via extern:
extern void dgetrf_(const int* M, const int* N, double* A, const int* LDA,
    int* IPIV, int* INFO);
Notice that all arguments are pointers. We've marked the input only arguments
as const to help the C compiler out. We've also used int (rather than
long) for the integer arguments.
When calling dgetrf_ in the C example, we take
care to lay out A in Fortran order:
int M = 3;
int N = 3;
double A[9] = { 4, 0, 1, 4, 4, 1, -3, -1, 1 };
int LDA = 3;
int IPIV[3];
int INFO;
Actually building code that uses liblapack confused me. On a Debian-based
Linux system, I could install LAPACK via
$ [sudo] apt install libatlas-base-dev
and be on my merry way with the -llapack flag:
$ gcc -o main dgetrf_example.c -llapack
$ ./main
Inputs:
  M = 3
  N = 3
  A = [ 4.00,  4.00, -3.00]
      [ 0.00,  4.00, -1.00]
      [ 1.00,  1.00,  1.00]
  LDA = 3
Outputs:
  A = [ 4.00,  4.00, -3.00]
      [ 0.00,  4.00, -1.00]
      [ 0.25,  0.00,  1.75]
  IPIV = [1, 2, 3]
  INFO = 0
Compiling on OS X
TL;DR: As of macOS High Sierra (10.13), your code can link to LAPACK via
-llapackor-framework Accelerate. Also, when I started graduate school I didn't know much about compiling code.
I was using OS X Mountain Lion (10.8) at the time. Without
using Homebrew to install LAPACK or OpenBLAS, compiling4 with -llapack failed with something
along the lines of:
$ clang -o main dgetrf_example.c -llapack
ld: library not found for -llapack
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Just leaving out the flag entirely was also not an option:
$ clang -o main dgetrf_example.c
Undefined symbols for architecture x86_64:
  "_dgetrf_", referenced from:
      _main in dgetrf_example-fcd533.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
I would eventually find out that Apple provides a custom framework
called vecLib that contained a LAPACK implementation / interface among other
things. So the correct magical incantation (as of OS X 10.8) was
$ clang -o main dgetrf_example.c -framework vecLib
At some point5,
things got better (thanks Apple). On a current6 version, liblapack comes with the OS:
>>> import ctypes.util
>>> ctypes.util.find_library("lapack")
'/usr/lib/liblapack.dylib'
Following symlinks, we see that this shared library is part of the
Accelerate framework:
>>> import os
>>> os.path.realpath("/usr/lib/liblapack.dylib")
'/System/Library/Frameworks/Accelerate.framework/.../A/libLAPACK.dylib'
Now vecLib is contained in Accelerate and can no longer be
included directly:
$ readlink /System/Library/Frameworks/vecLib.framework
Accelerate.framework//Versions/A/Frameworks/vecLib.framework
$ clang -o main dgetrf_example.c -framework vecLib
ld: cannot link directly with /System/Library/Frameworks//vecLib.framework/vecLib for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
However, this means that LAPACK can be included the "Linux" way or via a framework:
$ clang -o main dgetrf_example.c -llapack
$ clang -o main dgetrf_example.c -framework Accelerate
C++
TL;DR: Use
extern "C"instead of justexternwhen declaring LAPACK routines in C++ code.
When I started graduate school I didn't know anything about shared libraries, symbol tables, name mangling or ABIs. It also didn't even occur to me that extra underscore in names of LAPACK routines in the shared library had anything to do with those topics.
Luckily, after an hour or two7
of Googling I learned about extern "C" and everything "made sense". Once
you know about extern "C", declaring the LAPACK routine in C++ is essentially
the same as in C:
extern "C" void dgetrf_(const int* M, const int* N, double* A, const int* LDA,
    int* IPIV, int* INFO);
From there, dgetrf_ can be used as it would in C. But I'd like to explain
a bit further why we say extern "C" rather than extern. To explain,
consider the simple program that is valid in C or C++:
int together(int m, int n) {
    return m + n;
}
If we compile this into an object file in C, we see the name together
in the symbol table
$ gcc -c together.c
$ nm together.o
0000000000000000 T together
but if we compile the same code in C++, the name is mangled (to
_Z8togetherii) before it is put in the symbol table:
$ g++ -c together.cpp
$ nm together.o
0000000000000000 T _Z8togetherii
This name mangling allows many features of C++. For example, we could
provide a second implementation using the same name but with type
double:
double together(double m, double n) {
    return m + n;
}
and the two functions can both be in the symbol table without "colliding" on their shared name:
$ g++ -c together.cpp
$ nm together.o
0000000000000014 T _Z8togetherdd
0000000000000000 T _Z8togetherii
For an example in the wild consider the libjsoncpp package (chosen at
random since it is installed on my current system). The valueToString method
in the Json namespace has six different implementations:
$ objdump    -T /usr/lib/x86_64-linux-gnu/libjsoncpp.so.1.7.4 | grep valueToString
0000000000020c30 ... _ZN4Json13valueToStringB5cxx11Eb
0000000000020be0 ... _ZN4Json13valueToStringB5cxx11Ed
0000000000020b60 ... _ZN4Json13valueToStringB5cxx11Ei
0000000000020ba0 ... _ZN4Json13valueToStringB5cxx11Ej
00000000000208c0 ... _ZN4Json13valueToStringB5cxx11Ex
0000000000020a80 ... _ZN4Json13valueToStringB5cxx11Ey
$ objdump -C -T /usr/lib/x86_64-linux-gnu/libjsoncpp.so.1.7.4 | grep valueToString
0000000000020c30 ... Json::valueToString[abi:cxx11](bool)
0000000000020be0 ... Json::valueToString[abi:cxx11](double)
0000000000020b60 ... Json::valueToString[abi:cxx11](int)
0000000000020ba0 ... Json::valueToString[abi:cxx11](unsigned int)
00000000000208c0 ... Json::valueToString[abi:cxx11](long long)
0000000000020a80 ... Json::valueToString[abi:cxx11](unsigned long long)
Why did I just tell you all this? To make it clear why we say extern "C"
instead of just extern. This indicates to the compiler that the symbol
we want to refer to is literally dgetrf_. By using extern, that would
tell the compiler to look for a mangled external name. For example with g++,
it would look for the name _Z7dgetrf_PKiS0_PdS0_PiS2_.
Conclusion
I hope this post has been informational. Now you should have a better idea how to
- Read LAPACK documentation to interpret and write the correct signature for a LAPACK routine
- Reference the correct symbol when calling a LAPACK routine as a foreign function
- Understand argument intent and pass by reference calling convention for LAPACK's Fortran-like interface
- Use Python's ctypesmodule (and NumPy'sctypes.data_asmethod) to invoke LAPACK routines for fast prototyping and debugging
Thanks for reading!
- This actually happened to me. My undergraduate research project (REU) eventually boiled down to showing that a particularly special matrix had a nonzero determinant. ↩
- Though SciPy has returned 0-based pivots rather than the 1-based indices returned from LAPACK) ↩
- On a Debian Linux. For example, I am running Ubuntu 18.04 ↩
- Note that gccis just an alias forclangon OS X ↩
- It seems it was in OS X Yosemite (10.10). ↩
- macOS High Sierra (10.13) ↩
- I'm not joking, I was really stumped ↩