How to overcome the 2GB filesize limit under Linux


Latest update:
If you have glibc-2.2.3 and have tried to apply the gcc-3.0.3.LFS-glibc2.2.3.diff.gz patch without success, please read the following update from Salnikov.


Update:
It has been pointed out to us that the upcoming g77-3.1 will be LFS capable directly. See http://gcc.gnu.org/onlinedocs/g77_news.html for the latest news on the g77 compiler.


Disclaimer:
The Dalton team has not tested this fix. We give absolutely no warranty of its correctness. We will not be held responsible in any way for any consequences arising from the application and use of this procedure.

Description submitted by Georgy Salnikov

Following are some copies of my docs how to make a Fortran program LFS capable, extracted from my email to Dr. Mike Schmidt, US-GAMESS maintainer. The patches themselves are in attachments. Some later comments are also introduced.

I have performed the same procedures with both Dalton-1.1 and Dalton-1.2.1, both versions were also made LFS capable.

Shortly spoken your needs are:

[The patches are located at the end of this document] NOTE: Only one single selected patch is to be applied, based on your version of gcc and glibc.

gcc has to be recompiled with special flags, see following descriptions.

Recently I have found, gcc-3.0.3 has problems with loop unrolling, don't use -funroll-loops and -funroll-all-loops if optimizing with this gcc.

Changes to Dalton

In respect to Dalton you have to do the following extra operations while recompiling it. Just after executing the Dalton configure script edit Makefile.config and do the following changes:

The general LFS instructions

In order to be able to work with large (>2Gb) files under Linux any program needs the following:
  1. The Linux kernel 2.4.x. I tested Linux-2.4.5 and 2.4.6 only but expect that any 2.4.x version should be OK. Some LFS patches for 2.2.x kernels might exist but I do not know where and whether.
    020202: Recently I have tested LFS support also under Linux-2.4.17, no problem, of course.
  2. The C system library glibc 2.x. I have used glibc-2.1.3. This version works fine with the low level I/O while still having several quirks with stdio (about how to struggle with the quirks I shall say below). The newer glibc versions may be yet better but I have not tested them. I have no idea if glibc earlier than 2.1.3 can work. Also some LFS patches for older libc might exist but I know nothing about them.
    020202: Recently I have experimented with LFS under glibc-2.2.3. This version is really much better than 2.1.3, I have found no problems which induced me to write the so called 'real' gcc patch, every function used in libI77 worked with the so called 'theor' patch fine.
In principle, having the proper glibc and kernel is sufficient for C programs to work with LFS. It does not even necessarily require to modify existing non-LFS code. Before recompiling the C modules the programmer must make sure that: For fortran programs the LFS support in the libc is however not sufficient, the fortran runtime (libg2c) support is additionally required. My patches from the attachments deal with it. There are two patches there. For the very first time I created gcc-3.0.LFS-theor.diff.gz. 'theor' means that this patch should theoretically work but it did not because glibc-2.1.3 was found to have several problems related to LFS. However, this patch may work with newer glibc versions. As I was not able to switch quickly to newer glibc (also not being sure whether it will help) I made another patch, gcc-3.0.LFS-real.diff.gz, which works around some glibc bugs so that the LFS fortran support does work as well.

So, if one is interested, he can firstly try to apply the 'theor' patch and check if it is sufficient (of course, if having glibc newer than 2.1.3). Otherwise, apply the 'real' patch - it was tested to work definitively.
020202: Recently I have tested, the 'theor' patch works under glibc-2.2.3 just so fine.

In order to apply the patch, go into the unpacked gcc-3.0 source directory and say

zcat gcc-3.0.LFS-real.diff.gz | patch -p1

Then one has to compile gcc with slightly modified procedure than described in the INSTALL docs. I did the following.

CFLAGS="-O2 -march=i686 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64" \
CXXFLAGS="-O2 -march=i686 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64" LDFLAGS=-s \
../gcc-3.0/configure --prefix=/usr/local/LFS --disable-shared \
--enable-threads --disable-libgcj

Here --prefix=/usr/local/LFS is used in order not to mix the LFS capable compiler with possible standard compiler (used to recompile kernels or other important system things). I am not sure whether both are fully compatible with each other so such switch may be important.

--disable-shared is used in order to supply the fortran programs compiled with this LFS capable fortran with statically linked libg2c runtime, otherwise they may get standard non-LFS runtime from improper shared library. Therefore this switch may be also important.

--enable-threads is a standard switch necessary for any Linux based on glibc-2.x

--disable-libgcj is used as gcc-3.0 under Linux cannot compile libgcj anyway.

I am not sure whether -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 inside C[XX]FLAGS are necessary. I have used them and it worked but I have not tested without.

LDFLAGS is not required but does not interfere.

After configuring I bootstrapped the compiler with the command

make CFLAGS="-O2 -march=i686 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64" \
LIBCFLAGS="-O2 -march=i686 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64" \
CXXFLAGS="-O2 -march=i686 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -fno-implicit-templates" \
LIBCXXFLAGS="-O2 -march=i686 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -fno-implicit-templates" \
BOOT_CFLAGS="-O2 -march=i686 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64" \
LDFLAGS=-s bootstrap

[A user reported that he could only get this to work by specifying: CFLAGS=... make bootstrap]

From the various flags above the switches -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 inside BOOT_CFLAGS are absolutely necessary, otherwise you will get the usual non-LFS libg2c. I can say nothing about [LIB]C[XX]FLAGS except I simply used them and it worked, I have not tried without. LDFLAGS is only for beauty.

Then one should make install with the same flags as make bootstrap.

Compiling fortran programs with just installed compiler is done, of course, with an explicit pathname, /usr/local/LFS/bin/g77, so that you are sure that you have used the proper compiler. Neither modification to fortran program is necessary, nor any specific compiler switches. In order to make sure that you have got really LFS compatible code, you can nm your program binary and look if it accesses fopen64, fseeko64, fgetpos64 and ftruncate64, and no fopen, fseek, ftell, fgetpos, ftruncate without the '64' modifier.

Linking together the fortran and C modules into the complete program may and may not require the -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 switches for the C compiler depending on that, which file operations are done in the C code.

Such implementation of the fortran LFS support should lead to practically unlimited in size SEQUENTIAL files, while the size limit of DIRECT files should be raised from 2 gigabytes to 2 gigarecords (the bytesize depending on the RECL parameter used while opening the file). So, the RECL and REC parameters are still 32 bit long, but the arithmetic on them in libg2c is now done in 64 bit off_t allowing the product to become bigger than 2 giga.

Several problems related to the LFS capable GAMESS usage (may or may not be relevant to Dalton)

  1. While debugging the LFS-GAMESS code I have found that the functions ftello64 and freopen64 from glibc-2.1.3 do not work as expected. I changed them to fgetpos64 and fclose;fopen64 correspondingly and produced the 'real' patch which did work. This problem might already be solved in newer glibc, I don't know.
    020202: Everything works fine under glibc-2.2.3.
  2. After having experimented with GAMESS, I have found that the ls and rm commands fail to deal with files >2Gb. Recompiling their binaries did not help because no function among stat64, lstat64, fstat64 from glibc-2.1.3 worked. The only real solution was to live without ls -lF $SCR/$JOB.* (the files which are shorter than 2Gb still appear in the listing while the large files do not). What about rm, the ability to delete large files remains rather important. For this a small program has to be written (an example of remove.c is in the attachment) and put, let's say, into the root of the gamess directory, and the rungms script extended to include the command sh -c "$GMSPATH/remove $SCR/$JOB.F*" (csh does not work as it tries to do stat which failes) for the master node and rsh $host -l $USER -n "$GMSPATH/remove $SCR/$JOB.F*" for other nodes. Then the tmpfiles will be deleted after finishing GAMESS job (similarly they can be deleted with this remove program manually).
    020202: Everything works fine under glibc-2.2.3.
  3. While LFS capable GAMESS runs fine from a shell which does not set any limits with the ulimit command, the system does not allow it to create large files if it is started from the NQS queueing system (qsub). I expect that it could be caused also by the incomplete LFS limits support in glibc-2.1.3 and may be better in newer glibc versions. I expect also similar problems if the user's shell sets some ulimits by default, then the user may be unable to run GAMESS even from the command line.
Therefore, if one has glibc-2.1.3, he has additionally to: 020202: The 64-bit limits and hence running programs with qsub works fine under glibc-2.2.3.

If one runs the LFS capable GAMESS under some kernel without LFS support (a 2.2.x kernel), then GAMESS still works, but cannot create large files.

That's all for now...

With best wishes,
Georgy Salnikov


The patches

Update: If you have glibc 2.2.3 you should use one of the following patches (see Update for details) :


Last updated: Jul 22nd 2002, by Trygve Helgaker