--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
NVIDIA CUDA
Linux Release Notes
Version 2.1
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

On some Linux releases, due to a GRUB bug in the handling of upper
memory and a default vmalloc too small on 32-bit systems, it may be
necessary to pass this information to the bootloader:

vmalloc=256MB, uppermem=524288

Example of grub conf:

title Red Hat Desktop (2.6.9-42.ELsmp)
root (hd0,0)
uppermem 524288
kernel /vmlinuz-2.6.9-42.ELsmp ro root=LABEL=/1 rhgb quiet vmalloc=256MB
pci=nommconf
initrd /initrd-2.6.9-42.ELsmp.img

--------------------------------------------------------------------------------
New Features
--------------------------------------------------------------------------------

  Hardware Support
  o  See http://www.nvidia.com/object/cuda_learn_products.html

  Platform Support
  o  Additional OS support
     - Red Hat Enterprise Linux 4.7
     - Red Hat Enterprise Linux 5.2
     - SUSE Linux 11.0
     - Fedora 9
     - Ubuntu 8.04
  o  Eliminated OS support
     - SUSE Linux 10.2
     - Ubuntu 7.04

  API Features
  o  PTX JIT API
     - cuModuleLoadDataEx
  o  New device attribute query
     - CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT

--------------------------------------------------------------------------------
Major Bug Fixes
--------------------------------------------------------------------------------

  o OpenGL interoperability will now only copy shared buffers through
    host memory when CUDA and OpenGL are running on different GPUs.

--------------------------------------------------------------------------------
Known Issues
--------------------------------------------------------------------------------

o GPU enumeration order on multi-GPU systems is non-deterministic and
  may change with this or future releases. Users should make sure to
  enumerate all CUDA-capable GPUs in the system and select the most
  appropriate one(s) to use.

o Individual GPU program launches are limited to a run time
  of less than 5 seconds on a GPU with a display attached.
  Exceeding this time limit causes a launch failure reported
  through the CUDA driver or the CUDA runtime. GPUs without
  a display attached are not subject to the 5 second run time
  restriction. For this reason it is recommeded that CUDA is
  run on a GPU that is NOT attached to an X display.

o In order to run CUDA applications, the CUDA module must be
  loaded and the entries in /dev created.  This may be achieved
  by initializing X Windows, or by creating a script to load the
  kernel module and create the entries.

  An example script (to be run at boot time):

  #!/bin/bash

  modprobe nvidia

  if [ "$?" -eq 0 ]; then

  # Count the number of NVIDIA controllers found.
  N3D=`/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l`
  NVGA=`/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l`

  N=`expr $N3D + $NVGA - 1`
  for i in `seq 0 $N`; do
  mknod -m 666 /dev/nvidia$i c 195 $i;
  done

  mknod -m 666 /dev/nvidiactl c 195 255

  else
  exit 1
  fi

o When compiling with GCC, special care must be taken for structs that
  contain 64-bit integers.  This is because GCC aligns long longs
  to a 4 byte boundary by default, while NVCC aligns long longs
  to an 8 byte boundary by default.  Thus, when using GCC to
  compile a file that has a struct/union, users must give the
  -malign-double
  option to GCC.  When using NVCC, this option is automatically
  passed to GCC.

o Cross-compilation with the --machine option is not supported.

o The default compilation mode for host code is now C++. To restore the old
  behavior, use the option --host-compilation=c

o For maximum performance when using multiple byte sizes to access the
  same data, coalesce adjacent loads and stores when possible rather
  than using a union or individual byte accesses. Accessing the data via
  a union may result in the compiler reserving extra memory for the object,
  and accessing the data as individual bytes may result in non-coalesced
  accesses. This will be improved in a future compiler release.


--------------------------------------------------------------------------------
Open64 Sources
--------------------------------------------------------------------------------

The Open64 source files are controlled under terms of the GPL license.
Current and previously released versions are located via anonymous ftp at
download.nvidia.com in the CUDAOpen64 directory.


--------------------------------------------------------------------------------
Revision History
--------------------------------------------------------------------------------

  11/2008 - Version 2.1 Beta
  06/2008 - Version 2.0
  11/2007 - Version 1.1
  06/2007 - Version 1.0
  06/2007 - Version 0.9
  02/2007 - Version 0.8 - Initial public Beta


--------------------------------------------------------------------------------
More Information
--------------------------------------------------------------------------------

  For more information and help with CUDA, please visit
  http://www.nvidia.com/cuda
