Opensourceinfo

Opensource for an Openmind.

OpenSource Headlines

Thanks for the advice. I just

Use embedded Linux and open-source software to build a networked audio appliance.

Benchmarking ZFS On FreeBSD vs. EXT4 & Btrfs On Linux

ZFS is often looked upon as an advanced, superior file-system and one of the strong points of the Solaris/OpenSolaris platform while most feel that only recently has Linux been able to catch-up on the file-system front with EXT4 and the still-experimental Btrfs.

openmamba 20100728

It runs on computers based on the 32-bit Intel x86 architecture, or on 64-bit AMD processors in 32-bit mode.

Even SAP is using more open source

By Source Seeker on Wed, 07/28/10 - 5:55pm. Yesterday SAP took another step into the open source world by signing on to use the Black Duck Suite .

Why WikiLeaks Is The Pirate Bay of Political Intelligence

WikiLeaks is currently in the news because its Afghan War logs comprise one of the largest and most controversial intelligence leaks to date.

Convirture goes open core with 2.0 virt tools

Convirture has unveiled a management tool for open source hypervisors. It's been clear from the beginning of the server virtualization wave that eventually the hypervisor would become commoditized and that the real action, in terms of functionality as well as in money, would come with the management tools that wrap around the hypervisor and make it ...

proxy servers

May 4, 2009 ... As an alternative to downloading the files, the HCPM/HAI Synthesis Cost Proxy Model may be obtained from the FCC's duplicating contractor, ... http://www.fcc.gov/ccb/apd/hcpm/ Patent Database Notices and Status The database servers are now capable of processing approximately 300 simultaneous searches.

Open source installer offered for Plug Computer

Marvell announced the availability of an open source installer, simplifying software deployment on its Linux-based Plug Computer reference design.

Google patches Chrome, sidesteps Windows kernel bug

July 28, 2010, 09:59 AM - Computerworld - Google on Monday patched five vulnerabilities in Chrome by issuing a new "stable" build of the browser.

Fast Easy Web Hosting

Your Say About Movies

Cigars Review

Man Pages



System Calls
Browse in : All > Documents > Man Pages > System Calls (322)
All A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Other

accept

NAME

accept - accept a connection on a socket  

SYNOPSIS

#include <sys/types.h>
#include <sys/socket.h>

int accept(int s, struct sockaddr *addr, socklen_t *addrlen);  

DESCRIPTION

The accept function is used with connection-based socket types (SOCK_STREAM, SOCK_SEQPACKET and SOCK_RDM). It extracts the first connection request on the queue of pending connections, creates a new connected socket with mostly the same properties as s, and allocates a new file descriptor for the socket, which is returned. The newly created socket is no longer in the listening state. The original socket s is unaffected by this call. Note that any per file descriptor flags (everything that can be set with the F_SETFL fcntl, like non blocking or async state) are not inherited across an accept.

The argument s is a socket that has been created with socket(2), bound to a local address with bind(2), and is listening for connections after a listen(2).

The argument addr is a pointer to a sockaddr structure. This structure is filled in with the address of the connecting entity, as known to the communications layer. The exact format of the address passed in the addr parameter is determined by the socket`s family (see socket(2) and the respective protocol man pages). The addrlen argument is a value-result parameter: it should initially contain the size of the structure pointed to by addr; on return it will contain the actual length (in bytes) of the address returned. When addr is NULL nothing is filled in.

If no pending connections are present on the queue, and the socket is not marked as non-blocking, accept blocks the caller until a connection is present. If the socket is marked non-blocking and no pending connections are present on the queue, accept returns EAGAIN.

In order to be notified of incoming connections on a socket, you can use select(2) or poll(2). A readable event will be delivered when a new connection is attempted and you may then call accept to get a socket for that connection. Alternatively, you can set the socket to deliver SIGIO when activity occurs on a socket; see socket(7) for details.

For certain protocols which require an explicit confirmation, such as DECNet, accept can be thought of as merely dequeuing the next connection request and not implying confirmation. Confirmation can be implied by a normal read or write on the new file descriptor, and rejection can be implied by closing the new socket. Currently only DECNet has these semantics on Linux.  

NOTES

There may not always be a connection waiting after a SIGIO is delivered or select(2) or poll(2) return a readability event because the connection might have been removed by an asynchronous network error or another thread before accept is called. If this happens then the call will block waiting for the next connection to arrive. To ensure that accept never blocks, the passed socket s needs to have the O_NONBLOCK flag set (see socket(7)).  

RETURN VALUE

The call returns -1 on error. If it succeeds, it returns a non-negative integer that is a descriptor for the accepted socket.  

ERROR HANDLING

Linux accept passes already-pending network errors on the new socket as an error code from accept. This behaviour differs from other BSD socket implementations. For reliable operation the application should detect the network errors defined for the protocol after accept and treat them like EAGAIN by retrying. In case of TCP/IP these are ENETDOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP, and ENETUNREACH.  

ERRORS

accept shall fail if:
EAGAIN or EWOULDBLOCK
The socket is marked non-blocking and no connections are present to be accepted.
EBADF
The descriptor is invalid.
ENOTSOCK
The descriptor references a file, not a socket.
EOPNOTSUPP
The referenced socket is not of type SOCK_STREAM.
EINTR
The system call was interrupted by a signal that was caught before a valid connection arrived.
ECONNABORTED
A connection has been aborted.
EINVAL
Socket is not listening for connections.
EMFILE
The per-process limit of open file descriptors has been reached.
ENFILE
The system maximum for file descriptors has been reached.

accept may fail if:

EFAULT
The addr parameter is not in a writable part of the user address space.
ENOBUFS, ENOMEM
Not enough free memory. This often means that the memory allocation is limited by the socket buffer limits, not by the system memory.
EPROTO
Protocol error.

Linux accept may fail if:

EPERM
Firewall rules forbid connection.

In addition, network errors for the new socket and as defined for the protocol may be returned. Various Linux kernels can return other errors such as ENOSR, ESOCKTNOSUPPORT, EPROTONOSUPPORT, ETIMEDOUT. The value ERESTARTSYS may be seen during a trace.  

CONFORMING TO

SVr4, 4.4BSD (the accept function first appeared in BSD 4.2). The BSD man page documents five possible error returns (EBADF, ENOTSOCK, EOPNOTSUPP, EWOULDBLOCK, EFAULT). SUSv3 documents errors EAGAIN, EBADF, ECONNABORTED, EINTR, EINVAL, EMFILE, ENFILE, ENOBUFS, ENOMEM, ENOTSOCK, EOPNOTSUPP, EPROTO, EWOULDBLOCK. In addition, SUSv2 documents EFAULT and ENOSR.

Linux accept does _not_ inherit socket flags like O_NONBLOCK. This behaviour differs from other BSD socket implementations. Portable programs should not rely on this behaviour and always set all required flags on the socket returned from accept.  

NOTE

The third argument of accept was originally declared as an `int *` (and is that under libc4 and libc5 and on many other systems like BSD 4.*, SunOS 4, SGI); a POSIX 1003.1g draft standard wanted to change it into a `size_t *`, and that is what it is for SunOS 5. Later POSIX drafts have `socklen_t *`, and so do the Single Unix Specification and glibc2. Quoting Linus Torvalds: _Any_ sane library _must_ have "socklen_t" be the same size as int. Anything else breaks any BSD socket layer stuff. POSIX initially _did_ make it a size_t, and I (and hopefully others, but obviously not too many) complained to them very loudly indeed. Making it a size_t is completely broken, exactly because size_t very seldom is the same size as "int" on 64-bit architectures, for example. And it _has_ to be the same size as "int" because that`s what the BSD socket interface is. Anyway, the POSIX people eventually got a clue, and created "socklen_t". They shouldn`t have touched it in the first place, but once they did they felt it had to have a named type for some unfathomable reason (probably somebody didn`t like losing face over having done the original stupid thing, so they silently just renamed their blunder).  

SEE ALSO

bind(2), connect(2), listen(2), select(2), socket(2)

acct

NAME

acct - switch process accounting on or off  

SYNOPSIS

#include <unistd.h> int acct(const char *filename);

 

DESCRIPTION

When called with the name of an existing file as argument, accounting is turned on, records for each terminating process are appended to filename as it terminates. An argument of NULL causes accounting to be turned off.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EACCES
Write permission is denied for the specified file.
EACCES
The argument filename is not a regular file.
EFAULT
filename points outside your accessible address space.
EIO
Error writing to the file filename.
EISDIR
filename is a directory.
ELOOP
Too many symbolic links were encountered in resolving filename.
ENAMETOOLONG
filename was too long.
ENOENT
The specified filename does not exist.
ENOMEM
Out of memory.
ENOSYS
BSD process accounting has not been enabled when the operating system kernel was compiled. The kernel configuration parameter controlling this feature is CONFIG_BSD_PROCESS_ACCT.
ENOTDIR
A component used as a directory in filename is not in fact a directory.
EPERM
The calling process has no permission to enable process accounting.
EROFS
filename refers to a file on a read-only file system.
EUSERS
There are no more free file structures or we ran out of memory.
 

CONFORMING TO

SVr4 (but not POSIX). SVr4 documents an EBUSY error condition, but no EISDIR or ENOSYS. Also AIX and HPUX document EBUSY (attempt is made to enable accounting when it is already enabled), as does Solaris (attempt is made to enable accounting using the same file that is currently being used).  

NOTES

No accounting is produced for programs running when a crash occurs. In particular, nonterminating processes are never accounted for.

afs_syscall

NAME

afs_syscall, break, ftime, getpmsg, gtty, lock, mpx, prof, profil, putpmsg, security, stty, ulimit - unimplemented system calls  

SYNOPSIS

Unimplemented system calls.  

DESCRIPTION

These system calls are not implemented in the Linux 2.4 kernel.  

RETURN VALUE

These system calls always return -1 and set errno to ENOSYS.  

NOTES

Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.

Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.

Some system calls, like ipc(2) and {create,init,delete}_module(2) only exist when the Linux kernel was built with support for them.

 

SEE ALSO

obsolete(2)

alloc_hugepages

NAME

alloc_hugepages, free_hugepages - allocate or free huge pages  

SYNOPSIS

void *alloc_hugepages(int key, void *addr, size_t len, int prot, int flag);

int free_hugepages(void *addr);  

DESCRIPTION

The system calls alloc_hugepages and free_hugepages were introduced in Linux 2.5.36 and removed again in 2.5.54. They existed only on i386 and ia64 (when built with CONFIG_HUGETLB_PAGE). In Linux 2.4.20 the syscall numbers exist, but the calls return ENOSYS.

On i386 the memory management hardware knows about ordinary pages (4 KiB) and huge pages (2 or 4 MiB). Similarly ia64 knows about huge pages of several sizes. These system calls serve to map huge pages into the process` memory or to free them again. Huge pages are locked into memory, and are not swapped.

The key parameter is an identifier. When zero the pages are private, and not inherited by children. When positive the pages are shared with other applications using the same key, and inherited by child processes.

The addr parameter of free_hugepages() tells which page is being freed - it was the return value of a call to alloc_hugepages(). (The memory is first actually freed when all users have released it.) The addr parameter of alloc_hugepages() is a hint, that the kernel may or may not follow. Addresses must be properly aligned.

The len parameter is the length of the required segment. It must be a multiple of the huge page size.

The prot parameter specifies the memory protection of the segment. It is one of PROT_READ, PROT_WRITE, PROT_EXEC.

The flag parameter is ignored, unless key is positive. In that case, if flag is IPC_CREAT, then a new huge page segment is created when none with the given key existed. If this flag is not set, then ENOENT is returned when no segment with the given key exists. .SHRETURN VALUE On success, alloc_hugepages returns the allocated virtual address, and free_hugepages returns zero. On error, -1 is returned, and errno is set appropriately.  

ERRORS

ENOSYS
The system call is not supported on this kernel.
 

CONFORMING TO

These calls existed only in Linux 2.5.36 - 2.5.54. These calls are specific to Linux on Intel processors, and should not be used in programs intended to be portable. Indeed, the system call numbers are marked for reuse, so programs using these may do something random on a future kernel.  

FILES

/proc/sys/vm/nr_hugepages Number of configured hugetlb pages. This can be read and written.

/proc/meminfo Gives info on the number of configured hugetlb pages and on their size in the three variables HugePages_Total, HugePages_Free, Hugepagesize.  

NOTES

The system calls are gone. Now the hugetlbfs filesystem can be used instead. Memory backed by huge pages (if the CPU supports them) is obtained by mmap`ing files in this virtual filesystem.

The maximal number of huge pages can be specified using the hugepages= boot parameter.

bdflush

NAME

bdflush - start, flush, or tune buffer-dirty-flush daemon  

SYNOPSIS

int bdflush(int func, long *address); int bdflush(int func, long data);

 

DESCRIPTION

bdflush starts, flushes, or tunes the buffer-dirty-flush daemon. Only the super-user may call bdflush.

If func is negative or 0, and no daemon has been started, then bdflush enters the daemon code and never returns.

If func is 1, some dirty buffers are written to disk.

If func is 2 or more and is even (low bit is 0), then address is the address of a long word, and the tuning parameter numbered (func-2)/2 is returned to the caller in that address.

If func is 3 or more and is odd (low bit is 1), then data is a long word, and the kernel sets tuning parameter numbered (func-3)/2 to that value.

The set of parameters, their values, and their legal ranges are defined in the kernel source file fs/buffer.c.  

RETURN VALUE

If func is negative or 0 and the daemon successfully starts, bdflush never returns. Otherwise, the return value is 0 on success and -1 on failure, with errno set to indicate the error.  

ERRORS

EPERM
Caller is not super-user.
EFAULT
address points outside your accessible address space.
EBUSY
An attempt was made to enter the daemon code after another process has already entered.
EINVAL
An attempt was made to read or write an invalid parameter number, or to write an invalid value to a parameter.
 

CONFORMING TO

bdflush is Linux specific and should not be used in programs intended to be portable.  

SEE ALSO

fsync(2), sync(2), update(8), sync(8)

break

NAME

afs_syscall, break, ftime, getpmsg, gtty, lock, mpx, prof, profil, putpmsg, security, stty, ulimit - unimplemented system calls  

SYNOPSIS

Unimplemented system calls.  

DESCRIPTION

These system calls are not implemented in the Linux 2.4 kernel.  

RETURN VALUE

These system calls always return -1 and set errno to ENOSYS.  

NOTES

Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.

Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.

Some system calls, like ipc(2) and {create,init,delete}_module(2) only exist when the Linux kernel was built with support for them.

 

SEE ALSO

obsolete(2)

cacheflush

NAME

cacheflush - flush contents of instruction and/or data cache  

SYNOPSIS

#include <asm/cachectl.h> int cacheflush(char *addr, int nbytes, int cache);

 

DESCRIPTION

cacheflush flushes contents of indicated cache(s) for user addresses in the range addr to (addr+nbytes-1). Cache may be one of:
ICACHE
Flush the instruction cache.
DCACHE
Write back to memory and invalidate the affected valid cache lines.
BCACHE
Same as (ICACHE|DCACHE).

 

RETURN VALUE

cacheflush returns 0 on success or -1 on error. If errors are detected, errno will indicate the error.  

ERRORS

EINVAL
cache parameter is not one of ICACHE, DCACHE, or BCACHE.
EFAULT
Some or all of the address range addr to (addr+nbytes-1) is not accessible.

 

BUGS

The current implementation ignores the addr and nbytes parameters. Therefore always the whole cache is flushed.  

NOTE

This system call is only available on MIPS based systems. It should not be used in programs intended to be portable.

chmod

NAME

chmod, fchmod - change permissions of a file  

SYNOPSIS

#include <sys/types.h>
#include <sys/stat.h>

int chmod(const char *path, mode_t mode);
int fchmod(int fildes, mode_t mode);  

DESCRIPTION

The mode of the file given by path or referenced by fildes is changed.

Modes are specified by or`ing the following:

S_ISUID
04000 set user ID on execution
S_ISGID
02000 set group ID on execution
S_ISVTX
01000 sticky bit
S_IRUSR (S_IREAD)
00400 read by owner
S_IWUSR (S_IWRITE)
00200 write by owner
S_IXUSR (S_IEXEC)
00100 execute/search by owner
S_IRGRP
00040 read by group
S_IWGRP
00020 write by group
S_IXGRP
00010 execute/search by group
S_IROTH
00004 read by others
S_IWOTH
00002 write by others
S_IXOTH
00001 execute/search by others

The effective UID of the process must be zero or must match the owner of the file.

If the effective UID of the process is not zero and the group of the file does not match the effective group ID of the process or one of its supplementary group IDs, the S_ISGID bit will be turned off, but this will not cause an error to be returned.

Depending on the file system, set user ID and set group ID execution bits may be turned off if a file is written. On some file systems, only the super-user can set the sticky bit, which may have a special meaning. For the sticky bit, and for set user ID and set group ID bits on directories, see stat(2).

On NFS file systems, restricting the permissions will immediately influence already open files, because the access control is done on the server, but open files are maintained by the client. Widening the permissions may be delayed for other clients if attribute caching is enabled on them.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

Depending on the file system, other errors can be returned. The more general errors for chmod are listed below:

EPERM
The effective UID does not match the owner of the file, and is not zero.
EROFS
The named file resides on a read-only file system.
EFAULT
path points outside your accessible address space.
ENAMETOOLONG
path is too long.
ENOENT
The file does not exist.
ENOMEM
Insufficient kernel memory was available.
ENOTDIR
A component of the path prefix is not a directory.
EACCES
Search permission is denied on a component of the path prefix.
ELOOP
Too many symbolic links were encountered in resolving path.
EIO
An I/O error occurred.

The general errors for fchmod are listed below:

EBADF
The file descriptor fildes is not valid.
EROFS
See above.
EPERM
See above.
EIO
See above.
 

CONFORMING TO

The chmod call conforms to SVr4, SVID, POSIX, X/OPEN, 4.4BSD. SVr4 documents EINTR, ENOLINK and EMULTIHOP returns, but no ENOMEM. POSIX.1 does not document EFAULT, ENOMEM, ELOOP or EIO error conditions, or the macros S_IREAD, S_IWRITE and S_IEXEC.

The fchmod call conforms to 4.4BSD and SVr4. SVr4 documents additional EINTR and ENOLINK error conditions. POSIX requires the fchmod function if at least one of _POSIX_MAPPED_FILES and _POSIX_SHARED_MEMORY_OBJECTS is defined, and documents additional ENOSYS and EINVAL error conditions, but does not document EIO.

POSIX and X/OPEN do not document the sticky bit.  

SEE ALSO

open(2), chown(2), execve(2), stat(2)

chroot

NAME

chroot - change root directory  

SYNOPSIS

#include <unistd.h>

int chroot(const char *path);  

DESCRIPTION

chroot changes the root directory to that specified in path. This directory will be used for path names beginning with /. The root directory is inherited by all children of the current process.

Only the super-user may change the root directory.

Note that this call does not change the current working directory, so that `.` can be outside the tree rooted at `/`. In particular, the super-user can escape from a `chroot jail` by doing `mkdir foo; chroot foo; cd ..`.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

Depending on the file system, other errors can be returned. The more general errors are listed below:

EPERM
The effective UID is not zero.
EFAULT
path points outside your accessible address space.
ENAMETOOLONG
path is too long.
ENOENT
The file does not exist.
ENOMEM
Insufficient kernel memory was available.
ENOTDIR
A component of path is not a directory.
EACCES
Search permission is denied on a component of the path prefix.
ELOOP
Too many symbolic links were encountered in resolving path.
EIO
An I/O error occurred.
 

CONFORMING TO

SVr4, SVID, 4.4BSD, X/OPEN. This function is not part of POSIX.1. SVr4 documents additional EINTR, ENOLINK and EMULTIHOP error conditions. X/OPEN does not document EIO, ENOMEM or EFAULT error conditions. This interface is marked as legacy by X/OPEN.  

NOTES

FreeBSD has a stronger jail() system call.  

SEE ALSO

chdir(2)

close

NAME

close - close a file descriptor  

SYNOPSIS

#include <unistd.h> int close(int fd);

 

DESCRIPTION

close closes a file descriptor, so that it no longer refers to any file and may be reused. Any locks held on the file it was associated with, and owned by the process, are removed (regardless of the file descriptor that was used to obtain the lock).

If fd is the last copy of a particular file descriptor the resources associated with it are freed; if the descriptor was the last reference to a file which has been removed using unlink(2) the file is deleted.  

RETURN VALUE

close returns zero on success, or -1 if an error occurred.  

ERRORS

EBADF
fd isn`t a valid open file descriptor.
EINTR
The close() call was interrupted by a signal.
EIO
An I/O error occurred.
 

CONFORMING TO

SVr4, SVID, POSIX, X/OPEN, BSD 4.3. SVr4 documents an additional ENOLINK error condition.  

NOTES

Not checking the return value of close is a common but nevertheless serious programming error. It is quite possible that errors on a previous write(2) operation are first reported at the final close. Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and disk quotas.

A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)  

SEE ALSO

open(2), fcntl(2), shutdown(2), unlink(2), fclose(3), fsync(2)

creat

NAME

open, creat - open and possibly create a file or device  

SYNOPSIS

#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int open(const char *pathname, int flags); int open(const char *pathname, int flags, mode_t mode); int creat(const char *pathname, mode_t mode);

 

DESCRIPTION

The open() system call is used to convert a pathname into a file descriptor (a small, non-negative integer for use in subsequent I/O as with read, write, etc.). When the call is successful, the file descriptor returned will be the lowest file descriptor not currently open for the process. This call creates a new open file, not shared with any other process. (But shared open files may arise via the fork(2) system call.) The new file descriptor is set to remain open across exec functions (see fcntl(2)). The file offset is set to the beginning of the file.

The parameter flags is one of O_RDONLY, O_WRONLY or O_RDWR which request opening the file read-only, write-only or read/write, respectively, bitwise-or`d with zero or more of the following:

O_CREAT
If the file does not exist it will be created. The owner (user ID) of the file is set to the effective user ID of the process. The group ownership (group ID) is set either to the effective group ID of the process or to the group ID of the parent directory (depending on filesystem type and mount options, and the mode of the parent directory, see, e.g., the mount options bsdgroups and sysvgroups of the ext2 filesystem, as described in mount(8)).
O_EXCL
When used with O_CREAT, if the file already exists it is an error and the open will fail. In this context, a symbolic link exists, regardless of where its points to. O_EXCL is broken on NFS file systems, programs which rely on it for performing locking tasks will contain a race condition. The solution for performing atomic file locking using a lockfile is to create a unique file on the same fs (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.
O_NOCTTY
If pathname refers to a terminal device --- see tty(4) --- it will not become the process`s controlling terminal even if the process does not have one.
O_TRUNC
If the file already exists and is a regular file and the open mode allows writing (i.e., is O_RDWR or O_WRONLY) it will be truncated to length 0. If the file is a FIFO or terminal device file, the O_TRUNC flag is ignored. Otherwise the effect of O_TRUNC is unspecified.
O_APPEND
The file is opened in append mode. Before each write, the file pointer is positioned at the end of the file, as if with lseek. O_APPEND may lead to corrupted files on NFS file systems if more than one process appends data to a file at once. This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can`t be done without a race condition.
O_NONBLOCK or O_NDELAY
When possible, the file is opened in non-blocking mode. Neither the open nor any subsequent operations on the file descriptor which is returned will cause the calling process to wait. For the handling of FIFOs (named pipes), see also fifo(4). This mode need not have any effect on files other than FIFOs.
O_SYNC
The file is opened for synchronous I/O. Any writes on the resulting file descriptor will block the calling process until the data has been physically written to the underlying hardware. See RESTRICTIONS below, though.
O_NOFOLLOW
If pathname is a symbolic link, then the open fails. This is a FreeBSD extension, which was added to Linux in version 2.1.126. Symbolic links in earlier components of the pathname will still be followed. The headers from glibc 2.0.100 and later include a definition of this flag; kernels before 2.1.126 will ignore it if used.
O_DIRECTORY
If pathname is not a directory, cause the open to fail. This flag is Linux-specific, and was added in kernel version 2.1.126, to avoid denial-of-service problems if opendir(3) is called on a FIFO or tape device, but should not be used outside of the implementation of opendir.
O_DIRECT
Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is synchronous, i.e., at the completion of the read(2) or write(2) system call, data is guaranteed to have been transferred. Under Linux 2.4 transfer sizes, and the alignment of user buffer and file offset must all be multiples of the logical block size of the file system. Under Linux 2.6 alignment to 512-byte boundaries suffices.
A semantically similar interface for block devices is described in raw(8).
O_ASYNC
Generate a signal (SIGIO by default, but this can be changed via fcntl(2)) when input or output becomes possible on this file descriptor. This feature is only available for terminals, pseudo-terminals, and sockets. See fcntl(2) for further details.
O_LARGEFILE
On 32-bit systems that support the Large Files System, allow files whose sizes cannot be represented in 31 bits to be opened.

Some of these optional flags can be altered using fcntl after the file has been opened.

The argument mode specifies the permissions to use in case a new file is created. It is modified by the process`s umask in the usual way: the permissions of the created file are (mode & ~umask). Note that this mode only applies to future accesses of the newly created file; the open call that creates a read-only file may well return a read/write file descriptor.

The following symbolic constants are provided for mode:

S_IRWXU
00700 user (file owner) has read, write and execute permission
S_IRUSR (S_IREAD)
00400 user has read permission
S_IWUSR (S_IWRITE)
00200 user has write permission
S_IXUSR (S_IEXEC)
00100 user has execute permission
S_IRWXG
00070 group has read, write and execute permission
S_IRGRP
00040 group has read permission
S_IWGRP
00020 group has write permission
S_IXGRP
00010 group has execute permission
S_IRWXO
00007 others have read, write and execute permission
S_IROTH
00004 others have read permission
S_IWOTH
00002 others have write permisson
S_IXOTH
00001 others have execute permission

mode must be specified when O_CREAT is in the flags, and is ignored otherwise.

creat is equivalent to open with flags equal to O_CREAT|O_WRONLY|O_TRUNC.  

RETURN VALUE

open and creat return the new file descriptor, or -1 if an error occurred (in which case, errno is set appropriately). Note that open can open device special files, but creat cannot create them - use mknod(2) instead.

On NFS file systems with UID mapping enabled, open may return a file descriptor but e.g. read(2) requests are denied with EACCES. This is because the client performs open by checking the permissions, but UID mapping is performed by the server upon read and write requests.

If the file is newly created, its atime, ctime, mtime fields are set to the current time, and so are the ctime and mtime fields of the parent directory. Otherwise, if the file is modified because of the O_TRUNC flag, its ctime and mtime fields are set to the current time.

 

ERRORS

EEXIST
pathname already exists and O_CREAT and O_EXCL were used.
EISDIR
pathname refers to a directory and the access requested involved writing (that is, O_WRONLY or O_RDWR is set).
EACCES
The requested access to the file is not allowed, or one of the directories in pathname did not allow search (execute) permission, or the file did not exist yet and write access to the parent directory is not allowed.
ENAMETOOLONG
pathname was too long.
ENOENT
O_CREAT is not set and the named file does not exist. Or, a directory component in pathname does not exist or is a dangling symbolic link.
ENOTDIR
A component used as a directory in pathname is not, in fact, a directory, or O_DIRECTORY was specified and pathname was not a directory.
ENXIO
O_NONBLOCK | O_WRONLY is set, the named file is a FIFO and no process has the file open for reading. Or, the file is a device special file and no corresponding device exists.
ENODEV
pathname refers to a device special file and no corresponding device exists. (This is a Linux kernel bug - in this situation ENXIO must be returned.)
EROFS
pathname refers to a file on a read-only filesystem and write access was requested.
ETXTBSY
pathname refers to an executable image which is currently being executed and write access was requested.
EFAULT
pathname points outside your accessible address space.
ELOOP
Too many symbolic links were encountered in resolving pathname, or O_NOFOLLOW was specified but pathname was a symbolic link.
ENOSPC
pathname was to be created but the device containing pathname has no room for the new file.
ENOMEM
Insufficient kernel memory was available.
EMFILE
The process already has the maximum number of files open.
ENFILE
The limit on the total number of files open on the system has been reached.
 

NOTE

Under Linux, the O_NONBLOCK flag indicates that one wants to open but does not necessarily have the intention to read or write. This is typically used to open devices in order to get a file descriptor for use with ioctl(2).  

CONFORMING TO

SVr4, SVID, POSIX, X/OPEN, BSD 4.3. The O_NOFOLLOW and O_DIRECTORY flags are Linux-specific. One may have to define the _GNU_SOURCE macro to get their definitions.

The (undefined) effect of O_RDONLY | O_TRUNC various among implementations. On many systems the file is actually truncated.

The O_DIRECT flag was introduced in SGI IRIX, where it has alignment restrictions similar to those of Linux 2.4. IRIX has also a fcntl(2) call to query appropriate alignments, and sizes. FreeBSD 4.x introduced a flag of same name, but without alignment restrictions. Support was added under Linux in kernel version 2.4.10. Older Linux kernels simply ignore this flag.  

BUGS

"The thing that has always disturbed me about O_DIRECT is that the whole interface is just stupid, and was probably designed by a deranged monkey on some serious mind-controlling substances." -- Linus  

RESTRICTIONS

There are many infelicities in the protocol underlying NFS, affecting amongst others O_SYNC and O_NDELAY.

POSIX provides for three different variants of synchronised I/O, corresponding to the flags O_SYNC, O_DSYNC and O_RSYNC. Currently (2.1.130) these are all synonymous under Linux.  

SEE ALSO

read(2), write(2), fcntl(2), close(2), link(2), mknod(2), mount(2), stat(2), umask(2), unlink(2), socket(2), fopen(3), fifo(4)

DC_CTX_new

NAME

DC_CTX_new, DC_CTX_free, DC_CTX_add_session, DC_CTX_remove_session, DC_CTX_get_session, DC_CTX_reget_session, DC_CTX_has_session - distcache blocking client API  

SYNOPSIS

#include <distcache/dc_client.h>

DC_CTX *DC_CTX_new(const char *target, unsigned int flags); void DC_CTX_free(DC_CTX *ctx); int DC_CTX_add_session(DC_CTX *ctx, const unsigned char *id_data, unsigned int id_len, const unsigned char *sess_data, unsigned int sess_len, unsigned long timeout_msecs); int DC_CTX_remove_session(DC_CTX *ctx, const unsigned char *id_data, unsigned int id_len); int DC_CTX_get_session(DC_CTX *ctx, const unsigned char *id_data, unsigned int id_len, unsigned char *result_storage, unsigned int result_size, unsigned int *result_used); int DC_CTX_reget_session(DC_CTX *ctx, const unsigned char *id_data, unsigned int id_len, unsigned char *result_storage, unsigned int result_size, unsigned int *result_used); int DC_CTX_has_session(DC_CTX *ctx, const unsigned char *id_data, unsigned int id_len);

 

DESCRIPTION

DC_CTX_new() allocates and initialises a <FONT SIZE="-1">DC_CTX</FONT> structure with an address for sending session caching operation requests to, and flags controlling the behaviour of the <FONT SIZE="-1">DC_CTX</FONT> object. The address specified by target should be compatible with the syntax defined by the libnal <FONT SIZE="-1">API</FONT>, see the ``<FONT SIZE="-1">NOTES</FONT>`` section below. The flags parameter can be zero to indicate that each cache operation should create and destroy a temporary connection, otherwise a bitmask combining one or more of the following flags;

#define DC_CTX_FLAG_PERSISTENT (unsigned int)0x0001 #define DC_CTX_FLAG_PERSISTENT_PIDCHECK (unsigned int)0x0002 #define DC_CTX_FLAG_PERSISTENT_RETRY (unsigned int)0x0004 #define DC_CTX_FLAG_PERSISTENT_LATE (unsigned int)0x0008

DC_CTX_free() frees the ctx object.

DC_CTX_add_session() attempts to add session data to the cache. id_data and id_len define the unique session <FONT SIZE="-1">ID</FONT> corresponding to the session data - this is the <FONT SIZE="-1">ID</FONT> used in DC_CTX_get_session() or DC_CTX_remove_session() to refer to the session being added, and the ``add`` operation will fail if there is already a session with a matching <FONT SIZE="-1">ID</FONT> in the cache. sess_data and sess_len define the session data itself to be stored in the cache. timeout_msecs specifies the expiry period for the session - if this period of time passes without the corresponding session being explicitly removed nor scrolled out of the cache because of over-filling, then the cache server will remove the session from the cache anyway.

DC_CTX_remove_session() provides a session <FONT SIZE="-1">ID</FONT> with id_data and id_len and requests that the corresponding session be removed from the cache.

DC_CTX_get_session() provides a session <FONT SIZE="-1">ID</FONT> with id_data and id_len and requests that the corresponding session data be retrieved from the cache. result_storage and result_size specify a storage area for the retrieved session data, and result_used points to a variable that will be set to the length of the retrieved session data. Even if DC_CTX_get_session() returns successfully, the caller should check the value of result_used - if it is larger than result_size then the requested session data was too big for the provided storage area and only partial data will have been returned. In this case, the caller should immediately call DC_CTX_reget_session().

DC_CTX_reget_session() is similar to DC_CTX_get_session() except that it does not perform any network operations at all. It is designed to return session data that had previously been retrieved by DC_CTX_get_session(), so that a larger storage area can be provided if the one first provided to DC_CTX_get_session() was too small. This function will fail if the last operation on ctx was not DC_CTX_get_session() with an exact match for id_data and id_len.

DC_CTX_has_session() is similar to DC_CTX_get_session() except that it does not ask for session data to be returned, merely to know whether the session is in the cache or not. This should be used by any application that already has a copy of the required session but merely wishes to verify that it hasn`t already been explicitly invalidated. As distcache allows parallel use of a single cache from multiple clients across potentially multiple machines, it is a security flaw for any client (thread, process, or machine) to implement local session caching and using its sessions whenever there is a cache-hit. If the session was used and for any reason required invalidation (eg. renegotiation, data corruption detected, etc) then another client should not use a locally cached copy of the session without first verifying with the shared cache that the session is still <FONT SIZE="-1">OK</FONT>. This function should be used in such cases as it provides the same check as DC_CTX_get_session() but with less network overhead.  

RETURN VALUES

DC_CTX_new() returns a valid <FONT SIZE="-1">DC_CTX</FONT> object on success, otherwise <FONT SIZE="-1">NULL</FONT> for failure.

DC_CTX_free() has no return type.

All other <FONT SIZE="-1">DC_CTX</FONT> functions return zero on failure, otherwise non-zero.  

NOTES

The following code snippet attempts to create a session cache context that uses a temporary connection for each operation to a local dc_client agent running on a unix domain socket at /tmp/dc_client;

DC_CTX *ctx = DC_CTX_new("UNIX:/tmp/dc_client", 0);

The following code snippet attempts to create a session cache context to communicate with a remote server listening on TCP/IPv4 port 9001. It will attempt to use a persistent connection for all cache operations (<FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT</FONT>), retry once for any cache operation that suffers a network I/O error (<FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT_RETRY</FONT>), will wait until the first cache operation before trying to connect (<FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT_LATE</FONT>), and will verify before any cache operation whether it is running in a different process than it used to be and if so will close then re-open a new connection (<FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT_PIDCHECK</FONT>).

DC_CTX *ctx = DC_CTX_new("IP:cacheserver.localnet", DC_CTX_FLAG_PERSISTENT | DC_CTX_FLAG_PERSISTENT_PIDCHECK | DC_CTX_FLAG_PERSISTENT_RETRY | DC_CTX_FLAG_PERSISTENT_LATE);

The <FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT_RETRY</FONT> flag exists because of the -idle command-line switch in the dc_client(1) tool. This switch allows dc_client to automatically close client connections that have been idle for some configurable length of time. However, this creates the possiblity for race conditions if a persistent <FONT SIZE="-1">DC_CTX</FONT> is used by an application to request a cache operation at the same time or following a decision by dc_client to close the connection. The most robust way to address this is to have <FONT SIZE="-1">DC_CTX</FONT> regard any first network error during the operation as an idle-timeout from the peer and to immediately re-connect and retry the operation. Any subsequent error (or initial error that can not be timeout-related, such as connection failure) is considered a failure and will not result in any retry.

The <FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT_PIDCHECK</FONT> flag exists for software like Apache or Stunnel that use fork(2) or clone(2) to create child processes that inherit file-descriptors from the parent process. In such circumstances, attempts by the parent and child processes to communicate over the same file-descriptor can have unpredictable results and is, generally speaking, never useful. This flag will force a check before each operation that the process <FONT SIZE="-1">ID</FONT> is ``what it used to be`` and if not, will close any persistent connection, reconnect with a new file-descriptor, and reset the process <FONT SIZE="-1">ID</FONT> in the <FONT SIZE="-1">DC_CTX</FONT>. If a parent process has a <FONT SIZE="-1">DC_CTX</FONT> that has a connection open, this flag will ensure that any subsequent child processes that attempt to perform cache operations will transparently reconnect with their own connections.  

SEE ALSO

DC_PLUG_new(2), DC_PLUG_read(2) - Lower-level asynchronous implementation of the distcache protocol, useful for client and server operation. This <FONT SIZE="-1">DC_CTX</FONT> implementation is built on top of the <FONT SIZE="-1">DC_PLUG</FONT> functionality.

distcache(8) - Overview of the distcache architecture.

http://www.distcache.org/ - Distcache home page.  

AUTHOR

This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at geoff@geoffthorpe.net.

Home Page: http://www.distcache.org

DC_PLUG_read

NAME

DC_PLUG_read, DC_PLUG_consume, DC_PLUG_write, DC_PLUG_write_more, DC_PLUG_commit, DC_PLUG_rollback - DC_PLUG read/write functions  

SYNOPSIS

#include <distcache/dc_plug.h>

int DC_PLUG_read(DC_PLUG *plug, int resume, unsigned long *request_uid, DC_CMD *cmd, const unsigned char **payload_data, unsigned int *payload_len); int DC_PLUG_consume(DC_PLUG *plug); int DC_PLUG_write(DC_PLUG *plug, int resume, unsigned long request_uid, DC_CMD cmd, const unsigned char *payload_data, unsigned int payload_len); int DC_PLUG_write_more(DC_PLUG *plug, const unsigned char *data, unsigned int data_len); int DC_PLUG_commit(DC_PLUG *plug); int DC_PLUG_rollback(DC_PLUG *plug);

typedef enum { DC_CMD_ERROR, DC_CMD_ADD, DC_CMD_GET, DC_CMD_REMOVE, DC_CMD_HAVE } DC_CMD;

 

DESCRIPTION

DC_PLUG_read() will attempt to open the next distcache message received by plug for reading. This message will block the reading of any other received messages remain until DC_PLUG_consume() is called. If a message has already been opened for reading inside plug, then DC_PLUG_read() will fail unless resume is set to non-zero in which case it will simply re-open the message that was already being read. If DC_PLUG_read() succeeds, request_uid, cmd, payload_data and payload_len are populated with the message`s data. Note that payload_data points to the original data stored inside plug and this pointer is only valid until the next call to DC_PLUG_consume().

DC_PLUG_consume() will close the message currently opened for reading in plug, and will allow a future call to DC_PLUG_read() to succeed if there any subsequent (complete) messages received from the plug object`s connection.

DC_PLUG_write() will attempt to open a distcache message for writing in plug. If successful, this message will block the writing of any other messages until the message is committed with DC_PLUG_commit() or discarded with DC_PLUG_rollback(). If a message has already been opened for writing, DC_PLUG_write() will fail unless resume is non-zero in which case the message will be re-opened and will overwrite the settings from the previous DC_PLUG_write() call. This is equivalent to DC_PLUG_rollback() followed immediately by DC_PLUG_write() with a zero resume value. Note that payload_len can be zero (and thus payload_data can be <FONT SIZE="-1">NULL</FONT>) even if the message will eventually have payload data - this can be supplemented afterwards using the DC_PLUG_write_more() function. request_uid and cmd, on the other hand, must be specified at once in DC_PLUG_write().

DC_PLUG_write_more() will attempt to add more payload data to the message currently opened for writing in plug. This data will be concatenated to the end of any payload data already provided in prior calls to DC_PLUG_write() or DC_PLUG_write_more().

DC_PLUG_commit() will close the message currently opened for writing, and queue it for serialisation out on the plug object`s connection.

DC_PLUG_rollback() will discard the message currently opened for writing.  

RETURN VALUES

All these <FONT SIZE="-1">DC_PLUG</FONT> functions return zero on failure, otherwise non-zero.  

SEE ALSO

DC_PLUG_new(2) - Basic <FONT SIZE="-1">DC_PLUG</FONT> functions.

distcache(8) - Overview of the distcache architecture.

http://www.distcache.org/ - Distcache home page.  

AUTHOR

This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at geoff@geoffthorpe.net.

Home Page: http://www.distcache.org

delete_module

NAME

delete_module - delete a loadable module entry  

SYNOPSIS

#include <linux/module.h> int delete_module(const char *name);

 

DESCRIPTION

delete_module attempts to remove an unused loadable module entry. If name is NULL, all unused modules marked auto-clean will be removed. This system call is only open to the superuser.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned and errno is set appropriately.  

ERRORS

EPERM
The user is not the superuser.
ENOENT
No module by that name exists.
EINVAL
name was the empty string.
EBUSY
The module is in use.
EFAULT
name is outside the program`s accessible address space.
 

SEE ALSO

create_module(2), init_module(2), query_module(2).

dup2

NAME

dup, dup2 - duplicate a file descriptor  

SYNOPSIS

#include <unistd.h> int dup(int oldfd); int dup2(int oldfd, int newfd);

 

DESCRIPTION

dup and dup2 create a copy of the file descriptor oldfd.

After successful return of dup or dup2, the old and new descriptors may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other.

The two descriptors do not share the close-on-exec flag, however.

dup uses the lowest-numbered unused descriptor for the new descriptor.

dup2 makes newfd be the copy of oldfd, closing newfd first if necessary.  

RETURN VALUE

dup and dup2 return the new descriptor, or -1 if an error occurred (in which case, errno is set appropriately).  

ERRORS

EBADF
oldfd isn`t an open file descriptor, or newfd is out of the allowed range for file descriptors.
EMFILE
The process already has the maximum number of file descriptors open and tried to open a new one.
EINTR
The dup2 call was interrupted by a signal.
EBUSY
(Linux only) This may be returned by dup2 during a race condition with open() and dup().
 

WARNING

The error returned by dup2 is different from that returned by fcntl(..., F_DUPFD, ...) when newfd is out of range. On some systems dup2 also sometimes returns EINVAL like F_DUPFD.  

BUGS

If newfd was open, any errors that would have been reported at close() time, are lost. A careful programmer will not use dup2 without closing newfd first.  

CONFORMING TO

SVr4, SVID, POSIX, X/OPEN, BSD 4.3. SVr4 documents additional EINTR and ENOLINK error conditions. POSIX.1 adds EINTR. The EBUSY return is Linux-specific.  

SEE ALSO

fcntl(2), open(2), close(2)

epoll_ctl

NAME

epoll_ctl - control interface for an epoll descriptor  

SYNOPSIS

#include <sys/epoll.h>

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)  

DESCRIPTION

Control an epoll descriptor, epfd, by requesting the operation op be performed on the target file descriptor, fd. The event describes the object linked to the file descriptor fd. The struct epoll_event is defined as :

typedef union epoll_data { void *ptr; int fd; __uint32_t u32; __uint64_t u64; } epoll_data_t; struct epoll_event { __uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ };

The events member is a bit set composed using the following available event types :

EPOLLIN
The associated file is available for read(2) operations.
EPOLLOUT
The associated file is available for write(2) operations.
EPOLLPRI
There is urgent data available for read(2) operations.
EPOLLERR
Error condition happened on the associated file descriptor.
EPOLLHUP
Hang up happened on the associated file descriptor.
EPOLLET
Sets the Edge Triggered behaviour for the associated file descriptor. The default behaviour for epoll is Level Triggered. See epoll(4) for more detailed informations about Edge and Level Triggered event distribution architectures.
EPOLLONESHOT
Sets the One-Shot behaviour for the associated file descriptor. It means that after an event is pulled out with epoll_wait(2) the associated file descriptor is internally disabled and no other events will be reported by the epoll interface. The user must call epoll_ctl(2) with EPOLL_CTL_MOD to re-enable the file descriptor with a new event mask.

The epoll interface supports all file descriptors that support poll(2). Valid values for the op parameter are :

EPOLL_CTL_ADD
Add the target file descriptor fd to the epoll descriptor epfd and associate the event event with the internal file linked to fd.
EPOLL_CTL_MOD
Change the event event associated to the target file descriptor fd.
EPOLL_CTL_DEL
Remove the target file descriptor fd from the epoll file descriptor, epfd.
 

RETURN VALUE

When successful, epoll_ctl(2) returns zero. When an error occurs, epoll_ctl(2) returns -1 and errno is set appropriately.  

ERRORS

EBADF
The epfd file descriptor is not a valid file descriptor.
EPERM
The target file fd is not supported by epoll.
EINVAL
The supplied file descriptor, epfd, is not an epoll file descriptor, or the requested operation op is not supported by this interface.
ENOMEM
There was insufficient memory to handle the requested op control operation.
 

CONFORMING TO

epoll_ctl(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.  

SEE ALSO

epoll_create(2), epoll_wait(2), epoll(4)

execve

NAME

execve - execute program  

SYNOPSIS

#include <unistd.h>

int execve(const char *filename, char *const argv [], char *const envp[]);  

DESCRIPTION

execve() executes the program pointed to by filename. filename must be either a binary executable, or a script starting with a line of the form "#! interpreter [arg]". In the latter case, the interpreter must be a valid pathname for an executable which is not itself a script, which will be invoked as interpreter [arg] filename.

argv is an array of argument strings passed to the new program. envp is an array of strings, conventionally of the form key=value, which are passed as environment to the new program. Both argv and envp must be terminated by a null pointer. The argument vector and environment can be accessed by the called program`s main function, when it is defined as int main(int argc, char *argv[], char *envp[]).

execve() does not return on success, and the text, data, bss, and stack of the calling process are overwritten by that of the program loaded. The program invoked inherits the calling process`s PID, and any open file descriptors that are not set to close on exec. Signals pending on the calling process are cleared. Any signals set to be caught by the calling process are reset to their default behaviour. The SIGCHLD signal (when set to SIG_IGN) may or may not be reset to SIG_DFL.

If the current program is being ptraced, a SIGTRAP is sent to it after a successful execve().

If the set-uid bit is set on the program file pointed to by filename the effective user ID of the calling process is changed to that of the owner of the program file. Similarly, when the set-gid bit of the program file is set the effective group ID of the calling process is set to the group of the program file.

If the executable is an a.out dynamically-linked binary executable containing shared-library stubs, the Linux dynamic linker ld.so(8) is called at the start of execution to bring needed shared libraries into core and link the executable with them.

If the executable is a dynamically-linked ELF executable, the interpreter named in the PT_INTERP segment is used to load the needed shared libraries. This interpreter is typically /lib/ld-linux.so.1 for binaries linked with the Linux libc version 5, or /lib/ld-linux.so.2 for binaries linked with the GNU libc version 2.  

RETURN VALUE

On success, execve() does not return, on error -1 is returned, and errno is set appropriately.  

ERRORS

EACCES
The file or a script interpreter is not a regular file.
EACCES
Execute permission is denied for the file or a script or ELF interpreter.
EACCES
The file system is mounted noexec.
EPERM
The file system is mounted nosuid, the user is not the superuser, and the file has an SUID or SGID bit set.
EPERM
The process is being traced, the user is not the superuser and the file has an SUID or SGID bit set.
E2BIG
The argument list is too big.
ENOEXEC
An executable is not in a recognised format, is for the wrong architecture, or has some other format error that means it cannot be executed.
EFAULT
filename points outside your accessible address space.
ENAMETOOLONG
filename is too long.
ENOENT
The file filename or a script or ELF interpreter does not exist, or a shared library needed for file or interpreter cannot be found.
ENOMEM
Insufficient kernel memory was available.
ENOTDIR
A component of the path prefix of filename or a script or ELF interpreter is not a directory.
EACCES
Search permission is denied on a component of the path prefix of filename or the name of a script interpreter.
ELOOP
Too many symbolic links were encountered in resolving filename or the name of a script or ELF interpreter.
ETXTBSY
Executable was open for writing by one or more processes.
EIO
An I/O error occurred.
ENFILE
The limit on the total number of files open on the system has been reached.
EMFILE
The process has the maximum number of files open.
EINVAL
An ELF executable had more than one PT_INTERP segment (i.e., tried to name more than one interpreter).
EISDIR
An ELF interpreter was a directory.
ELIBBAD
An ELF interpreter was not in a recognised format.
 

CONFORMING TO

SVr4, SVID, X/OPEN, BSD 4.3. POSIX does not document the #! behavior but is otherwise compatible. SVr4 documents additional error conditions EAGAIN, EINTR, ELIBACC, ENOLINK, EMULTIHOP; POSIX does not document ETXTBSY, EPERM, EFAULT, ELOOP, EIO, ENFILE, EMFILE, EINVAL, EISDIR or ELIBBAD error conditions.  

NOTES

SUID and SGID processes can not be ptrace()d.

Linux ignores the SUID and SGID bits on scripts.

The result of mounting a filesystem nosuid vary between Linux kernel versions: some will refuse execution of SUID/SGID executables when this would give the user powers she did not have already (and return EPERM), some will just ignore the SUID/SGID bits and exec successfully.

A maximum line length of 127 characters is allowed for the first line in a #! executable shell script.

 

HISTORICAL

With Unix V6 the argument list of an exec call was ended by 0, while the argument list of main was ended by -1. Thus, this argument list was not directly usable in a further exec call. Since Unix V7 both are NULL.

 

SEE ALSO

chmod(2), fork(2), execl(3), environ(5), ld.so(8)

fchdir

NAME

chdir, fchdir - change working directory  

SYNOPSIS

#include <unistd.h>

int chdir(const char *path);
int fchdir(int fd);  

DESCRIPTION

chdir changes the current directory to that specified in path.

fchdir is identical to chdir, only that the directory is given as an open file descriptor.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

Depending on the file system, other errors can be returned. The more general errors for chdir are listed below:
EFAULT
path points outside your accessible address space.
ENAMETOOLONG
path is too long.
ENOENT
The file does not exist.
ENOMEM
Insufficient kernel memory was available.
ENOTDIR
A component of path is not a directory.
EACCES
Search permission is denied on a component of path.
ELOOP
Too many symbolic links were encountered in resolving path.
EIO
An I/O error occurred.

The general errors for fchdir are listed below:

EBADF
fd is not a valid file descriptor.
EACCES
Search permission was denied on the directory open on fd.
 

NOTES

The prototype for fchdir is only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).  

CONFORMING TO

The chdir call is compatible with SVr4, SVID, POSIX, X/OPEN, 4.4BSD. SVr4 documents additional EINTR, ENOLINK, and EMULTIHOP error conditions but has no ENOMEM. POSIX.1 does not have ENOMEM or ELOOP error conditions. X/OPEN does not have EFAULT, ENOMEM or EIO error conditions.

The fchdir call is compatible with SVr4, 4.4BSD and X/OPEN. SVr4 documents additional EIO, EINTR, and ENOLINK error conditions. X/OPEN documents additional EINTR and EIO error conditions.  

SEE ALSO

getcwd(3), chroot(2)

fchown

NAME

chown, fchown, lchown - change ownership of a file  

SYNOPSIS

#include <sys/types.h>
#include <unistd.h>

int chown(const char *path, uid_t owner, gid_t group);
int fchown(int fd, uid_t owner, gid_t group);
int lchown(const char *path, uid_t owner, gid_t group);  

DESCRIPTION

The owner of the file specified by path or by fd is changed. Only the super-user may change the owner of a file. The owner of a file may change the group of the file to any group of which that owner is a member. The super-user may change the group arbitrarily.

If the owner or group is specified as -1, then that ID is not changed.

When the owner or group of an executable file are changed by a non-super-user, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown; the Linux behaviour depends on the kernel version. In case of a non-group-executable file (with clear S_IXGRP bit) the S_ISGID bit indicates mandatory locking, and is not cleared by a chown.

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

Depending on the file system, other errors can be returned. The more general errors for chown are listed below:

EPERM
The effective UID does not match the owner of the file, and is not zero; or the owner or group were specified incorrectly.
EROFS
The named file resides on a read-only file system.
EFAULT
path points outside your accessible address space.
ENAMETOOLONG
path is too long.
ENOENT
The file does not exist.
ENOMEM
Insufficient kernel memory was available.
ENOTDIR
A component of the path prefix is not a directory.
EACCES
Search permission is denied on a component of the path prefix.
ELOOP
Too many symbolic links were encountered in resolving path.

The general errors for fchown are listed below:

EBADF
The descriptor is not valid.
ENOENT
See above.
EPERM
See above.
EROFS
See above.
EIO
A low-level I/O error occurred while modifying the inode.
 

NOTES

In versions of Linux prior to 2.1.81 (and distinct from 2.1.46), chown did not follow symbolic links. Since Linux 2.1.81, chown does follow symbolic links, and there is a new system call lchown that does not follow symbolic links. Since Linux 2.1.86, this new call (that has the same semantics as the old chown) has got the same syscall number, and chown got the newly introduced number.

The prototype for fchown is only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).  

CONFORMING TO

The chown call conforms to SVr4, SVID, POSIX, X/OPEN. The 4.4BSD version can only be used by the superuser (that is, ordinary users cannot give away files). SVr4 documents EINVAL, EINTR, ENOLINK and EMULTIHOP returns, but no ENOMEM. POSIX.1 does not document ENOMEM or ELOOP error conditions.

The fchown call conforms to 4.4BSD and SVr4. SVr4 documents additional EINVAL, EIO, EINTR, and ENOLINK error conditions.  

RESTRICTIONS

The chown() semantics are deliberately violated on NFS file systems which have UID mapping enabled. Additionally, the semantics of all system calls which access the file contents are violated, because chown() may cause immediate access revocation on already open files. Client side caching may lead to a delay between the time where ownership have been changed to allow access for a user and the time where the file can actually be accessed by the user on other clients.  

SEE ALSO

chmod(2), flock(2)

fdatasync

NAME

fdatasync - synchronize a file`s in-core data with that on disk  

SYNOPSIS

#include <unistd.h>

int fdatasync(int fd);  

DESCRIPTION

fdatasync flushes all data buffers of a file to disk (before the system call returns). It resembles fsync but is not required to update the metadata such as access time.

Applications that access databases or log files often write a tiny data fragment (e.g., one line in a log file) and then call fsync immediately in order to ensure that the written data is physically stored on the harddisk. Unfortunately, fsync will always initiate two write operations: one for the newly written data and another one in order to update the modification time stored in the inode. If the modification time is not a part of the transaction concept fdatasync can be used to avoid unnecessary inode disk write operations.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
fd is not a valid file descriptor open for writing.
EROFS, EINVAL
fd is bound to a special file which does not support synchronization.
EIO
An error occurred during synchronization.
 

BUGS

Currently (Linux 2.2) fdatasync is equivalent to fsync.  

AVAILABILITY

On POSIX systems on which fdatasync is available, _POSIX_SYNCHRONIZED_IO is defined in <unistd.h> to a value greater than 0. (See also sysconf(3).)  

CONFORMING TO

POSIX1b (formerly POSIX.4)  

SEE ALSO

fsync(2), B.O. Gallmeister, POSIX.4, O`Reilly, pp. 220-223 and 343.

flistxattr

NAME

listxattr, llistxattr, flistxattr - list extended attribute names  

SYNOPSIS

#include <sys/types.h> #include <attr/xattr.h> ssize_t listxattr (const char *path, char *list, size_t size); ssize_t llistxattr (const char *path, char *list, size_t size); ssize_t flistxattr (int filedes, char *list, size_t size);

 

DESCRIPTION

Extended attributes are name:value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the stat(2) data). A complete overview of extended attributes concepts can be found in attr(5).

listxattr retrieves the list of extended attribute names associated with the given path in the filesystem. The list is the set of (NULL-terminated) names, one after the other. Names of extended attributes to which the calling process does not have access may be omitted from the list. The length of the attribute name list is returned.

llistxattr is identical to listxattr, except in the case of a symbolic link, where the list of names of extended attributes associated with the link itself is retrieved, not the file that it refers to.

flistxattr is identical to listxattr, only the open file pointed to by filedes (as returned by open(2)) is interrogated in place of path.

A single extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode.

An empty buffer of size zero can be passed into these calls to return the current size of the list of extended attribute names, which can be used to estimate the size of a buffer which is sufficiently large to hold the list of names.  

EXAMPLES

The list of names is returned as an unordered array of NULL-terminated character strings (attribute names are separated by NULL characters), like this:

user.name1system.name1user.name2

Filesystems like ext2, ext3 and XFS which implement POSIX ACLs using extended attributes, might return a list like this:

system.posix_acl_accesssystem.posix_acl_default

 

RETURN VALUE

On success, a positive number is returned indicating the size of the extended attribute name list. On failure, -1 is returned and errno is set appropriately.

If the size of the list buffer is too small to hold the result, errno is set to ERANGE.

If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.

The errors documented for the stat(2) system call are also applicable here.  

AUTHORS

Andreas Gruenbacher, <a.gruenbacher@computer.org> and the SGI XFS development team, <linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.  

SEE ALSO

getfattr(1), setfattr(1), open(2), stat(2), getxattr(2), setxattr(2), removexattr(2), and attr(5).

fork

NAME

fork - create a child process  

SYNOPSIS

#include <sys/types.h>
#include <unistd.h>

pid_t fork(void);  

DESCRIPTION

fork creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited.

Under Linux, fork is implemented using copy-on-write pages, so the only penalty incurred by fork is the time and memory required to duplicate the parent`s page tables, and to create a unique task structure for the child.  

RETURN VALUE

On success, the PID of the child process is returned in the parent`s thread of execution, and a 0 is returned in the child`s thread of execution. On failure, a -1 will be returned in the parent`s context, no child process will be created, and errno will be set appropriately.  

ERRORS

EAGAIN
fork cannot allocate sufficient memory to copy the parent`s page tables and allocate a task structure for the child.
ENOMEM
fork failed to allocate the necessary kernel structures because memory is tight.
 

CONFORMING TO

The fork call conforms to SVr4, SVID, POSIX, X/OPEN, BSD 4.3.  

SEE ALSO

clone(2), execve(2), vfork(2), wait(2)

fremovexattr

NAME

removexattr, lremovexattr, fremovexattr - remove an extended attribute  

SYNOPSIS

#include <sys/types.h> #include <attr/xattr.h> int removexattr (const char *path, const char *name); int lremovexattr (const char *path, const char *name); int fremovexattr (int filedes, const char *name);

 

DESCRIPTION

Extended attributes are name:value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the stat(2) data). A complete overview of extended attributes concepts can be found in attr(5).

removexattr removes the extended attribute identified by name and associated with the given path in the filesystem.

lremovexattr is identical to removexattr, except in the case of a symbolic link, where the extended attribute is removed from the link itself, not the file that it refers to.

fremovexattr is identical to removexattr, only the extended attribute is removed from the open file pointed to by filedes (as returned by open(2)) in place of path.

An extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode.  

RETURN VALUE

On success, zero is returned. On failure, -1 is returned and errno is set appropriately.

If the named attribute does not exist, errno is set to ENOATTR.

If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.

The errors documented for the stat(2) system call are also applicable here.  

AUTHORS

Andreas Gruenbacher, <a.gruenbacher@computer.org> and the SGI XFS development team, <linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.  

SEE ALSO

getfattr(1), setfattr(1), open(2), stat(2), setxattr(2), getxattr(2), listxattr(2), and attr(5).

fstat

NAME

stat, fstat, lstat - get file status  

SYNOPSIS

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

int stat(const char *file_name, struct stat *buf);
int fstat(int filedes, struct stat *buf);
int lstat(const char *file_name, struct stat *buf);  

DESCRIPTION

These functions return information about the specified file. You do not need any access rights to the file to get this information but you need search rights to all directories named in the path leading to the file.

stat stats the file pointed to by file_name and fills in buf.

lstat is identical to stat, except in the case of a symbolic link, where the link itself is stat-ed, not the file that it refers to.

fstat is identical to stat, only the open file pointed to by filedes (as returned by open(2)) is stat-ed in place of file_name.

They all return a stat structure, which contains the following fields:

struct stat { dev_t st_dev; /* device */ ino_t st_ino; /* inode */ mode_t st_mode; /* protection */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device type (if inode device) */ off_t st_size; /* total size, in bytes */ blksize_t st_blksize; /* blocksize for filesystem I/O */ blkcnt_t st_blocks; /* number of blocks allocated */ time_t st_atime; /* time of last access */ time_t st_mtime; /* time of last modification */ time_t st_ctime; /* time of last status change */ };

The value st_size gives the size of the file (if it is a regular file or a symlink) in bytes. The size of a symlink is the length of the pathname it contains, without trailing NUL.

The value st_blocks gives the size of the file in 512-byte blocks. (This may be smaller than st_size/512 e.g. when the file has holes.) The value st_blksize gives the "preferred" blocksize for efficient file system I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.)

Not all of the Linux filesystems implement all of the time fields. Some file system types allow mounting in such a way that file accesses do not cause an update of the st_atime field. (See `noatime` in mount(8).)

The field st_atime is changed by file accesses, e.g. by execve(2), mknod(2), pipe(2), utime(2) and read(2) (of more than zero bytes). Other routines, like mmap(2), may or may not update st_atime.

The field st_mtime is changed by file modifications, e.g. by mknod(2), truncate(2), utime(2) and write(2) (of more than zero bytes). Moreover, st_mtime of a directory is changed by the creation or deletion of files in that directory. The st_mtime field is not changed for changes in owner, group, hard link count, or mode.

The field st_ctime is changed by writing or by setting inode information (i.e., owner, group, link count, mode, etc.).

The following POSIX macros are defined to check the file type:

S_ISREG(m)
is it a regular file?
S_ISDIR(m)
directory?
S_ISCHR(m)
character device?
S_ISBLK(m)
block device?
S_ISFIFO(m)
fifo?
S_ISLNK(m)
symbolic link? (Not in POSIX.1-1996.)
S_ISSOCK(m)
socket? (Not in POSIX.1-1996.)

The following flags are defined for the st_mode field:

S_IFMT0170000bitmask for the file type bitfields
S_IFSOCK0140000socket
S_IFLNK0120000symbolic link
S_IFREG0100000regular file
S_IFBLK0060000block device
S_IFDIR0040000directory
S_IFCHR0020000character device
S_IFIFO0010000fifo
S_ISUID0004000set UID bit
S_ISGID0002000set GID bit (see below)
S_ISVTX0001000sticky bit (see below)
S_IRWXU00700mask for file owner permissions
S_IRUSR00400owner has read permission
S_IWUSR00200owner has write permission
S_IXUSR00100owner has execute permission
S_IRWXG00070mask for group permissions
S_IRGRP00040group has read permission
S_IWGRP00020group has write permission
S_IXGRP00010group has execute permission
S_IRWXO00007mask for permissions for others (not in group)
S_IROTH00004others have read permission
S_IWOTH00002others have write permisson
S_IXOTH00001others have execute permission
The set GID bit (S_ISGID) has several special uses: For a directory it indicates that BSD semantics is to be used for that directory: files created there inherit their group ID from the directory, not from the effective gid of the creating process, and directories created there will also get the S_ISGID bit set. For a file that does not have the group execution bit (S_IXGRP) set, it indicates mandatory file/record locking. The `sticky` bit (S_ISVTX) on a directory means that a file in that directory can be renamed or deleted only by the owner of the file, by the owner of the directory, and by root.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
filedes is bad.
ENOENT
A component of the path file_name does not exist, or the path is an empty string.
ENOTDIR
A component of the path is not a directory.
ELOOP
Too many symbolic links encountered while traversing the path.
EFAULT
Bad address.
EACCES
Permission denied.
ENOMEM
Out of memory (i.e. kernel memory).
ENAMETOOLONG
File name too long.
 

CONFORMING TO

The stat and fstat calls conform to SVr4, SVID, POSIX, X/OPEN, BSD 4.3. The lstat call conforms to 4.3BSD and SVr4. SVr4 documents additional fstat error conditions EINTR, ENOLINK, and EOVERFLOW. SVr4 documents additional stat and lstat error conditions EACCES, EINTR, EMULTIHOP, ENOLINK, and EOVERFLOW. Use of the st_blocks and st_blksize fields may be less portable. (They were introduced in BSD. Are not specified by POSIX. The interpretation differs between systems, and possibly on a single system when NFS mounts are involved.)

POSIX does not describe the S_IFMT, S_IFSOCK, S_IFLNK, S_IFREG, S_IFBLK, S_IFDIR, S_IFCHR, S_IFIFO, S_ISVTX bits, but instead demands the use of the macros S_ISDIR(), etc. The S_ISLNK and S_ISSOCK macros are not in POSIX.1-1996, but both will be in the next POSIX standard; the former is from SVID 4v2, the latter from SUSv2.

Unix V7 (and later systems) had S_IREAD, S_IWRITE, S_IEXEC, where POSIX prescribes the synonyms S_IRUSR, S_IWUSR, S_IXUSR.  

OTHER SYSTEMS

Values that have been (or are) in use on various systems:
hexnamelsoctaldescription
f000S_IFMT170000mask for file type
0000000000SCO out-of-service inode, BSD unknown type
SVID-v2 and XPG2 have both 0 and 0100000 for ordinary file
1000S_IFIFOp|010000fifo (named pipe)
2000S_IFCHRc020000character special (V7)
3000S_IFMPC030000multiplexed character special (V7)
4000S_IFDIRd/040000directory (V7)
5000S_IFNAM050000XENIX named special file
with two subtypes, distinguished by st_rdev values 1, 2:
0001S_INSEMs000001XENIX semaphore subtype of IFNAM
0002S_INSHDm000002XENIX shared data subtype of IFNAM
6000S_IFBLKb060000block special (V7)
7000S_IFMPB070000multiplexed block special (V7)
8000S_IFREG-100000regular (V7)
9000S_IFCMP110000VxFS compressed
9000S_IFNWKn110000network special (HP-UX)
a000S_IFLNKl@120000symbolic link (BSD)
b000S_IFSHAD130000Solaris shadow inode for ACL (not seen by userspace)
c000S_IFSOCKs=140000socket (BSD; also "S_IFSOC" on VxFS)
d000S_IFDOORD>150000Solaris door
e000S_IFWHTw%160000BSD whiteout (not used for inode)

0200S_ISVTX001000`sticky bit`: save swapped text even after use (V7)
reserved (SVID-v2)
On non-directories: don`t cache this file (SunOS)
On directories: restricted deletion flag (SVID-v4.2)
0400S_ISGID002000set group ID on execution (V7)
for directories: use BSD semantics for propagation of gid
0400S_ENFMT002000SysV file locking enforcement (shared w/ S_ISGID)
0800S_ISUID004000set user ID on execution (V7)
0800S_CDF004000directory is a context dependent file (HP-UX)

A sticky command appeared in Version 32V AT&T UNIX.

 

SEE ALSO

chmod(2), chown(2), readlink(2), utime(2)

fstatvfs

NAME

statvfs, fstatvfs - get file system statistics  

SYNOPSIS

#include <sys/statvfs.h>

int statvfs(const char *path, struct statvfs *buf);
int fstatvfs(int fd, struct statvfs *buf);  

DESCRIPTION

The function statvfs returns information about a mounted file system. path is the path name of any file within the mounted filesystem. buf is a pointer to a statvfs structure defined approximately as follows:

struct statvfs { unsigned long f_bsize; /* file system block size */ unsigned long f_frsize; /* fragment size */ fsblkcnt_t f_blocks; /* size of fs in f_frsize units */ fsblkcnt_t f_bfree; /* # free blocks */ fsblkcnt_t f_bavail; /* # free blocks for non-root */ fsfilcnt_t f_files; /* # inodes */ fsfilcnt_t f_ffree; /* # free inodes */ fsfilcnt_t f_favail; /* # free inodes for non-root */ unsigned long f_fsid; /* file system id */ unsigned long f_flag; /* mount flags */ unsigned long f_namemax; /* maximum filename length */ };

Here the types fsblkcnt_t and fsfilcnt_t are defined in <sys/types.h>. Both used to be unsigned long.

The field f_flag is a bit mask (of mount flags, see mount(8)). Bits defined by POSIX are

ST_RDONLY
Read-only file system.
ST_NOSUID
Setuid/setgid bits are ignored by exec(2).

It is unspecified whether all members of the returned struct have meaningful values on all filesystems.

fstatvfs returns the same information about an open file referenced by descriptor fd.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
(fstatvfs) fd is not a valid open file descriptor.
EACCES
(statvfs) Search permission is denied for a component of the path prefix of path.
ELOOP
(statvfs) Too many symbolic links were encountered in translating path.
ENAMETOOLONG
(statvfs) path is too long.
ENOENT
(statvfs) The file referred to by path does not exist.
ENOTDIR
(statvfs) A component of the path prefix of path is not a directory.
EFAULT
Buf or path points to an invalid address.
EINTR
This call was interrupted by a signal.
EIO
An I/O error occurred while reading from the file system.
ENOMEM
Insufficient kernel memory was available.
ENOSYS
The file system does not support this call.
EOVERFLOW
Some values were too large to be represented in the returned struct.

 

CONFORMING TO

Solaris, Irix, POSIX 1003.1-2001  

NOTES

The Linux kernel has system calls statfs, fstatfs to support this library call.

The current glibc implementation of

pathconf(path, _PC_REC_XFER_ALIGN); pathconf(path, _PC_ALLOC_SIZE_MIN); pathconf(path, _PC_REC_MIN_XFER_SIZE);

uses the f_frsize, f_frsize, and f_bsize fields of the return value of statvfs(path,buf).  

SEE ALSO

statfs(2)

ftruncate

NAME

truncate, ftruncate - truncate a file to a specified length  

SYNOPSIS

#include <unistd.h>
#include <sys/types.h>

int truncate(const char *path, off_t length);
int ftruncate(int fd, off_t length);  

DESCRIPTION

The truncate and ftruncate functions cause the regular file named by path or referenced by fd to be truncated to a size of precisely length bytes.

If the file previously was larger than this size, the extra data is lost. If the file previously was shorter, it is extended, and the extended part reads as zero bytes.

The file pointer is not changed.

If the size changed, then the ctime and mtime fields for the file are updated, and suid and sgid mode bits may be cleared.

With ftruncate, the file must be open for writing; with truncate, the file must be writable.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

For truncate:
EACCES
Search permission is denied for a component of the path prefix, or the named file is not writable by the user.
EFAULT
Path points outside the process`s allocated address space.
EFBIG
The argument length is larger than the maximum file size. (XSI)
EINTR
A signal was caught during execution.
EINVAL
The argument length is negative or larger than the maximum file size.
EIO
An I/O error occurred updating the inode.
EISDIR
The named file is a directory.
ELOOP
Too many symbolic links were encountered in translating the pathname.
ENAMETOOLONG
A component of a pathname exceeded 255 characters, or an entire path name exceeded 1023 characters.
ENOENT
The named file does not exist.
ENOTDIR
A component of the path prefix is not a directory.
EROFS
The named file resides on a read-only file system.
ETXTBSY
The file is a pure procedure (shared text) file that is being executed.

For ftruncate the same errors apply, but instead of things that can be wrong with path, we now have things that can be wrong with fd:

EBADF
The fd is not a valid descriptor.
EBADF or EINVAL
The fd is not open for writing.
EINVAL
The fd does not reference a regular file.
 

CONFORMING TO

4.4BSD, SVr4 (these function calls first appeared in BSD 4.2). POSIX 1003.1-1996 has ftruncate. POSIX 1003.1-2001 also has truncate, as an XSI extension.

SVr4 documents additional truncate error conditions EMFILE, EMULTIHP, ENFILE, ENOLINK. SVr4 documents for ftruncate an additional EAGAIN error condition.  

NOTES

The above description is for XSI-compliant systems. For non-XSI-compliant systems, the POSIX standard allows two behaviours for ftruncate when length exceeds the file length (note that truncate is not specified at all in such an environment): either returning an error, or extending the file. (Most Unices follow the XSI requirement.)  

SEE ALSO

open(2)

getcontext

NAME

getcontext, setcontext - get or set the user context  

SYNOPSIS

#include <ucontext.h>

int getcontext(ucontext_t *ucp);
int setcontext(const ucontext_t *ucp);

where:

ucp
points to a structure defined in <ucontext.h> containing the signal mask, execution stack, and machine registers.
 

DESCRIPTION

getcontext(2) gets the current context of the calling process, storing it in the ucontext struct pointed to by ucp.

setcontext(2) sets the context of the calling process to the state stored in the ucontext struct pointed to by ucp. The struct must either have been created by getcontext(2) or have been passed as the third parameter of the sigaction(2) signal handler.

The ucontext struct created by getcontext(2) is defined in <ucontext.h> as follows:

typedef struct ucontext { unsigned long int uc_flags; struct ucontext *uc_link; stack_t uc_stack; mcontext_t uc_mcontext; __sigset_t uc_sigmask; struct _fpstate __fpregs_mem; } ucontext_t;

 

RETURN VALUES

getcontext(2) returns 0 on success and -1 on failure. setcontext(2) does not return a value on success and returns -1 on failure.  

STANDARDS

These functions comform to: XPG4-UNIX.  

NOTES

When a signal handler executes, the current user context is saved and a new context is created by the kernel. If the calling process leaves the signal handler using longjmp(2), the original context cannot be restored, and the result of future calls to getcontext(2) are unpredictable. To avoid this problem, use siglongjmp(2) or setcontext(2) in signal handlers instead of longjmp(2).  

SEE ALSO

sigaction(2), sigaltstack(2), sigprocmask(2), sigsetjmp(3), setjmp(3).

getdomainname

NAME

getdomainname, setdomainname - get/set domain name  

SYNOPSIS

#include <unistd.h>

int getdomainname(char *name, size_t len);
int setdomainname(const char *name, size_t len);  

DESCRIPTION

These functions are used to access or to change the domain name of the current processor. If the NUL-terminated domain name requires more than len bytes, getdomainname returns the first len bytes (glibc) or returns an error (libc).  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EINVAL
For getdomainname under libc: name is NULL or name is longer than len bytes.
EINVAL
For setdomainname: len was negative or too large.
EPERM
For setdomainname: the caller was not the superuser.
EFAULT
For setdomainname: name pointed outside of user address space.
 

CONFORMING TO

POSIX does not specify these calls.  

SEE ALSO

gethostname(2), sethostname(2), uname(2)

getegid

NAME

getgid, getegid - get group identity  

SYNOPSIS

#include <unistd.h>
#include <sys/types.h>

gid_t getgid(void);
gid_t getegid(void);  

DESCRIPTION

getgid returns the real group ID of the current process.

getegid returns the effective group ID of the current process.

The real ID corresponds to the ID of the calling process. The effective ID corresponds to the set ID bit on the file being executed.  

ERRORS

These functions are always successful.  

CONFORMING TO

POSIX, BSD 4.3  

SEE ALSO

setregid(2), setgid(2)

getgid

NAME

getgid, getegid - get group identity  

SYNOPSIS

#include <unistd.h>
#include <sys/types.h>

gid_t getgid(void);
gid_t getegid(void);  

DESCRIPTION

getgid returns the real group ID of the current process.

getegid returns the effective group ID of the current process.

The real ID corresponds to the ID of the calling process. The effective ID corresponds to the set ID bit on the file being executed.  

ERRORS

These functions are always successful.  

CONFORMING TO

POSIX, BSD 4.3  

SEE ALSO

setregid(2), setgid(2)

gethostid

NAME

gethostid, sethostid - get or set the unique identifier of the current host  

SYNOPSIS

#include <unistd.h>

long gethostid(void);
int sethostid(long hostid);  

DESCRIPTION

Get or set a unique 32-bit identifier for the current machine. The 32-bit identifier is intended to be unique among all UNIX systems in existence. This normally resembles the Internet address for the local machine, as returned by gethostbyname(3), and thus usually never needs to be set.

The sethostid call is restricted to the superuser.

The hostid argument is stored in the file /etc/hostid.  

RETURN VALUE

gethostid returns the 32-bit identifier for the current host as set by sethostid(2).  

CONFORMING TO

4.2BSD. These functions were dropped in 4.4BSD. POSIX.1 does not define these functions, but ISO/IEC 9945-1:1990 mentions them in B.4.4.1. SVr4 includes gethostid but not sethostid.  

FILES

/etc/hostid  

SEE ALSO

hostid(1), gethostbyname(3)

getitimer

NAME

getitimer, setitimer - get or set value of an interval timer  

SYNOPSIS

#include <sys/time.h>

int getitimer(int which, struct itimerval *value);
int setitimer(int which, const struct itimerval *value, struct itimerval *ovalue);
 

DESCRIPTION

The system provides each process with three interval timers, each decrementing in a distinct time domain. When any timer expires, a signal is sent to the process, and the timer (potentially) restarts.
ITIMER_REAL
decrements in real time, and delivers SIGALRM upon expiration.
ITIMER_VIRTUAL
decrements only when the process is executing, and delivers SIGVTALRM upon expiration.
ITIMER_PROF
decrements both when the process executes and when the system is executing on behalf of the process. Coupled with ITIMER_VIRTUAL, this timer is usually used to profile the time spent by the application in user and kernel space. SIGPROF is delivered upon expiration.

Timer values are defined by the following structures:

struct itimerval { struct timeval it_interval; /* next value */ struct timeval it_value; /* current value */ }; struct timeval { long tv_sec; /* seconds */ long tv_usec; /* microseconds */ };

The function getitimer fills the structure indicated by value with the current setting for the timer indicated by which (one of ITIMER_REAL, ITIMER_VIRTUAL, or ITIMER_PROF). The element it_value is set to the amount of time remaining on the timer, or zero if the timer is disabled. Similarly, it_interval is set to the reset value. The function setitimer sets the indicated timer to the value in value. If ovalue is nonzero, the old value of the timer is stored there.

Timers decrement from it_value to zero, generate a signal, and reset to it_interval. A timer which is set to zero (it_value is zero or the timer expires and it_interval is zero) stops.

Both tv_sec and tv_usec are significant in determining the duration of a timer.

Timers will never expire before the requested time, instead expiring some short, constant time afterwards, dependent on the system timer resolution (currently 10ms). Upon expiration, a signal will be generated and the timer reset. If the timer expires while the process is active (always true for ITIMER_VIRT) the signal will be delivered immediately when generated. Otherwise the delivery will be offset by a small time dependent on the system loading.

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EFAULT
value or ovalue are not valid pointers.
EINVAL
which is not one of ITIMER_REAL, ITIMER_VIRT, or ITIMER_PROF.
 

CONFORMING TO

SVr4, 4.4BSD (This call first appeared in 4.2BSD).  

SEE ALSO

gettimeofday(2), sigaction(2), signal(2)  

BUGS

Under Linux, the generation and delivery of a signal are distinct, and there each signal is permitted only one outstanding event. It`s therefore conceivable that under pathologically heavy loading, ITIMER_REAL will expire before the signal from a previous expiration has been delivered. The second signal in such an event will be lost.

getpeername

NAME

getpeername - get name of connected peer socket  

SYNOPSIS

#include <sys/socket.h>

int getpeername(int s, struct sockaddr *name, socklen_t *namelen);  

DESCRIPTION

Getpeername returns the name of the peer connected to socket s. The namelen parameter should be initialized to indicate the amount of space pointed to by name. On return it contains the actual size of the name returned (in bytes). The name is truncated if the buffer provided is too small.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
The argument s is not a valid descriptor.
ENOTSOCK
The argument s is a file, not a socket.
ENOTCONN
The socket is not connected.
ENOBUFS
Insufficient resources were available in the system to perform the operation.
EFAULT
The name parameter points to memory not in a valid part of the process address space.
 

CONFORMING TO

SVr4, 4.4BSD (the getpeername function call first appeared in 4.2BSD).  

NOTE

The third argument of getpeername is in reality an `int *` (and this is what BSD 4.* and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t. The draft standard has not been adopted yet, but glibc2 already follows it and also has socklen_t. See also accept(2).  

SEE ALSO

accept(2), bind(2), getsockname(2)

getpgrp

NAME

setpgid, getpgid, setpgrp, getpgrp - set/get process group  

SYNOPSIS

#include <unistd.h>

int setpgid(pid_t pid, pid_t pgid);
pid_t getpgid(pid_t pid);
int setpgrp(void);
pid_t getpgrp(void);  

DESCRIPTION

setpgid sets the process group ID of the process specified by pid to pgid. If pid is zero, the process ID of the current process is used. If pgid is zero, the process ID of the process specified by pid is used. If setpgid is used to move a process from one process group to another (as is done by some shells when creating pipelines), both process groups must be part of the same session. In this case, the pgid specifies an existing process group to be joined and the session ID of that group must match the session ID of the joining process.

getpgid returns the process group ID of the process specified by pid. If pid is zero, the process ID of the current process is used.

The call setpgrp() is equivalent to setpgid(0,0).

Similarly, getpgrp() is equivalent to getpgid(0). Each process group is a member of a session and each process is a member of the session of which its process group is a member.

Process groups are used for distribution of signals, and by terminals to arbitrate requests for their input: Processes that have the same process group as the terminal are foreground and may read, while others will block with a signal if they attempt to read. These calls are thus used by programs such as csh(1) to create process groups in implementing job control. The TIOCGPGRP and TIOCSPGRP calls described in termios(3) are used to get/set the process group of the control terminal.

If a session has a controlling terminal, CLOCAL is not set and a hangup occurs, then the session leader is sent a SIGHUP. If the session leader exits, the SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal.

If the exit of the process causes a process group to become orphaned, and if any member of the newly-orphaned process group is stopped, then a SIGHUP signal followed by a SIGCONT signal will be sent to each process in the newly-orphaned process group.

 

RETURN VALUE

On success, setpgid and setpgrp return zero. On error, -1 is returned, and errno is set appropriately.

getpgid returns a process group on success. On error, -1 is returned, and errno is set appropriately.

getpgrp always returns the current process group.  

ERRORS

EINVAL
pgid is less than 0 (setpgid, setpgrp).
EACCES
An attempt was made to change the process group ID of one of the children of the calling process and the child had already performed an execve (setpgid, setpgrp).
EPERM
An attempt was made to move a process into a process group in a different session, or to change the process group ID of one of the children of the calling process and the child was in a different session, or to change the process group ID of a session leader (setpgid, setpgrp).
ESRCH
For getpgid: pid does not match any process. For setpgid: pid is not the current process and not a child of the current process.
 

CONFORMING TO

The functions setpgid and getpgrp conform to POSIX.1. The function setpgrp is from BSD 4.2. The function getpgid conforms to SVr4.  

NOTES

POSIX took setpgid from the BSD function setpgrp. Also SysV has a function with the same name, but it is identical to setsid(2).

To get the prototypes under glibc, define both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer n larger than or equal to 500.  

SEE ALSO

getuid(2), setsid(2), tcgetpgrp(3), tcsetpgrp(3), termios(3)

getpmsg

NAME

afs_syscall, break, ftime, getpmsg, gtty, lock, mpx, prof, profil, putpmsg, security, stty, ulimit - unimplemented system calls  

SYNOPSIS

Unimplemented system calls.  

DESCRIPTION

These system calls are not implemented in the Linux 2.4 kernel.  

RETURN VALUE

These system calls always return -1 and set errno to ENOSYS.  

NOTES

Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.

Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.

Some system calls, like ipc(2) and {create,init,delete}_module(2) only exist when the Linux kernel was built with support for them.

 

SEE ALSO

obsolete(2)

getpriority

NAME

getpriority, setpriority - get/set program scheduling priority  

SYNOPSIS

#include <sys/time.h>
#include <sys/resource.h>

int getpriority(int which, int who);
int setpriority(int which, int who, int prio);  

DESCRIPTION

The scheduling priority of the process, process group, or user, as indicated by which and who is obtained with the getpriority call and set with the setpriority call. Which is one of PRIO_PROCESS, PRIO_PGRP, or PRIO_USER, and who is interpreted relative to which (a process identifier for PRIO_PROCESS, process group identifier for PRIO_PGRP, and a user ID for PRIO_USER). A zero value for who denotes (respectively) the calling process, the process group of the calling process, or the real user ID of the calling process. Prio is a value in the range -20 to 20 (but see the Notes below). The default priority is 0; lower priorities cause more favorable scheduling.

The getpriority call returns the highest priority (lowest numerical value) enjoyed by any of the specified processes. The setpriority call sets the priorities of all of the specified processes to the specified value. Only the super-user may lower priorities.  

RETURN VALUE

Since getpriority can legitimately return the value -1, it is necessary to clear the external variable errno prior to the call, then check it afterwards to determine if a -1 is an error or a legitimate value. The setpriority call returns 0 if there is no error, or -1 if there is.  

ERRORS

ESRCH
No process was located using the which and who values specified.
EINVAL
Which was not one of PRIO_PROCESS, PRIO_PGRP, or PRIO_USER.

In addition to the errors indicated above, setpriority may fail if:

EPERM
A process was located, but neither the effective nor the real user ID of the caller matches its effective user ID.
EACCES
A non super-user attempted to lower a process priority.
 

NOTES

The details on the condition for EPERM depend on the system. The above description is what SUSv3 says, and seems to be followed on all SYSV-like systems. Linux requires the real or effective user ID of the caller to match the real user of the process who (instead of its effective user ID). All BSD-like systems (SunOS 4.1.3, Ultrix 4.2, BSD 4.3, FreeBSD 4.3, OpenBSD-2.5, ...) require the effective user ID of the caller to match the real or effective user ID of the process who.

The actual priority range varies between kernel versions. Linux before 1.3.36 had -infinity..15. Linux since 1.3.43 has -20..19, and the system call getpriority returns 40..1 for these values (since negative numbers are error codes). The library call converts N into 20-N.

Including <sys/time.h> is not required these days, but increases portability. (Indeed, <sys/resource.h> defines the rusage structure with fields of type struct timeval defined in <sys/time.h>.)  

CONFORMING TO

SVr4, 4.4BSD (these function calls first appeared in 4.2BSD).  

SEE ALSO

nice(1), fork(2), renice(8)

getresuid

NAME

getresuid, getresgid - get real, effective and saved user or group ID  

SYNOPSIS

#define _GNU_SOURCE
#include <unistd.h>

int getresuid(uid_t *ruid, uid_t *euid, uid_t *suid);
int getresgid(gid_t *rgid, gid_t *egid, gid_t *sgid);  

DESCRIPTION

getresuid and getresgid (both introduced in Linux 2.1.44) get the real, effective and saved user ID`s (resp. group ID`s) of the current process.

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EFAULT
One of the arguments specified an address outside the calling program`s address space.
 

CONFORMING TO

This call is Linux-specific. The prototype is given by glibc since version 2.3.2 provided _GNU_SOURCE is defined.  

SEE ALSO

getuid(2), setuid(2), setreuid(2), setresuid(2)

getrusage

NAME

getrlimit, getrusage, setrlimit - get/set resource limits and usage  

SYNOPSIS

#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>

int getrlimit(int resource, struct rlimit *rlim);
int getrusage(int who, struct rusage *usage);
int setrlimit(int resource, const struct rlimit *rlim);  

DESCRIPTION

getrlimit and setrlimit get and set resource limits respectively. Each resource has an associated soft and hard limit, as defined by the rlimit structure (the rlim argument to both getrlimit() and setrlimit()):

struct rlimit { rlim_t rlim_cur; /* Soft limit */ rlim_t rlim_max; /* Hard limit (ceiling for rlim_cur) */ };

The soft limit is the value that the kernel enforces for the corresponding resource. The hard limit acts as a ceiling for the soft limit: an unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit. A privileged process may make arbitrary changes to either limit value.

The value RLIM_INFINITY denotes no limit on a resource (both in the structure returned by getrlimit() and in the structure passed to setrlimit()).

resource must be one of:

RLIMIT_AS
The maximum size of the process`s virtual memory (address space) in bytes. This limit affects calls to brk(2), mmap(2) and mremap(2), which fail with the error ENOMEM upon exceeding this limit. Also automatic stack expansion will fail (and generate a SIGSEGV that kills the process when no alternate stack has been made available). Since the value is a long, on machines with a 32-bit long either this limit is at most 2 GiB, or this resource is unlimited.
RLIMIT_CORE
Maximum size of core file. When 0 no core dump files are created. When nonzero, larger dumps are truncated to this size.
RLIMIT_CPU
CPU time limit in seconds. When the process reaches the soft limit, it is sent a SIGXCPU signal. The default action for this signal is to terminate the process. However, the signal can be caught, and the handler can return control to the main program. If the process continues to consume CPU time, it will be sent SIGXCPU once per second until the hard limit is reached, at which time it is sent SIGKILL. (This latter point describes Linux 2.2 and 2.4 behaviour. Implementations vary in how they treat processes which continue to consume CPU time after reaching the soft limit. Portable applications that need to catch this signal should perform an orderly termination upon first receipt of SIGXCPU.)
RLIMIT_DATA
The maximum size of the process`s data segment (initialized data, uninitialized data, and heap). This limit affects calls to brk() and sbrk(), which fail with the error ENOMEM upon encountering the soft limit of this resource.
RLIMIT_FSIZE
The maximum size of files that the process may create. Attempts to extend a file beyond this limit result in delivery of a SIGXFSZ signal. By default, this signal terminates a process, but a process can catch this signal instead, in which case the relevant system call (e.g., write(), truncate()) fails with the error EFBIG.
RLIMIT_LOCKS
A limit on the combined number of flock() locks and fcntl() leases that this process may establish. (Early Linux 2.4 only.)
RLIMIT_MEMLOCK
The maximum number of bytes of virtual memory that may be locked into RAM using mlock() and mlockall().
RLIMIT_NOFILE
Specifies a value one greater than the maximum file descriptor number that can be opened by this process. Attempts (open(), pipe(), dup(), etc.) to exceed this limit yield the error EMFILE.
RLIMIT_NPROC
The maximum number of processes that can be created for the real user ID of the calling process. Upon encountering this limit, fork() fails with the error EAGAIN.
RLIMIT_RSS
Specifies the limit (in pages) of the process`s resident set (the number of virtual pages resident in RAM). This limit only has effect in Linux 2.4 onwatrds, and there only affects calls to madvise() specifying MADVISE_WILLNEED.
RLIMIT_STACK
The maximum size of the process stack, in bytes. Upon reaching this limit, a SIGSEGV signal is generated. To handle this signal, a process must employ an alternate signal stack (sigaltstack(2)).

RLIMIT_OFILE is the BSD name for RLIMIT_NOFILE.

getrusage returns the current resource usages, for a who of either RUSAGE_SELF or RUSAGE_CHILDREN. The former asks for resources used by the current process, the latter for resources used by those of its children that have terminated and have been waited for.

struct rusage { struct timeval ru_utime; /* user time used */ struct timeval ru_stime; /* system time used */ long ru_maxrss; /* maximum resident set size */ long ru_ixrss; /* integral shared memory size */ long ru_idrss; /* integral unshared data size */ long ru_isrss; /* integral unshared stack size */ long ru_minflt; /* page reclaims */ long ru_majflt; /* page faults */ long ru_nswap; /* swaps */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_msgsnd; /* messages sent */ long ru_msgrcv; /* messages received */ long ru_nsignals; /* signals received */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ };

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EFAULT
rlim or usage points outside the accessible address space.
EINVAL
getrlimit or setrlimit is called with a bad resource, or getrusage is called with a bad who.
EPERM
A non-superuser tries to use setrlimit() to increase the soft or hard limit above the current hard limit, or a superuser tries to increase RLIMIT_NOFILE above the current kernel maximum.
 

CONFORMING TO

SVr4, BSD 4.3  

NOTE

Including <sys/time.h> is not required these days, but increases portability. (Indeed, struct timeval is defined in <sys/time.h>.)

On Linux, if the disposition of SIGCHLD is set to SIG_IGN then the resource usages of child processes are automatically included in the value returned by RUSAGE_CHILDREN, although POSIX 1003.1-2001 explicitly prohibits this.

The above struct was taken from BSD 4.3 Reno. Not all fields are meaningful under Linux. Right now (Linux 2.4, 2.6) only the fields ru_utime, ru_stime, ru_minflt, ru_majflt, and ru_nswap are maintained.  

SEE ALSO

dup(2), fcntl(2), fork(2), mlock(2), mlockall(2), mmap(2), open(2), quotactl(2), sbrk(2), wait3(2), wait4(2), malloc(3), ulimit(3), signal(7)

getsockname

NAME

getsockname - get socket name  

SYNOPSIS

#include <sys/socket.h> int getsockname(int s, struct sockaddr *name, socklen_t *namelen);

 

DESCRIPTION

Getsockname returns the current name for the specified socket. The namelen parameter should be initialized to indicate the amount of space pointed to by name. On return it contains the actual size of the name returned (in bytes).  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
The argument s is not a valid descriptor.
ENOTSOCK
The argument s is a file, not a socket.
ENOBUFS
Insufficient resources were available in the system to perform the operation.
EFAULT
The name parameter points to memory not in a valid part of the process address space.
 

CONFORMING TO

SVr4, 4.4BSD (the getsockname function call appeared in 4.2BSD). SVr4 documents additional ENOMEM and ENOSR error codes.  

NOTE

The third argument of getsockname is in reality an `int *` (and this is what BSD 4.* and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t. The draft standard has not been adopted yet, but glibc2 already follows it and also has socklen_t. See also accept(2).  

SEE ALSO

bind(2), socket(2)

gettid

NAME

gettid - get thread identification  

SYNOPSIS

#include <sys/types.h>
#include <linux/unistd.h>

_syscall0(pid_t,gettid)

pid_t gettid(void);  

DESCRIPTION

gettid returns the thread ID of the current process. This is equal to the process ID (as returned by getpid(2)), unless the process is part of a thread group (created by specifying the CLONE_THREAD flag to the clone(2) system call). All processes in the same thread group have the same PID, but each one has a unique TID.  

RETURN VALUE

On success, returns the thread ID of the current process.  

ERRORS

This call is always successful.  

CONFORMING TO

gettid is Linux specific and should not be used in programs that are intended to be portable.  

SEE ALSO

getpid(2), clone(2), fork(2)

getuid

NAME

getuid, geteuid - get user identity  

SYNOPSIS

#include <unistd.h>
#include <sys/types.h>

uid_t getuid(void);
uid_t geteuid(void);  

DESCRIPTION

getuid returns the real user ID of the current process.

geteuid returns the effective user ID of the current process.

The real ID corresponds to the ID of the calling process. The effective ID corresponds to the set ID bit on the file being executed.  

ERRORS

These functions are always successful.  

CONFORMING TO

POSIX, BSD 4.3.  

SEE ALSO

setreuid(2), setuid(2)

get_kernel_syms

NAME

get_kernel_syms - retrieve exported kernel and module symbols  

SYNOPSIS

#include <linux/module.h> int get_kernel_syms(struct kernel_sym *table);

 

DESCRIPTION

If table is NULL, get_kernel_syms returns the number of symbols available for query. Otherwise it fills in a table of structures:

struct kernel_sym { unsigned long value; char name[60]; };

The symbols are interspersed with magic symbols of the form #module-name with the kernel having an empty name. The value associated with a symbol of this form is the address at which the module is loaded.

The symbols exported from each module follow their magic module tag and the modules are returned in the reverse order they were loaded.  

RETURN VALUE

Returns the number of symbols returned. There is no possible error return.  

SEE ALSO

create_module(2), init_module(2), delete_module(2), query_module(2).  

BUGS

There is no way to indicate the size of the buffer allocated for table. If symbols have been added to the kernel since the program queried for the symbol table size, memory will be corrupted.

The length of exported symbol names is limited to 59.

Because of these limitations, this system call is deprecated in favor of query_module.

gtty

NAME

afs_syscall, break, ftime, getpmsg, gtty, lock, mpx, prof, profil, putpmsg, security, stty, ulimit - unimplemented system calls  

SYNOPSIS

Unimplemented system calls.  

DESCRIPTION

These system calls are not implemented in the Linux 2.4 kernel.  

RETURN VALUE

These system calls always return -1 and set errno to ENOSYS.  

NOTES

Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.

Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.

Some system calls, like ipc(2) and {create,init,delete}_module(2) only exist when the Linux kernel was built with support for them.

 

SEE ALSO

obsolete(2)

inb

NAME

outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O

 

DESCRIPTION

This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.

They are primarily designed for internal kernel use, but can be used from user space.

You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.

You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.

 

CONFORMING TO

outb and friends are hardware specific. The port and value arguments are in the opposite order from most DOS implementations.  

SEE ALSO

ioperm(2), iopl(2)

init_module

NAME

init_module - initialize a loadable module entry  

SYNOPSIS

#include <linux/module.h> int init_module(const char *name, struct module *image);

 

DESCRIPTION

init_module loads the relocated module image into kernel space and runs the module`s init function.

The module image begins with a module structure and is followed by code and data as appropriate. The module structure is defined as follows:

struct module { unsigned long size_of_struct; struct module *next; const char *name; unsigned long size; long usecount; unsigned long flags; unsigned int nsyms; unsigned int ndeps; struct module_symbol *syms; struct module_ref *deps; struct module_ref *refs; int (*init)(void); void (*cleanup)(void); const struct exception_table_entry *ex_table_start; const struct exception_table_entry *ex_table_end; #ifdef __alpha__ unsigned long gp; #endif };

All of the pointer fields, with the exception of next and refs, are expected to point within the module body and be initialized as appropriate for kernel space, i.e. relocated with the rest of the module.

This system call is only open to the superuser.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned and errno is set appropriately.  

ERRORS

EPERM
The user is not the superuser.
ENOENT
No module by that name exists.
EINVAL
Some image slot filled in incorrectly, image->name does not correspond to the original module name, some image->deps entry does not correspond to a loaded module, or some other similar inconsistency.
EBUSY
The module`s initialization routine failed.
EFAULT
name or image is outside the program`s accessible address space.
 

SEE ALSO

create_module(2), delete_module(2), query_module(2).

inl_p

NAME

outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O

 

DESCRIPTION

This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.

They are primarily designed for internal kernel use, but can be used from user space.

You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.

You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.

 

CONFORMING TO

outb and friends are hardware specific. The port and value arguments are in the opposite order from most DOS implementations.  

SEE ALSO

ioperm(2), iopl(2)

insl

NAME

outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O

 

DESCRIPTION

This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.

They are primarily designed for internal kernel use, but can be used from user space.

You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.

You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.

 

CONFORMING TO

outb and friends are hardware specific. The port and value arguments are in the opposite order from most DOS implementations.  

SEE ALSO

ioperm(2), iopl(2)

intro

NAME

intro, _syscall - Introduction to system calls  

DESCRIPTION

This chapter describes the Linux system calls. For a list of the 164 syscalls present in Linux 2.0, see syscalls(2).  

Calling Directly

In most cases, it is unnecessary to invoke a system call directly, but there are times when the Standard C library does not implement a nice function call for you.  

Synopsis

#include <linux/unistd.h>

A _syscall macro

desired system call

 

Setup

The important thing to know about a system call is its prototype. You need to know how many arguments, their types, and the function return type. There are six macros that make the actual call into the system easier. They have the form:

_syscallX(type,name,type1,arg1,type2,arg2,...)
where X is 0-5, which are the number of arguments taken by the system call
type is the return type of the system call
name is the name of the system call
typeN is the Nth argument`s type
argN is the name of the Nth argument

These macros create a function called name with the arguments you specify. Once you include the _syscall() in your source file, you call the system call by name.  

EXAMPLE

#include <stdio.h> #include <errno.h> #include <linux/unistd.h> /* for _syscallX macros/related stuff */ #include <linux/kernel.h> /* for struct sysinfo */ _syscall1(int, sysinfo, struct sysinfo *, info); /* Note: if you copy directly from the nroff source, remember to REMOVE the extra backslashes in the printf statement. */ int main(void) { struct sysinfo s_info; int error; error = sysinfo(&s_info); printf("code error = %d ", error); printf("Uptime = %lds Load: 1 min %lu / 5 min %lu / 15 min %lu " "RAM: total %lu / free %lu / shared %lu " "Memory in buffers = %lu Swap: total %lu / free %lu " "Number of processes = %d ", s_info.uptime, s_info.loads[0], s_info.loads[1], s_info.loads[2], s_info.totalram, s_info.freeram, s_info.sharedram, s_info.bufferram, s_info.totalswap, s_info.freeswap, s_info.procs); return(0); }

 

Sample Output

code error = 0 uptime = 502034s Load: 1 min 13376 / 5 min 5504 / 15 min 1152 RAM: total 15343616 / free 827392 / shared 8237056 Memory in buffers = 5066752 Swap: total 27881472 / free 24698880 Number of processes = 40

 

NOTES

The _syscall() macros DO NOT produce a prototype. You may have to create one, especially for C++ users.

System calls are not required to return only positive or negative error codes. You need to read the source to be sure how it will return errors. Usually, it is the negative of a standard error code, e.g., -EPERM. The _syscall() macros will return the result r of the system call when r is nonnegative, but will return -1 and set the variable errno to -r when r is negative. For the error codes, see errno(3).

Some system calls, such as mmap, require more than five arguments. These are handled by pushing the arguments on the stack and passing a pointer to the block of arguments.

When defining a system call, the argument types MUST be passed by-value or by-pointer (for aggregates like structs).

The preferred way to invoke system calls that glibc does not know about yet, is via syscall(2).  

CONFORMING TO

Certain codes are used to indicate Unix variants and standards to which calls in the section conform. These are:
SVr4
System V Release 4 Unix, as described in the "Programmer`s Reference Manual: Operating System API (Intel processors)" (Prentice-Hall 1992, ISBN 0-13-951294-2)
SVID
System V Interface Definition, as described in "The System V Interface Definition, Fourth Edition".
POSIX.1
IEEE 1003.1-1990 part 1, aka ISO/IEC 9945-1:1990s, aka "IEEE Portable Operating System Interface for Computing Environments", as elucidated in Donald Lewine`s "POSIX Programmer`s Guide" (O`Reilly & Associates, Inc., 1991, ISBN 0-937175-73-0.
POSIX.1b
IEEE Std 1003.1b-1993 (POSIX.1b standard) describing real-time facilities for portable operating systems, aka ISO/IEC 9945-1:1996, as elucidated in "Programming for the real world - POSIX.4" by Bill O. Gallmeister (O`Reilly & Associates, Inc. ISBN 1-56592-074-0).
SUS, SUSv2
Single Unix Specification. (Developed by X/Open and The Open Group. See also http://www.UNIX-systems.org/version2/ .)
4.3BSD/4.4BSD
The 4.3 and 4.4 distributions of Berkeley Unix. 4.4BSD was upward-compatible from 4.3.
V7
Version 7, the ancestral Unix from Bell Labs.
 

FILES

/usr/include/linux/unistd.h  

SEE ALSO

syscall(2), errno(3)

inw_p

NAME

outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O

 

DESCRIPTION

This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.

They are primarily designed for internal kernel use, but can be used from user space.

You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.

You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.

 

CONFORMING TO

outb and friends are hardware specific. The port and value arguments are in the opposite order from most DOS implementations.  

SEE ALSO

ioperm(2), iopl(2)

ioperm

NAME

ioperm - set port input/output permissions  

SYNOPSIS

#include <unistd.h> /* for libc5 */
#include <sys/io.h> /* for glibc */

int ioperm(unsigned long from, unsigned long num, int turn_on);  

DESCRIPTION

Ioperm sets the port access permission bits for the process for num bytes starting from port address from to the value turn_on. The use of ioperm requires root privileges.

Only the first 0x3ff I/O ports can be specified in this manner. For more ports, the iopl function must be used. Permissions are not inherited on fork, but on exec they are. This is useful for giving port access permissions to non-privileged tasks.

This call is mostly for the i386 architecture. On many other architectures it does not exist or will always return an error.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EINVAL
Invalid values for from or num.
EPERM
Caller does not have the CAP_SYS_RAWIO privileges.
EIO
(on ppc) This call is not supported.
 

CONFORMING TO

ioperm is Linux specific and should not be used in programs intended to be portable.  

NOTES

Libc5 treats it as a system call and has a prototype in <unistd.h>. Glibc1 does not have a prototype. Glibc2 has a prototype both in <sys/io.h> and in <sys/perm.h>. Avoid the latter, it is available on i386 only.  

SEE ALSO

iopl(2)

io_cancel

NAME

io_cancel - Cancel an outstanding asynchronous I/O operation  

SYNOPSIS

#include <linux/aio.h>

long io_cancel (aio_context_t ctx_id, struct iocb *iocb, struct io_event *result);

 

DESCRIPTION

io_cancel attempts to cancel an asynchronous I/O operation previously submitted with the io_submit system call. ctx_id is the AIO context ID of the operation to be cancelled. If the AIO context is found, the event will be cancelled and then copied into the memory pointed to by result without being placed into the completion queue.

 

RETURN VALUE

io_cancel returns 0 on success; otherwise, it returns one of the errors listed in the "Errors" section.

 

ERRORS

EINVAL
The AIO context specified by ctx_id is invalid.

EFAULT
One of the data structures points to invalid data.

EAGAIN
The iocb specified was not cancelled.

ENOSYS
io_cancel is not implemented on this architecture.

 

VERSIONS

The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.

 

CONFORMING TO

io_cancel is Linux specific and should not be used in programs that are intended to be portable.

 

SEE ALSO

io_setup(2), io_destroy(2), io_getevents(2), io_submit(2).

 

NOTES

The asynchronous I/O system calls were written by Benjamin LaHaise.

 

AUTHOR

Kent Yoder.

io_getevents

NAME

io_getevents - Read asynchronous I/O events from the completion queue  

SYNOPSIS

#include <linux/time.h>

#include <linux/aio.h>

long io_getevents (aio_context_t ctx_id, long min_nr, long nr, struct io_event *events, struct timespec *timeout);

 

DESCRIPTION

io_getevents attempts to read at least min_nr events and up to nr events from the completion queue of the AIO context specified by ctx_id. timeout specifies the amount of time to wait for events, where a NULL timeout waits until at least min_nr events have been seen. Note that timeout is relative and will be updated if not NULL and the operation blocks.

 

RETURN VALUE

io_getevents returns the number of events read: 0 if no events are available or < min_nr if the timeout has elapsed.

 

ERRORS

EINVAL
ctx_id is invalid. min_nr is out of range or nr is out of range.

EFAULT
Either events or timeout is an invalid pointer.

ENOSYS
io_getevents is not implemented on this architecture.

 

CONFORMING TO

io_getevents is Linux specific and should not be used in programs that are intended to be portable.

 

VERSIONS

The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.

 

SEE ALSO

io_setup(2), io_submit(2), io_getevents(2), io_cancel(2), io_destroy(2).

 

NOTES

The asynchronous I/O system calls were written by Benjamin LaHaise.

 

AUTHOR

Kent Yoder.

io_submit

NAME

io_submit - Submit asynchronous I/O blocks for processing  

SYNOPSIS

#include <linux/aio.h>

long io_submit (aio_context_t ctx_id, long nr, struct iocb **iocbpp);

 

DESCRIPTION

io_submit queues nr I/O request blocks for processing in the AIO context ctx_id. iocbpp should be an array of nr AIO request blocks, which will be submitted to context ctx_id.

 

RETURN VALUE

io_submit returns the number of iocbs submitted and 0 if nr is zero.

 

ERRORS

EINVAL
The aio_context specified by ctx_id is invalid. nr is less than 0. The iocb at *iocbpp[0] is not properly initialized, or the operation specified is invalid for the file descriptor in the iocb.

EFAULT
One of the data structures points to invalid data.

EBADF
The file descriptor specified in the first iocb is invalid.

EAGAIN
Insufficient resources are available to queue any iocbs.

ENOSYS
io_submit is not implemented on this architecture.

 

CONFORMING TO

io_submit is Linux specific and should not be used in programs that are intended to be portable.

 

VERSIONS

The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.

 

SEE ALSO

io_setup(2), io_destroy(2), io_getevents(2), io_cancel(2).

 

NOTES

The asynchronous I/O system calls were written by Benjamin LaHaise.

 

AUTHOR

Kent Yoder.

kill

NAME

kill - send signal to a process  

SYNOPSIS

#include <sys/types.h>
#include <signal.h> int kill(pid_t pid, int sig);

 

DESCRIPTION

The kill system call can be used to send any signal to any process group or process.

If pid is positive, then signal sig is sent to pid.

If pid equals 0, then sig is sent to every process in the process group of the current process.

If pid equals -1, then sig is sent to every process except for process 1 (init), but see below.

If pid is less than -1, then sig is sent to every process in the process group -pid.

If sig is 0, then no signal is sent, but error checking is still performed.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EINVAL
An invalid signal was specified.
ESRCH
The pid or process group does not exist. Note that an existing process might be a zombie, a process which already committed termination, but has not yet been wait()ed for.
EPERM
The process does not have permission to send the signal to any of the receiving processes. For a process to have permission to send a signal to process pid it must either have root privileges, or the real or effective user ID of the sending process must equal the real or saved set-user-ID of the receiving process. In the case of SIGCONT it suffices when the sending and receiving processes belong to the same session.
 

NOTES

It is impossible to send a signal to task number one, the init process, for which it has not installed a signal handler. This is done to assure the system is not brought down accidentally.

POSIX 1003.1-2001 requires that kill(-1,sig) send sig to all processes that the current process may send signals to, except possibly for some implementation-defined system processes. Linux allows a process to signal itself, but on Linux the call kill(-1,sig) does not signal the current process.  

LINUX HISTORY

Across different kernel versions, Linux has enforced different rules for the permissions required for an unprivileged process to send a signal to another process. In kernels 1.0 to 1.2.2, a signal could be sent if the effective user ID of the sender matched that of the receiver, or the real user ID of the sender matched that of the receiver. From kernel 1.2.3 until 1.3.77, a signal could be sent if the effective user ID of the sender matched either the real or effective user ID of the receiver. The current rules, which conform to POSIX 1003.1-2001, were adopted in kernel 1.3.78.  

CONFORMING TO

SVr4, SVID, POSIX.1, X/OPEN, BSD 4.3, POSIX 1003.1-2001  

SEE ALSO

_exit(2), killpg(2), signal(2), tkill(2), exit(3), signal(7)

lchown

NAME

chown, fchown, lchown - change ownership of a file  

SYNOPSIS

#include <sys/types.h>
#include <unistd.h>

int chown(const char *path, uid_t owner, gid_t group);
int fchown(int fd, uid_t owner, gid_t group);
int lchown(const char *path, uid_t owner, gid_t group);  

DESCRIPTION

The owner of the file specified by path or by fd is changed. Only the super-user may change the owner of a file. The owner of a file may change the group of the file to any group of which that owner is a member. The super-user may change the group arbitrarily.

If the owner or group is specified as -1, then that ID is not changed.

When the owner or group of an executable file are changed by a non-super-user, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown; the Linux behaviour depends on the kernel version. In case of a non-group-executable file (with clear S_IXGRP bit) the S_ISGID bit indicates mandatory locking, and is not cleared by a chown.

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

Depending on the file system, other errors can be returned. The more general errors for chown are listed below:

EPERM
The effective UID does not match the owner of the file, and is not zero; or the owner or group were specified incorrectly.
EROFS
The named file resides on a read-only file system.
EFAULT
path points outside your accessible address space.
ENAMETOOLONG
path is too long.
ENOENT
The file does not exist.
ENOMEM
Insufficient kernel memory was available.
ENOTDIR
A component of the path prefix is not a directory.
EACCES
Search permission is denied on a component of the path prefix.
ELOOP
Too many symbolic links were encountered in resolving path.

The general errors for fchown are listed below:

EBADF
The descriptor is not valid.
ENOENT
See above.
EPERM
See above.
EROFS
See above.
EIO
A low-level I/O error occurred while modifying the inode.
 

NOTES

In versions of Linux prior to 2.1.81 (and distinct from 2.1.46), chown did not follow symbolic links. Since Linux 2.1.81, chown does follow symbolic links, and there is a new system call lchown that does not follow symbolic links. Since Linux 2.1.86, this new call (that has the same semantics as the old chown) has got the same syscall number, and chown got the newly introduced number.

The prototype for fchown is only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).  

CONFORMING TO

The chown call conforms to SVr4, SVID, POSIX, X/OPEN. The 4.4BSD version can only be used by the superuser (that is, ordinary users cannot give away files). SVr4 documents EINVAL, EINTR, ENOLINK and EMULTIHOP returns, but no ENOMEM. POSIX.1 does not document ENOMEM or ELOOP error conditions.

The fchown call conforms to 4.4BSD and SVr4. SVr4 documents additional EINVAL, EIO, EINTR, and ENOLINK error conditions.  

RESTRICTIONS

The chown() semantics are deliberately violated on NFS file systems which have UID mapping enabled. Additionally, the semantics of all system calls which access the file contents are violated, because chown() may cause immediate access revocation on already open files. Client side caching may lead to a delay between the time where ownership have been changed to allow access for a user and the time where the file can actually be accessed by the user on other clients.  

SEE ALSO

chmod(2), flock(2)

link

NAME

link - make a new name for a file  

SYNOPSIS

#include <unistd.h>

int link(const char *oldpath, const char *newpath);  

DESCRIPTION

link creates a new link (also known as a hard link) to an existing file.

If newpath exists it will not be overwritten.

This new name may be used exactly as the old one for any operation; both names refer to the same file (and so have the same permissions and ownership) and it is impossible to tell which name was the `original`.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EXDEV
oldpath and newpath are not on the same filesystem.
EPERM
The filesystem containing oldpath and newpath does not support the creation of hard links.
EFAULT
oldpath or newpath points outside your accessible address space.
EACCES
Write access to the directory containing newpath is not allowed for the process`s effective uid, or one of the directories in oldpath or newpath did not allow search (execute) permission.
ENAMETOOLONG
oldpath or newpath was too long.
ENOENT
A directory component in oldpath or newpath does not exist or is a dangling symbolic link.
ENOTDIR
A component used as a directory in oldpath or newpath is not, in fact, a directory.
ENOMEM
Insufficient kernel memory was available.
EROFS
The file is on a read-only filesystem.
EEXIST
newpath already exists.
EMLINK
The file referred to by oldpath already has the maximum number of links to it.
ELOOP
Too many symbolic links were encountered in resolving oldpath or newpath.
ENOSPC
The device containing the file has no room for the new directory entry.
EPERM
oldpath is a directory.
EIO
An I/O error occurred.
 

NOTES

Hard links, as created by link, cannot span filesystems. Use symlink if this is required.  

CONFORMING TO

SVr4, SVID, POSIX, BSD 4.3, X/OPEN. SVr4 documents additional ENOLINK and EMULTIHOP error conditions; POSIX.1 does not document ELOOP. X/OPEN does not document EFAULT, ENOMEM or EIO.  

BUGS

On NFS file systems, the return code may be wrong in case the NFS server performs the link creation and dies before it can say so. Use stat(2) to find out if the link got created.  

SEE ALSO

symlink(2), unlink(2), rename(2), open(2), stat(2), ln(1)

listxattr

NAME

listxattr, llistxattr, flistxattr - list extended attribute names  

SYNOPSIS

#include <sys/types.h> #include <attr/xattr.h> ssize_t listxattr (const char *path, char *list, size_t size); ssize_t llistxattr (const char *path, char *list, size_t size); ssize_t flistxattr (int filedes, char *list, size_t size);

 

DESCRIPTION

Extended attributes are name:value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the stat(2) data). A complete overview of extended attributes concepts can be found in attr(5).

listxattr retrieves the list of extended attribute names associated with the given path in the filesystem. The list is the set of (NULL-terminated) names, one after the other. Names of extended attributes to which the calling process does not have access may be omitted from the list. The length of the attribute name list is returned.

llistxattr is identical to listxattr, except in the case of a symbolic link, where the list of names of extended attributes associated with the link itself is retrieved, not the file that it refers to.

flistxattr is identical to listxattr, only the open file pointed to by filedes (as returned by open(2)) is interrogated in place of path.

A single extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode.

An empty buffer of size zero can be passed into these calls to return the current size of the list of extended attribute names, which can be used to estimate the size of a buffer which is sufficiently large to hold the list of names.  

EXAMPLES

The list of names is returned as an unordered array of NULL-terminated character strings (attribute names are separated by NULL characters), like this:

user.name1system.name1user.name2

Filesystems like ext2, ext3 and XFS which implement POSIX ACLs using extended attributes, might return a list like this:

system.posix_acl_accesssystem.posix_acl_default

 

RETURN VALUE

On success, a positive number is returned indicating the size of the extended attribute name list. On failure, -1 is returned and errno is set appropriately.

If the size of the list buffer is too small to hold the result, errno is set to ERANGE.

If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.

The errors documented for the stat(2) system call are also applicable here.  

AUTHORS

Andreas Gruenbacher, <a.gruenbacher@computer.org> and the SGI XFS development team, <linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.  

SEE ALSO

getfattr(1), setfattr(1), open(2), stat(2), getxattr(2), setxattr(2), removexattr(2), and attr(5).

llseek

NAME

_llseek - reposition read/write file offset  

SYNOPSIS

#include <unistd.h>

#include <linux/unistd.h>

_syscall5(int, _llseek, uint, fd, ulong, hi, ulong, lo, loff_t *, res, uint, wh);

int _llseek(unsigned int fd, unsigned long offset_high, unsigned long offset_low, loff_t *result, unsigned int whence);  

DESCRIPTION

The _llseek function repositions the offset of the file descriptor fd to (offset_high<<32) | offset_low bytes relative to the beginning of the file, the current position in the file, or the end of the file, depending on whether whence is SEEK_SET, SEEK_CUR, or SEEK_END, respectively. It returns the resulting file position in the argument result.

 

RETURN VALUE

Upon successful completion, _llseek returns 0. Otherwise, a value of -1 is returned and errno is set to indicate the error.  

ERRORS

EBADF
fd is not an open file descriptor.
EINVAL
whence is invalid.
EFAULT
Problem with copying results to user space.
 

CONFORMING TO

This function is Linux-specific, and should not be used in programs intended to be portable.  

SEE ALSO

lseek(2)

lookup_dcookie

NAME

lookup_dcookie - return a directory entry`s path  

SYNOPSIS

int lookup_dcookie(u64 cookie, char * buffer, size_t len);  

DESCRIPTION

Look up the full path of the directory entry specified by the value cookie The cookie is an opaque identifier uniquely identifying a particular directory entry. The buffer given is filled in with the full path of the directory entry.

For lookup_dcookie to return successfully, the kernel must still hold a cookie reference to the directory entry.

 

NOTES

lookup_dcookie is a special-purpose system call, currently used only by the oprofile profiler. It relies on a kernel driver to register cookies for directory entries.

The path returned may be suffixed by the string " (deleted)" if the directory entry has been removed.

 

RETURN VALUE

On success, lookup_dcookie returns the length of the path string copied into the buffer. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EPERM The process does not have the capability to look up cookie values.
EINVAL
The kernel has no registered cookie/directory entry mappings at the time of lookup, or the cookie does not refer to a valid directory entry.
ENOMEM
The kernel could not allocate memory for the temporary buffer holding the path.
ERANGE
The buffer was not large enough to hold the path of the directory entry.
ENAMETOOLONG
The name could not fit in the buffer.
EFAULT
The buffer was not valid.

 

CONFORMING TO

lookup_dcookie is Linux-specific.  

AVAILABILITY

Since Linux 2.5.43. The ENAMETOOLONG error return was added in 2.5.70.

lseek

NAME

lseek - reposition read/write file offset  

SYNOPSIS

#include <sys/types.h>
#include <unistd.h>

off_t lseek(int fildes, off_t offset, int whence);  

DESCRIPTION

The lseek function repositions the offset of the file descriptor fildes to the argument offset according to the directive whence as follows:
SEEK_SET
The offset is set to offset bytes.
SEEK_CUR
The offset is set to its current location plus offset bytes.
SEEK_END
The offset is set to the size of the file plus offset bytes.

The lseek function allows the file offset to be set beyond the end of the existing end-of-file of the file (but this does not change the size of the file). If data is later written at this point, subsequent reads of the data in the gap return bytes of zeros (until data is actually written into the gap).  

RETURN VALUE

Upon successful completion, lseek returns the resulting offset location as measured in bytes from the beginning of the file. Otherwise, a value of (off_t)-1 is returned and errno is set to indicate the error.  

ERRORS

EBADF
fildes is not an open file descriptor.
ESPIPE
fildes is associated with a pipe, socket, or FIFO.
EINVAL
whence is not one of SEEK_SET, SEEK_CUR, SEEK_END, or the resulting file offset would be negative.
EOVERFLOW
The resulting file offset cannot be represented in an off_t.
 

CONFORMING TO

SVr4, POSIX, BSD 4.3  

RESTRICTIONS

Some devices are incapable of seeking and POSIX does not specify which devices must support it.

Linux specific restrictions: using lseek on a tty device returns ESPIPE.  

NOTES

This document`s use of whence is incorrect English, but maintained for historical reasons.

When converting old code, substitute values for whence with the following macros:

oldnew
0SEEK_SET
1SEEK_CUR
2SEEK_END
L_SETSEEK_SET
L_INCRSEEK_CUR
L_XTNDSEEK_END

SVR1-3 returns long instead of off_t, BSD returns int.

Note that file descriptors created by dup(2) or fork(2) share the current file position pointer, so seeking on such files may be subject to race conditions.  

SEE ALSO

dup(2), fork(2), open(2), fseek(3)

lstat

NAME

stat, fstat, lstat - get file status  

SYNOPSIS

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

int stat(const char *file_name, struct stat *buf);
int fstat(int filedes, struct stat *buf);
int lstat(const char *file_name, struct stat *buf);  

DESCRIPTION

These functions return information about the specified file. You do not need any access rights to the file to get this information but you need search rights to all directories named in the path leading to the file.

stat stats the file pointed to by file_name and fills in buf.

lstat is identical to stat, except in the case of a symbolic link, where the link itself is stat-ed, not the file that it refers to.

fstat is identical to stat, only the open file pointed to by filedes (as returned by open(2)) is stat-ed in place of file_name.

They all return a stat structure, which contains the following fields:

struct stat { dev_t st_dev; /* device */ ino_t st_ino; /* inode */ mode_t st_mode; /* protection */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device type (if inode device) */ off_t st_size; /* total size, in bytes */ blksize_t st_blksize; /* blocksize for filesystem I/O */ blkcnt_t st_blocks; /* number of blocks allocated */ time_t st_atime; /* time of last access */ time_t st_mtime; /* time of last modification */ time_t st_ctime; /* time of last status change */ };

The value st_size gives the size of the file (if it is a regular file or a symlink) in bytes. The size of a symlink is the length of the pathname it contains, without trailing NUL.

The value st_blocks gives the size of the file in 512-byte blocks. (This may be smaller than st_size/512 e.g. when the file has holes.) The value st_blksize gives the "preferred" blocksize for efficient file system I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.)

Not all of the Linux filesystems implement all of the time fields. Some file system types allow mounting in such a way that file accesses do not cause an update of the st_atime field. (See `noatime` in mount(8).)

The field st_atime is changed by file accesses, e.g. by execve(2), mknod(2), pipe(2), utime(2) and read(2) (of more than zero bytes). Other routines, like mmap(2), may or may not update st_atime.

The field st_mtime is changed by file modifications, e.g. by mknod(2), truncate(2), utime(2) and write(2) (of more than zero bytes). Moreover, st_mtime of a directory is changed by the creation or deletion of files in that directory. The st_mtime field is not changed for changes in owner, group, hard link count, or mode.

The field st_ctime is changed by writing or by setting inode information (i.e., owner, group, link count, mode, etc.).

The following POSIX macros are defined to check the file type:

S_ISREG(m)
is it a regular file?
S_ISDIR(m)
directory?
S_ISCHR(m)
character device?
S_ISBLK(m)
block device?
S_ISFIFO(m)
fifo?
S_ISLNK(m)
symbolic link? (Not in POSIX.1-1996.)
S_ISSOCK(m)
socket? (Not in POSIX.1-1996.)

The following flags are defined for the st_mode field:

S_IFMT0170000bitmask for the file type bitfields
S_IFSOCK0140000socket
S_IFLNK0120000symbolic link
S_IFREG0100000regular file
S_IFBLK0060000block device
S_IFDIR0040000directory
S_IFCHR0020000character device
S_IFIFO0010000fifo
S_ISUID0004000set UID bit
S_ISGID0002000set GID bit (see below)
S_ISVTX0001000sticky bit (see below)
S_IRWXU00700mask for file owner permissions
S_IRUSR00400owner has read permission
S_IWUSR00200owner has write permission
S_IXUSR00100owner has execute permission
S_IRWXG00070mask for group permissions
S_IRGRP00040group has read permission
S_IWGRP00020group has write permission
S_IXGRP00010group has execute permission
S_IRWXO00007mask for permissions for others (not in group)
S_IROTH00004others have read permission
S_IWOTH00002others have write permisson
S_IXOTH00001others have execute permission
The set GID bit (S_ISGID) has several special uses: For a directory it indicates that BSD semantics is to be used for that directory: files created there inherit their group ID from the directory, not from the effective gid of the creating process, and directories created there will also get the S_ISGID bit set. For a file that does not have the group execution bit (S_IXGRP) set, it indicates mandatory file/record locking. The `sticky` bit (S_ISVTX) on a directory means that a file in that directory can be renamed or deleted only by the owner of the file, by the owner of the directory, and by root.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
filedes is bad.
ENOENT
A component of the path file_name does not exist, or the path is an empty string.
ENOTDIR
A component of the path is not a directory.
ELOOP
Too many symbolic links encountered while traversing the path.
EFAULT
Bad address.
EACCES
Permission denied.
ENOMEM
Out of memory (i.e. kernel memory).
ENAMETOOLONG
File name too long.
 

CONFORMING TO

The stat and fstat calls conform to SVr4, SVID, POSIX, X/OPEN, BSD 4.3. The lstat call conforms to 4.3BSD and SVr4. SVr4 documents additional fstat error conditions EINTR, ENOLINK, and EOVERFLOW. SVr4 documents additional stat and lstat error conditions EACCES, EINTR, EMULTIHOP, ENOLINK, and EOVERFLOW. Use of the st_blocks and st_blksize fields may be less portable. (They were introduced in BSD. Are not specified by POSIX. The interpretation differs between systems, and possibly on a single system when NFS mounts are involved.)

POSIX does not describe the S_IFMT, S_IFSOCK, S_IFLNK, S_IFREG, S_IFBLK, S_IFDIR, S_IFCHR, S_IFIFO, S_ISVTX bits, but instead demands the use of the macros S_ISDIR(), etc. The S_ISLNK and S_ISSOCK macros are not in POSIX.1-1996, but both will be in the next POSIX standard; the former is from SVID 4v2, the latter from SUSv2.

Unix V7 (and later systems) had S_IREAD, S_IWRITE, S_IEXEC, where POSIX prescribes the synonyms S_IRUSR, S_IWUSR, S_IXUSR.  

OTHER SYSTEMS

Values that have been (or are) in use on various systems:
hexnamelsoctaldescription
f000S_IFMT170000mask for file type
0000000000SCO out-of-service inode, BSD unknown type
SVID-v2 and XPG2 have both 0 and 0100000 for ordinary file
1000S_IFIFOp|010000fifo (named pipe)
2000S_IFCHRc020000character special (V7)
3000S_IFMPC030000multiplexed character special (V7)
4000S_IFDIRd/040000directory (V7)
5000S_IFNAM050000XENIX named special file
with two subtypes, distinguished by st_rdev values 1, 2:
0001S_INSEMs000001XENIX semaphore subtype of IFNAM
0002S_INSHDm000002XENIX shared data subtype of IFNAM
6000S_IFBLKb060000block special (V7)
7000S_IFMPB070000multiplexed block special (V7)
8000S_IFREG-100000regular (V7)
9000S_IFCMP110000VxFS compressed
9000S_IFNWKn110000network special (HP-UX)
a000S_IFLNKl@120000symbolic link (BSD)
b000S_IFSHAD130000Solaris shadow inode for ACL (not seen by userspace)
c000S_IFSOCKs=140000socket (BSD; also "S_IFSOC" on VxFS)
d000S_IFDOORD>150000Solaris door
e000S_IFWHTw%160000BSD whiteout (not used for inode)

0200S_ISVTX001000`sticky bit`: save swapped text even after use (V7)
reserved (SVID-v2)
On non-directories: don`t cache this file (SunOS)
On directories: restricted deletion flag (SVID-v4.2)
0400S_ISGID002000set group ID on execution (V7)
for directories: use BSD semantics for propagation of gid
0400S_ENFMT002000SysV file locking enforcement (shared w/ S_ISGID)
0800S_ISUID004000set user ID on execution (V7)
0800S_CDF004000directory is a context dependent file (HP-UX)

A sticky command appeared in Version 32V AT&T UNIX.

 

SEE ALSO

chmod(2), chown(2), readlink(2), utime(2)

mbind

NAME

mbind - Set memory policy for an memory range  

SYNOPSIS

#include <numaif.h>
int mbind(void *start, unsigned long len, int policy, unsigned long *nodemask,         unsigned long maxnode, unsigned flags)
 

DESCRIPTION

mbind sets the NUMA memory policy for the memory range starting with start and length len. The memory of a NUMA machine is divided into multiple nodes. The memory policy defines in which node memory is allocated. mbind has only an effect for new allocations; when the pages inside the range have been already touched before setting the policy the policy has no effect.

Available policies are MPOL_DEFAULT, MPOL_BIND, MPOL_INTERLEAVE, MPOL_PREFERRED. All policies except MPOL_DEFAULT require to specify the nodes they apply to in the nodemask parameter. nodemask is a bit field of nodes that contains upto maxnode bits. The node mask bit field size is rounded to the next multiple of sizeof(unsigned long), but the kernel will only use bits upto maxnode.

When MPOL_MF_STRICT is passed in the flags parameter EIO will be returned when the existing pages in the mapping don`t follow the policy.

The MPOL_DEFAULT policy is the default and means to use the underlying process policy (which can be modified with set_mempolicy(2) ). Unless the process policy has been changed this means to allocate memory on the node of the CPU that triggered the allocation. nodemask should be passed as NULL.

The MPOL_BIND policy is a strict policy that restricts memory allocation to the nodes specified in nodemask. There won`t be allocations on other nodes.

MPOL_INTERLEAVE interleaves allocations to the nodes specified in nodemask. This optimizes for bandwidth instead of latency. To be effective the memory area should be fairly large, at least 1MB or bigger.

MPOL_PREFERRED sets the preferred node for allocation. The kernel will try to allocate in this node first and fall back to other nodes when the preferred nodes is low on free memory. Only the first node in the nodemask is used. When no node is set in the mask the current node is used for allocation.

 

RETURN VALUE

mbind returns -1 when an error occurred, otherwise 0.

 

ERRORS

EFAULT
There was a unmapped hole in the specified memory range or an passed pointer was not valid.
EINVAL
An illegal parameter was passed.
ENOMEM
System out of memory
EIO
MPOL_F_STRICT was specified and an existing page was already on an wrong node.

 

NOTES

For a higher level interface it is recommended to use the functions in numa(3).

Until glibc supports these system calls you can link with -lnuma to get system call definitions.

MPOL_MF_STRICT is ignored on huge page mappings right now. For preferred and interleave mappings it will only accept the first choice node.

For MPOL_INTERLEAVE mode the interleaving is changed at fault time. The final layout of the pages depends on the order they were faulted in first.

 

SEE ALSO

numa(3), numactl(8), set_mempolicy(2), get_mempolicy(2), mmap(2)

mkdir

NAME

mkdir - create a directory  

SYNOPSIS

#include <sys/stat.h> #include <sys/types.h> int mkdir(const char *pathname, mode_t mode);

 

DESCRIPTION

mkdir attempts to create a directory named pathname.

The parameter mode specifies the permissions to use. It is modified by the process`s umask in the usual way: the permissions of the created directory are (mode & ~umask & 0777). Other mode bits of the created directory depend on the operating system. For Linux, see below.

The newly created directory will be owned by the effective uid of the process. If the directory containing the file has the set group id bit set, or if the filesystem is mounted with BSD group semantics, the new directory will inherit the group ownership from its parent; otherwise it will be owned by the effective gid of the process.

If the parent directory has the set group id bit set then so will the newly created directory.

 

RETURN VALUE

mkdir returns zero on success, or -1 if an error occurred (in which case, errno is set appropriately).  

ERRORS

EPERM
The filesystem containing pathname does not support the creation of directories.
EEXIST
pathname already exists (not necessarily as a directory). This includes the case where pathname is a symbolic link, dangling or not.
EFAULT
pathname points outside your accessible address space.
EACCES
The parent directory does not allow write permission to the process, or one of the directories in pathname did not allow search (execute) permission.
ENAMETOOLONG
pathname was too long.
ENOENT
A directory component in pathname does not exist or is a dangling symbolic link.
ENOTDIR
A component used as a directory in pathname is not, in fact, a directory.
ENOMEM
Insufficient kernel memory was available.
EROFS
pathname refers to a file on a read-only filesystem.
ELOOP
Too many symbolic links were encountered in resolving pathname.
ENOSPC
The device containing pathname has no room for the new directory.
ENOSPC
The new directory cannot be created because the user`s disk quota is exhausted.
 

CONFORMING TO

SVr4, POSIX, BSD, SYSV, X/OPEN. SVr4 documents additional EIO, EMULTIHOP and ENOLINK error conditions; POSIX.1 omits ELOOP.  

NOTES

Under Linux apart from the permission bits, only the S_ISVTX mode bit is honored. That is, under Linux the created directory actually gets mode (mode & ~umask & 01777). See also stat(2).

There are many infelicities in the protocol underlying NFS. Some of these affect mkdir.  

SEE ALSO

mkdir(1), chmod(2), mknod(2), mount(2), rmdir(2), stat(2), umask(2), unlink(2)

mlock

NAME

mlock - disable paging for some parts of memory  

SYNOPSIS

#include <sys/mman.h> int mlock(const void *addr, size_t len);

 

DESCRIPTION

mlock disables paging for the memory in the range starting at addr with length len bytes. All pages which contain a part of the specified memory range are guaranteed be resident in RAM when the mlock system call returns successfully and they are guaranteed to stay in RAM until the pages are unlocked by munlock or munlockall, until the pages are unmapped via munmap, or until the process terminates or starts another program with exec. Child processes do not inherit page locks across a fork.

Memory locking has two main applications: real-time algorithms and high-security data processing. Real-time applications require deterministic timing, and, like scheduling, paging is one major cause of unexpected program execution delays. Real-time applications will usually also switch to a real-time scheduler with sched_setscheduler. Cryptographic security software often handles critical bytes like passwords or secret keys as data structures. As a result of paging, these secrets could be transferred onto a persistent swap store medium, where they might be accessible to the enemy long after the security software has erased the secrets in RAM and terminated. (But be aware that the suspend mode on laptops and some desktop computers will save a copy of the system`s RAM to disk, regardless of memory locks.)

Memory locks do not stack, i.e., pages which have been locked several times by calls to mlock or mlockall will be unlocked by a single call to munlock for the corresponding range or by munlockall. Pages which are mapped to several locations or by several processes stay locked into RAM as long as they are locked at least at one location or by at least one process.

On POSIX systems on which mlock and munlock are available, _POSIX_MEMLOCK_RANGE is defined in <unistd.h> and the value PAGESIZE from <limits.h> indicates the number of bytes per page.  

NOTES

With the Linux system call, addr is automatically rounded down to the nearest page boundary. However, POSIX 1003.1-2001 allows an implementation to require that addr is page aligned, so portable applications should ensure this.  

RETURN VALUE

On success, mlock returns zero. On error, -1 is returned, errno is set appropriately, and no changes are made to any locks in the address space of the process.  

ERRORS

ENOMEM
Some of the specified address range does not correspond to mapped pages in the address space of the process or the process tried to exceed the maximum number of allowed locked pages. Non-root processes are allowed to lock up to their current RLIMIT_MEMLOCK resource limit.
EPERM
The calling process does not have appropriate privileges. Processes are permitted to lock pages if they running with the CAP_IPC_LOCK capability (normally only true for root) or if their current RLIMIT_MEMLOCK resource limit is non-zero.
EINVAL
(Not on Linux) addr was not a multiple of the page size.

Linux adds

EINVAL
len was negative.
 

CONFORMING TO

POSIX.1b, SVr4. SVr4 documents an additional EAGAIN error code.  

SEE ALSO

mlockall(2), munlock(2), munlockall(2), munmap(2), setrlimit(2)

mlockall

NAME

mlockall - disable paging for calling process  

SYNOPSIS

#include <sys/mman.h> int mlockall(int flags);

 

DESCRIPTION

mlockall disables paging for all pages mapped into the address space of the calling process. This includes the pages of the code, data and stack segment, as well as shared libraries, user space kernel data, shared memory and memory mapped files. All mapped pages are guaranteed to be resident in RAM when the mlockall system call returns successfully and they are guaranteed to stay in RAM until the pages are unlocked again by munlock or munlockall or until the process terminates or starts another program with exec. Child processes do not inherit page locks across a fork.

Memory locking has two main applications: real-time algorithms and high-security data processing. Real-time applications require deterministic timing, and, like scheduling, paging is one major cause of unexpected program execution delays. Real-time applications will usually also switch to a real-time scheduler with sched_setscheduler. Cryptographic security software often handles critical bytes like passwords or secret keys as data structures. As a result of paging, these secrets could be transfered onto a persistent swap store medium, where they might be accessible to the enemy long after the security software has erased the secrets in RAM and terminated. For security applications, only small parts of memory have to be locked, for which mlock is available.

The flags parameter can be constructed from the bitwise OR of the following constants:

MCL_CURRENT
Lock all pages which are currently mapped into the address space of the process.
MCL_FUTURE
Lock all pages which will become mapped into the address space of the process in the future. These could be for instance new pages required by a growing heap and stack as well as new memory mapped files or shared memory regions.

If MCL_FUTURE has been specified and the number of locked pages exceeds the upper limit of allowed locked pages, then the system call which caused the new mapping will fail with ENOMEM. If these new pages have been mapped by the the growing stack, then the kernel will deny stack expansion and send a SIGSEGV.

Real-time processes should reserve enough locked stack pages before entering the time-critical section, so that no page fault can be caused by function calls. This can be achieved by calling a function which has a sufficiently large automatic variable and which writes to the memory occupied by this large array in order to touch these stack pages. This way, enough pages will be mapped for the stack and can be locked into RAM. The dummy writes ensure that not even copy-on-write page faults can occur in the critical section.

Memory locks do not stack, i.e., pages which have been locked several times by calls to mlockall or mlock will be unlocked by a single call to munlockall. Pages which are mapped to several locations or by several processes stay locked into RAM as long as they are locked at least at one location or by at least one process.  

RETURN VALUE

On success, mlockall returns zero. On error, -1 is returned, and errno is set appropriately.  

ERRORS

ENOMEM
The process tried to exceed the maximum number of allowed locked pages. Non-root processes are allowed to lock up to their current RLIMIT_MEMLOCK resource limit.
EPERM
The calling process does not have appropriate privileges. Processes are permitted to lock pages if they running with the CAP_IPC_LOCK capability (normally only true for root) or if their current RLIMIT_MEMLOCK resource limit is non-zero.
EINVAL
Unknown flags were specified.
 

AVAILABILITY

On POSIX systems on which mlockall and munlockall are available, _POSIX_MEMLOCK is defined in <unistd.h> to a value greater than 0. (See also sysconf(3).)  

CONFORMING TO

POSIX.1b, SVr4. SVr4 documents an additional EAGAIN error code.  

SEE ALSO

munlockall(2), mlock(2), munlock(2), sysconf(3)

mmap

NAME

mmap, munmap - map or unmap files or devices into memory  

SYNOPSIS

#include <sys/mman.h>

void * mmap(void *start, size_t length, int prot , int flags, int fd, off_t offset);

int munmap(void *start, size_t length);  

DESCRIPTION

The mmap function asks to map length bytes starting at offset offset from the file (or other object) specified by the file descriptor fd into memory, preferably at address start. This latter address is a hint only, and is usually specified as 0. The actual place where the object is mapped is returned by mmap, and is never 0.

The prot argument describes the desired memory protection (and must not conflict with the open mode of the file). It is either PROT_NONE or is the bitwise OR of one or more of the other PROT_* flags.

PROT_EXEC
Pages may be executed.
PROT_READ
Pages may be read.
PROT_WRITE
Pages may be written.
PROT_NONE
Pages may not be accessed.

The flags parameter specifies the type of the mapped object, mapping options and whether modifications made to the mapped copy of the page are private to the process or are to be shared with other references. It has bits

MAP_FIXED
Do not select a different address than the one specified. If the specified address cannot be used, mmap will fail. If MAP_FIXED is specified, start must be a multiple of the pagesize. Use of this option is discouraged.
MAP_SHARED
Share this mapping with all other processes that map this object. Storing to the region is equivalent to writing to the file. The file may not actually be updated until msync(2) or munmap(2) are called.
MAP_PRIVATE
Create a private copy-on-write mapping. Stores to the region do not affect the original file. It is unspecified whether changes made to the file after the mmap call are visible in the mapped region.

You must specify exactly one of MAP_SHARED and MAP_PRIVATE.

The above three flags are described in POSIX.1b (formerly POSIX.4) and SUSv2. Linux also knows about the following non-standard flags:

MAP_DENYWRITE
This flag is ignored. (Long ago, it signalled that attempts to write to the underlying file should fail with ETXTBUSY. But this was a source of denial-of-service attacks.)
MAP_EXECUTABLE
This flag is ignored.
MAP_NORESERVE
(Used together with MAP_PRIVATE.) Do not reserve swap space pages for this mapping. When swap space is reserved, one has the guarantee that it is possible to modify this private copy-on-write region. When it is not reserved one might get SIGSEGV upon a write when no memory is available.
MAP_LOCKED
(Linux 2.5.37 and later) Lock the pages of the mapped region into memory in the manner of mlock(). This flag is ignored in older kernels.
MAP_GROWSDOWN
Used for stacks. Indicates to the kernel VM system that the mapping should extend downwards in memory.
MAP_ANONYMOUS
The mapping is not backed by any file; the fd and offset arguments are ignored. This flag in conjunction with MAP_SHARED is implemented since Linux 2.4.
MAP_ANON
Alias for MAP_ANONYMOUS. Deprecated.
MAP_FILE
Compatibility flag. Ignored.
MAP_32BIT
Put the mapping into the first 2GB of the process address space. Ignored when MAP_FIXED is set. This flag is currently only supported on x86-64 for 64bit programs.

Some systems document the additional flags MAP_AUTOGROW, MAP_AUTORESRV, MAP_COPY, and MAP_LOCAL.

fd should be a valid file descriptor, unless MAP_ANONYMOUS is set, in which case the argument is ignored.

offset should be a multiple of the page size as returned by getpagesize(2).

Memory mapped by mmap is preserved across fork(2), with the same attributes.

A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file. The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified.

The munmap system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references. The region is also automatically unmapped when the process is terminated. On the other hand, closing the file descriptor does not unmap the region.

The address start must be a multiple of the page size. All pages containing a part of the indicated range are unmapped, and subsequent references to these pages will generate SIGSEGV. It is not an error if the indicated range does not contain any mapped pages.

For file-backed mappings, the st_atime field for the mapped file may be updated at any time between the mmap() and the corresponding unmapping; the first reference to a mapped page will update the field if it has not been already.

The st_ctime and st_mtime field for a file mapped with PROT_WRITE and MAP_SHARED will be updated after a write to the mapped region, and before a subsequent msync() with the MS_SYNC or MS_ASYNC flag, if one occurs.  

RETURN VALUE

On success, mmap returns a pointer to the mapped area. On error, the value MAP_FAILED (that is, (void *) -1) is returned, and errno is set appropriately. On success, munmap returns 0, on failure -1, and errno is set (probably to EINVAL).  

NOTES

It is architecture dependent whether PROT_READ includes PROT_EXEC or not. Portable programs should always set PROT_EXEC if they intend to execute code in the new mapping.  

ERRORS

EBADF
fd is not a valid file descriptor (and MAP_ANONYMOUS was not set).
EACCES
A file descriptor refers to a non-regular file. Or MAP_PRIVATE was requested, but fd is not open for reading. Or MAP_SHARED was requested and PROT_WRITE is set, but fd is not open in read/write (O_RDWR) mode. Or PROT_WRITE is set, but the file is append-only.
EINVAL
We don`t like start or length or offset. (E.g., they are too large, or not aligned on a PAGESIZE boundary.)
ETXTBSY
MAP_DENYWRITE was set but the object specified by fd is open for writing.
EAGAIN
The file has been locked, or too much memory has been locked.
ENOMEM
No memory is available, or the process`s maximum number of mappings would have been exceeded.
ENODEV
The underlying filesystem of the specified file does not support memory mapping.

Use of a mapped region can result in these signals:

SIGSEGV
Attempted write into a region specified to mmap as read-only.
SIGBUS
Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file).
 

AVAILABILITY

On POSIX systems on which mmap, msync and munmap are available, _POSIX_MAPPED_FILES is defined in <unistd.h> to a value greater than 0. (See also sysconf(3).)  

CONFORMING TO

SVr4, POSIX.1b (formerly POSIX.4), 4.4BSD, SUSv2. SVr4 documents additional error codes ENXIO and ENODEV. SUSv2 documents additional error codes EMFILE and EOVERFLOW.

MAP_32BIT is a Linux extension.  

SEE ALSO

getpagesize(2), mlock(2), mmap2(2), mremap(2), msync(2), shm_open(2), B.O. Gallmeister, POSIX.4, O`Reilly, pp. 128-129 and 389-391.

modify_ldt

NAME

modify_ldt - get or set ldt  

SYNOPSIS

#include <linux/ldt.h>
#include <linux/unistd.h>

_syscall3(int, modify_ldt, int, func, void *, ptr, unsigned long, bytecount)

int modify_ldt(int func, void *ptr, unsigned long bytecount);  

DESCRIPTION

modify_ldt reads or writes the local descriptor table (ldt) for a process. The ldt is a per-process memory management table used by the i386 processor. For more information on this table, see an Intel 386 processor handbook.

When func is 0, modify_ldt reads the ldt into the memory pointed to by ptr. The number of bytes read is the smaller of bytecount and the actual size of the ldt.

When func is 1, modify_ldt modifies one ldt entry. ptr points to a modify_ldt_ldt_s structure and bytecount must equal the size of this structure.  

RETURN VALUE

On success, modify_ldt returns either the actual number of bytes read (for reading) or 0 (for writing). On failure, modify_ldt returns -1 and sets errno.  

ERRORS

ENOSYS
func is neither 0 nor 1.
EINVAL
ptr is 0, or func is 1 and bytecount is not equal to the size of the structure modify_ldt_ldt_s, or func is 1 and the new ldt entry has invalid values.
EFAULT
ptr points outside the address space.
 

CONFORMING TO

This call in Linux-specific and should not be used in programs intended to be portable.  

SEE ALSO

vm86(2)

mprotect

NAME

mprotect - control allowable accesses to a region of memory  

SYNOPSIS

#include <sys/mman.h> int mprotect(const void *addr, size_t len, int prot);

 

DESCRIPTION

The function mprotect specifies the desired protection for the memory page(s) containing part or all of the interval [addr,addr+len-1]. If an access is disallowed by the protection given it, the program receives a SIGSEGV.

prot is a bitwise-or of the following values:

PROT_NONE
The memory cannot be accessed at all.
PROT_READ
The memory can be read.
PROT_WRITE
The memory can be written to.
PROT_EXEC
The memory can contain executing code.

The new protection replaces any existing protection. For example, if the memory had previously been marked PROT_READ, and mprotect is then called with prot PROT_WRITE, it will no longer be readable.  

RETURN VALUE

On success, mprotect returns zero. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EINVAL
addr is not a valid pointer, or not a multiple of PAGESIZE.
EFAULT
The memory cannot be accessed.
EACCES
The memory cannot be given the specified access. This can happen, for example, if you mmap(2) a file to which you have read-only access, then ask mprotect to mark it PROT_WRITE.
ENOMEM
Internal kernel structures could not be allocated.
 

EXAMPLE

#include <stdio.h> #include <stdlib.h> #include <errno.h> #include <sys/mman.h> #include <limits.h> /* for PAGESIZE */ #ifndef PAGESIZE #define PAGESIZE 4096 #endif int main(void) { char *p; char c; /* Allocate a buffer; it will have the default protection of PROT_READ|PROT_WRITE. */ p = malloc(1024+PAGESIZE-1); if (!p) { perror("Couldn`t malloc(1024)"); exit(errno); } /* Align to a multiple of PAGESIZE, assumed to be a power of two */ p = (char *)(((int) p + PAGESIZE-1) & ~(PAGESIZE-1)); c = p[666]; /* Read; ok */ p[666] = 42; /* Write; ok */ /* Mark the buffer read-only. */ if (mprotect(p, 1024, PROT_READ)) { perror("Couldn`t mprotect"); exit(errno); } c = p[666]; /* Read; ok */ p[666] = 42; /* Write; program dies on SIGSEGV */ exit(0); }

 

CONFORMING TO

SVr4, POSIX.1b (formerly POSIX.4). SVr4 defines an additional error code EAGAIN. The SVr4 error conditions don`t map neatly onto Linux`s. POSIX.1b says that mprotect can be used only on regions of memory obtained from mmap(2).  

SEE ALSO

mmap(2)

mremap

NAME

mremap - re-map a virtual memory address  

SYNOPSIS

#include <unistd.h>
#include <sys/mman.h>

void * mremap(void *old_address, size_t old_size , size_t new_size, unsigned long flags);  

DESCRIPTION

mremap expands (or shrinks) an existing memory mapping, potentially moving it at the same time (controlled by the flags argument and the available virtual address space).

old_address is the old address of the virtual memory block that you want to expand (or shrink). Note that old_address has to be page aligned. old_size is the old size of the virtual memory block. new_size is the requested size of the virtual memory block after the resize.

The flags argument is a bitmap of flags.

In Linux the memory is divided into pages. A user process has (one or) several linear virtual memory segments. Each virtual memory segment has one or more mappings to real memory pages (in the page table). Each virtual memory segment has its own protection (access rights), which may cause a segmentation violation if the memory is accessed incorrectly (e.g., writing to a read-only segment). Accessing virtual memory outside of the segments will also cause a segmentation violation.

mremap uses the Linux page table scheme. mremap changes the mapping between virtual addresses and memory pages. This can be used to implement a very efficient realloc.

 

FLAGS

MREMAP_MAYMOVE
indicates if the operation should fail, or change the virtual address if the resize cannot be done at the current virtual address.

 

RETURN VALUE

On success mremap returns a pointer to the new virtual memory area. On error, the value MAP_FAILED (that is, (void *) -1) is returned, and errno is set appropriately.

 

ERRORS

EINVAL
An invalid argument was given. Most likely old_address was not page aligned.
EFAULT
"Segmentation fault." Some address in the range old_address to old_address+old_size is an invalid virtual memory address for this process. You can also get EFAULT even if there exist mappings that cover the whole address space requested, but those mappings are of different types.
EAGAIN
The memory segment is locked and cannot be re-mapped.
ENOMEM
The memory area cannot be expanded at the current virtual address, and the MREMAP_MAYMOVE flag is not set in flags. Or, there is not enough (virtual) memory available.
 

NOTES

With current glibc includes, in order to get the definition of MREMAP_MAYMOVE, you need to define _GNU_SOURCE before including <sys/mman.h>.  

CONFORMING TO

This call is Linux-specific, and should not be used in programs intended to be portable. 4.2BSD had a (never actually implemented) mremap(2) call with completely different semantics.  

SEE ALSO

getpagesize(2), realloc(3), malloc(3), brk(2), sbrk(2), mmap(2) Your favorite OS text book for more information on paged memory. (Modern Operating Systems by Andrew S. Tannenbaum, Inside Linux by Randolf Bentson, The Design of the UNIX Operating System by Maurice J. Bach.)

msgget

NAME

msgget - get a message queue identifier  

SYNOPSIS

#include <sys/types.h> #include <sys/ipc.h> #include <sys/msg.h>

int msgget(key_t key, int msgflg);  

DESCRIPTION

The function returns the message queue identifier associated with the value of the key argument. A new message queue is created if key has the value IPC_PRIVATE or key isn`t IPC_PRIVATE, no message queue with the given key key exists, and IPC_CREAT is asserted in msgflg (i.e., msgflg&IPC_CREAT is nonzero). The presence in msgflg of the fields IPC_CREAT and IPC_EXCL plays the same role, with respect to the existence of the message queue, as the presence of O_CREAT and O_EXCL in the mode argument of the open(2) system call: i.e. the msgget function fails if msgflg asserts both IPC_CREAT and IPC_EXCL and a message queue already exists for key.

Upon creation, the lower 9 bits of the argument msgflg define the access permissions of the message queue. These permission bits have the same format and semantics as the access permissions parameter in open(2) or creat(2) system calls. (The execute permissions are not used.)

If a new message queue is created, the system call initializes the system message queue data structure msqid_ds as follows:

msg_perm.cuid and msg_perm.uid are set to the effective user-ID of the calling process.
msg_perm.cgid and msg_perm.gid are set to the effective group-ID of the calling process.
The lowest order 9 bits of msg_perm.mode are set to the lowest order 9 bit of msgflg.
msg_qnum, msg_lspid, msg_lrpid, msg_stime and msg_rtime are set to 0.
msg_ctime is set to the current time.
msg_qbytes is set to the system limit MSGMNB.

If the message queue already exists the access permissions are verified, and a check is made to see if it is marked for destruction.  

RETURN VALUE

If successful, the return value will be the message queue identifier (a nonnegative integer), otherwise -1 with errno indicating the error.  

ERRORS

On failure, errno is set to one of the following values:
EACCES
A message queue exists for key, but the calling process has no access permissions to the queue.
EEXIST
A message queue exists for key and msgflg was asserting both IPC_CREAT and IPC_EXCL.
ENOENT
No message queue exists for key and msgflg wasn`t asserting IPC_CREAT.
ENOMEM
A message queue has to be created but the system has not enough memory for the new data structure.
ENOSPC
A message queue has to be created but the system limit for the maximum number of message queues (MSGMNI) would be exceeded.
 

NOTES

IPC_PRIVATE isn`t a flag field but a key_t type. If this special value is used for key, the system call ignores everything but the lowest order 9 bits of msgflg and creates a new message queue (on success).

The following is a system limit on message queue resources affecting a msgget call:

MSGMNI
System wide maximum number of message queues: policy dependent.
 

BUGS

The name choice IPC_PRIVATE was perhaps unfortunate, IPC_NEW would more clearly show its function.  

CONFORMING TO

SVr4, SVID. Until version 2.3.20 Linux would return EIDRM for a msgget on a message queue scheduled for deletion.  

SEE ALSO

ftok(3), ipc(5), msgctl(2), msgsnd(2), msgrcv(2)

msgrcv

NAME

msgop - message operations  

SYNOPSIS

#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>

int msgsnd(int msqid, struct msgbuf *msgp, size_t msgsz, int msgflg);

ssize_t msgrcv(int msqid, struct msgbuf *msgp, size_t msgsz, long msgtyp, int msgflg);  

DESCRIPTION

To send or receive a message, the calling process allocates a structure of the following general form:

       struct msgbuf {
               long    mtype;   /* message type, must be > 0 */

               char    mtext[1];        /* message data */

       };

The
mtext field is an array (or other structure) whose size is specified by msgsz, a non-negative integer value. Messages of zero length (i.e., no mtext field) are permitted. The mtype field must have a strictly positive integer value that can be used by the receiving process for message selection (see the section about msgrcv).

The calling process must have write permission to send and read permission to receive a message on the queue.

The msgsnd system call appends a copy of the message pointed to by msgp to the message queue whose identifier is specified by msqid.

If sufficient space is available on the queue, msgsnd succeeds immediately. (The queue capacity is defined by the msg_bytes field in the associated data structure for the message queue. During queue creation this field is initialised to MSGMNB bytes, but this limit can be modified using msgctl.) If insufficient space is available on the queue, then the default behaviour of msgsnd is to block until space becomes available. If IPC_NOWAIT is asserted in msgflg then the call instead fails with the error EAGAIN.

A blocked msgsnd call may also fail if the queue is removed (in which case the system call fails with errno set to EIDRM), or a signal is caught (in which case the system call fails with errno set to EINTR). (msgsnd and msgrcv are never automatically restarted after being interrupted by a signal handler, regardless of the setting of the SA_RESTART flag when establishing a signal handler.)

Upon successful completion the message queue data structure is updated as follows:

msg_lspid is set to the process ID of the calling process.
msg_qnum is incremented by 1.
msg_stime is set to the current time.

The system call msgrcv reads a message from the message queue specified by msqid into the msgbuf pointed to by the msgp argument, removing the read message from the queue.

The argument msgsz specifies the maximum size in bytes for the member mtext of the structure pointed to by the msgp argument. If the message text has length greater than msgsz, then if the msgflg argument asserts MSG_NOERROR, the message text will be truncated (and the truncated part will be lost), otherwise the message isn`t removed from the queue and the system call fails returning with errno set to E2BIG.

The argument msgtyp specifies the type of message requested as follows:

If msgtyp is 0, then the first message in the queue is read.
If msgtyp is greater than 0, then the first message on the queue of type msgtyp is read, unless MSG_EXCEPT was asserted in msgflg, in which case the first message on the queue of type not equal to msgtyp will be read.
If msgtyp is less than 0, then the first message on the queue with the lowest type less than or equal to the absolute value of msgtyp will be read.

The msgflg argument asserts none, one or more (or-ing them) of the following flags:

IPC_NOWAIT For immediate return if no message of the requested type is on the queue. The system call fails with errno set to ENOMSG.
MSG_EXCEPT Used with msgtyp greater than 0 to read the first message on the queue with message type that differs from msgtyp.
MSG_NOERROR To truncate the message text if longer than msgsz bytes.

If no message of the requested type is available and IPC_NOWAIT isn`t asserted in msgflg, the calling process is blocked until one of the following conditions occurs:

A message of the desired type is placed on the queue.
The message queue is removed from the system. In this case the system call fails with errno set to EIDRM.
The calling process catches a signal. In this case the system call fails with errno set to EINTR.

Upon successful completion the message queue data structure is updated as follows:

msg_lrpid is set to the process ID of the calling process.
msg_qnum is decremented by 1.
msg_rtime is set to the current time.
 

RETURN VALUE

On a failure both functions return -1 with errno indicating the error, otherwise msgsnd returns 0 and msgrvc returns the number of bytes actually copied into the mtext array.  

ERRORS

When msgsnd fails, at return errno will be set to one among the following values:
EAGAIN
The message can`t be sent due to the msg_qbytes limit for the queue and IPC_NOWAIT was asserted in mgsflg.
EACCES
The calling process has no write permission on the message queue.
EFAULT
The address pointed to by msgp isn`t accessible.
EIDRM
The message queue was removed.
EINTR
Sleeping on a full message queue condition, the process caught a signal.
EINVAL
Invalid msqid value, or nonpositive mtype value, or invalid msgsz value (less than 0 or greater than the system value MSGMAX).
ENOMEM
The system has not enough memory to make a copy of the supplied msgbuf.

When msgrcv fails, at return errno will be set to one among the following values:

E2BIG
The message text length is greater than msgsz and MSG_NOERROR isn`t asserted in msgflg.
EACCES
The calling process does not have read permission on the message queue.
EFAULT
The address pointed to by msgp isn`t accessible.
EIDRM
While the process was sleeping to receive a message, the message queue was removed.
EINTR
While the process was sleeping to receive a message, the process received a signal that had to be caught.
EINVAL
Illegal msgqid value, or msgsz less than 0.
ENOMSG
IPC_NOWAIT was asserted in msgflg and no message of the requested type existed on the message queue.
 

NOTES

The followings are system limits affecting a msgsnd system call:
MSGMAX
Maximum size for a message text: the implementation set this value to 8192 bytes.
MSGMNB
Default maximum size in bytes of a message queue: 16384 bytes. The super-user can increase the size of a message queue beyond MSGMNB by a msgctl system call.

The implementation has no intrinsic limits for the system wide maximum number of message headers (MSGTQL) and for the system wide maximum size in bytes of the message pool (MSGPOOL).  

CONFORMING TO

SVr4, SVID.  

NOTE

The pointer argument is declared as struct msgbuf * with libc4, libc5, glibc 2.0, glibc 2.1. It is declared as void * (const void * for msgsnd()) with glibc 2.2, following the SUSv2.  

SEE ALSO

ipc(5), msgctl(2), msgget(2), msgrcv(2), msgsnd(2)

msync

NAME

msync - synchronize a file with a memory map  

SYNOPSIS

#include <sys/mman.h>

int msync(void *start, size_t length, int flags);  

DESCRIPTION

msync flushes changes made to the in-core copy of a file that was mapped into memory using mmap(2) back to disk. Without use of this call there is no guarantee that changes are written back before munmap(2) is called. To be more precise, the part of the file that corresponds to the memory area starting at start and having length length is updated. The flags argument may have the bits MS_ASYNC, MS_SYNC and MS_INVALIDATE set, but not both MS_ASYNC and MS_SYNC. MS_ASYNC specifies that an update be scheduled, but the call returns immediately. MS_SYNC asks for an update and waits for it to complete. MS_INVALIDATE asks to invalidate other mappings of the same file (so that they can be updated with the fresh values just written).  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EINVAL
start is not a multiple of PAGESIZE, or any bit other than MS_ASYNC | MS_INVALIDATE | MS_SYNC is set in flags.
ENOMEM
The indicated memory (or part of it) was not mapped.
 

AVAILABILITY

On POSIX systems on which msync is available, both _POSIX_MAPPED_FILES and _POSIX_SYNCHRONIZED_IO are defined in <unistd.h> to a value greater than 0. (See also sysconf(3).)  

CONFORMING TO

POSIX.1b (formerly POSIX.4)

This call was introduced in Linux 1.3.21, and then used EFAULT instead of ENOMEM. In Linux 2.4.19 this was changed to the POSIX value ENOMEM.  

SEE ALSO

mmap(2), B.O. Gallmeister, POSIX.4, O`Reilly, pp. 128-129 and 389-391.

munlockall

NAME

munlockall - reenable paging for calling process  

SYNOPSIS

#include <sys/mman.h> int munlockall(void);

 

DESCRIPTION

munlockall reenables paging for all pages mapped into the address space of the calling process.

Memory locks do not stack, i.e., pages which have been locked several times by calls to mlock or mlockall will be unlocked by a single call to munlockall. Pages which are mapped to several locations or by several processes stay locked into RAM as long as they are locked at least at one location or by at least one process.

On POSIX systems on which mlockall and munlockall are available, _POSIX_MEMLOCK is defined in <unistd.h> .  

RETURN VALUE

On success, munlockall returns zero. On error, -1 is returned and errno is set appropriately.  

CONFORMING TO

POSIX.1b, SVr4  

SEE ALSO

mlockall(2), mlock(2), munlock(2)

NAL_ADDRESS_new

NAME

NAL_ADDRESS_new, NAL_ADDRESS_free, NAL_ADDRESS_create, NAL_ADDRESS_set_def_buffer_size, NAL_ADDRESS_can_connect, NAL_ADDRESS_can_listen - libnal addressing functions  

SYNOPSIS

#include <libnal/nal.h>

NAL_ADDRESS *NAL_ADDRESS_new(void); void NAL_ADDRESS_free(NAL_ADDRESS *addr); void NAL_ADDRESS_reset(NAL_ADDRESS *addr); int NAL_ADDRESS_create(NAL_ADDRESS *addr, const char *addr_string, unsigned int def_buffer_size); unsigned int NAL_ADDRESS_get_def_buffers_size(const NAL_ADDRESS *addr); int NAL_ADDRESS_set_def_buffer_size(NAL_ADDRESS *addr, unsigned int def_buffer_size); int NAL_ADDRESS_can_connect(const NAL_ADDRESS *addr); int NAL_ADDRESS_can_listen(const NAL_ADDRESS *addr);

 

DESCRIPTION

NAL_ADDRESS_new() allocates and initialises a new <FONT SIZE="-1">NAL_ADDRESS</FONT> object.

NAL_ADDRESS_free() destroys a <FONT SIZE="-1">NAL_ADDRESS</FONT> object.

NAL_ADDRESS_reset() will, if necessary, cleanup any prior state in addr so that it can be reused in NAL_ADDRESS_create(). Internally, there are other optimisations and benefits to using NAL_ADDRESS_reset() instead of NAL_ADDRESS_free() and NAL_ADDRESS_new() - the implementation can try to avoid repeated reallocation and reinitialisation of state, only doing full cleanup and reinitialisation when necessary.

NAL_ADDRESS_create() will attempt to parse a network address from the string constant provided in addr_string. If this succeeds, then addr will represent the given network address for use in other libnal functions. The significance of def_buffer_size is that any <FONT SIZE="-1">NAL_CONNECTION</FONT> object created with addr will inherent def_buffer_size as the default size for its read and write buffers (see NAL_CONNECTION_set_size(2)). If addr is used to create a <FONT SIZE="-1">NAL_LISTENER</FONT> object, then any <FONT SIZE="-1">NAL_CONNECTION</FONT> objects that are assigned connections from the listener will likewise have the given default size for its buffers. See the ``<FONT SIZE="-1">NOTES</FONT>`` section for information on the syntax of addr.

NAL_ADDRESS_set_def_buffer_size() sets def_buffer_size as the default buffer size in addr. This operation is built into NAL_ADDRESS_create() already. NAL_ADDRESS_get_def_buffer_size() returns the current default buffer size of addr.

NAL_ADDRESS_can_connect() will indicate whether the address represented by addr is of an appropriate form for creating a <FONT SIZE="-1">NAL_CONNECTION</FONT> object. NAL_ADDRESS_can_listen() likewise indicates if addr is appopriate for creating a <FONT SIZE="-1">NAL_LISTENER</FONT> object. In other words, these functions determine whether the address can be ``connected to`` or ``listened on``. Depending on the type of transport and the string from which addr was parsed, some addresses are only good for connecting or listening whereas others can be good for both. See ``<FONT SIZE="-1">NOTES</FONT>``.  

RETURN VALUES

NAL_ADDRESS_new() returns a valid <FONT SIZE="-1">NAL_ADDRESS</FONT> object on success, <FONT SIZE="-1">NULL</FONT> otherwise.

NAL_ADDRESS_free() and NAL_ADDRESS_reset() have no return value.

NAL_ADDRESS_get_def_buffer_size() returns the size of the current default buffer size in a <FONT SIZE="-1">NAL_ADDRESS</FONT> object.

All other <FONT SIZE="-1">NAL_ADDRESS</FONT> functions return zero for failure or false, and non-zero for success or true.  

NOTES

The string syntax implemented by libnal is used by all the distcache libraries and tools. At the time of writing, only TCP/IPv4 and unix domain sockets were supported as underlying transports and so likewise the implemented syntax handling only supported these two forms.
TCP/IPv4 addresses
The syntax for TCP/IPv4 addresses has two forms, depending on whether you specify a hostname (or alternatively a dotted-numeric <FONT SIZE="-1">IP</FONT> address) with the port number or just the port number on its own. Eg. to represent port 9001, one uses;

IP:9001

whereas to specify a hostname or <FONT SIZE="-1">IP</FONT> address with it, the syntax is;

IP:machinename.domain:9001 IP:192.168.0.1:9001

Either form of TCP/IPv4 address is generally valid for creating a <FONT SIZE="-1">NAL_LISTENER</FONT> object, although it will depend at run-time on the situation in the system - ie. whether privileges exist to listen on the port, whether the port is already in use, whether the specified hostname or <FONT SIZE="-1">IP</FONT> address is bound to a running network interface that can be listened on, etc. For creating a <FONT SIZE="-1">NAL_CONNECTION</FONT> object, an address must be specified. This is why the NAL_CONNECTION_can_connect() and NAL_CONNECTION_can_listen() helper functions exist - to detect whether the syntax used is logical for the intended use. Failures to set up network resources afterwards will in turn say whether the given address data is possible within the host system.

unix domain addresses
There is only one syntax for unix domain addresses, and so any correctly parsed address string is in theory valid for connecting to or listening on. The form is;

UNIX:/path/to/socket

This represents the path to the socket in the file system.

 

SEE ALSO

NAL_CONNECTION_new(2) - Functions for the <FONT SIZE="-1">NAL_CONNECTION</FONT> type.

NAL_LISTENER_new(2) - Functions for the <FONT SIZE="-1">NAL_LISTENER</FONT> type.

NAL_SELECTOR_new(2) - Functions for the <FONT SIZE="-1">NAL_SELECTOR</FONT> type.

NAL_BUFFER_new(2) - Functions for the <FONT SIZE="-1">NAL_BUFFER</FONT> type.

distcache(8) - Overview of the distcache architecture.

http://www.distcache.org/ - Distcache home page.  

AUTHOR

This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at geoff@geoffthorpe.net.

Home Page: http://www.distcache.org

NAL_CONNECTION_new

NAME

NAL_CONNECTION_new, NAL_CONNECTION_free, NAL_CONNECTION_create, NAL_CONNECTION_create_pair, NAL_CONNECTION_create_dummy, NAL_CONNECTION_set_size, NAL_CONNECTION_get_read, NAL_CONNECTION_get_send, NAL_CONNECTION_io, NAL_CONNECTION_io_cap, NAL_CONNECTION_is_established, NAL_CONNECTION_add_to_selector, NAL_CONNECTION_del_from_selector - libnal connection functions  

SYNOPSIS

#include <libnal/nal.h>

#define NAL_SELECT_FLAG_READ (unsigned int)0x0001 #define NAL_SELECT_FLAG_SEND (unsigned int)0x0002 #define NAL_SELECT_FLAG_RW (NAL_SELECT_FLAG_READ | NAL_SELECT_FLAG_SEND)

NAL_CONNECTION *NAL_CONNECTION_new(void); void NAL_CONNECTION_free(NAL_CONNECTION *conn); void NAL_CONNECTION_reset(NAL_CONNECTION *conn); int NAL_CONNECTION_create(NAL_CONNECTION *conn, const NAL_ADDRESS *addr); int NAL_CONNECTION_accept(NAL_CONNECTION *conn, NAL_LISTENER *list, NAL_SELECTOR *sel); int NAL_CONNECTION_create_pair(NAL_CONNECTION *conn1, NAL_CONNECTION *conn2, unsigned int def_buffer_size); #if 0 int NAL_CONNECTION_create_dummy(NAL_CONNECTION *conn, unsigned int def_buffer_size); #endif int NAL_CONNECTION_set_size(NAL_CONNECTION *conn, unsigned int size); NAL_BUFFER *NAL_CONNECTION_get_read(NAL_CONNECTION *conn); NAL_BUFFER *NAL_CONNECTION_get_send(NAL_CONNECTION *conn); const NAL_BUFFER *NAL_CONNECTION_get_read_c(const NAL_CONNECTION *conn); const NAL_BUFFER *NAL_CONNECTION_get_send_c(const NAL_CONNECTION *conn); int NAL_CONNECTION_io(NAL_CONNECTION *conn, NAL_SELECTOR *sel); int NAL_CONNECTION_io_cap(NAL_CONNECTION *conn, NAL_SELECTOR *sel, unsigned int max_read, unsigned int max_send); int NAL_CONNECTION_is_established(const NAL_CONNECTION *conn); void NAL_CONNECTION_add_to_selector(const NAL_CONNECTION *conn, NAL_SELECTOR *sel); void NAL_CONNECTION_add_to_selector_ex(const NAL_CONNECTION *conn, NAL_SELECTOR *sel, unsigned int flags); void NAL_CONNECTION_del_from_selector(const NAL_CONNECTION *conn, NAL_SELECTOR *sel);

 

DESCRIPTION

NAL_CONNECTION_new() allocates and initialises a new <FONT SIZE="-1">NAL_CONNECTION</FONT> object.

NAL_CONNECTION_free() destroys a <FONT SIZE="-1">NAL_CONNECTION</FONT> object.

NAL_CONNECTION_reset() will, if necessary, cleanup any prior state in conn so that it can be reused in NAL_CONNECTION_create(). Internally, there are other optimisations and benefits to using NAL_CONNECTION_reset() instead of NAL_CONNECTION_free() and NAL_CONNECTION_new() - the implementation can try to avoid repeated reallocation and reinitialisation of state, only doing full cleanup and reinitialisation when necessary.

NAL_CONNECTION_create() will attempt to connect to the address represented by addr. If this succeeds, it means either that the underlying connection of conn is established, or that a non-blocking connect was successfully initiated but has not yet completed (it may still be rejected by the peer eventually). Typically, unix domain sockets connect or fail immediately, and usually TCP/IPv4 connect non-blocking, though this may not be true for some interfaces such as `localhost`. NAL_CONNECTION_is_established() can be used to distinguish the difference. The size of the connection`s underlying read and send <FONT SIZE="-1">NAL_BUFFER</FONT>s is initialised to the default that was created in addr. See the ``<FONT SIZE="-1">NOTES</FONT>`` section for more discussion of connection semantics.

NAL_CONNECTION_accept() will not block waiting for incoming connection requests on list, but will accept any pending connection request that had already been identified by a previous call to NAL_SELECTOR_select(2) on sel. See ``<FONT SIZE="-1">NOTES</FONT>``.

NAL_CONNECTION_create_pair() will initialise conn1 and conn2 to be end-points of a single connection. This is typically implemented using the socketpair(2) function, and is designed to allow for an <FONT SIZE="-1">IPC</FONT> mechanism that integrates with libnal. def_buffer_size will control the size of the read and send buffers of both connections if the functions succeed. See the <FONT SIZE="-1">EXAMPLES</FONT> section for some uses of ``pairs``.

NAL_CONNECTION_create_dummy() will implement a virtual <FONT SIZE="-1">FIFO</FONT> that has no underlying network resource associated with it. Writing data to the connection amounts to pushing data onto the front of the <FONT SIZE="-1">FIFO</FONT>, and reading data from the connection amounts to popping data off the end of the <FONT SIZE="-1">FIFO</FONT>. The size of the <FONT SIZE="-1">FIFO</FONT> is specified by def_buffer_size. See the ``<FONT SIZE="-1">BUGS</FONT>`` section for a note on using these connection types with <FONT SIZE="-1">NAL_SELECTOR</FONT>.

NAL_CONNECTION_set_size() will resize the read and send buffers of conn to size. The default size of those buffers is inherited from the setting created in the <FONT SIZE="-1">NAL_ADDRESS</FONT> that initialised conn, or if conn was accepted from a <FONT SIZE="-1">NAL_LISTENER</FONT> object, then from the address that created the listener. The individual buffers can be resized independantly by using the following two functions to obtain the buffesr and using <FONT SIZE="-1">NAL_BUFFER</FONT> functions directly.

NAL_CONNECTION_get_read() and NAL_CONNECTION_get_send() return the read and send buffers of conn. This is how reading and writing is performed on conn, as <FONT SIZE="-1">NAL_BUFFER</FONT> functions may be used on these buffers directly. NAL_CONNECTION_get_read_c() and NAL_CONNECTION_get_send_c() perform the same function but on a constant conn parameter and returning constant pointers to the corresponding buffers.

NAL_CONNECTION_io() will perform any network input/output that is possible given the state in sel. Unless conn had been added to sel via NAL_SELECTOR_add_conn() (or its `_ex` variant) and a resulting call to NAL_SELECTOR_select() had revealed readability and/or writability on conn, this function will silently succeed. Otherwise it will attempt to perform whatever reading or writing was required. If this function fails, that indicates that the connection is no longer valid - this represents a disconnection by the peer, the result of a non-blocking connect that had been initiated but was unable to connect, or some network error that makes conn unusable. See the ``<FONT SIZE="-1">NOTES</FONT>`` section.

NAL_CONNECTION_io_cap() is a version of NAL_CONNECTION_io() that allows the caller to specify a limit on the maximum amount conn should read from, or send to, the network. Whether this amount is read or sent (or even whether reading or sending takes place at all) depends on; the data (and space) available is in the connection`s buffers, what the results of the last select on sel were, and how much data the host system`s networking support will accept or provide to conn.

NAL_CONNECTION_is_established() is useful for determining when a non-blocking connect has completed. See the ``<FONT SIZE="-1">NOTES</FONT>`` section.

NAL_CONNECTION_add_to_selector() registers conn with the selector sel for any events relevant to it. NAL_CONNECTION_del_from_selector() can be used to reverse this if called before any subsequent call to NAL_SELECTOR_select(). NAL_CONNECTION_add_to_selector_ex() extends NAL_CONNECTION_add_to_selector() by allowing a bit-mask to be supplied to control what events the connection can be selected on, these flags are indicated above prefixed with <FONT SIZE="-1">NAL_SELECT_FLAG_</FONT>.  

RETURN VALUES

NAL_CONNECTION_new() returns a valid <FONT SIZE="-1">NAL_CONNECTION</FONT> object on success, <FONT SIZE="-1">NULL</FONT> otherwise.

NAL_CONNECTION_free(), NAL_CONNECTION_reset(), NAL_CONNECTION_add_to_selector(), NAL_CONNECTION_add_to_selector_ex(), and NAL_CONNECTION_del_from_selector() have no return value.

NAL_CONNECTION_get_read(), NAL_CONNECTION_get_send(), NAL_CONNECTION_get_read_c(), and NAL_CONNECTION_get_send_c() return pointers to the connection`s buffer objects or <FONT SIZE="-1">NULL</FONT> for failure.

NAL_CONNECTION_accept() returns non-zero if a connection was accepted and is represented by the provided <FONT SIZE="-1">NAL_CONNECTION</FONT> object, or zero if no connection attempt was pending (or if there was but an error prevented the accept operation).

All other <FONT SIZE="-1">NAL_CONNECTION</FONT> functions return zero for failure or false, and non-zero for success or true.  

NOTES

A <FONT SIZE="-1">NAL_CONNECTION</FONT> object encapsulates two <FONT SIZE="-1">NAL_BUFFER</FONT> objects and a non-blocking socket. Any data that has been read from the socket is placed in the read buffer, and applications write data into the send buffer for it to be (eventually) written out to the socket. The <FONT SIZE="-1">NAL_SELECTOR</FONT> type provides the ability to poll for any requested network events and then allow connections and listeners to perform their network input/output based on the results.

NAL_CONNECTION_add_to_selector() uses the following logic; the connection is always selected for exception events, and will be selected for readability if its read buffer is not full and writability if its send buffer is not empty.

NAL_CONNECTION_io() is used after calling NAL_CONNECTION_add_to_selector() and a subsequent call to NAL_SELECTOR_select(). It observes the following logic; if an exception event has occured it returns failure, if readability is indicated it will read incoming data up to the limit of the available space in the read buffer, and if writability is indicated it will send as much of the send buffer`s data as possible. If NAL_CONNECTION_io() returns failure, the connection is considered broken for some reason and no further I/O operations should be attempted (the behaviour is undefined). <FONT SIZE="-1">NB:</FONT> The connection object is not automatically cleaned up so as to allow the caller to continue reading any data in the read buffer and/or examine any unsent data in the send buffer.

The above is almost true, <FONT SIZE="-1">BTW</FONT> :-) The special case is that of non-blocking connects. If NAL_CONNECTION_create() cannot immediately connect without blocking, it will return success but subsequent calls to NAL_CONNECTION_is_established() will reveal that the connection is not yet complete. Any connection that is not complete will request selection for sendability inside NAL_CONNECTION_add_to_selector(), whether the application has provided data to send or not. The completion (or failure) of the non-blocking connect will thus cause any subsequent NAL_SELECTOR_select() operation to break. As with all other semantics, it is the follow up call to NAL_CONNECTION_io() that changes the state of the connection object - if it returns failure, the non-blocking connect failed. If it returns success, you should still call NAL_CONNECTION_is_established() to determine if the connection is complete, as the selector could have broken because of signals or network events on other objects.

NAL_CONNECTION_accept() will return immediately, and will only succeed if the <FONT SIZE="-1">NAL_LISTENER</FONT> object had already been added to the selector using NAL_LISTENER_add_to_select(), the selector had been subsequently selected using NAL_SELECTOR_select(2), and this indicated an incoming connection request waiting on the listener.

It should be noted that the actual transport in use is virtualised to allow for multiple transports and, because of this, multiple semantics for how the network functionality behaves. TCP/IPv4 and unix domain socket based connections, as well as connection pairs from NAL_CONNECTION_create_pair(), operate very much as described here. The <FONT SIZE="-1">FIFO</FONT> connection type, created by NAL_CONNECTION_create_dummy() is not yet consistent with this and is described in the ``<FONT SIZE="-1">BUGS</FONT>`` section.  

BUGS

Dummy <FONT SIZE="-1">FIFO</FONT> connections created using NAL_CONNECTION_create_dummy() should be trivially selectable if anyone`s daft enough to try. Ie. if you add a dummy connection to a selector, the NAL_SELECTOR_select() should break instantly if the <FONT SIZE="-1">FIFO</FONT> is non-empty otherwise the <FONT SIZE="-1">FIFO</FONT> should have no influence at all on the real select(2). Right now, NAL_CONNECTION_add_to_selector() silently ignores dummy connections completely.  

EXAMPLES

A typical state-machine implementation using a single connection is illustrated here (without error-checking);

NAL_BUFFER *c_read, *c_send; NAL_SELECTOR *sel = NAL_SELECTOR_new(); NAL_CONNECTION *conn = NAL_CONNECTION_new(); NAL_ADDRESS *addr = retrieve_the_desired_address();

/* Setup */ NAL_CONNECTION_create(conn, addr); c_read = NAL_CONNECTION_get_read(conn); c_send = NAL_CONNECTION_get_send(conn);

/* Loop */ do { /* This is where the state-machine code should process as much data as * possible from `c_read` and/or produce as much output to `c_send` as * it can. */ ... ... user code ... /* block on (relevant) network events for `conn` */ NAL_CONNECTION_add_to_selector(conn, sel); NAL_SELECTOR_select(sel, 0, 0); /* Do network I/O after the above blocking select and continue looping * only if the connection is still alive. */ } while(NAL_CONNECTION_io(conn, sel));

An example of using a connection pair (with 2 Kb read and send buffers for each connection) to create <FONT SIZE="-1">IPC</FONT> between a parent process and its child (again, no error checking);

NAL_CONNECTION *ipc_to_parent = NAL_CONNECTION_new(); NAL_CONNECTION *ipc_to_child = NAL_CONNECTION_new();

/* Setup */ NAL_CONNECTION_create_pair(ipc_to_parent, ipc_to_child, 2048);

/* Create child process */ switch(fork()) { case 0: /* Inside the child process, close our copy of the parent`s side */ NAL_CONNECTION_free(ipc_to_child); /* Do child process things, and use `ipc_to_parent` to communicate * with the parent. */ do_child_logic(ipc_to_parent); exit(0); default: /* Inside the parent process, close our copy of the child`s side */ NAL_CONNECTION_free(ipc_to_parent); break; } /* Continue in the parent process, and use `ipc_to_child` to communicate * with the child. */ do_parent_logic(ipc_to_child);

Note that these connection pairs can also be a useful way of handling process termination that allow you to bypass signal handling altogether. If a child process terminates, the connection between the pair will be broken and so this will be noticed in the parent process by any selector selecting on the ipc_to_child connection - the subsequent NAL_CONNECTION_io() operation will fail indicating that the child process is dead (or in the process of dying) and so the parent could immediately call wait(2) or waitpid(2). Whether the <FONT SIZE="-1">SIGCHLD</FONT> signal arrives before the NAL_CONNECTION_io() call or not is not too important, at worst it might prematurely interrupt NAL_SELECTOR_select() (causing it to return zero) so that a redundant loop of the state-machine runs before the next select operation will notice the disconnection. If you already need <FONT SIZE="-1">IPC</FONT> between the parent and child for exchange of data anyway, this mechanism could be useful in avoiding global variables, signal handlers, and the associated difficulties.  

SEE ALSO

NAL_CONNECTION_new(2) - Functions for the <FONT SIZE="-1">NAL_CONNECTION</FONT> type.

NAL_LISTENER_new(2) - Functions for the <FONT SIZE="-1">NAL_LISTENER</FONT> type.

NAL_SELECTOR_new(2) - Functions for the <FONT SIZE="-1">NAL_SELECTOR</FONT> type.

NAL_BUFFER_new(2) - Functions for the <FONT SIZE="-1">NAL_BUFFER</FONT> type.

distcache(8) - Overview of the distcache architecture.

http://www.distcache.org/ - Distcache home page.  

AUTHOR

This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at geoff@geoffthorpe.net.

Home Page: http://www.distcache.org

access

NAME

access - check user`s permissions for a file  

SYNOPSIS

#include <unistd.h> int access(const char *pathname, int mode);

 

DESCRIPTION

access checks whether the process would be allowed to read, write or test for existence of the file (or other file system object) whose name is pathname. If pathname is a symbolic link permissions of the file referred to by this symbolic link are tested.

mode is a mask consisting of one or more of R_OK, W_OK, X_OK and F_OK.

R_OK, W_OK and X_OK request checking whether the file exists and has read, write and execute permissions, respectively. F_OK just requests checking for the existence of the file.

The tests depend on the permissions of the directories occurring in the path to the file, as given in pathname, and on the permissions of directories and files referred to by symbolic links encountered on the way.

The check is done with the process`s real uid and gid, rather than with the effective ids as is done when actually attempting an operation. This is to allow set-UID programs to easily determine the invoking user`s authority.

Only access bits are checked, not the file type or contents. Therefore, if a directory is found to be "writable," it probably means that files can be created in the directory, and not that the directory can be written as a file. Similarly, a DOS file may be found to be "executable," but the execve(2) call will still fail.

If the process has appropriate privileges, an implementation may indicate success for X_OK even if none of the execute file permission bits are set.  

RETURN VALUE

On success (all requested permissions granted), zero is returned. On error (at least one bit in mode asked for a permission that is denied, or some other error occurred), -1 is returned, and errno is set appropriately.  

ERRORS

access shall fail if:
EACCES
The requested access would be denied to the file or search permission is denied to one of the directories in pathname.
ELOOP
Too many symbolic links were encountered in resolving pathname.
ENAMETOOLONG
pathname is too long.
ENOENT
A directory component in pathname would have been accessible but does not exist or was a dangling symbolic link.
ENOTDIR
A component used as a directory in pathname is not, in fact, a directory.
EROFS
Write permission was requested for a file on a read-only filesystem.

access may fail if:

EFAULT
pathname points outside your accessible address space.
EINVAL
mode was incorrectly specified.
EIO
An I/O error occurred.
ENOMEM
Insufficient kernel memory was available.
ETXTBSY
Write access was requested to an executable which is being executed.
 

RESTRICTIONS

access returns an error if any of the access types in the requested call fails, even if other types might be successful.

access may not work correctly on NFS file systems with UID mapping enabled, because UID mapping is done on the server and hidden from the client, which checks permissions.

Using access to check if a user is authorized to e.g. open a file before actually doing so using open(2) creates a security hole, because the user might exploit the short time interval between checking and opening the file to manipulate it.  

CONFORMING TO

SVID, AT&T, POSIX, X/OPEN, BSD 4.3  

SEE ALSO

stat(2), open(2), chmod(2), chown(2), setuid(2), setgid(2)

adjtimex

NAME

adjtimex - tune kernel clock  

SYNOPSIS

#include <sys/timex.h>

int adjtimex(struct timex *buf);  

DESCRIPTION

Linux uses David L. Mills` clock adjustment algorithm (see RFC 1305). The system call adjtimex reads and optionally sets adjustment parameters for this algorithm. It takes a pointer to a timex structure, updates kernel parameters from field values, and returns the same structure with current kernel values. This structure is declared as follows:

struct timex { int modes; /* mode selector */ long offset; /* time offset (usec) */ long freq; /* frequency offset (scaled ppm) */ long maxerror; /* maximum error (usec) */ long esterror; /* estimated error (usec) */ int status; /* clock command/status */ long constant; /* pll time constant */ long precision; /* clock precision (usec) (read only) */ long tolerance; /* clock frequency tolerance (ppm) (read only) */ struct timeval time; /* current time (read only) */ long tick; /* usecs between clock ticks */ };

The modes field determines which parameters, if any, to set. It may contain a bitwise-or combination of zero or more of the following bits:

#define ADJ_OFFSET 0x0001 /* time offset */ #define ADJ_FREQUENCY 0x0002 /* frequency offset */ #define ADJ_MAXERROR 0x0004 /* maximum time error */ #define ADJ_ESTERROR 0x0008 /* estimated time error */ #define ADJ_STATUS 0x0010 /* clock status */ #define ADJ_TIMECONST 0x0020 /* pll time constant */ #define ADJ_TICK 0x4000 /* tick value */ #define ADJ_OFFSET_SINGLESHOT 0x8001 /* old-fashioned adjtime */

Ordinary users are restricted to a zero value for mode. Only the superuser may set any parameters.
 

RETURN VALUE

On success, adjtimex returns the clock state:

#define TIME_OK 0 /* clock synchronized */ #define TIME_INS 1 /* insert leap second */ #define TIME_DEL 2 /* delete leap second */ #define TIME_OOP 3 /* leap second in progress */ #define TIME_WAIT 4 /* leap second has occurred */ #define TIME_BAD 5 /* clock not synchronized */

On failure, adjtimex returns -1 and sets errno.  

ERRORS

EFAULT
buf does not point to writable memory.
EPERM
buf.mode is non-zero and the user is not super-user.
EINVAL
An attempt is made to set buf.offset to a value outside the range -131071 to +131071, or to set buf.status to a value other than those listed above, or to set buf.tick to a value outside the range 900000/HZ to 1100000/HZ, where HZ is the system timer interrupt frequency.
 

CONFORMING TO

adjtimex is Linux specific and should not be used in programs intended to be portable. There is a similar but less general call adjtime in SVr4.  

SEE ALSO

settimeofday(2)

alarm

NAME

alarm - set an alarm clock for delivery of a signal  

SYNOPSIS

#include <unistd.h> unsigned int alarm(unsigned int seconds);

 

DESCRIPTION

alarm arranges for a SIGALRM signal to be delivered to the process in seconds seconds.

If seconds is zero, no new alarm is scheduled.

In any event any previously set alarm is cancelled.  

RETURN VALUE

alarm returns the number of seconds remaining until any previously scheduled alarm was due to be delivered, or zero if there was no previously scheduled alarm.  

NOTES

alarm and setitimer share the same timer; calls to one will interfere with use of the other.

sleep() may be implemented using SIGALRM; mixing calls to alarm() and sleep() is a bad idea.

Scheduling delays can, as ever, cause the execution of the process to be delayed by an arbitrary amount of time.  

CONFORMING TO

SVr4, SVID, POSIX, X/OPEN, BSD 4.3  

SEE ALSO

setitimer(2), signal(2), sigaction(2), gettimeofday(2), select(2), pause(2), sleep(3)

arch_prctl

NAME

arch_prctl - set architecture specific thread state  

SYNOPSIS

#include <asm/prctl.h>

#include <sys/prctl.h>

int arch_prctl(int code, unsigned long addr)  

DESCRIPTION

The arch_prctl function sets architecture specific process or thread state. code selects a subfunction and passes argument addr to it.

Sub functions for x86-64 are:

ARCH_SET_FS
Set the 64bit base for the FS register to addr.
ARCH_GET_FS
Return the 64bit base value for the FS register of the current thread in the unsigned long pointed to by the address parameter
ARCH_SET_GS
Set the 64bit base for the GS register to addr.
ARCH_GET_GS
Return the 64bit base value for the GS register of the current thread in the unsigned long pointed to by the address parameter.
 

NOTES

arch_prctl is only supported on Linux/x86-64 for 64bit programs currently.

The 64bit base changes when a new 32bit segment selector is loaded.

ARCH_SET_GS is disabled in some kernels.

Context switches for 64bit segment bases are rather expensive. It may be a faster alternative to set a 32bit base using a segment selector by setting up an LDT with modify_ldt(2) or using the set_thread_area(2) system call in a 2.5 kernel. arch_prctl is only needed when you want to set bases that are larger than 4GB. Memory in the first 2GB of address space can be allocated by using mmap(2) with the MAP_32BIT flag.

No prototype for arch_prctl in glibc 2.2. You have to declare it yourself for now. This will be fixed in future glibc versions.

FS may be already used by the threading library.  

ERRORS

EINVAL
code is not a valid subcommand.
EPERM
addr is outside the process address space.
EFAULT
addr points to an unmapped address or is outside the process address space.
 

AUTHOR

Man page written by Andi Kleen.  

CONFORMANCE

arch_prctl is a Linux/x86-64 extension and should not be used in programs intended to be portable.  

SEE ALSO

mmap(2), modify_ldt(2), prctl(2), set_thread_area(2)

AMD X86-64 Programmer`s manual

bind

NAME

bind - bind a name to a socket  

SYNOPSIS

#include <sys/types.h>
#include <sys/socket.h>

int bind(int sockfd, struct sockaddr *my_addr, socklen_t addrlen);  

DESCRIPTION

bind gives the socket sockfd the local address my_addr. my_addr is addrlen bytes long. Traditionally, this is called lqassigning a name to a socket.rq When a socket is created with socket(2), it exists in a name space (address family) but has no name assigned.

It is normally necessary to assign a local address using bind before a SOCK_STREAM socket may receive connections (see accept(2)).

The rules used in name binding vary between address families. Consult the manual entries in Section 7 for detailed information. For AF_INET see ip(7), for AF_UNIX see unix(7), for AF_APPLETALK see ddp(7), for AF_PACKET see packet(7), for AF_X25 see x25(7) and for AF_NETLINK see netlink(7).

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
sockfd is not a valid descriptor.
EINVAL
The socket is already bound to an address. This may change in the future: see linux/unix/sock.c for details.
EACCES
The address is protected, and the user is not the super-user.
ENOTSOCK
Argument is a descriptor for a file, not a socket.

The following errors are specific to UNIX domain (AF_UNIX) sockets:

EINVAL
The addrlen is wrong, or the socket was not in the AF_UNIX family.
EROFS
The socket inode would reside on a read-only file system.
EFAULT
my_addr points outside the user`s accessible address space.
ENAMETOOLONG
my_addr is too long.
ENOENT
The file does not exist.
ENOMEM
Insufficient kernel memory was available.
ENOTDIR
A component of the path prefix is not a directory.
EACCES
Search permission is denied on a component of the path prefix.
ELOOP
Too many symbolic links were encountered in resolving my_addr.
 

BUGS

The transparent proxy options are not described.  

CONFORMING TO

SVr4, 4.4BSD (the bind function first appeared in BSD 4.2). SVr4 documents additional EADDRNOTAVAIL, EADDRINUSE, and ENOSR general error conditions, and additional EIO and EISDIR Unix-domain error conditions.  

NOTE

The third argument of bind is in reality an int (and this is what BSD 4.* and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t. See also accept(2).  

SEE ALSO

accept(2), connect(2), listen(2), socket(2), getsockname(2), ip(7), socket(7)

brk

NAME

brk, sbrk - change data segment size  

SYNOPSIS

#include <unistd.h>

int brk(void *end_data_segment);

void *sbrk(intptr_t increment);  

DESCRIPTION

brk sets the end of the data segment to the value specified by end_data_segment, when that value is reasonable, the system does have enough memory and the process does not exceed its max data size (see setrlimit(2)).

sbrk increments the program`s data space by increment bytes. sbrk isn`t a system call, it is just a C library wrapper. Calling sbrk with an increment of 0 can be used to find the current location of the program break.  

RETURN VALUE

On success, brk returns zero, and sbrk returns a pointer to the start of the new area. On error, -1 is returned, and errno is set to ENOMEM.  

CONFORMING TO

BSD 4.3

brk and sbrk are not defined in the C Standard and are deliberately excluded from the POSIX.1 standard (see paragraphs B.1.1.1.3 and B.8.3.3).  

NOTES

Various systems use various types for the parameter of sbrk(). Common are int, ssize_t, ptrdiff_t, intptr_t. XPGv6 obsoletes this function.  

SEE ALSO

execve(2), getrlimit(2), malloc(3)

chdir

NAME

chdir, fchdir - change working directory  

SYNOPSIS

#include <unistd.h>

int chdir(const char *path);
int fchdir(int fd);  

DESCRIPTION

chdir changes the current directory to that specified in path.

fchdir is identical to chdir, only that the directory is given as an open file descriptor.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

Depending on the file system, other errors can be returned. The more general errors for chdir are listed below:
EFAULT
path points outside your accessible address space.
ENAMETOOLONG
path is too long.
ENOENT
The file does not exist.
ENOMEM
Insufficient kernel memory was available.
ENOTDIR
A component of path is not a directory.
EACCES
Search permission is denied on a component of path.
ELOOP
Too many symbolic links were encountered in resolving path.
EIO
An I/O error occurred.

The general errors for fchdir are listed below:

EBADF
fd is not a valid file descriptor.
EACCES
Search permission was denied on the directory open on fd.
 

NOTES

The prototype for fchdir is only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).  

CONFORMING TO

The chdir call is compatible with SVr4, SVID, POSIX, X/OPEN, 4.4BSD. SVr4 documents additional EINTR, ENOLINK, and EMULTIHOP error conditions but has no ENOMEM. POSIX.1 does not have ENOMEM or ELOOP error conditions. X/OPEN does not have EFAULT, ENOMEM or EIO error conditions.

The fchdir call is compatible with SVr4, 4.4BSD and X/OPEN. SVr4 documents additional EIO, EINTR, and ENOLINK error conditions. X/OPEN documents additional EINTR and EIO error conditions.  

SEE ALSO

getcwd(3), chroot(2)

chown

NAME

chown, fchown, lchown - change ownership of a file  

SYNOPSIS

#include <sys/types.h>
#include <unistd.h>

int chown(const char *path, uid_t owner, gid_t group);
int fchown(int fd, uid_t owner, gid_t group);
int lchown(const char *path, uid_t owner, gid_t group);  

DESCRIPTION

The owner of the file specified by path or by fd is changed. Only the super-user may change the owner of a file. The owner of a file may change the group of the file to any group of which that owner is a member. The super-user may change the group arbitrarily.

If the owner or group is specified as -1, then that ID is not changed.

When the owner or group of an executable file are changed by a non-super-user, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown; the Linux behaviour depends on the kernel version. In case of a non-group-executable file (with clear S_IXGRP bit) the S_ISGID bit indicates mandatory locking, and is not cleared by a chown.

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

Depending on the file system, other errors can be returned. The more general errors for chown are listed below:

EPERM
The effective UID does not match the owner of the file, and is not zero; or the owner or group were specified incorrectly.
EROFS
The named file resides on a read-only file system.
EFAULT
path points outside your accessible address space.
ENAMETOOLONG
path is too long.
ENOENT
The file does not exist.
ENOMEM
Insufficient kernel memory was available.
ENOTDIR
A component of the path prefix is not a directory.
EACCES
Search permission is denied on a component of the path prefix.
ELOOP
Too many symbolic links were encountered in resolving path.

The general errors for fchown are listed below:

EBADF
The descriptor is not valid.
ENOENT
See above.
EPERM
See above.
EROFS
See above.
EIO
A low-level I/O error occurred while modifying the inode.
 

NOTES

In versions of Linux prior to 2.1.81 (and distinct from 2.1.46), chown did not follow symbolic links. Since Linux 2.1.81, chown does follow symbolic links, and there is a new system call lchown that does not follow symbolic links. Since Linux 2.1.86, this new call (that has the same semantics as the old chown) has got the same syscall number, and chown got the newly introduced number.

The prototype for fchown is only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).  

CONFORMING TO

The chown call conforms to SVr4, SVID, POSIX, X/OPEN. The 4.4BSD version can only be used by the superuser (that is, ordinary users cannot give away files). SVr4 documents EINVAL, EINTR, ENOLINK and EMULTIHOP returns, but no ENOMEM. POSIX.1 does not document ENOMEM or ELOOP error conditions.

The fchown call conforms to 4.4BSD and SVr4. SVr4 documents additional EINVAL, EIO, EINTR, and ENOLINK error conditions.  

RESTRICTIONS

The chown() semantics are deliberately violated on NFS file systems which have UID mapping enabled. Additionally, the semantics of all system calls which access the file contents are violated, because chown() may cause immediate access revocation on already open files. Client side caching may lead to a delay between the time where ownership have been changed to allow access for a user and the time where the file can actually be accessed by the user on other clients.  

SEE ALSO

chmod(2), flock(2)

clone

NAME

clone - create a child process  

SYNOPSIS

#include <sched.h>

int clone(int (*fn)(void *), void *child_stack, int flags, void *arg);

_syscall2(int, clone, int, flags, void *, child_stack)

 

DESCRIPTION

clone creates a new process, just like fork(2). clone is a library function layered on top of the underlying clone system call, hereinafter referred to as sys_clone. A description of sys_clone is given towards the end of this page.

Unlike fork(2), these calls allow the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers. (Note that on this manual page, "calling process" normally corresponds to "parent process". But see the description of CLONE_PARENT below.)

The main use of clone is to implement threads: multiple threads of control in a program that run concurrently in a shared memory space.

When the child process is created with clone, it executes the function application fn(arg). (This differs from fork(2), where execution continues in the child from the point of the fork(2) call.) The fn argument is a pointer to a function that is called by the child process at the beginning of its execution. The arg argument is passed to the fn function.

When the fn(arg) function application returns, the child process terminates. The integer returned by fn is the exit code for the child process. The child process may also terminate explicitly by calling exit(2) or after receiving a fatal signal.

The child_stack argument specifies the location of the stack used by the child process. Since the child and calling process may share memory, it is not possible for the child process to execute in the same stack as the calling process. The calling process must therefore set up memory space for the child stack and pass a pointer to this space to clone. Stacks grow downwards on all processors that run Linux (except the HP PA processors), so child_stack usually points to the topmost address of the memory space set up for the child stack.

The low byte of flags contains the number of the signal sent to the parent when the child dies. If this signal is specified as anything other than SIGCHLD, then the parent process must specify the __WALL or __WCLONE options when waiting for the child with wait(2). If no signal is specified, then the parent process is not signaled when the child terminates.

flags may also be bitwise-or`ed with one or several of the following constants, in order to specify what is shared between the calling process and the child process:

CLONE_PARENT
(Linux 2.4 onwards) If CLONE_PARENT is set, then the parent of the new child (as returned by getppid(2)) will be the same as that of the calling process.

If CLONE_PARENT is not set, then (as with fork(2)) the child`s parent is the calling process.

Note that it is the parent process, as returned by getppid(2), which is signaled when the child terminates, so that if CLONE_PARENT is set, then the parent of the calling process, rather than the calling process itself, will be signaled.

CLONE_FS
If CLONE_FS is set, the caller and the child processes share the same file system information. This includes the root of the file system, the current working directory, and the umask. Any call to chroot(2), chdir(2), or umask(2) performed by the calling process or the child process also takes effect in the other process.

If CLONE_FS is not set, the child process works on a copy of the file system information of the calling process at the time of the clone call. Calls to chroot(2), chdir(2), umask(2) performed later by one of the processes do not affect the other process.

CLONE_FILES
If CLONE_FILES is set, the calling process and the child processes share the same file descriptor table. File descriptors always refer to the same files in the calling process and in the child process. Any file descriptor created by the calling process or by the child process is also valid in the other process. Similarly, if one of the processes closes a file descriptor, or changes its associated flags, the other process is also affected.

If CLONE_FILES is not set, the child process inherits a copy of all file descriptors opened in the calling process at the time of clone. Operations on file descriptors performed later by either the calling process or the child process do not affect the other process.

CLONE_NEWNS
(Linux 2.4.19 onwards) Start the child in a new namespace.

Every process lives in a namespace. The namespace of a process is the data (the set of mounts) describing the file hierarchy as seen by that process. After a fork(2) or clone(2) where the CLONE_NEWNS flag is not set, the child lives in the same namespace as the parent. The system calls mount(2) and umount(2) change the namespace of the calling process, and hence affect all processes that live in the same namespace, but do not affect processes in a different namespace.

After a clone(2) where the CLONE_NEWNS flag is set, the cloned child is started in a new namespace, initialized with a copy of the namespace of the parent.

Only a privileged process may specify the CLONE_NEWNS flag. It is not permitted to specify both CLONE_NEWNS and CLONE_FS in the same clone call.

CLONE_SIGHAND
If CLONE_SIGHAND is set, the calling process and the child processes share the same table of signal handlers. If the calling process or child process calls sigaction(2) to change the behavior associated with a signal, the behavior is changed in the other process as well. However, the calling process and child processes still have distinct signal masks and sets of pending signals. So, one of them may block or unblock some signals using sigprocmask(2) without affecting the other process.

If CLONE_SIGHAND is not set, the child process inherits a copy of the signal handlers of the calling process at the time clone is called. Calls to sigaction(2) performed later by one of the processes have no effect on the other process.

CLONE_PTRACE
If CLONE_PTRACE is specified, and the calling process is being traced, then trace the child also (see ptrace(2)).

CLONE_VFORK
If CLONE_VFORK is set, the execution of the calling process is suspended until the child releases its virtual memory resources via a call to execve(2) or _exit(2) (as with vfork(2)).

If CLONE_VFORK is not set then both the calling process and the child are schedulable after the call, and an application should not rely on execution occurring in any particular order.

CLONE_VM
If CLONE_VM is set, the calling process and the child processes run in the same memory space. In particular, memory writes performed by the calling process or by the child process are also visible in the other process. Moreover, any memory mapping or unmapping performed with mmap(2) or munmap(2) by the child or calling process also affects the other process.

If CLONE_VM is not set, the child process runs in a separate copy of the memory space of the calling process at the time of clone. Memory writes or file mappings/unmappings performed by one of the processes do not affect the other, as with fork(2).

CLONE_PID
(Obsolete) If CLONE_PID is set, the child process is created with the same process ID as the calling process. This is good for hacking the system, but otherwise of not much use. Since 2.3.21 this flag can be specified only by the system boot process (PID 0). It disappeared in Linux 2.5.16.

CLONE_THREAD
(Linux 2.4 onwards) If CLONE_THREAD is set, the child is placed in the same thread group as the calling process.

If CLONE_THREAD is not set, then the child is placed in its own (new) thread group, whose ID is the same as the process ID.

(Thread groups are feature added in Linux 2.4 to support the POSIX threads notion of a set of threads sharing a single PID. In Linux 2.4, calls to getpid(2) return the thread group ID of the caller.)

 

sys_clone

The sys_clone system call corresponds more closely to fork(2) in that execution in the child continues from the point of the call. Thus, sys_clone only requires the flags and child_stack arguments, which have the same meaning as for clone. (Note that the order of these arguments differs from clone.)

Another difference for sys_clone is that the child_stack argument may be zero, in which case copy-on-write semantics ensure that the child gets separate copies of stack pages when either process modifies the stack. In this case, for correct operation, the CLONE_VM option should not be specified.

 

RETURN VALUE

On success, the PID of the child process is returned in the caller`s thread of execution. On failure, a -1 will be returned in the caller`s context, no child process will be created, and errno will be set appropriately.

 

ERRORS

EAGAIN
Too many processes are already running.
ENOMEM
Cannot allocate sufficient memory to allocate a task structure for the child, or to copy those parts of the caller`s context that need to be copied.
EINVAL
Returned by clone when a zero value is specified for child_stack.
EINVAL
Both CLONE_FS and CLONE_NEWNS were specified in flags.
EINVAL
CLONE_THREAD was specified, but CLONE_SIGHAND was not. (Since Linux 2.5.35.)
EINVAL
Precisely one of CLONE_DETACHED and CLONE_THREAD was specified. (Since Linux 2.6.0-test6.)
EINVAL
CLONE_SIGHAND was specified, but CLONE_VM was not. (Since Linux 2.6.0-test6.)
EPERM
CLONE_NEWNS was specified by a non-root process (process without CAP_SYS_ADMIN).
EPERM
CLONE_PID was specified by a process other than process 0.

 

BUGS

There is no entry for clone in libc version 5. libc 6 (a.k.a. glibc 2) provides clone as described in this manual page.

 

NOTES

For kernel versions 2.4.7-2.4.18 the CLONE_THREAD flag implied the CLONE_PARENT flag.

 

CONFORMING TO

The clone and sys_clone calls are Linux-specific and should not be used in programs intended to be portable. For programming threaded applications (multiple threads of control in the same memory space), it is better to use a library implementing the POSIX 1003.1c thread API, such as the LinuxThreads library (included in glibc2). See pthread_create(3).

 

SEE ALSO

fork(2), wait(2), pthread_create(3)

connect

NAME

connect - initiate a connection on a socket  

SYNOPSIS

#include <sys/types.h>
#include <sys/socket.h>

int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);  

DESCRIPTION

The file descriptor sockfd must refer to a socket. If the socket is of type SOCK_DGRAM then the serv_addr address is the address to which datagrams are sent by default, and the only address from which datagrams are received. If the socket is of type SOCK_STREAM or SOCK_SEQPACKET, this call attempts to make a connection to another socket. The other socket is specified by serv_addr, which is an address (of length addrlen) in the communications space of the socket. Each communications space interprets the serv_addr parameter in its own way.

Generally, connection-based protocol sockets may successfully connect only once; connectionless protocol sockets may use connect multiple times to change their association. Connectionless sockets may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC.  

RETURN VALUE

If the connection or binding succeeds, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

The following are general socket errors only. There may be other domain-specific error codes.
EBADF
The file descriptor is not a valid index in the descriptor table.
EFAULT
The socket structure address is outside the user`s address space.
ENOTSOCK
The file descriptor is not associated with a socket.
EISCONN
The socket is already connected.
ECONNREFUSED
No one listening on the remote address.
ETIMEDOUT
Timeout while attempting connection. The server may be too busy to accept new connections. Note that for IP sockets the timeout may be very long when syncookies are enabled on the server.
ENETUNREACH
Network is unreachable.
EADDRINUSE
Local address is already in use.
EINPROGRESS
The socket is non-blocking and the connection cannot be completed immediately. It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select indicates writability, use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure).
EALREADY
The socket is non-blocking and a previous connection attempt has not yet been completed.
EAGAIN
No more free local ports or insufficient entries in the routing cache. For PF_INET see the net.ipv4.ip_local_port_range sysctl in ip(7) on how to increase the number of local ports.
EAFNOSUPPORT
The passed address didn`t have the correct address family in its sa_family field.
EACCES, EPERM
The user tried to connect to a broadcast address without having the socket broadcast flag enabled or the connection request failed because of a local firewall rule.
 

CONFORMING TO

SVr4, 4.4BSD (the connect function first appeared in BSD 4.2). SVr4 documents the additional general error codes EADDRNOTAVAIL, EINVAL, EAFNOSUPPORT, EALREADY, EINTR, EPROTOTYPE, and ENOSR. It also documents many additional error conditions not described here.  

NOTE

The third argument of connect is in reality an int (and this is what BSD 4.* and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t. The draft standard has not been adopted yet, but glibc2 already follows it and also has socklen_t. See also accept(2).  

BUGS

Unconnecting a socket by calling connect with a AF_UNSPEC address is not yet implemented.  

SEE ALSO

accept(2), bind(2), listen(2), socket(2), getsockname(2)

create_module

NAME

create_module - create a loadable module entry  

SYNOPSIS

#include <linux/module.h> caddr_t create_module(const char *name, size_t size);

 

DESCRIPTION

create_module attempts to create a loadable module entry and reserve the kernel memory that will be needed to hold the module. This system call is only open to the superuser.  

RETURN VALUE

On success, returns the kernel address at which the module will reside. On error -1 is returned and errno is set appropriately.  

ERRORS

EPERM
The user is not the superuser.
EEXIST
A module by that name already exists.
EINVAL
The requested size is too small even for the module header information.
ENOMEM
The kernel could not allocate a contiguous block of memory large enough for the module.
EFAULT
name is outside the program`s accessible address space.
 

SEE ALSO

init_module(2), delete_module(2), query_module(2).

DC_PLUG_new

NAME

DC_PLUG_new, DC_PLUG_free, DC_PLUG_to_select, DC_PLUG_io - basic DC_PLUG functions  

SYNOPSIS

#include <distcache/dc_plug.h>

DC_PLUG *DC_PLUG_new(NAL_CONNECTION *conn, unsigned int flags); int DC_PLUG_free(DC_PLUG *plug); void DC_PLUG_to_select(DC_PLUG *plug, NAL_SELECTOR *sel); int DC_PLUG_io(DC_PLUG *plug, NAL_SELECTOR *sel);

 

DESCRIPTION

DC_PLUG_new() allocates and initialises a <FONT SIZE="-1">DC_PLUG</FONT> structure encapsulating the specified connection. The flags parameter is zero or a bitmask combining one or more of the following flags;

#define DC_PLUG_FLAG_TO_SERVER (unsigned int)0x0001 #define DC_PLUG_FLAG_NOFREE_CONN (unsigned int)0x0002

If the <FONT SIZE="-1">DC_PLUG_FLAG_TO_SERVER</FONT> flag is specified, the plug object will expect to be sending ``request`` messages and receiving ``response`` messages, otherwise will default to the opposite sense.

DC_PLUG_free() frees the <FONT SIZE="-1">DC_PLUG</FONT> structure and, unless it had been created with the <FONT SIZE="-1">DC_PLUG_FLAG_NOFREE_CONN</FONT> flag, will also destroy the connection object it encapsulates.

DC_PLUG_to_select() is used to add a plug object to the sel selector so that it can be tested for network events it is waiting on. This will automatically handle selection of flags depending on the plug object`s state. Ie. it will select for writability on its underlying connection only if there is data waiting to be sent, and likewise will select for readability only if it is ready to receive any data that may have arrived.

DC_PLUG_io() is used to allow network I/O to be performed on a plug object`s underlying connection depending on the results of the last select operation on sel.  

RETURN VALUES

DC_PLUG_new() returns the new plug object on success, otherwise <FONT SIZE="-1">NULL</FONT> for failure.

DC_PLUG_free() should never fail and should only return non-zero results.

DC_PLUG_to_select() has no return value.

DC_PLUG_io() return zero on an error, otherwise non-zero.

None of the <FONT SIZE="-1">DC_PLUG</FONT> functions sets (or clears) errno because it is implemented on top of the libnal library which in turn is an abstraction layer for the system`s networking interfaces. As such, any errno codes set by failure in system libraries will not be overwritten by these functions.  

SEE ALSO

DC_PLUG_read(2) - Provides documentation for other <FONT SIZE="-1">DC_PLUG</FONT> functions also.

distcache(8) - Overview of the distcache architecture.

http://www.distcache.org/ - Distcache home page.  

AUTHOR

This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at geoff@geoffthorpe.net.

Home Page: http://www.distcache.org

DC_SERVER_new

NAME

DC_SERVER_set_default_cache, DC_SERVER_set_cache, DC_SERVER_new, DC_SERVER_free, DC_SERVER_items_stored, DC_SERVER_reset_operations, DC_SERVER_num_operations, DC_SERVER_new_client, DC_SERVER_del_client, DC_SERVER_process_client, DC_SERVER_clients_to_sel, DC_SERVER_clients_io - distcache server API  

SYNOPSIS

#include <distcache/dc_server.h>

DC_SERVER *DC_SERVER_new(unsigned int max_sessions); void DC_SERVER_free(DC_SERVER *ctx); int DC_SERVER_set_default_cache(void); int DC_SERVER_set_cache(const DC_CACHE_cb *impl); unsigned int DC_SERVER_items_stored(DC_SERVER *ctx, const struct timeval *now); void DC_SERVER_reset_operations(DC_SERVER *ctx); unsigned long DC_SERVER_num_operations(DC_SERVER *ctx); DC_CLIENT *DC_SERVER_new_client(DC_SERVER *ctx, NAL_CONNECTION *conn, unsigned int flags); int DC_SERVER_del_client(DC_CLIENT *clnt); int DC_SERVER_process_client(DC_CLIENT *clnt, const struct timeval *now); int DC_SERVER_clients_to_sel(DC_SERVER *ctx, NAL_SELECTOR *sel); int DC_SERVER_clients_io(DC_SERVER *ctx, NAL_SELECTOR *sel, const struct timeval *now);

 

RETURN VALUES

DC_SERVER_new() returns an initialised <FONT SIZE="-1">DC_SERVER</FONT> object, or <FONT SIZE="-1">NULL</FONT> for failure.

DC_SERVER_free() and DC_SERVER_reset_operations() have no return value.

DC_SERVER_items_stored() returns the number of cached sessions in a cache (after any session expiry is performed).

DC_SERVER_num_operations() indicates how many operations the cache object has performed.

DC_SERVER_new_client() returns a new <FONT SIZE="-1">DC_CLIENT</FONT> object, or <FONT SIZE="-1">NULL</FONT> for failure.

The remaining functions return non-zero for success or zero for failure.  

DESCRIPTION and NOTES

Use of the dc_server.h header requires the "struct timeval" type to be defined. On many systems, this will require that you include the time.h header in advance, though details will vary from system to system. If in doubt, try consulting your system`s gettimeofday(2) man page for information on how to have this system type defined.

These <FONT SIZE="-1">DC_SERVER</FONT> functions facilitate the implementation a session cache server to be compatible with the distcache protocol. The source code to dc_server(1) provides an example of using this <FONT SIZE="-1">API</FONT>, and is probably the ideal reference (a single C file of 304 lines). The storage of the cache is provided by a table of handler functions defined by the DC_CACHE_cb structure;

typedef struct st_DC_CACHE_cb { DC_CACHE * (*cache_new)(unsigned int max_sessions); void (*cache_free)(DC_CACHE *cache); int (*cache_add)(DC_CACHE *cache, const struct timeval *now, unsigned long timeout_msecs, const unsigned char *session_id, unsigned int session_id_len, const unsigned char *data, unsigned int data_len); unsigned int (*cache_get)(DC_CACHE *cache, const struct timeval *now, const unsigned char *session_id, unsigned int session_id_len, unsigned char *store, unsigned int store_size); int (*cache_remove)(DC_CACHE *cache, const struct timeval *now, const unsigned char *session_id, unsigned int session_id_len); int (*cache_have)(DC_CACHE *cache, const struct timeval *now, const unsigned char *session_id, unsigned int session_id_len); unsigned int (*cache_num_items)(DC_CACHE *cache, const struct timeval *now); } DC_CACHE_cb;

libdistcacheserver provides a default implementation that can be enabled by calling DC_SERVER_set_default_cache() prior to DC_SERVER_new(). Alternatively, a customised cache implementation can be specified by DC_SERVER_set_cache(). The reason that one or the other must be specified is so that custom implementations will not need to have the default implementation linked in because they won`t explicitly call DC_SERVER_set_default_cache().

The choice of DC_CACHE_cb implementation will control all manipulations and queries on the session cache. Each handler is passed a struct timeval value to allow it to implicitly handle expiry of old sessions without having to repeatedly query the time on each invokation.

Outside the actual cache implementation, the other subject covered by libdistcacheserver is that of managing client connections and processing their requests. It is assumed that the caller will use libnal to handle the network aspects of the cache server - otherwise the application would be better to use the lower-level <FONT SIZE="-1">DC_PLUG</FONT> <FONT SIZE="-1">API</FONT> (see DC_PLUG_new(2)), and the implementation of libdistcacheserver would provide a good reference for this.

New clients of the cache server are created by DC_SERVER_new_client() using the supplied connection object conn. The behaviour of the returned <FONT SIZE="-1">DC_CLIENT</FONT> object depends on the flags parameter, which is zero or a bitwise combination of the following values;

#define DC_CLIENT_FLAG_NOFREE_CONN (unsigned int)0x0001 #define DC_CLIENT_FLAG_IN_SERVER (unsigned int)0x0002

If <FONT SIZE="-1">DC_CLIENT_FLAG_NOFREE_CONN</FONT> is set, then conn will not be destroyed when the <FONT SIZE="-1">DC_CLIENT</FONT> object is destroyed by DC_SERVER_new_client(). Note, the <FONT SIZE="-1">DC_CLIENT</FONT> object encapsulates the provided conn object and does not copy it.

If <FONT SIZE="-1">DC_CLIENT_FLAG_IN_SERVER</FONT> is set, then network traffic and request processing for the client will be implicit in the DC_SERVER_clients_to_sel() and DC_SERVER_clients_io() functions. This includes destroying any clients that have disconnected at the network level or had corruption errors at the data level.

If <FONT SIZE="-1">DC_CLIENT_FLAG_IN_SERVER</FONT> is not set, then selecting and performing network I/O should be handled by the caller directly using the original conn object, and checking for (and processing of) requests should be handled directly by DC_SERVER_process_client(). A zero return value from this function indicates an error in the client`s processing, and would then require the caller to destroy the client object via DC_SERVER_del_client(). This allows network handling and logical cache handling to be explicitly separated by the implementation if required.

Note that the dc_server(1) implementation is greatly simplified by using <FONT SIZE="-1">DC_CLIENT_FLAG_IN_SERVER</FONT> and not setting <FONT SIZE="-1">DC_CLIENT_FLAG_NOFREE_CONN</FONT>. This allows it to forget about <FONT SIZE="-1">NAL_CONNECTION</FONT> objects after they have been successfully converted into <FONT SIZE="-1">DC_CLIENT</FONT> objects, and in fact can forget about the resulting <FONT SIZE="-1">DC_CLIENT</FONT> objects too as they become completely controlled by the <FONT SIZE="-1">DC_SERVER</FONT> object. If the client is closed, the underlying connection object is destroyed also. If the cache server itself is destroyed, then any remaining clients will likewise be properly cleaned up.

DC_SERVER_clients_to_sel() and DC_SERVER_clients_io() only operate on cache clients that are created with the <FONT SIZE="-1">DC_CLIENT_FLAG_IN_SERVER</FONT> flag.  

SEE ALSO

DC_PLUG_new(2), DC_PLUG_read(2) - Lower-level asynchronous implementation of the distcache protocol, useful for client and server operation.

dc_server(1) - Runs a cache server listening on a configurable network address.

distcache(8) - Overview of the distcache architecture.

http://www.distcache.org/ - Distcache home page.  

AUTHOR

This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at geoff@geoffthorpe.net.

Home Page: http://www.distcache.org

dup

NAME

dup, dup2 - duplicate a file descriptor  

SYNOPSIS

#include <unistd.h> int dup(int oldfd); int dup2(int oldfd, int newfd);

 

DESCRIPTION

dup and dup2 create a copy of the file descriptor oldfd.

After successful return of dup or dup2, the old and new descriptors may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other.

The two descriptors do not share the close-on-exec flag, however.

dup uses the lowest-numbered unused descriptor for the new descriptor.

dup2 makes newfd be the copy of oldfd, closing newfd first if necessary.  

RETURN VALUE

dup and dup2 return the new descriptor, or -1 if an error occurred (in which case, errno is set appropriately).  

ERRORS

EBADF
oldfd isn`t an open file descriptor, or newfd is out of the allowed range for file descriptors.
EMFILE
The process already has the maximum number of file descriptors open and tried to open a new one.
EINTR
The dup2 call was interrupted by a signal.
EBUSY
(Linux only) This may be returned by dup2 during a race condition with open() and dup().
 

WARNING

The error returned by dup2 is different from that returned by fcntl(..., F_DUPFD, ...) when newfd is out of range. On some systems dup2 also sometimes returns EINVAL like F_DUPFD.  

BUGS

If newfd was open, any errors that would have been reported at close() time, are lost. A careful programmer will not use dup2 without closing newfd first.  

CONFORMING TO

SVr4, SVID, POSIX, X/OPEN, BSD 4.3. SVr4 documents additional EINTR and ENOLINK error conditions. POSIX.1 adds EINTR. The EBUSY return is Linux-specific.  

SEE ALSO

fcntl(2), open(2), close(2)

epoll_create

NAME

epoll_create - open an epoll file descriptor  

SYNOPSIS

#include <sys/epoll.h>

int epoll_create(int size)  

DESCRIPTION

Open an epoll file descriptor by requesting the kernel allocate an event backing store dimensioned for size descriptors. The size is not the maximum size of the backing store but just a hint to the kernel about how to dimension internal structures. The returned file descriptor will be used for all the subsequent calls to the epoll interface. The file descriptor returned by epoll_create(2) must be closed by using close(2).  

RETURN VALUE

When successful, epoll_create(2) returns a positive integer identifying the descriptor. When an error occurs, epoll_create(2) returns -1 and errno is set appropriately.  

ERRORS

ENOMEM
There was insufficient memory to create the kernel object.
 

CONFORMING TO

epoll_create(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.  

SEE ALSO

close(2), epoll_ctl(2), epoll_wait(2), epoll(4)

epoll_wait

NAME

epoll_wait - wait for an I/O event on an epoll file descriptor  

SYNOPSIS

#include <sys/epoll.h>

int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout)  

DESCRIPTION

Wait for events on the epoll file descriptor epfd for a maximum time of timeout milliseconds. The memory area pointed to by events will contain the events that will be available for the caller. Up to maxevents are returned by epoll_wait(2). The maxevents parameter must be greater than zero. Specifying a timeout of -1 makes epoll_wait(2) wait indefinitely, while specifying a timeout equal to zero makes epoll_wait(2) to return immediately even if no events are available ( return code equal to zero ). The struct epoll_event is defined as :

typedef union epoll_data { void *ptr; int fd; __uint32_t u32; __uint64_t u64; } epoll_data_t; struct epoll_event { __uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ };

The data of each returned structure will contain the same data the user set with a epoll_ctl(2) (EPOLL_CTL_ADD,EPOLL_CTL_MOD) while the events member will contain the returned event bit field.  

RETURN VALUE

When successful, epoll_wait(2) returns the number of file descriptors ready for the requested I/O, or zero if no file descriptor became ready during the requested timeout milliseconds. When an error occurs, epoll_wait(2) returns -1 and errno is set appropriately.  

ERRORS

EBADF
epfd is not a valid file descriptor.
EINVAL
The supplied file descriptor, epfd, is not an epoll file descriptor, or the maxevents parameter is less than or equal to zero.
EFAULT
The memory area pointed to by events is not accessible with write permissions.
 

CONFORMING TO

epoll_wait(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.  

SEE ALSO

epoll_ctl(2), epoll_create(2), epoll(4)

exit

NAME

_exit, _Exit - terminate the current process  

SYNOPSIS

#include <unistd.h>

void _exit(int status);

#include <stdlib.h>

void _Exit(int status);  

DESCRIPTION

The function _exit terminates the calling process "immediately". Any open file descriptors belonging to the process are closed; any children of the process are inherited by process 1, init, and the process`s parent is sent a SIGCHLD signal.

The value status is returned to the parent process as the process`s exit status, and can be collected using one of the wait family of calls.

The function _Exit is equivalent to _exit.  

RETURN VALUE

These functions do not return.  

CONFORMING TO

SVr4, SVID, POSIX, X/OPEN, BSD 4.3. The function _Exit() was introduced by C99.  

NOTES

For a discussion on the effects of an exit, the transmission of exit status, zombie processes, signals sent, etc., see exit(3).

The function _exit is like exit(), but does not call any functions registered with the ANSI C atexit function, nor any registered signal handlers. Whether it flushes standard I/O buffers and removes temporary files created with tmpfile(3) is implementation-dependent. On the other hand, _exit does close open file descriptors, and this may cause an unknown delay, waiting for pending output to finish. If the delay is undesired, it may be useful to call functions like tcflush() before calling _exit(). Whether any pending I/O is cancelled, and which pending I/O may be cancelled upon _exit(), is implementation-dependent.  

SEE ALSO

fork(2), execve(2), waitpid(2), wait4(2), kill(2), wait(2), exit(3), termios(3)

fchmod

NAME

chmod, fchmod - change permissions of a file  

SYNOPSIS

#include <sys/types.h>
#include <sys/stat.h>

int chmod(const char *path, mode_t mode);
int fchmod(int fildes, mode_t mode);  

DESCRIPTION

The mode of the file given by path or referenced by fildes is changed.

Modes are specified by or`ing the following:

S_ISUID
04000 set user ID on execution
S_ISGID
02000 set group ID on execution
S_ISVTX
01000 sticky bit
S_IRUSR (S_IREAD)
00400 read by owner
S_IWUSR (S_IWRITE)
00200 write by owner
S_IXUSR (S_IEXEC)
00100 execute/search by owner
S_IRGRP
00040 read by group
S_IWGRP
00020 write by group
S_IXGRP
00010 execute/search by group
S_IROTH
00004 read by others
S_IWOTH
00002 write by others
S_IXOTH
00001 execute/search by others

The effective UID of the process must be zero or must match the owner of the file.

If the effective UID of the process is not zero and the group of the file does not match the effective group ID of the process or one of its supplementary group IDs, the S_ISGID bit will be turned off, but this will not cause an error to be returned.

Depending on the file system, set user ID and set group ID execution bits may be turned off if a file is written. On some file systems, only the super-user can set the sticky bit, which may have a special meaning. For the sticky bit, and for set user ID and set group ID bits on directories, see stat(2).

On NFS file systems, restricting the permissions will immediately influence already open files, because the access control is done on the server, but open files are maintained by the client. Widening the permissions may be delayed for other clients if attribute caching is enabled on them.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

Depending on the file system, other errors can be returned. The more general errors for chmod are listed below:

EPERM
The effective UID does not match the owner of the file, and is not zero.
EROFS
The named file resides on a read-only file system.
EFAULT
path points outside your accessible address space.
ENAMETOOLONG
path is too long.
ENOENT
The file does not exist.
ENOMEM
Insufficient kernel memory was available.
ENOTDIR
A component of the path prefix is not a directory.
EACCES
Search permission is denied on a component of the path prefix.
ELOOP
Too many symbolic links were encountered in resolving path.
EIO
An I/O error occurred.

The general errors for fchmod are listed below:

EBADF
The file descriptor fildes is not valid.
EROFS
See above.
EPERM
See above.
EIO
See above.
 

CONFORMING TO

The chmod call conforms to SVr4, SVID, POSIX, X/OPEN, 4.4BSD. SVr4 documents EINTR, ENOLINK and EMULTIHOP returns, but no ENOMEM. POSIX.1 does not document EFAULT, ENOMEM, ELOOP or EIO error conditions, or the macros S_IREAD, S_IWRITE and S_IEXEC.

The fchmod call conforms to 4.4BSD and SVr4. SVr4 documents additional EINTR and ENOLINK error conditions. POSIX requires the fchmod function if at least one of _POSIX_MAPPED_FILES and _POSIX_SHARED_MEMORY_OBJECTS is defined, and documents additional ENOSYS and EINVAL error conditions, but does not document EIO.

POSIX and X/OPEN do not document the sticky bit.  

SEE ALSO

open(2), chown(2), execve(2), stat(2)

fcntl

NAME

fcntl - manipulate file descriptor  

SYNOPSIS

#include <unistd.h> #include <fcntl.h> int fcntl(int fd, int cmd); int fcntl(int fd, int cmd, long arg); int fcntl(int fd, int cmd, struct flock *lock);

 

DESCRIPTION

fcntl performs one of various miscellaneous operations on fd. The operation in question is determined by cmd.  

Handling close-on-exec

F_DUPFD
Find the lowest numbered available file descriptor greater than or equal to arg and make it be a copy of fd. This is different form dup2(2) which uses exactly the descriptor specified.

The old and new descriptors may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other.

The two descriptors do not share the close-on-exec flag, however. The close-on-exec flag of the copy is off, meaning that it will not be closed on exec.

On success, the new descriptor is returned.

F_GETFD
Read the close-on-exec flag. If the FD_CLOEXEC bit is 0, the file will remain open across exec, otherwise it will be closed.
F_SETFD
Set the close-on-exec flag to the value specified by the FD_CLOEXEC bit of arg.
 

The file status flags

A file descriptor has certain associated flags, initialized by open(2) and possibly modified by fcntl(2). The flags are shared between copies (made with dup(2), fork(2), etc.) of the same file descriptor.

The flags and their semantics are described in open(2).

F_GETFL
Read the file descriptor`s flags.
F_SETFL
Set the file status flags part of the descriptor`s flags to the value specified by arg. Remaining bits (access mode, file creation flags) in arg are ignored. On Linux this command can only change the O_APPEND, O_NONBLOCK, O_ASYNC, and O_DIRECT flags.
 

Advisory locking

F_GETLK, F_SETLK and F_SETLKW are used to acquire, release, and test for the existence of record locks (also known as file-segment or file-region locks). The third argument lock is a pointer to a structure that has at least the following fields (in unspecified order).

struct flock { ... short l_type; /* Type of lock: F_RDLCK, F_WRLCK, F_UNLCK */ short l_whence; /* How to interpret l_start: SEEK_SET, SEEK_CUR, SEEK_END */ off_t l_start; /* Starting offset for lock */ off_t l_len; /* Number of bytes to lock */ pid_t l_pid; /* PID of process blocking our lock (F_GETLK only) */ ... };

The l_whence, l_start, and l_len fields of this structure specify the range of bytes we wish to lock. l_start is the starting offset for the lock, and is interpreted relative to either: the start of the file (if l_whence is SEEK_SET); the current file offset (if l_whence is SEEK_CUR); or the end of the file (if l_whence is SEEK_END). In the final two cases, l_start can be a negative number provided the offset does not lie before the start of the file. l_len is a non-negative integer (but see the NOTES below) specifying the number of bytes to be locked. Bytes past the end of the file may be locked, but not bytes before the start of the file. Specifying 0 for l_len has the special meaning: lock all bytes starting at the location specified by l_whence and l_start through to the end of file, no matter how large the file grows. The l_type field can be used to place a read (F_RDLCK) or a write (F_WDLCK) lock on a file. Any number of processes may hold a read lock (shared lock) on a file region, but only one process may hold a write lock (exclusive lock). An exclusive lock excludes all other locks, both shared and exclusive. A single process can hold only one type of lock on a file region; if a new lock is applied to an already-locked region, then the existing lock is converted to the the new lock type. (Such conversions may involve splitting, shrinking, or coalescing with an existing lock if the byte range specified by the new lock does not precisely coincide with the range of the existing lock.)
F_SETLK
Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release a lock (when l_type is F_UNLCK) on the bytes specified by the l_whence, l_start, and l_len fields of lock. If a conflicting lock is held by another process, this call returns -1 and sets errno to EACCES or EAGAIN.
F_SETLKW
As for F_SETLK, but if a conflicting lock is held on the file, then wait for that lock to be released. If a signal is caught while waiting, then the call is interrupted and (after the signal handler has returned) returns immediately (with return value -1 and errno set to EINTR).
F_GETLK
On input to this call, lock describes a lock we would like to place on the file. If the lock could be placed, fcntl() does not actually place it, but returns F_UNLCK in the l_type field of lock and leaves the other fields of the structure unchanged. If one or more incompatible locks would prevent this lock being placed, then fcntl() returns details about one of these locks in the l_type, l_whence, l_start, and l_len fields of lock and sets l_pid to be the PID of the process holding that lock. In order to place a read lock, fd must be open for reading. In order to place a write lock, fd must be open for writing. To place both types of lock, open a file read-write. As well as being removed by an explicit F_UNLCK, record locks are automatically released when the process terminates or if it closes any file descriptor referring to a file on which locks are held. This is bad: it means that a process can lose the locks on a file like /etc/passwd or /etc/mtab when for some reason a library function decides to open, read and close it. Record locks are not inherited by a child created via fork(2), but are preserved across an execve(2). Because of the buffering performed by the stdio(3) library, the use of record locking with routines in that package should be avoided; use read(2) and write(2) instead.
 

Mandatory locking

(Non-POSIX.) The above record locks may be either advisory or mandatory, and are advisory by default. To make use of mandatory locks, mandatory locking must be enabled (using the "-o mand" option to mount(8)) for the file system containing the file to be locked and enabled on the file itself (by disabling group execute permission on the file and enabling the set-GID permission bit).

Advisory locks are not enforced and are useful only between cooperating processes. Mandatory locks are enforced for all processes.  

Managing signals

F_GETOWN, F_SETOWN, F_GETSIG and F_SETSIG are used to manage I/O availability signals:
F_GETOWN
Get the process ID or process group currently receiving SIGIO and SIGURG signals for events on file descriptor fd. Process groups are returned as negative values.
F_SETOWN
Set the process ID or process group that will receive SIGIO and SIGURG signals for events on file descriptor fd. Process groups are specified using negative values. (F_SETSIG can be used to specify a different signal instead of SIGIO).

If you set the O_ASYNC status flag on a file descriptor (either by providing this flag with the open(2) call, or by using the F_SETFL command of fcntl), a SIGIO signal is sent whenever input or output becomes possible on that file descriptor.

The process or process group to receive the signal can be selected by using the F_SETOWN command to the fcntl function. If the file descriptor is a socket, this also selects the recipient of SIGURG signals that are delivered when out-of-band data arrives on that socket. (SIGURG is sent in any situation where select(2) would report the socket as having an "exceptional condition".) If the file descriptor corresponds to a terminal device, then SIGIO signals are sent to the foreground process group of the terminal.

F_GETSIG
Get the signal sent when input or output becomes possible. A value of zero means SIGIO is sent. Any other value (including SIGIO) is the signal sent instead, and in this case additional info is available to the signal handler if installed with SA_SIGINFO.
F_SETSIG
Sets the signal sent when input or output becomes possible. A value of zero means to send the default SIGIO signal. Any other value (including SIGIO) is the signal to send instead, and in this case additional info is available to the signal handler if installed with SA_SIGINFO.

By using F_SETSIG with a non-zero value, and setting SA_SIGINFO for the signal handler (see sigaction(2)), extra information about I/O events is passed to the handler in a siginfo_t structure. If the si_code field indicates the source is SI_SIGIO, the si_fd field gives the file descriptor associated with the event. Otherwise, there is no indication which file descriptors are pending, and you should use the usual mechanisms (select(2), poll(2), read(2) with O_NONBLOCK set etc.) to determine which file descriptors are available for I/O.

By selecting a POSIX.1b real time signal (value >= SIGRTMIN), multiple I/O events may be queued using the same signal numbers. (Queuing is dependent on available memory). Extra information is available if SA_SIGINFO is set for the signal handler, as above.

Using these mechanisms, a program can implement fully asynchronous I/O without using select(2) or poll(2) most of the time.

The use of O_ASYNC, F_GETOWN, F_SETOWN is specific to BSD and Linux. F_GETSIG and F_SETSIG are Linux-specific. POSIX has asynchronous I/O and the aio_sigevent structure to achieve similar things; these are also available in Linux as part of the GNU C Library (Glibc).  

Leases

F_SETLEASE and F_GETLEASE (Linux 2.4 onwards) are used (respectively) to establish and retrieve the current setting of the calling process`s lease on the file referred to by fd. A file lease provides a mechanism whereby the process holding the lease (the "lease holder") is notified (via delivery of a signal) when another process (the "lease breaker") tries to open(2) or truncate(2) that file.
F_SETLEASE
Set or remove a file lease according to which of the following values is specified in the integer arg:

F_RDLCK
Take out a read lease. This will cause us to be notified when another process opens the file for writing or truncates it.
F_WRLCK
Take out a write lease. This will cause us to be notified when another process opens the file (for reading or writing) or truncates it. A write lease may be placed on a file only if no other process currently has the file open.
F_UNLCK
Remove our lease from the file.
A process may hold only one type of lease on a file. Leases may only be taken out on regular files. An unprivileged process may only take out a lease on a file whose UID matches the file system UID of the process.
F_GETLEASE
Indicates what type of lease we hold on the file referred to by fd by returning either F_RDLCK, F_WRLCK, or F_UNLCK, indicating, respectively, that the calling process holds a read, a write, or no lease on the file. (The third argument to fcntl() is omitted.)

When a process (the "lease breaker") performs an open() or truncate() that conflicts with a lease established via F_SETLEASE, the system call is blocked by the kernel, unless the O_NONBLOCK flag was specified to open(), in which case the system call will return with the error EWOULDBLOCK. The kernel notifies the lease holder by sending it a signal (SIGIO by default). The lease holder should respond to receipt of this signal by doing whatever cleanup is required in preparation for the file to be accessed by another process (e.g., flushing cached buffers) and then either remove or downgrade its lease. A lease is removed by performing an F_SETLEASE command specifying arg as F_UNLCK. If we currently hold a write lease on the file, and the lease breaker is opening the file for reading, then it is sufficient to downgrade the lease to a read lease. This is done by performing an F_SETLEASE command specifying arg as F_RDLCK.

If the lease holder fails to downgrade or remove the lease within the number of seconds specified in /proc/sys/fs/lease-break-time then the kernel forcibly removes or downgrades the lease holder`s lease.

Once the lease has been voluntarily or forcibly removed or downgraded, and assuming the lease breaker has not unblocked its system call, the kernel permits the lease breaker`s system call to proceed.

The default signal used to notify the lease holder is SIGIO, but this can be changed using the F_SETSIG command to fcntl (). If a F_SETSIG command is performed (even one specifying SIGIO), and the signal handler is established using SA_SIGINFO, then the handler will receive a siginfo_t sructure as its second argument, and the si_fd field of this argument will hold the descriptor of the leased file that has been accessed by another process. (This is useful if the caller holds leases against multiple files).  

File and directory change notification

F_NOTIFY
(Linux 2.4 onwards) Provide notification when the directory referred to by fd or any of the files that it contains is changed. The events to be notified are specified in arg, which is a bit mask specified by ORing together zero or more of the following bits:

BitDescription (event in directory)
DN_ACCESS
DN_MODIFYA file was modified (write, pwrite,
writev, truncate, ftruncate)
DN_CREATEA file was created (open, creat, mknod,
mkdir, link, symlink, rename)
DN_DELETEA file was unlinked (unlink, rename to
another directory, rmdir)
DN_RENAMEA file was renamed within this
directory (rename)
DN_ATTRIBThe attributes of a file were changed
(chown, chmod, utime[s])

(In order to obtain these definitions, the _GNU_SOURCE macro must be defined before including <fcntl.h>.)

Directory notifications are normally "one-shot", and the application must re-register to receive further notifications. Alternatively, if DN_MULTISHOT is included in arg, then notification will remain in effect until explicitly removed.

A series of F_NOTIFY requests is cumulative, with the events in arg being added to the set already monitored. To disable notification of all events, make an F_NOTIFY call specifying arg as 0.

Notification occurs via delivery of a signal. The default signal is SIGIO, but this can be changed using the F_SETSIG command to fcntl(). In the latter case, the signal handler receives a siginfo_t structure as its second argument (if the handler was established using SA_SIGINFO) and the si_fd field of this structure contains the file descriptor which generated the notification (useful when establishing notification on multiple directories).

Especially when using DN_MULTISHOT, a POSIX.1b real time signal should be used for notication, so that multiple notifications can be queued.

 

RETURN VALUE

For a successful call, the return value depends on the operation:
F_DUPFD
The new descriptor.
F_GETFD
Value of flag.
F_GETFL
Value of flags.
F_GETOWN
Value of descriptor owner.
F_GETSIG
Value of signal sent when read or write becomes possible, or zero for traditional SIGIO behaviour.
All other commands
Zero.

On error, -1 is returned, and errno is set appropriately.  

ERRORS

EACCES or EAGAIN
Operation is prohibited by locks held by other processes. Or, operation is prohibited because the file has been memory-mapped by another process.
EBADF
fd is not an open file descriptor, or the command was F_SETLK or F_SETLKW and the file descriptor open mode doesn`t match with the type of lock requested.
EDEADLK
It was detected that the specified F_SETLKW command would cause a deadlock.
EFAULT
lock is outside your accessible address space.
EINTR
For F_SETLKW, the command was interrupted by a signal. For F_GETLK and F_SETLK, the command was interrupted by a signal before the lock was checked or acquired. Most likely when locking a remote file (e.g. locking over NFS), but can sometimes happen locally.
EINVAL
For F_DUPFD, arg is negative or is greater than the maximum allowable value. For F_SETSIG, arg is not an allowable signal number.
EMFILE
For F_DUPFD, the process already has the maximum number of file descriptors open.
ENOLCK
Too many segment locks open, lock table is full, or a remote locking protocol failed (e.g. locking over NFS).
EPERM
Attempted to clear the O_APPEND flag on a file that has the append-only attribute set.
 

NOTES

The errors returned by dup2 are different from those returned by F_DUPFD.

Since kernel 2.0, there is no interaction between the types of lock placed by flock(2) and fcntl(2).

POSIX 1003.1-2001 allows l_len to be negative. (And if it is, the interval described by the lock covers bytes l_start+l_len up to and including l_start-1.) This is supported by Linux since Linux 2.4.21 and 2.5.49.

Several systems have more fields in struct flock such as e.g. l_sysid. Clearly, l_pid alone is not going to be very useful if the process holding the lock may live on a different machine.

 

CONFORMING TO

SVr4, SVID, POSIX, X/OPEN, BSD 4.3. Only the operations F_DUPFD, F_GETFD, F_SETFD, F_GETFL, F_SETFL, F_GETLK, F_SETLK and F_SETLKW are specified in POSIX.1. F_GETOWN and F_SETOWN are BSDisms not supported in SVr4; F_GETSIG and F_SETSIG are specific to Linux. F_NOTIFY, F_GETLEASE, and F_SETLEASE are Linux specific. (Define the _GNU_SOURCE macro before including <fcntl.h> to obtain these definitions.) The flags legal for F_GETFL/F_SETFL are those supported by open(2) and vary between these systems; O_APPEND, O_NONBLOCK, O_RDONLY, and O_RDWR are specified in POSIX.1. SVr4 supports several other options and flags not documented here.

SVr4 documents additional EIO, ENOLINK and EOVERFLOW error conditions.  

SEE ALSO

dup2(2), flock(2), lockf(3), open(2), socket(2) See also locks.txt, mandatory.txt, and dnotify.txt in /usr/src/linux/Documentation.

fgetxattr

NAME

getxattr, lgetxattr, fgetxattr - retrieve an extended attribute value  

SYNOPSIS

#include <sys/types.h> #include <attr/xattr.h> ssize_t getxattr (const char *path, const char *name, void *value, size_t size); ssize_t lgetxattr (const char *path, const char *name, void *value, size_t size); ssize_t fgetxattr (int filedes, const char *name, void *value, size_t size);

 

DESCRIPTION

Extended attributes are name:value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the stat(2) data). A complete overview of extended attributes concepts can be found in attr(5).

getxattr retrieves the value of the extended attribute identified by name and associated with the given path in the filesystem. The length of the attribute value is returned.

lgetxattr is identical to getxattr, except in the case of a symbolic link, where the link itself is interrogated, not the file that it refers to.

fgetxattr is identical to getxattr, only the open file pointed to by filedes (as returned by open(2)) is interrogated in place of path.

An extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode. The value of an extended attribute is a chunk of arbitrary textual or binary data of specified length.

An empty buffer of size zero can be passed into these calls to return the current size of the named extended attribute, which can be used to estimate the size of a buffer which is sufficiently large to hold the value associated with the extended attribute.

The interface is designed to allow guessing of initial buffer sizes, and to enlarge buffers when the return value indicates that the buffer provided was too small.  

RETURN VALUE

On success, a positive number is returned indicating the size of the extended attribute value. On failure, -1 is returned and errno is set appropriately.

If the named attribute does not exist, or the process has no access to this attribute, errno is set to ENOATTR.

If the size of the value buffer is too small to hold the result, errno is set to ERANGE.

If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.

The errors documented for the stat(2) system call are also applicable here.  

AUTHORS

Andreas Gruenbacher, <a.gruenbacher@computer.org> and the SGI XFS development team, <linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.  

SEE ALSO

getfattr(1), setfattr(1), open(2), stat(2), setxattr(2), listxattr(2), removexattr(2), and attr(5).

flock

NAME

flock - apply or remove an advisory lock on an open file  

SYNOPSIS

#include <sys/file.h>

int flock(int fd, int operation);  

DESCRIPTION

Apply or remove an advisory lock on the open file specified by fd. The parameter operation is one of the following:

LOCK_SH
Place a shared lock. More than one process may hold a shared lock for a given file at a given time.
LOCK_EX
Place an exclusive lock. Only one process may hold an exclusive lock for a given file at a given time.
LOCK_UN
Remove an existing lock held by this process.

A call to flock() may block if an incompatible lock is held by another process. To make a non-blocking request, include LOCK_NB (by ORing) with any of the above operations.

A single file may not simultaneously have both shared and exclusive locks.

Locks created by flock() are associated with a file, or, more precisely, an open file table entry. This means that duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer to the same lock, and this lock may be modified or released using any of these descriptors. Furthermore, the lock is released either by an explicit LOCK_UN operation on any of these duplicate descriptors, or when all such descriptors have been closed.

A process may only hold one type of lock (shared or exclusive) on a file. Subsequent flock() calls on an already locked file will convert an existing lock to the new lock mode.

Locks created by flock() are preserved across an execve(2).

A shared or exclusive lock can be placed on a file regardless of the mode in which the file was opened.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EWOULDBLOCK
The file is locked and the LOCK_NB flag was selected.
EBADF
fd is not a not an open file descriptor.
EINTR
While waiting to acquire a lock, the call was interrupted by delivery of a signal caught by a handler.
EINVAL
operation is invalid.
ENOLCK
The kernel ran out of memory for allocating lock records.
 

CONFORMING TO

4.4BSD (the flock(2) call first appeared in 4.2BSD). A version of flock(2), possibly implemented in terms of fcntl(2), appears on most Unices.  

NOTES

flock(2) does not lock files over NFS. Use fcntl(2) instead: that does work over NFS, given a sufficiently recent version of Linux and a server which supports locking.

Since kernel 2.0, flock(2) is implemented as a system call in its own right rather than being emulated in the GNU C library as a call to fcntl(2). This yields true BSD semantics: there is no interaction between the types of lock placed by flock(2) and fcntl(2), and flock(2) does not detect deadlock.

flock(2) places advisory locks only; given suitable permissions on a file, a process is free to ignore the use of flock(2) and perform I/O on the file.

flock(2) and fcntl(2) locks have different semantics with respect to forked processes and dup(2).  

SEE ALSO

open(2), close(2), dup(2), execve(2), fcntl(2), fork(2), lockf(3)

There are also locks.txt and mandatory.txt in /usr/src/linux/Documentation.

free_hugepages

NAME

alloc_hugepages, free_hugepages - allocate or free huge pages  

SYNOPSIS

void *alloc_hugepages(int key, void *addr, size_t len, int prot, int flag);

int free_hugepages(void *addr);  

DESCRIPTION

The system calls alloc_hugepages and free_hugepages were introduced in Linux 2.5.36 and removed again in 2.5.54. They existed only on i386 and ia64 (when built with CONFIG_HUGETLB_PAGE). In Linux 2.4.20 the syscall numbers exist, but the calls return ENOSYS.

On i386 the memory management hardware knows about ordinary pages (4 KiB) and huge pages (2 or 4 MiB). Similarly ia64 knows about huge pages of several sizes. These system calls serve to map huge pages into the process` memory or to free them again. Huge pages are locked into memory, and are not swapped.

The key parameter is an identifier. When zero the pages are private, and not inherited by children. When positive the pages are shared with other applications using the same key, and inherited by child processes.

The addr parameter of free_hugepages() tells which page is being freed - it was the return value of a call to alloc_hugepages(). (The memory is first actually freed when all users have released it.) The addr parameter of alloc_hugepages() is a hint, that the kernel may or may not follow. Addresses must be properly aligned.

The len parameter is the length of the required segment. It must be a multiple of the huge page size.

The prot parameter specifies the memory protection of the segment. It is one of PROT_READ, PROT_WRITE, PROT_EXEC.

The flag parameter is ignored, unless key is positive. In that case, if flag is IPC_CREAT, then a new huge page segment is created when none with the given key existed. If this flag is not set, then ENOENT is returned when no segment with the given key exists. .SHRETURN VALUE On success, alloc_hugepages returns the allocated virtual address, and free_hugepages returns zero. On error, -1 is returned, and errno is set appropriately.  

ERRORS

ENOSYS
The system call is not supported on this kernel.
 

CONFORMING TO

These calls existed only in Linux 2.5.36 - 2.5.54. These calls are specific to Linux on Intel processors, and should not be used in programs intended to be portable. Indeed, the system call numbers are marked for reuse, so programs using these may do something random on a future kernel.  

FILES

/proc/sys/vm/nr_hugepages Number of configured hugetlb pages. This can be read and written.

/proc/meminfo Gives info on the number of configured hugetlb pages and on their size in the three variables HugePages_Total, HugePages_Free, Hugepagesize.  

NOTES

The system calls are gone. Now the hugetlbfs filesystem can be used instead. Memory backed by huge pages (if the CPU supports them) is obtained by mmap`ing files in this virtual filesystem.

The maximal number of huge pages can be specified using the hugepages= boot parameter.

fsetxattr

NAME

setxattr, lsetxattr, fsetxattr - set an extended attribute value  

SYNOPSIS

#include <sys/types.h> #include <attr/xattr.h> int setxattr (const char *path, const char *name, const void *value, size_t size, int flags); int lsetxattr (const char *path, const char *name, const void *value, size_t size, int flags); int fsetxattr (int filedes, const char *name, const void *value, size_t size, int flags);

 

DESCRIPTION

Extended attributes are name:value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the stat(2) data). A complete overview of extended attributes concepts can be found in attr(5).

setxattr sets the value of the extended attribute identified by name and associated with the given path in the filesystem. The size of the value must be specified.

lsetxattr is identical to setxattr, except in the case of a symbolic link, where the extended attribute is set on the link itself, not the file that it refers to.

fsetxattr is identical to setxattr, only the extended attribute is set on the open file pointed to by filedes (as returned by open(2)) in place of path.

An extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode. The value of an extended attribute is a chunk of arbitrary textual or binary data of specified length.

The flags parameter can be used to refine the semantics of the operation. XATTR_CREATE specifies a pure create, which fails if the named attribute exists already. XATTR_REPLACE specifies a pure replace operation, which fails if the named attribute does not already exist. By default (no flags), the extended attribute will be created if need be, or will simply replace the value if the attribute exists.  

RETURN VALUE

On success, zero is returned. On failure, -1 is returned and errno is set appropriately.

If XATTR_CREATE is specified, and the attribute exists already, errno is set to EEXIST. If XATTR_REPLACE is specified, and the attribute does not exist, errno is set to ENOATTR.

If there is insufficient space remaining to store the extended attribute, errno is set to either ENOSPC, or EDQUOT if quota enforcement was the cause.

If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.

The errors documented for the stat(2) system call are also applicable here.  

AUTHORS

Andreas Gruenbacher, <a.gruenbacher@computer.org> and the SGI XFS development team, <linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.  

SEE ALSO

getfattr(1), setfattr(1), open(2), stat(2), getxattr(2), listxattr(2), removexattr(2), and attr(5).

fstatfs

NAME

statfs, fstatfs - get file system statistics  

SYNOPSIS

#include <sys/vfs.h> /* or <sys/statfs.h> */

int statfs(const char *path, struct statfs *buf);
int fstatfs(int fd, struct statfs *buf);  

DESCRIPTION

The function statfs returns information about a mounted file system. path is the path name of any file within the mounted filesystem. buf is a pointer to a statfs structure defined approximately as follows:

struct statfs { long f_type; /* type of filesystem (see below) */ long f_bsize; /* optimal transfer block size */ long f_blocks; /* total data blocks in file system */ long f_bfree; /* free blocks in fs */ long f_bavail; /* free blocks avail to non-superuser */ long f_files; /* total file nodes in file system */ long f_ffree; /* free file nodes in fs */ fsid_t f_fsid; /* file system id */ long f_namelen; /* maximum length of filenames */ }; File system types: ADFS_SUPER_MAGIC 0xadf5 AFFS_SUPER_MAGIC 0xADFF BEFS_SUPER_MAGIC 0x42465331 BFS_MAGIC 0x1BADFACE CIFS_MAGIC_NUMBER 0xFF534D42 CODA_SUPER_MAGIC 0x73757245 COH_SUPER_MAGIC 0x012FF7B7 CRAMFS_MAGIC 0x28cd3d45 DEVFS_SUPER_MAGIC 0x1373 EFS_SUPER_MAGIC 0x00414A53 EXT_SUPER_MAGIC 0x137D EXT2_OLD_SUPER_MAGIC 0xEF51 EXT2_SUPER_MAGIC 0xEF53 EXT3_SUPER_MAGIC 0xEF53 HFS_SUPER_MAGIC 0x4244 HPFS_SUPER_MAGIC 0xF995E849 HUGETLBFS_MAGIC 0x958458f6 ISOFS_SUPER_MAGIC 0x9660 JFFS2_SUPER_MAGIC 0x72b6 JFS_SUPER_MAGIC 0x3153464a MINIX_SUPER_MAGIC 0x137F /* orig. minix */ MINIX_SUPER_MAGIC2 0x138F /* 30 char minix */ MINIX2_SUPER_MAGIC 0x2468 /* minix V2 */ MINIX2_SUPER_MAGIC2 0x2478 /* minix V2, 30 char names */ MSDOS_SUPER_MAGIC 0x4d44 NCP_SUPER_MAGIC 0x564c NFS_SUPER_MAGIC 0x6969 NTFS_SB_MAGIC 0x5346544e OPENPROM_SUPER_MAGIC 0x9fa1 PROC_SUPER_MAGIC 0x9fa0 QNX4_SUPER_MAGIC 0x002f REISERFS_SUPER_MAGIC 0x52654973 ROMFS_MAGIC 0x7275 SMB_SUPER_MAGIC 0x517B SYSV2_SUPER_MAGIC 0x012FF7B6 SYSV4_SUPER_MAGIC 0x012FF7B5 TMPFS_MAGIC 0x01021994 UDF_SUPER_MAGIC 0x15013346 UFS_MAGIC 0x00011954 USBDEVICE_SUPER_MAGIC 0x9fa2 VXFS_SUPER_MAGIC 0xa501FCF5 XENIX_SUPER_MAGIC 0x012FF7B4 XFS_SUPER_MAGIC 0x58465342 _XIAFS_SUPER_MAGIC 0x012FD16D

Nobody knows what f_fsid is supposed to contain (but see below).

Fields that are undefined for a particular file system are set to 0. fstatfs returns the same information about an open file referenced by descriptor fd.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
(fstatfs) fd is not a valid open file descriptor.
EACCES
(statfs) Search permission is denied for a component of the path prefix of path.
ELOOP
(statfs) Too many symbolic links were encountered in translating path.
ENAMETOOLONG
(statfs) path is too long.
ENOENT
(statfs) The file referred to by path does not exist.
ENOTDIR
(statfs) A component of the path prefix of path is not a directory.
EFAULT
buf or path points to an invalid address.
EINTR
This call was interrupted by a signal.
EIO
An I/O error occurred while reading from the file system.
ENOMEM
Insufficient kernel memory was available.
ENOSYS
The file system does not support this call.
EOVERFLOW
Some values were too large to be represented in the returned struct.

 

CONFORMING TO

The Linux statfs was inspired by the 4.4BSD one (but they do not use the same structure).  

NOTES ON f_fsid

Solaris, Irix and POSIX have a system call statvfs(2) that returns a struct statvfs (defined in <sys/statvfs.h>) containing an unsigned long f_fsid. Linux, SunOS, HPUX, 4.4BSD have a system call statfs that returns a struct statfs (defined in <sys/vfs.h>) containing a fsid_t f_fsid, where fsid_t is defined as struct { int val[2]; }. The same holds for FreeBSD, except that it uses the include file <sys/mount.h>.

The general idea is that f_fsid contains some random stuff such that the pair (f_fsid,ino) uniquely determines a file. Some OSes use (a variation on) the device number, or the device number combined with the filesystem type. Several OSes restrict giving out the f_fsid field to the superuser only (and zero it for nonprivileged users), because this field is used in the filehandle of the filesystem when NFS-exported, and giving it out is a security concern.

Under some OSes the fsid can be used as second parameter to the sysfs() system call.  

NOTES

The kernel has system calls statfs, fstatfs, statfs64, fstatfs64 to support this library call.

Some systems only have <sys/vfs.h>, other systems also have <sys/statfs.h>, where the former includes the latter. So it seems including the former is the best choice.

LSB has deprecated the library calls [f]statfs() and tells us to use [f]statvfs() instead.  

SEE ALSO

stat(2), statvfs(2)

fsync

NAME

fsync, fdatasync - synchronize a file`s complete in-core state with that on disk  

SYNOPSIS

#include <unistd.h>

int fsync(int fd);

int fdatasync(int fd);  

DESCRIPTION

fsync copies all in-core parts of a file to disk, and waits until the device reports that all parts are on stable storage. It also updates metadata stat information. It does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync on the file descriptor of the directory is also needed.

fdatasync does the same as fsync but only flushes user data, not the meta data like the mtime or atime.

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
fd is not a valid file descriptor open for writing.
EROFS, EINVAL
fd is bound to a special file which does not support synchronization.
EIO
An error occurred during synchronization.
 

NOTES

In case the hard disk has write cache enabled, the data may not really be on permanent storage when fsync/fdatasync return.

When an ext2 file system is mounted with the sync option, directory entries are also implicitly synced by fsync.

On kernels before 2.4, fsync on big files can be inefficient. An alternative might be to use the O_SYNC flag to open(2).  

CONFORMING TO

POSIX.1b (formerly POSIX.4)  

SEE ALSO

bdflush(2), open(2), sync(2), mount(8), update(8), sync(8)

futex

NAME

futex - Fast Userspace Locking system call  

SYNOPSIS

#include <linux/futex.h>

#include <sys/time.h>

int sys_futex (void *futex, int op, int val, const struct timespec *timeout);  

DESCRIPTION

The sys_futex system call provides a method for a program to wait for a value at a given address to change, and a method to wake up anyone waiting on a particular address (while the addresses for the same memory in separate processes may not be equal, the kernel maps them internally so the same memory mapped in different locations will correspond for sys_futex calls). It is typically used to implement the contended case of a lock in shared memory, as described in futex(4).

When a futex(4) operation did not finish uncontended in userspace, a call needs to be made to the kernel to arbitrate. Arbitration can either mean putting the calling process to sleep or, conversely, waking a waiting process.

Callers of this function are expected to adhere to the semantics as set out in futex(4). As these semantics involve writing non-portable assembly instructions, this in turn probably means that most users will in fact be library authors and not general application developers.

The futex argument needs to point to an aligned integer which stores the counter. The operation to execute is passed via the op parameter, along with a value val.

Three operations are currently defined:

FUTEX_WAIT
This operation atomically verifies that the futex address still contains the value given, and sleeps awaiting FUTEX_WAKE on this futex address. If the timeout argument is non-NULL, its contents describe the maximum duration of the wait, which is infinite otherwise. For futex(4), this call is executed if decrementing the count gave a negative value (indicating contention), and will sleep until another process releases the futex and executes the FUTEX_WAKE operation.
FUTEX_WAKE
This operation wakes at most val processes waiting on this futex address (ie. inside FUTEX_WAIT). For futex(4), this is executed if incrementing the count showed that there were waiters, once the futex value has been set to 1 (indicating that it is available).
FUTEX_FD
To support asynchronous wakeups, this operation associates a file descriptor with a futex. If another process executes a FUTEX_WAKE, the process will receive the signal number that was passed in val. The calling process must close the returned file descriptor after use.

To prevent race conditions, the caller should test if the futex has been upped after FUTEX_FD returns.

 

RETURN VALUE

Depending on which operation was executed, the returned value can have differing meanings.

FUTEX_WAIT
Returns 0 if the process was woken by a FUTEX_WAKE call. In case of timeout, ETIMEDOUT is returned. If the futex was not equal to the expected value, the operation returns EWOULDBLOCK. Signals (or other spurious wakeups) cause FUTEX_WAIT to return EINTR.
FUTEX_WAKE
Returns the number of processes woken up.
FUTEX_FD
Returns the new file descriptor associated with the futex.
 

ERRORS

EFAULT
Error in getting timeout information from userspace.
EINVAL
An operation was not defined or error in page alignment.
 

NOTES

To reiterate, bare futexes are not intended as an easy to use abstraction for end-users. Implementors are expected to be assembly literate and to have read the sources of the futex userspace library referenced below.  

AUTHORS

Futexes were designed and worked on by Hubertus Franke (IBM Thomas J. Watson Research Center), Matthew Kirkwood, Ingo Molnar (Red Hat) and Rusty Russell (IBM Linux Technology Center). This page written by bert hubert.  

VERSIONS

Initial futex support was merged in Linux 2.5.7 but with different semantics from those described above. Current semantics are available from Linux 2.5.40 onwards.  

SEE ALSO

futex(4), `Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux` (proceedings of the Ottawa Linux Symposium 2002), futex example library, futex-*.tar.bz2 <URL:ftp://ftp.nl.kernel.org:/pub/linux/kernel/people/rusty/>.

getdents

NAME

getdents - get directory entries  

SYNOPSIS

#include <unistd.h> #include <linux/types.h> #include <linux/dirent.h> #include <linux/unistd.h> _syscall3(int, getdents, uint, fd, struct dirent *, dirp, uint, count); int getdents(unsigned int fd, struct dirent *dirp, unsigned int count);

 

DESCRIPTION

This is not the function you are interested in. Look at readdir(3) for the POSIX conforming C library interface. This page documents the bare kernel system call interface.

The system call getdents reads several dirent structures from the directory pointed at by fd into the memory area pointed to by dirp. The parameter count is the size of the memory area.

The dirent structure is declared as follows:

struct dirent { long d_ino; /* inode number */ off_t d_off; /* offset to next dirent */ unsigned short d_reclen; /* length of this dirent */ char d_name [NAME_MAX+1]; /* file name (null-terminated) */ }

d_ino is an inode number. d_off is the distance from the start of the directory to the start of the next dirent. d_reclen is the size of this entire dirent. d_name is a null-terminated file name.

This call supersedes readdir(2).  

RETURN VALUE

On success, the number of bytes read is returned. On end of directory, 0 is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
Invalid file descriptor fd.
EFAULT
Argument points outside the calling process`s address space.
EINVAL
Result buffer is too small.
ENOENT
No such directory.
ENOTDIR
File descriptor does not refer to a directory.
 

CONFORMING TO

SVr4, SVID. SVr4 documents additional ENOLINK, EIO error conditions.  

SEE ALSO

readdir(2), readdir(3)

getdtablesize

NAME

getdtablesize - get descriptor table size  

SYNOPSIS

#include <unistd.h>

int getdtablesize(void);  

DESCRIPTION

getdtablesize returns the maximum number of files a process can have open, one more than the largest possible value for a file descriptor.  

RETURN VALUE

The current limit on the number of open files per process.  

NOTES

getdtablesize is implemented as a libc library function. The glibc version calls getrlimit(2) and returns the current RLIMIT_NOFILE limit, or OPEN_MAX when that fails. The libc4 and libc5 versions return OPEN_MAX (set to 256 since Linux 0.98.4).  

CONFORMING TO

SVr4, 4.4BSD (the getdtablesize function first appeared in BSD 4.2).  

SEE ALSO

close(2), dup(2), getrlimit(2), open(2)

geteuid

NAME

getuid, geteuid - get user identity  

SYNOPSIS

#include <unistd.h>
#include <sys/types.h>

uid_t getuid(void);
uid_t geteuid(void);  

DESCRIPTION

getuid returns the real user ID of the current process.

geteuid returns the effective user ID of the current process.

The real ID corresponds to the ID of the calling process. The effective ID corresponds to the set ID bit on the file being executed.  

ERRORS

These functions are always successful.  

CONFORMING TO

POSIX, BSD 4.3.  

SEE ALSO

setreuid(2), setuid(2)

getgroups

NAME

getgroups, setgroups - get/set list of supplementary group IDs  

SYNOPSIS

#include <sys/types.h>
#include <unistd.h>

int getgroups(int size, gid_t list[]);

#include <grp.h>

int setgroups(size_t size, const gid_t *list);  

DESCRIPTION

getgroups
Up to size supplementary group IDs are returned in list. It is unspecified whether the effective group ID of the calling process is included in the returned list. (Thus, an application should also call getegid(2) and add or remove the resulting value.) If size is zero, list is not modified, but the total number of supplementary group IDs for the process is returned.
setgroups
Sets the supplementary group IDs for the process. Only the super-user may use this function.
 

RETURN VALUE

getgroups
On success, the number of supplementary group IDs is returned. On error, -1 is returned, and errno is set appropriately.
setgroups
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
 

ERRORS

EFAULT
list has an invalid address.
EPERM
For setgroups, the user is not the super-user.
EINVAL
For setgroups, size is greater than NGROUPS (32 for Linux 2.0.32). For getgroups, size is less than the number of supplementary group IDs, but is not zero.
 

NOTES

A process can have up to at least NGROUPS_MAX supplementary group IDs in addition to the effective group ID. The set of supplementary group IDs is inherited from the parent process and may be changed using setgroups. The maximum number of supplementary group IDs can be found using sysconf(3):

long ngroups_max; ngroups_max = sysconf(_SC_NGROUPS_MAX);

The maximal return value of getgroups cannot be larger than one more than the value obtained this way.

The prototype for setgroups is only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).  

CONFORMING TO

SVr4, SVID (issue 4 only; these calls were not present in SVr3), X/OPEN, 4.3BSD. The getgroups function is in POSIX.1. Since setgroups requires privilege, it is not covered by POSIX.1.  

SEE ALSO

initgroups(3), getgid(2), setgid(2)

gethostname

NAME

gethostname, sethostname - get/set host name  

SYNOPSIS

#include <unistd.h>

int gethostname(char *name, size_t len);
int sethostname(const char *name, size_t len);  

DESCRIPTION

These functions are used to access or to change the host name of the current processor. The gethostname() function returns a NUL-terminated hostname (set earlier by sethostname()) in the array name that has a length of len bytes. In case the NUL-terminated hostname does not fit, no error is returned, but the hostname is truncated. It is unspecified whether the truncated hostname will be NUL-terminated.  

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EINVAL
len is negative or, for sethostname, len is larger than the maximum allowed size, or, for gethostname on Linux/i386, len is smaller than the actual size. (In this last case glibc 2.1 uses ENAMETOOLONG.)
EPERM
For sethostname, the caller was not the superuser.
EFAULT
name is an invalid address.
 

CONFORMING TO

SVr4, 4.4BSD (this function first appeared in 4.2BSD). POSIX 1003.1-2001 specifies gethostname but not sethostname.  

BUGS

For many Linux kernel / libc combinations gethostname will return an error instead of returning a truncated hostname.  

NOTES

SUSv2 guarantees that `Host names are limited to 255 bytes`. POSIX 1003.1-2001 guarantees that `Host names (not including the terminating NUL) are limited to HOST_NAME_MAX bytes`.  

SEE ALSO

getdomainname(2), setdomainname(2), uname(2)

getpagesize

NAME

getpagesize - get memory page size  

SYNOPSIS

#include <unistd.h>

int getpagesize(void);  

DESCRIPTION

The function getpagesize() returns the number of bytes in a page, where a "page" is the thing used where it says in the description of mmap(2) that files are mapped in page-sized units.

The size of the kind of pages that mmap uses, is found using

#include <unistd.h> long sz = sysconf(_SC_PAGESIZE);

(where some systems also allow the synonym _SC_PAGE_SIZE for _SC_PAGESIZE), or

#include <unistd.h> int sz = getpagesize();

 

HISTORY

This call first appeared in 4.2BSD.  

CONFORMING TO

SVr4, 4.4BSD, SUSv2. In SUSv2 the getpagesize() call is labeled "legacy", and in POSIX 1003.1-2001 it has been dropped. HPUX does not have this call.  

NOTES

Whether getpagesize() is present as a Linux system call depends on the architecture. If it is, it returns the kernel symbol PAGE_SIZE, which is architecture and machine model dependent. Generally, one uses binaries that are architecture but not machine model dependent, in order to have a single binary distribution per architecture. This means that a user program should not find PAGE_SIZE at compile time from a header file, but use an actual system call, at least for those architectures (like sun4) where this dependency exists. Here libc4, libc5, glibc 2.0 fail because their getpagesize() returns a statically derived value, and does not use a system call. Things are OK in glibc 2.1.  

SEE ALSO

mmap(2), sysconf(3)

getpgid

NAME

setpgid, getpgid, setpgrp, getpgrp - set/get process group  

SYNOPSIS

#include <unistd.h>

int setpgid(pid_t pid, pid_t pgid);
pid_t getpgid(pid_t pid);
int setpgrp(void);
pid_t getpgrp(void);  

DESCRIPTION

setpgid sets the process group ID of the process specified by pid to pgid. If pid is zero, the process ID of the current process is used. If pgid is zero, the process ID of the process specified by pid is used. If setpgid is used to move a process from one process group to another (as is done by some shells when creating pipelines), both process groups must be part of the same session. In this case, the pgid specifies an existing process group to be joined and the session ID of that group must match the session ID of the joining process.

getpgid returns the process group ID of the process specified by pid. If pid is zero, the process ID of the current process is used.

The call setpgrp() is equivalent to setpgid(0,0).

Similarly, getpgrp() is equivalent to getpgid(0). Each process group is a member of a session and each process is a member of the session of which its process group is a member.

Process groups are used for distribution of signals, and by terminals to arbitrate requests for their input: Processes that have the same process group as the terminal are foreground and may read, while others will block with a signal if they attempt to read. These calls are thus used by programs such as csh(1) to create process groups in implementing job control. The TIOCGPGRP and TIOCSPGRP calls described in termios(3) are used to get/set the process group of the control terminal.

If a session has a controlling terminal, CLOCAL is not set and a hangup occurs, then the session leader is sent a SIGHUP. If the session leader exits, the SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal.

If the exit of the process causes a process group to become orphaned, and if any member of the newly-orphaned process group is stopped, then a SIGHUP signal followed by a SIGCONT signal will be sent to each process in the newly-orphaned process group.

 

RETURN VALUE

On success, setpgid and setpgrp return zero. On error, -1 is returned, and errno is set appropriately.

getpgid returns a process group on success. On error, -1 is returned, and errno is set appropriately.

getpgrp always returns the current process group.  

ERRORS

EINVAL
pgid is less than 0 (setpgid, setpgrp).
EACCES
An attempt was made to change the process group ID of one of the children of the calling process and the child had already performed an execve (setpgid, setpgrp).
EPERM
An attempt was made to move a process into a process group in a different session, or to change the process group ID of one of the children of the calling process and the child was in a different session, or to change the process group ID of a session leader (setpgid, setpgrp).
ESRCH
For getpgid: pid does not match any process. For setpgid: pid is not the current process and not a child of the current process.
 

CONFORMING TO

The functions setpgid and getpgrp conform to POSIX.1. The function setpgrp is from BSD 4.2. The function getpgid conforms to SVr4.  

NOTES

POSIX took setpgid from the BSD function setpgrp. Also SysV has a function with the same name, but it is identical to setsid(2).

To get the prototypes under glibc, define both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer n larger than or equal to 500.  

SEE ALSO

getuid(2), setsid(2), tcgetpgrp(3), tcsetpgrp(3), termios(3)

getpid

NAME

getpid, getppid - get process identification  

SYNOPSIS

#include <sys/types.h>
#include <unistd.h>

pid_t getpid(void);
pid_t getppid(void);  

DESCRIPTION

getpid returns the process ID of the current process. (This is often used by routines that generate unique temporary file names.)

getppid returns the process ID of the parent of the current process.  

CONFORMING TO

POSIX, BSD 4.3, SVID  

SEE ALSO

exec(3), fork(2), kill(2), mkstemp(3), tmpnam(3), tempnam(3), tmpfile(3)

getppid

NAME

getpid, getppid - get process identification  

SYNOPSIS

#include <sys/types.h>
#include <unistd.h>

pid_t getpid(void);
pid_t getppid(void);  

DESCRIPTION

getpid returns the process ID of the current process. (This is often used by routines that generate unique temporary file names.)

getppid returns the process ID of the parent of the current process.  

CONFORMING TO

POSIX, BSD 4.3, SVID  

SEE ALSO

exec(3), fork(2), kill(2), mkstemp(3), tmpnam(3), tempnam(3), tmpfile(3)

getresgid

NAME

getresuid, getresgid - get real, effective and saved user or group ID  

SYNOPSIS

#define _GNU_SOURCE
#include <unistd.h>

int getresuid(uid_t *ruid, uid_t *euid, uid_t *suid);
int getresgid(gid_t *rgid, gid_t *egid, gid_t *sgid);  

DESCRIPTION

getresuid and getresgid (both introduced in Linux 2.1.44) get the real, effective and saved user ID`s (resp. group ID`s) of the current process.

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EFAULT
One of the arguments specified an address outside the calling program`s address space.
 

CONFORMING TO

This call is Linux-specific. The prototype is given by glibc since version 2.3.2 provided _GNU_SOURCE is defined.  

SEE ALSO

getuid(2), setuid(2), setreuid(2), setresuid(2)

getrlimit

NAME

getrlimit, getrusage, setrlimit - get/set resource limits and usage  

SYNOPSIS

#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>

int getrlimit(int resource, struct rlimit *rlim);
int getrusage(int who, struct rusage *usage);
int setrlimit(int resource, const struct rlimit *rlim);  

DESCRIPTION

getrlimit and setrlimit get and set resource limits respectively. Each resource has an associated soft and hard limit, as defined by the rlimit structure (the rlim argument to both getrlimit() and setrlimit()):

struct rlimit { rlim_t rlim_cur; /* Soft limit */ rlim_t rlim_max; /* Hard limit (ceiling for rlim_cur) */ };

The soft limit is the value that the kernel enforces for the corresponding resource. The hard limit acts as a ceiling for the soft limit: an unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit. A privileged process may make arbitrary changes to either limit value.

The value RLIM_INFINITY denotes no limit on a resource (both in the structure returned by getrlimit() and in the structure passed to setrlimit()).

resource must be one of:

RLIMIT_AS
The maximum size of the process`s virtual memory (address space) in bytes. This limit affects calls to brk(2), mmap(2) and mremap(2), which fail with the error ENOMEM upon exceeding this limit. Also automatic stack expansion will fail (and generate a SIGSEGV that kills the process when no alternate stack has been made available). Since the value is a long, on machines with a 32-bit long either this limit is at most 2 GiB, or this resource is unlimited.
RLIMIT_CORE
Maximum size of core file. When 0 no core dump files are created. When nonzero, larger dumps are truncated to this size.
RLIMIT_CPU
CPU time limit in seconds. When the process reaches the soft limit, it is sent a SIGXCPU signal. The default action for this signal is to terminate the process. However, the signal can be caught, and the handler can return control to the main program. If the process continues to consume CPU time, it will be sent SIGXCPU once per second until the hard limit is reached, at which time it is sent SIGKILL. (This latter point describes Linux 2.2 and 2.4 behaviour. Implementations vary in how they treat processes which continue to consume CPU time after reaching the soft limit. Portable applications that need to catch this signal should perform an orderly termination upon first receipt of SIGXCPU.)
RLIMIT_DATA
The maximum size of the process`s data segment (initialized data, uninitialized data, and heap). This limit affects calls to brk() and sbrk(), which fail with the error ENOMEM upon encountering the soft limit of this resource.
RLIMIT_FSIZE
The maximum size of files that the process may create. Attempts to extend a file beyond this limit result in delivery of a SIGXFSZ signal. By default, this signal terminates a process, but a process can catch this signal instead, in which case the relevant system call (e.g., write(), truncate()) fails with the error EFBIG.
RLIMIT_LOCKS
A limit on the combined number of flock() locks and fcntl() leases that this process may establish. (Early Linux 2.4 only.)
RLIMIT_MEMLOCK
The maximum number of bytes of virtual memory that may be locked into RAM using mlock() and mlockall().
RLIMIT_NOFILE
Specifies a value one greater than the maximum file descriptor number that can be opened by this process. Attempts (open(), pipe(), dup(), etc.) to exceed this limit yield the error EMFILE.
RLIMIT_NPROC
The maximum number of processes that can be created for the real user ID of the calling process. Upon encountering this limit, fork() fails with the error EAGAIN.
RLIMIT_RSS
Specifies the limit (in pages) of the process`s resident set (the number of virtual pages resident in RAM). This limit only has effect in Linux 2.4 onwatrds, and there only affects calls to madvise() specifying MADVISE_WILLNEED.
RLIMIT_STACK
The maximum size of the process stack, in bytes. Upon reaching this limit, a SIGSEGV signal is generated. To handle this signal, a process must employ an alternate signal stack (sigaltstack(2)).

RLIMIT_OFILE is the BSD name for RLIMIT_NOFILE.

getrusage returns the current resource usages, for a who of either RUSAGE_SELF or RUSAGE_CHILDREN. The former asks for resources used by the current process, the latter for resources used by those of its children that have terminated and have been waited for.

struct rusage { struct timeval ru_utime; /* user time used */ struct timeval ru_stime; /* system time used */ long ru_maxrss; /* maximum resident set size */ long ru_ixrss; /* integral shared memory size */ long ru_idrss; /* integral unshared data size */ long ru_isrss; /* integral unshared stack size */ long ru_minflt; /* page reclaims */ long ru_majflt; /* page faults */ long ru_nswap; /* swaps */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_msgsnd; /* messages sent */ long ru_msgrcv; /* messages received */ long ru_nsignals; /* signals received */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ };

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EFAULT
rlim or usage points outside the accessible address space.
EINVAL
getrlimit or setrlimit is called with a bad resource, or getrusage is called with a bad who.
EPERM
A non-superuser tries to use setrlimit() to increase the soft or hard limit above the current hard limit, or a superuser tries to increase RLIMIT_NOFILE above the current kernel maximum.
 

CONFORMING TO

SVr4, BSD 4.3  

NOTE

Including <sys/time.h> is not required these days, but increases portability. (Indeed, struct timeval is defined in <sys/time.h>.)

On Linux, if the disposition of SIGCHLD is set to SIG_IGN then the resource usages of child processes are automatically included in the value returned by RUSAGE_CHILDREN, although POSIX 1003.1-2001 explicitly prohibits this.

The above struct was taken from BSD 4.3 Reno. Not all fields are meaningful under Linux. Right now (Linux 2.4, 2.6) only the fields ru_utime, ru_stime, ru_minflt, ru_majflt, and ru_nswap are maintained.  

SEE ALSO

dup(2), fcntl(2), fork(2), mlock(2), mlockall(2), mmap(2), open(2), quotactl(2), sbrk(2), wait3(2), wait4(2), malloc(3), ulimit(3), signal(7)

getsid

NAME

getsid - get session ID  

SYNOPSIS

#include <unistd.h>

pid_t getsid(pid_t pid);  

DESCRIPTION

getsid(0) returns the session ID of the calling process. getsid(p) returns the session ID of the process with process ID p. (The session ID of a process is the process group ID of the session leader.) On error, (pid_t) -1 will be returned, and errno is set appropriately.  

ERRORS

EPERM
A process with process ID p exists, but it is not in the same session as the current process, and the implementation considers this an error.
ESRCH
No process with process ID p was found.
 

CONFORMING TO

SVr4, POSIX 1003.1-2001.  

NOTES

Linux does not return EPERM.

Linux has this system call since Linux 1.3.44. There is libc support since libc 5.2.19.

To get the prototype under glibc, define both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer n larger than or equal to 500.  

SEE ALSO

getpgid(2), setsid(2)

getsockopt

NAME

getsockopt, setsockopt - get and set options on sockets  

SYNOPSIS

#include <sys/types.h>
#include <sys/socket.h>

int getsockopt(int s, int level, int optname, void *optval, socklen_t *optlen);

int setsockopt(int s, int level, int optname, const void *optval, socklen_t optlen);  

DESCRIPTION

Getsockopt and setsockopt manipulate the options associated with a socket. Options may exist at multiple protocol levels; they are always present at the uppermost socket level.

When manipulating socket options the level at which the option resides and the name of the option must be specified. To manipulate options at the socket level, level is specified as SOL_SOCKET. To manipulate options at any other level the protocol number of the appropriate protocol controlling the option is supplied. For example, to indicate that an option is to be interpreted by the TCP protocol, level should be set to the protocol number of TCP; see getprotoent(3).

The parameters optval and optlen are used to access option values for setsockopt. For getsockopt they identify a buffer in which the value for the requested option(s) are to be returned. For getsockopt, optlen is a value-result parameter, initially containing the size of the buffer pointed to by optval, and modified on return to indicate the actual size of the value returned. If no option value is to be supplied or returned, optval may be NULL.

Optname and any specified options are passed uninterpreted to the appropriate protocol module for interpretation. The include file <sys/socket.h> contains definitions for socket level options, described below. Options at other protocol levels vary in format and name; consult the appropriate entries in section 4 of the manual.

Most socket-level options utilize an int parameter for optval. For setsockopt, the parameter should be non-zero to enable a boolean option, or zero if the option is to be disabled.

For a description of the available socket options see socket(7) and the appropriate protocol man pages.

 

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is set appropriately.  

ERRORS

EBADF
The argument s is not a valid descriptor.
ENOTSOCK
The argument s is a file, not a socket.
ENOPROTOOPT
The option is unknown at the level indicated.
EFAULT
The address pointed to by optval is not in a valid part of the process address space. For getsockopt, this error may also be returned if optlen is not in a valid part of the process address space.
EINVAL
optlen invalid in setsockopt
 

CONFORMING TO

SVr4, 4.4BSD (these system calls first appeared in 4.2BSD). SVr4 documents additional ENOMEM and ENOSR error codes, but does not document the SO_SNDLOWAT, SO_RCVLOWAT, SO_SNDTIMEO, SO_RCVTIMEO options  

NOTE

The fifth argument of getsockopt and setsockopt is in reality an int [*] (and this is what BSD 4.* and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t. The draft standard has not been adopted yet, but glibc2 already follows it and also has socklen_t [*]. See also accept(2).  

BUGS

Several of the socket options should be handled at lower levels of the system.  

SEE ALSO

ioctl(2), socket(2), getprotoent(3), protocols(5), socket(7), unix(7), tcp(7)

gettimeofday

NAME

gettimeofday, settimeofday - get / set time  

SYNOPSIS

#include <sys/time.h>

int gettimeofday(struct timeval *tv, struct timezone *tz);
int settimeofday(const struct timeval *tv , const struct timezone *tz);  

DESCRIPTION

The functions gettimeofday and settimeofday can get and set the time as well as a timezone. The tv argument is a timeval struct, as specified in <sys/time.h>:

struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* microseconds */ };

and gives the number of seconds and microseconds since the Epoch (see time(2)). The tz argument is a timezone :

struct timezone { int tz_minuteswest; /* minutes W of Greenwich */ int tz_dsttime; /* type of dst correction */ };

The use of the timezone struct is obsolete; the tz_dsttime field has never been used under Linux - it has not been and will not be supported by libc or glibc. Each and every occurrence of this field in the kernel source (other than the declaration) is a bug. Thus, the following is purely of historic interest.

The field tz_dsttime contains a symbolic constant (values are given below) that indicates in which part of the year Daylight Saving Time is in force. (Note: its value is constant throughout the year - it does not indicate that DST is in force, it just selects an algorithm.) The daylight saving time algorithms defined are as follows :

DST_NONE     /* not on dst */

DST_USA     /* USA style dst */

DST_AUST    /* Australian style dst */

DST_WET     /* Western European dst */

DST_MET     /* Middle European dst */

DST_EET     /* Eastern European dst */

DST_CAN     /* Canada */

DST_GB      /* Great Britain and Eire */

DST_RUM     /* Rumania */

DST_TUR     /* Turkey */

DST_AUSTALT /* Australian style with shift in 1986 */

Of course it turned out that the period in which Daylight Saving Time is in force cannot be given by a simple algorithm, one per country; indeed, this period is determined by unpredictable political decisions. So this method of representing time zones has been abandoned. Under Linux, in a call to settimeofday the tz_dsttime field should be zero.

Under Linux there is some peculiar `warp clock` semantics associated to the settimeofday system call if on the very first call (after booting) that has a non-NULL tz argument, the tv argument is NULL and the tz_minuteswest field is nonzero. In such a case it is assumed that the CMOS clock is on local time, and that it has to be incremented by this amount to get UTC system time. No doubt it is a bad idea to use this feature.

The following macros are defined to operate on a struct timeval :

#define timerisset(tvp)
        ((tvp)->tv_sec || (tvp)->tv_usec) #define timercmp(tvp, uvp, cmp) ((tvp)->tv_sec cmp (uvp)->tv_sec || (tvp)->tv_sec == (uvp)->tv_sec && (tvp)->tv_usec cmp (uvp)->tv_usec) #define timerclear(tvp)
        ((tvp)->tv_sec = (tvp)->tv_usec = 0)

If either tv or tz is null, the corresponding structure is not set or returned.

Only the super user may use settimeofday.  

RETURN VALUE

gettimeofday and settimeofday return 0 for success, or -1 for failure (in which case errno is set appropriately).  

ERRORS

EPERM
settimeofday is called by someone other than the superuser.
EINVAL
Timezone (or something else) is invalid.
EFAULT
One of tv or tz pointed outside your accessible address space.
 

NOTE

The prototype for settimeofday and the defines for timercmp, timerisset, timerclear, timeradd, timersub are (since glibc2.2.2) only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).

Traditionally, the fields of struct timeval were longs.  

CONFORMING TO

SVr4, BSD 4.3. POSIX 1003.1-2001 describes gettimeofday() but not settimeofday().  

SEE ALSO

date(1), adjtimex(2), time(2), ctime(3), ftime(3)

getxattr

NAME

getxattr, lgetxattr, fgetxattr - retrieve an extended attribute value  

SYNOPSIS

#include <sys/types.h> #include <attr/xattr.h> ssize_t getxattr (const char *path, const char *name, void *value, size_t size); ssize_t lgetxattr (const char *path, const char *name, void *value, size_t size); ssize_t fgetxattr (int filedes, const char *name, void *value, size_t size);

 

DESCRIPTION

Extended attributes are name:value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the stat(2) data). A complete overview of extended attributes concepts can be found in attr(5).

getxattr retrieves the value of the extended attribute identified by name and associated with the given path in the filesystem. The length of the attribute value is returned.

lgetxattr is identical to getxattr, except in the case of a symbolic link, where the link itself is interrogated, not the file that it refers to.

fgetxattr is identical to getxattr, only the open file pointed to by filedes (as returned by open(2)) is interrogated in place of path.

An extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode. The value of an extended attribute is a chunk of arbitrary textual or binary data of specified length.

An empty buffer of size zero can be passed into these calls to return the current size of the named extended attribute, which can be used to estimate the size of a buffer which is sufficiently large to hold the value associated with the extended attribute.

The interface is designed to allow guessing of initial buffer sizes, and to enlarge buffers when the return value indicates that the buffer provided was too small.  

RETURN VALUE

On success, a positive number is returned indicating the size of the extended attribute value. On failure, -1 is returned and errno is set appropriately.

If the named attribute does not exist, or the process has no access to this attribute, errno is set to ENOATTR.

If the size of the value buffer is too small to hold the result, errno is set to ERANGE.

If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.

The errors documented for the stat(2) system call are also applicable here.  

AUTHORS

Andreas Gruenbacher, <a.gruenbacher@computer.org> and the SGI XFS development team, <linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.  

SEE ALSO

getfattr(1), setfattr(1), open(2), stat(2), setxattr(2), listxattr(2), removexattr(2), and attr(5).

get_thread_area

NAME

get_thread_area - Get a Thread Local Storage (TLS) area  

SYNOPSIS

#include <linux/unistd.h>
#include <asm/ldt.h>

int get_thread_area (struct user_desc *u_info);

 

DESCRIPTION

get_thread_area returns an entry in the current thread`s Thread Local Storage (TLS) array. The index of the entry corresponds to the value of u_info->entry_number, passed in by the user. If the value is in bounds, get_thread_info copies the corresponding TLS entry into the area pointed to by u_info.

 

RETURN VALUE

get_thread_area returns 0 on success. Otherwise, it returns -1 and sets errno appropriately.

 

ERRORS

EINVAL
u_info->entry_number is out of bounds.
EFAULT
u_info is an invalid pointer.

 

CONFORMING TO

get_thread_area is Linux specific and should not be used in programs that are intended to be portable.

 

AVAILABILITY

A version of get_thread_area first appeared in Linux 2.5.32.

 

SEE ALSO

set_thread_area(2), modify_ldt(2)

idle

NAME

idle - make process 0 idle  

SYNOPSIS

#include <unistd.h>

int idle(void);  

DESCRIPTION

idle is an internal system call used during bootstrap. It marks the process`s pages as swappable, lowers its priority, and enters the main scheduling loop. idle never returns.

Only process 0 may call idle. Any user process, even a process with super-user permission, will receive EPERM.  

RETURN VALUE

idle never returns for process 0, and always returns -1 for a user process.  

ERRORS

EPERM
Always, for a user process.
 

CONFORMING TO

This function is Linux-specific, and should not be used in programs intended to be portable.  

NOTES

Since 2.3.13 this system call does not exist anymore.

inb_p

NAME

outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O

 

DESCRIPTION

This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.

They are primarily designed for internal kernel use, but can be used from user space.

You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.

You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.

 

CONFORMING TO

outb and friends are hardware specific. The port and value arguments are in the opposite order from most DOS implementations.  

SEE ALSO

ioperm(2), iopl(2)

inl

NAME

outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O

 

DESCRIPTION

This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.

They are primarily designed for internal kernel use, but can be used from user space.

You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.

You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.

 

CONFORMING TO

outb and friends are hardware specific. The port and value arguments are in the opposite order from most DOS implementations.  

SEE ALSO

ioperm(2), iopl(2)

insb

NAME

outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O

 

DESCRIPTION

This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.

They are primarily designed for internal kernel use, but can be used from user space.

You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.

You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space