NAME
accept - accept a connection on a socket
SYNOPSIS
#include <sys/types.h> #include <sys/socket.h> int accept(int s, struct sockaddr *addr, socklen_t *addrlen);
DESCRIPTION
The accept function is used with connection-based socket types (SOCK_STREAM, SOCK_SEQPACKET and SOCK_RDM). It extracts the first connection request on the queue of pending connections, creates a new connected socket with mostly the same properties as s, and allocates a new file descriptor for the socket, which is returned. The newly created socket is no longer in the listening state. The original socket s is unaffected by this call. Note that any per file descriptor flags (everything that can be set with the F_SETFL fcntl, like non blocking or async state) are not inherited across an accept.
The argument s is a socket that has been created with socket(2), bound to a local address with bind(2), and is listening for connections after a listen(2).
The argument addr is a pointer to a sockaddr structure. This structure is filled in with the address of the connecting entity, as known to the communications layer. The exact format of the address passed in the addr parameter is determined by the socket`s family (see socket(2) and the respective protocol man pages). The addrlen argument is a value-result parameter: it should initially contain the size of the structure pointed to by addr; on return it will contain the actual length (in bytes) of the address returned. When addr is NULL nothing is filled in.
If no pending connections are present on the queue, and the socket is not marked as non-blocking, accept blocks the caller until a connection is present. If the socket is marked non-blocking and no pending connections are present on the queue, accept returns EAGAIN.
In order to be notified of incoming connections on a socket, you can use select(2) or poll(2). A readable event will be delivered when a new connection is attempted and you may then call accept to get a socket for that connection. Alternatively, you can set the socket to deliver SIGIO when activity occurs on a socket; see socket(7) for details.
For certain protocols which require an explicit confirmation, such as DECNet, accept can be thought of as merely dequeuing the next connection request and not implying confirmation. Confirmation can be implied by a normal read or write on the new file descriptor, and rejection can be implied by closing the new socket. Currently only DECNet has these semantics on Linux.
NOTES
There may not always be a connection waiting after a
SIGIO is delivered or
select(2) or
poll(2) return a readability event because the connection might have been removed by an asynchronous network error or another thread before
accept is called. If this happens then the call will block waiting for the next connection to arrive. To ensure that
accept never blocks, the passed socket
s needs to have the
O_NONBLOCK flag set (see
socket(7)).
RETURN VALUE
The call returns -1 on error. If it succeeds, it returns a non-negative integer that is a descriptor for the accepted socket.
ERROR HANDLING
Linux
accept passes already-pending network errors on the new socket as an error code from
accept. This behaviour differs from other BSD socket implementations. For reliable operation the application should detect the network errors defined for the protocol after
accept and treat them like
EAGAIN by retrying. In case of TCP/IP these are
ENETDOWN,
EPROTO,
ENOPROTOOPT,
EHOSTDOWN,
ENONET,
EHOSTUNREACH,
EOPNOTSUPP, and
ENETUNREACH.
ERRORS
accept shall fail if:
- EAGAIN or EWOULDBLOCK
- The socket is marked non-blocking and no connections are present to be accepted.
- EBADF
- The descriptor is invalid.
- ENOTSOCK
- The descriptor references a file, not a socket.
- EOPNOTSUPP
- The referenced socket is not of type SOCK_STREAM.
- EINTR
- The system call was interrupted by a signal that was caught before a valid connection arrived.
- ECONNABORTED
- A connection has been aborted.
- EINVAL
- Socket is not listening for connections.
- EMFILE
- The per-process limit of open file descriptors has been reached.
- ENFILE
- The system maximum for file descriptors has been reached.
accept may fail if:
- EFAULT
- The addr parameter is not in a writable part of the user address space.
- ENOBUFS, ENOMEM
- Not enough free memory. This often means that the memory allocation is limited by the socket buffer limits, not by the system memory.
- EPROTO
- Protocol error.
Linux accept may fail if:
- EPERM
- Firewall rules forbid connection.
In addition, network errors for the new socket and as defined for the protocol may be returned. Various Linux kernels can return other errors such as ENOSR, ESOCKTNOSUPPORT, EPROTONOSUPPORT, ETIMEDOUT. The value ERESTARTSYS may be seen during a trace.
CONFORMING TO
SVr4, 4.4BSD (the
accept function first appeared in BSD 4.2). The BSD man page documents five possible error returns (EBADF, ENOTSOCK, EOPNOTSUPP, EWOULDBLOCK, EFAULT). SUSv3 documents errors EAGAIN, EBADF, ECONNABORTED, EINTR, EINVAL, EMFILE, ENFILE, ENOBUFS, ENOMEM, ENOTSOCK, EOPNOTSUPP, EPROTO, EWOULDBLOCK. In addition, SUSv2 documents EFAULT and ENOSR.
Linux accept does _not_ inherit socket flags like O_NONBLOCK. This behaviour differs from other BSD socket implementations. Portable programs should not rely on this behaviour and always set all required flags on the socket returned from accept.
NOTE
The third argument of
accept was originally declared as an `int *` (and is that under libc4 and libc5 and on many other systems like BSD 4.*, SunOS 4, SGI); a POSIX 1003.1g draft standard wanted to change it into a `size_t *`, and that is what it is for SunOS 5. Later POSIX drafts have `socklen_t *`, and so do the Single Unix Specification and glibc2. Quoting Linus Torvalds:
_Any_ sane library _must_ have "socklen_t" be the same size as int. Anything else breaks any BSD socket layer stuff. POSIX initially _did_ make it a size_t, and I (and hopefully others, but obviously not too many) complained to them very loudly indeed. Making it a size_t is completely broken, exactly because size_t very seldom is the same size as "int" on 64-bit architectures, for example. And it _has_ to be the same size as "int" because that`s what the BSD socket interface is. Anyway, the POSIX people eventually got a clue, and created "socklen_t". They shouldn`t have touched it in the first place, but once they did they felt it had to have a named type for some unfathomable reason (probably somebody didn`t like losing face over having done the original stupid thing, so they silently just renamed their blunder). SEE ALSO
bind(2),
connect(2),
listen(2),
select(2),
socket(2)
NAME
acct - switch process accounting on or off
SYNOPSIS
#include <unistd.h> int acct(const char *filename);
DESCRIPTION
When called with the name of an existing file as argument, accounting is turned on, records for each terminating process are appended to
filename as it terminates. An argument of
NULL causes accounting to be turned off.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EACCES
- Write permission is denied for the specified file.
- EACCES
- The argument filename is not a regular file.
- EFAULT
- filename points outside your accessible address space.
- EIO
- Error writing to the file filename.
- EISDIR
- filename is a directory.
- ELOOP
- Too many symbolic links were encountered in resolving filename.
- ENAMETOOLONG
- filename was too long.
- ENOENT
- The specified filename does not exist.
- ENOMEM
- Out of memory.
- ENOSYS
- BSD process accounting has not been enabled when the operating system kernel was compiled. The kernel configuration parameter controlling this feature is CONFIG_BSD_PROCESS_ACCT.
- ENOTDIR
- A component used as a directory in filename is not in fact a directory.
- EPERM
- The calling process has no permission to enable process accounting.
- EROFS
- filename refers to a file on a read-only file system.
- EUSERS
- There are no more free file structures or we ran out of memory.
CONFORMING TO
SVr4 (but not POSIX). SVr4 documents an EBUSY error condition, but no EISDIR or ENOSYS. Also AIX and HPUX document EBUSY (attempt is made to enable accounting when it is already enabled), as does Solaris (attempt is made to enable accounting using the same file that is currently being used).
NOTES
No accounting is produced for programs running when a crash occurs. In particular, nonterminating processes are never accounted for.
NAME
afs_syscall, break, ftime, getpmsg, gtty, lock, mpx, prof, profil, putpmsg, security, stty, ulimit - unimplemented system calls
SYNOPSIS
Unimplemented system calls.
DESCRIPTION
These system calls are not implemented in the Linux 2.4 kernel.
RETURN VALUE
These system calls always return -1 and set
errno to
ENOSYS.
NOTES
Note that
ftime(3),
profil(3) and
ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2) and {create,init,delete}_module(2) only exist when the Linux kernel was built with support for them.
SEE ALSO
obsolete(2)
NAME
alloc_hugepages, free_hugepages - allocate or free huge pages
SYNOPSIS
void *alloc_hugepages(int key, void *addr, size_t len, int prot, int flag); int free_hugepages(void *addr);
DESCRIPTION
The system calls
alloc_hugepages and
free_hugepages were introduced in Linux 2.5.36 and removed again in 2.5.54. They existed only on i386 and ia64 (when built with CONFIG_HUGETLB_PAGE). In Linux 2.4.20 the syscall numbers exist, but the calls return ENOSYS.
On i386 the memory management hardware knows about ordinary pages (4 KiB) and huge pages (2 or 4 MiB). Similarly ia64 knows about huge pages of several sizes. These system calls serve to map huge pages into the process` memory or to free them again. Huge pages are locked into memory, and are not swapped.
The key parameter is an identifier. When zero the pages are private, and not inherited by children. When positive the pages are shared with other applications using the same key, and inherited by child processes.
The addr parameter of free_hugepages() tells which page is being freed - it was the return value of a call to alloc_hugepages(). (The memory is first actually freed when all users have released it.) The addr parameter of alloc_hugepages() is a hint, that the kernel may or may not follow. Addresses must be properly aligned.
The len parameter is the length of the required segment. It must be a multiple of the huge page size.
The prot parameter specifies the memory protection of the segment. It is one of PROT_READ, PROT_WRITE, PROT_EXEC.
The flag parameter is ignored, unless key is positive. In that case, if flag is IPC_CREAT, then a new huge page segment is created when none with the given key existed. If this flag is not set, then ENOENT is returned when no segment with the given key exists. .SHRETURN VALUE On success, alloc_hugepages returns the allocated virtual address, and free_hugepages returns zero. On error, -1 is returned, and errno is set appropriately.
ERRORS
- ENOSYS
- The system call is not supported on this kernel.
CONFORMING TO
These calls existed only in Linux 2.5.36 - 2.5.54. These calls are specific to Linux on Intel processors, and should not be used in programs intended to be portable. Indeed, the system call numbers are marked for reuse, so programs using these may do something random on a future kernel.
FILES
/proc/sys/vm/nr_hugepages Number of configured hugetlb pages. This can be read and written.
/proc/meminfo Gives info on the number of configured hugetlb pages and on their size in the three variables HugePages_Total, HugePages_Free, Hugepagesize.
NOTES
The system calls are gone. Now the hugetlbfs filesystem can be used instead. Memory backed by huge pages (if the CPU supports them) is obtained by mmap`ing files in this virtual filesystem.
The maximal number of huge pages can be specified using the hugepages= boot parameter.
NAME
bdflush - start, flush, or tune buffer-dirty-flush daemon
SYNOPSIS
int bdflush(int func, long *address); int bdflush(int func, long data);
DESCRIPTION
bdflush starts, flushes, or tunes the buffer-dirty-flush daemon. Only the super-user may call
bdflush.
If func is negative or 0, and no daemon has been started, then bdflush enters the daemon code and never returns.
If func is 1, some dirty buffers are written to disk.
If func is 2 or more and is even (low bit is 0), then address is the address of a long word, and the tuning parameter numbered (func-2)/2 is returned to the caller in that address.
If func is 3 or more and is odd (low bit is 1), then data is a long word, and the kernel sets tuning parameter numbered (func-3)/2 to that value.
The set of parameters, their values, and their legal ranges are defined in the kernel source file fs/buffer.c.
RETURN VALUE
If
func is negative or 0 and the daemon successfully starts,
bdflush never returns. Otherwise, the return value is 0 on success and -1 on failure, with
errno set to indicate the error.
ERRORS
- EPERM
- Caller is not super-user.
- EFAULT
- address points outside your accessible address space.
- EBUSY
- An attempt was made to enter the daemon code after another process has already entered.
- EINVAL
- An attempt was made to read or write an invalid parameter number, or to write an invalid value to a parameter.
CONFORMING TO
bdflush is Linux specific and should not be used in programs intended to be portable.
SEE ALSO
fsync(2),
sync(2),
update(8),
sync(8)
NAME
afs_syscall, break, ftime, getpmsg, gtty, lock, mpx, prof, profil, putpmsg, security, stty, ulimit - unimplemented system calls
SYNOPSIS
Unimplemented system calls.
DESCRIPTION
These system calls are not implemented in the Linux 2.4 kernel.
RETURN VALUE
These system calls always return -1 and set
errno to
ENOSYS.
NOTES
Note that
ftime(3),
profil(3) and
ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2) and {create,init,delete}_module(2) only exist when the Linux kernel was built with support for them.
SEE ALSO
obsolete(2)
NAME
cacheflush - flush contents of instruction and/or data cache
SYNOPSIS
#include <asm/cachectl.h> int cacheflush(char *addr, int nbytes, int cache);
DESCRIPTION
cacheflush flushes contents of indicated cache(s) for user addresses in the range addr to (addr+nbytes-1). Cache may be one of:
- ICACHE
- Flush the instruction cache.
- DCACHE
- Write back to memory and invalidate the affected valid cache lines.
- BCACHE
- Same as (ICACHE|DCACHE).
RETURN VALUE
cacheflush returns 0 on success or -1 on error. If errors are detected, errno will indicate the error.
ERRORS
- EINVAL
- cache parameter is not one of ICACHE, DCACHE, or BCACHE.
- EFAULT
- Some or all of the address range addr to (addr+nbytes-1) is not accessible.
BUGS
The current implementation ignores the addr and nbytes parameters. Therefore always the whole cache is flushed.
NOTE
This system call is only available on MIPS based systems. It should not be used in programs intended to be portable.
NAME
chmod, fchmod - change permissions of a file
SYNOPSIS
#include <sys/types.h> #include <sys/stat.h> int chmod(const char *path, mode_t mode);
int fchmod(int fildes, mode_t mode);
DESCRIPTION
The mode of the file given by
path or referenced by
fildes is changed.
Modes are specified by or`ing the following:
-
- S_ISUID
- 04000 set user ID on execution
- S_ISGID
- 02000 set group ID on execution
- S_ISVTX
- 01000 sticky bit
- S_IRUSR (S_IREAD)
- 00400 read by owner
- S_IWUSR (S_IWRITE)
- 00200 write by owner
- S_IXUSR (S_IEXEC)
- 00100 execute/search by owner
- S_IRGRP
- 00040 read by group
- S_IWGRP
- 00020 write by group
- S_IXGRP
- 00010 execute/search by group
- S_IROTH
- 00004 read by others
- S_IWOTH
- 00002 write by others
- S_IXOTH
- 00001 execute/search by others
The effective UID of the process must be zero or must match the owner of the file.
If the effective UID of the process is not zero and the group of the file does not match the effective group ID of the process or one of its supplementary group IDs, the S_ISGID bit will be turned off, but this will not cause an error to be returned.
Depending on the file system, set user ID and set group ID execution bits may be turned off if a file is written. On some file systems, only the super-user can set the sticky bit, which may have a special meaning. For the sticky bit, and for set user ID and set group ID bits on directories, see stat(2).
On NFS file systems, restricting the permissions will immediately influence already open files, because the access control is done on the server, but open files are maintained by the client. Widening the permissions may be delayed for other clients if attribute caching is enabled on them.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
Depending on the file system, other errors can be returned. The more general errors for
chmod are listed below:
- EPERM
- The effective UID does not match the owner of the file, and is not zero.
- EROFS
- The named file resides on a read-only file system.
- EFAULT
- path points outside your accessible address space.
- ENAMETOOLONG
- path is too long.
- ENOENT
- The file does not exist.
- ENOMEM
- Insufficient kernel memory was available.
- ENOTDIR
- A component of the path prefix is not a directory.
- EACCES
- Search permission is denied on a component of the path prefix.
- ELOOP
- Too many symbolic links were encountered in resolving path.
- EIO
- An I/O error occurred.
The general errors for fchmod are listed below:
- EBADF
- The file descriptor fildes is not valid.
- EROFS
- See above.
- EPERM
- See above.
- EIO
- See above.
CONFORMING TO
The
chmod call conforms to SVr4, SVID, POSIX, X/OPEN, 4.4BSD. SVr4 documents EINTR, ENOLINK and EMULTIHOP returns, but no ENOMEM. POSIX.1 does not document EFAULT, ENOMEM, ELOOP or EIO error conditions, or the macros
S_IREAD,
S_IWRITE and
S_IEXEC.
The fchmod call conforms to 4.4BSD and SVr4. SVr4 documents additional EINTR and ENOLINK error conditions. POSIX requires the fchmod function if at least one of _POSIX_MAPPED_FILES and _POSIX_SHARED_MEMORY_OBJECTS is defined, and documents additional ENOSYS and EINVAL error conditions, but does not document EIO.
POSIX and X/OPEN do not document the sticky bit.
SEE ALSO
open(2),
chown(2),
execve(2),
stat(2)
NAME
chroot - change root directory
SYNOPSIS
#include <unistd.h> int chroot(const char *path);
DESCRIPTION
chroot changes the root directory to that specified in
path. This directory will be used for path names beginning with /. The root directory is inherited by all children of the current process.
Only the super-user may change the root directory.
Note that this call does not change the current working directory, so that `.` can be outside the tree rooted at `/`. In particular, the super-user can escape from a `chroot jail` by doing `mkdir foo; chroot foo; cd ..`.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
Depending on the file system, other errors can be returned. The more general errors are listed below:
- EPERM
- The effective UID is not zero.
- EFAULT
- path points outside your accessible address space.
- ENAMETOOLONG
- path is too long.
- ENOENT
- The file does not exist.
- ENOMEM
- Insufficient kernel memory was available.
- ENOTDIR
- A component of path is not a directory.
- EACCES
- Search permission is denied on a component of the path prefix.
- ELOOP
- Too many symbolic links were encountered in resolving path.
- EIO
- An I/O error occurred.
CONFORMING TO
SVr4, SVID, 4.4BSD, X/OPEN. This function is not part of POSIX.1. SVr4 documents additional EINTR, ENOLINK and EMULTIHOP error conditions. X/OPEN does not document EIO, ENOMEM or EFAULT error conditions. This interface is marked as legacy by X/OPEN.
NOTES
FreeBSD has a stronger
jail() system call.
SEE ALSO
chdir(2)
NAME
close - close a file descriptor
SYNOPSIS
#include <unistd.h> int close(int fd);
DESCRIPTION
close closes a file descriptor, so that it no longer refers to any file and may be reused. Any locks held on the file it was associated with, and owned by the process, are removed (regardless of the file descriptor that was used to obtain the lock).
If fd is the last copy of a particular file descriptor the resources associated with it are freed; if the descriptor was the last reference to a file which has been removed using unlink(2) the file is deleted.
RETURN VALUE
close returns zero on success, or -1 if an error occurred.
ERRORS
- EBADF
- fd isn`t a valid open file descriptor.
- EINTR
- The close() call was interrupted by a signal.
- EIO
- An I/O error occurred.
CONFORMING TO
SVr4, SVID, POSIX, X/OPEN, BSD 4.3. SVr4 documents an additional ENOLINK error condition.
NOTES
Not checking the return value of close is a common but nevertheless serious programming error. It is quite possible that errors on a previous
write(2) operation are first reported at the final
close. Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and disk quotas.
A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)
SEE ALSO
open(2),
fcntl(2),
shutdown(2),
unlink(2),
fclose(3),
fsync(2)
NAME
open, creat - open and possibly create a file or device
SYNOPSIS
#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int open(const char *pathname, int flags); int open(const char *pathname, int flags, mode_t mode); int creat(const char *pathname, mode_t mode);
DESCRIPTION
The
open() system call is used to convert a pathname into a file descriptor (a small, non-negative integer for use in subsequent I/O as with
read,
write, etc.). When the call is successful, the file descriptor returned will be the lowest file descriptor not currently open for the process. This call creates a new open file, not shared with any other process. (But shared open files may arise via the
fork(2) system call.) The new file descriptor is set to remain open across exec functions (see
fcntl(2)). The file offset is set to the beginning of the file.
The parameter flags is one of O_RDONLY, O_WRONLY or O_RDWR which request opening the file read-only, write-only or read/write, respectively, bitwise-or`d with zero or more of the following:
- O_CREAT
- If the file does not exist it will be created. The owner (user ID) of the file is set to the effective user ID of the process. The group ownership (group ID) is set either to the effective group ID of the process or to the group ID of the parent directory (depending on filesystem type and mount options, and the mode of the parent directory, see, e.g., the mount options bsdgroups and sysvgroups of the ext2 filesystem, as described in mount(8)).
- O_EXCL
- When used with O_CREAT, if the file already exists it is an error and the open will fail. In this context, a symbolic link exists, regardless of where its points to. O_EXCL is broken on NFS file systems, programs which rely on it for performing locking tasks will contain a race condition. The solution for performing atomic file locking using a lockfile is to create a unique file on the same fs (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.
- O_NOCTTY
- If pathname refers to a terminal device --- see tty(4) --- it will not become the process`s controlling terminal even if the process does not have one.
- O_TRUNC
- If the file already exists and is a regular file and the open mode allows writing (i.e., is O_RDWR or O_WRONLY) it will be truncated to length 0. If the file is a FIFO or terminal device file, the O_TRUNC flag is ignored. Otherwise the effect of O_TRUNC is unspecified.
- O_APPEND
- The file is opened in append mode. Before each write, the file pointer is positioned at the end of the file, as if with lseek. O_APPEND may lead to corrupted files on NFS file systems if more than one process appends data to a file at once. This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can`t be done without a race condition.
- O_NONBLOCK or O_NDELAY
- When possible, the file is opened in non-blocking mode. Neither the open nor any subsequent operations on the file descriptor which is returned will cause the calling process to wait. For the handling of FIFOs (named pipes), see also fifo(4). This mode need not have any effect on files other than FIFOs.
- O_SYNC
- The file is opened for synchronous I/O. Any writes on the resulting file descriptor will block the calling process until the data has been physically written to the underlying hardware. See RESTRICTIONS below, though.
- O_NOFOLLOW
- If pathname is a symbolic link, then the open fails. This is a FreeBSD extension, which was added to Linux in version 2.1.126. Symbolic links in earlier components of the pathname will still be followed. The headers from glibc 2.0.100 and later include a definition of this flag; kernels before 2.1.126 will ignore it if used.
- O_DIRECTORY
- If pathname is not a directory, cause the open to fail. This flag is Linux-specific, and was added in kernel version 2.1.126, to avoid denial-of-service problems if opendir(3) is called on a FIFO or tape device, but should not be used outside of the implementation of opendir.
- O_DIRECT
- Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is synchronous, i.e., at the completion of the read(2) or write(2) system call, data is guaranteed to have been transferred. Under Linux 2.4 transfer sizes, and the alignment of user buffer and file offset must all be multiples of the logical block size of the file system. Under Linux 2.6 alignment to 512-byte boundaries suffices.
A semantically similar interface for block devices is described in raw(8). - O_ASYNC
- Generate a signal (SIGIO by default, but this can be changed via fcntl(2)) when input or output becomes possible on this file descriptor. This feature is only available for terminals, pseudo-terminals, and sockets. See fcntl(2) for further details.
- O_LARGEFILE
- On 32-bit systems that support the Large Files System, allow files whose sizes cannot be represented in 31 bits to be opened.
Some of these optional flags can be altered using fcntl after the file has been opened.
The argument mode specifies the permissions to use in case a new file is created. It is modified by the process`s umask in the usual way: the permissions of the created file are (mode & ~umask). Note that this mode only applies to future accesses of the newly created file; the open call that creates a read-only file may well return a read/write file descriptor.
The following symbolic constants are provided for mode:
- S_IRWXU
- 00700 user (file owner) has read, write and execute permission
- S_IRUSR (S_IREAD)
- 00400 user has read permission
- S_IWUSR (S_IWRITE)
- 00200 user has write permission
- S_IXUSR (S_IEXEC)
- 00100 user has execute permission
- S_IRWXG
- 00070 group has read, write and execute permission
- S_IRGRP
- 00040 group has read permission
- S_IWGRP
- 00020 group has write permission
- S_IXGRP
- 00010 group has execute permission
- S_IRWXO
- 00007 others have read, write and execute permission
- S_IROTH
- 00004 others have read permission
- S_IWOTH
- 00002 others have write permisson
- S_IXOTH
- 00001 others have execute permission
mode must be specified when O_CREAT is in the flags, and is ignored otherwise.
creat is equivalent to open with flags equal to O_CREAT|O_WRONLY|O_TRUNC.
RETURN VALUE
open and
creat return the new file descriptor, or -1 if an error occurred (in which case,
errno is set appropriately). Note that
open can open device special files, but
creat cannot create them - use
mknod(2) instead.
On NFS file systems with UID mapping enabled, open may return a file descriptor but e.g. read(2) requests are denied with EACCES. This is because the client performs open by checking the permissions, but UID mapping is performed by the server upon read and write requests.
If the file is newly created, its atime, ctime, mtime fields are set to the current time, and so are the ctime and mtime fields of the parent directory. Otherwise, if the file is modified because of the O_TRUNC flag, its ctime and mtime fields are set to the current time.
ERRORS
- EEXIST
- pathname already exists and O_CREAT and O_EXCL were used.
- EISDIR
- pathname refers to a directory and the access requested involved writing (that is, O_WRONLY or O_RDWR is set).
- EACCES
- The requested access to the file is not allowed, or one of the directories in pathname did not allow search (execute) permission, or the file did not exist yet and write access to the parent directory is not allowed.
- ENAMETOOLONG
- pathname was too long.
- ENOENT
- O_CREAT is not set and the named file does not exist. Or, a directory component in pathname does not exist or is a dangling symbolic link.
- ENOTDIR
- A component used as a directory in pathname is not, in fact, a directory, or O_DIRECTORY was specified and pathname was not a directory.
- ENXIO
- O_NONBLOCK | O_WRONLY is set, the named file is a FIFO and no process has the file open for reading. Or, the file is a device special file and no corresponding device exists.
- ENODEV
- pathname refers to a device special file and no corresponding device exists. (This is a Linux kernel bug - in this situation ENXIO must be returned.)
- EROFS
- pathname refers to a file on a read-only filesystem and write access was requested.
- ETXTBSY
- pathname refers to an executable image which is currently being executed and write access was requested.
- EFAULT
- pathname points outside your accessible address space.
- ELOOP
- Too many symbolic links were encountered in resolving pathname, or O_NOFOLLOW was specified but pathname was a symbolic link.
- ENOSPC
- pathname was to be created but the device containing pathname has no room for the new file.
- ENOMEM
- Insufficient kernel memory was available.
- EMFILE
- The process already has the maximum number of files open.
- ENFILE
- The limit on the total number of files open on the system has been reached.
NOTE
Under Linux, the O_NONBLOCK flag indicates that one wants to open but does not necessarily have the intention to read or write. This is typically used to open devices in order to get a file descriptor for use with
ioctl(2).
CONFORMING TO
SVr4, SVID, POSIX, X/OPEN, BSD 4.3. The
O_NOFOLLOW and
O_DIRECTORY flags are Linux-specific. One may have to define the
_GNU_SOURCE macro to get their definitions.
The (undefined) effect of O_RDONLY | O_TRUNC various among implementations. On many systems the file is actually truncated.
The O_DIRECT flag was introduced in SGI IRIX, where it has alignment restrictions similar to those of Linux 2.4. IRIX has also a fcntl(2) call to query appropriate alignments, and sizes. FreeBSD 4.x introduced a flag of same name, but without alignment restrictions. Support was added under Linux in kernel version 2.4.10. Older Linux kernels simply ignore this flag.
BUGS
"The thing that has always disturbed me about O_DIRECT is that the whole interface is just stupid, and was probably designed by a deranged monkey on some serious mind-controlling substances." -- Linus
RESTRICTIONS
There are many infelicities in the protocol underlying NFS, affecting amongst others
O_SYNC and
O_NDELAY.
POSIX provides for three different variants of synchronised I/O, corresponding to the flags O_SYNC, O_DSYNC and O_RSYNC. Currently (2.1.130) these are all synonymous under Linux.
SEE ALSO
read(2),
write(2),
fcntl(2),
close(2),
link(2),
mknod(2),
mount(2),
stat(2),
umask(2),
unlink(2),
socket(2),
fopen(3),
fifo(4)
NAME
DC_CTX_new, DC_CTX_free, DC_CTX_add_session, DC_CTX_remove_session, DC_CTX_get_session, DC_CTX_reget_session, DC_CTX_has_session - distcache blocking client API
SYNOPSIS
#include <distcache/dc_client.h>
DC_CTX *DC_CTX_new(const char *target, unsigned int flags); void DC_CTX_free(DC_CTX *ctx); int DC_CTX_add_session(DC_CTX *ctx, const unsigned char *id_data, unsigned int id_len, const unsigned char *sess_data, unsigned int sess_len, unsigned long timeout_msecs); int DC_CTX_remove_session(DC_CTX *ctx, const unsigned char *id_data, unsigned int id_len); int DC_CTX_get_session(DC_CTX *ctx, const unsigned char *id_data, unsigned int id_len, unsigned char *result_storage, unsigned int result_size, unsigned int *result_used); int DC_CTX_reget_session(DC_CTX *ctx, const unsigned char *id_data, unsigned int id_len, unsigned char *result_storage, unsigned int result_size, unsigned int *result_used); int DC_CTX_has_session(DC_CTX *ctx, const unsigned char *id_data, unsigned int id_len);
DESCRIPTION
DC_CTX_new() allocates and initialises a
<FONT SIZE="-1">
DC_CTX</FONT>
structure with an address for sending session caching operation requests to, and flags controlling the behaviour of the
<FONT SIZE="-1">
DC_CTX</FONT>
object. The address specified by
target should be compatible with the syntax defined by the
libnal <FONT SIZE="-1">API</FONT>, see the ``<FONT SIZE="-1">NOTES</FONT>`` section below. The
flags parameter can be zero to indicate that each cache operation should create and destroy a temporary connection, otherwise a bitmask combining one or more of the following flags;
#define DC_CTX_FLAG_PERSISTENT (unsigned int)0x0001 #define DC_CTX_FLAG_PERSISTENT_PIDCHECK (unsigned int)0x0002 #define DC_CTX_FLAG_PERSISTENT_RETRY (unsigned int)0x0004 #define DC_CTX_FLAG_PERSISTENT_LATE (unsigned int)0x0008
DC_CTX_free() frees the ctx object.
DC_CTX_add_session() attempts to add session data to the cache. id_data and id_len define the unique session <FONT SIZE="-1">ID</FONT> corresponding to the session data - this is the <FONT SIZE="-1">ID</FONT> used in DC_CTX_get_session() or DC_CTX_remove_session() to refer to the session being added, and the ``add`` operation will fail if there is already a session with a matching <FONT SIZE="-1">ID</FONT> in the cache. sess_data and sess_len define the session data itself to be stored in the cache. timeout_msecs specifies the expiry period for the session - if this period of time passes without the corresponding session being explicitly removed nor scrolled out of the cache because of over-filling, then the cache server will remove the session from the cache anyway.
DC_CTX_remove_session() provides a session <FONT SIZE="-1">ID</FONT> with id_data and id_len and requests that the corresponding session be removed from the cache.
DC_CTX_get_session() provides a session <FONT SIZE="-1">ID</FONT> with id_data and id_len and requests that the corresponding session data be retrieved from the cache. result_storage and result_size specify a storage area for the retrieved session data, and result_used points to a variable that will be set to the length of the retrieved session data. Even if DC_CTX_get_session() returns successfully, the caller should check the value of result_used - if it is larger than result_size then the requested session data was too big for the provided storage area and only partial data will have been returned. In this case, the caller should immediately call DC_CTX_reget_session().
DC_CTX_reget_session() is similar to DC_CTX_get_session() except that it does not perform any network operations at all. It is designed to return session data that had previously been retrieved by DC_CTX_get_session(), so that a larger storage area can be provided if the one first provided to DC_CTX_get_session() was too small. This function will fail if the last operation on ctx was not DC_CTX_get_session() with an exact match for id_data and id_len.
DC_CTX_has_session() is similar to DC_CTX_get_session() except that it does not ask for session data to be returned, merely to know whether the session is in the cache or not. This should be used by any application that already has a copy of the required session but merely wishes to verify that it hasn`t already been explicitly invalidated. As distcache allows parallel use of a single cache from multiple clients across potentially multiple machines, it is a security flaw for any client (thread, process, or machine) to implement local session caching and using its sessions whenever there is a cache-hit. If the session was used and for any reason required invalidation (eg. renegotiation, data corruption detected, etc) then another client should not use a locally cached copy of the session without first verifying with the shared cache that the session is still <FONT SIZE="-1">OK</FONT>. This function should be used in such cases as it provides the same check as DC_CTX_get_session() but with less network overhead.
RETURN VALUES
DC_CTX_new() returns a valid
<FONT SIZE="-1">
DC_CTX</FONT>
object on success, otherwise <FONT SIZE="-1">NULL</FONT> for failure.
DC_CTX_free() has no return type.
All other <FONT SIZE="-1">DC_CTX</FONT> functions return zero on failure, otherwise non-zero.
NOTES
The following code snippet attempts to create a session cache context that uses a temporary connection for each operation to a local
dc_client agent running on a unix domain socket at /tmp/dc_client;
DC_CTX *ctx = DC_CTX_new("UNIX:/tmp/dc_client", 0);
The following code snippet attempts to create a session cache context to communicate with a remote server listening on TCP/IPv4 port 9001. It will attempt to use a persistent connection for all cache operations (<FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT</FONT>), retry once for any cache operation that suffers a network I/O error (<FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT_RETRY</FONT>), will wait until the first cache operation before trying to connect (<FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT_LATE</FONT>), and will verify before any cache operation whether it is running in a different process than it used to be and if so will close then re-open a new connection (<FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT_PIDCHECK</FONT>).
DC_CTX *ctx = DC_CTX_new("IP:cacheserver.localnet", DC_CTX_FLAG_PERSISTENT | DC_CTX_FLAG_PERSISTENT_PIDCHECK | DC_CTX_FLAG_PERSISTENT_RETRY | DC_CTX_FLAG_PERSISTENT_LATE);
The <FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT_RETRY</FONT> flag exists because of the -idle command-line switch in the dc_client(1) tool. This switch allows dc_client to automatically close client connections that have been idle for some configurable length of time. However, this creates the possiblity for race conditions if a persistent <FONT SIZE="-1">DC_CTX</FONT> is used by an application to request a cache operation at the same time or following a decision by dc_client to close the connection. The most robust way to address this is to have <FONT SIZE="-1">DC_CTX</FONT> regard any first network error during the operation as an idle-timeout from the peer and to immediately re-connect and retry the operation. Any subsequent error (or initial error that can not be timeout-related, such as connection failure) is considered a failure and will not result in any retry.
The <FONT SIZE="-1">DC_CTX_FLAG_PERSISTENT_PIDCHECK</FONT> flag exists for software like Apache or Stunnel that use fork(2) or clone(2) to create child processes that inherit file-descriptors from the parent process. In such circumstances, attempts by the parent and child processes to communicate over the same file-descriptor can have unpredictable results and is, generally speaking, never useful. This flag will force a check before each operation that the process <FONT SIZE="-1">ID</FONT> is ``what it used to be`` and if not, will close any persistent connection, reconnect with a new file-descriptor, and reset the process <FONT SIZE="-1">ID</FONT> in the <FONT SIZE="-1">DC_CTX</FONT>. If a parent process has a <FONT SIZE="-1">DC_CTX</FONT> that has a connection open, this flag will ensure that any subsequent child processes that attempt to perform cache operations will transparently reconnect with their own connections.
SEE ALSO
DC_PLUG_new(2),
DC_PLUG_read(2) - Lower-level asynchronous implementation of the distcache protocol, useful for client and server operation. This
<FONT SIZE="-1">
DC_CTX</FONT>
implementation is built on top of the
<FONT SIZE="-1">
DC_PLUG</FONT>
functionality.
distcache(8) - Overview of the distcache architecture.
http://www.distcache.org/ - Distcache home page.
AUTHOR
This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at
geoff@geoffthorpe.net.
Home Page: http://www.distcache.org
NAME
DC_PLUG_read, DC_PLUG_consume, DC_PLUG_write, DC_PLUG_write_more, DC_PLUG_commit, DC_PLUG_rollback - DC_PLUG read/write functions
SYNOPSIS
#include <distcache/dc_plug.h>
int DC_PLUG_read(DC_PLUG *plug, int resume, unsigned long *request_uid, DC_CMD *cmd, const unsigned char **payload_data, unsigned int *payload_len); int DC_PLUG_consume(DC_PLUG *plug); int DC_PLUG_write(DC_PLUG *plug, int resume, unsigned long request_uid, DC_CMD cmd, const unsigned char *payload_data, unsigned int payload_len); int DC_PLUG_write_more(DC_PLUG *plug, const unsigned char *data, unsigned int data_len); int DC_PLUG_commit(DC_PLUG *plug); int DC_PLUG_rollback(DC_PLUG *plug);
typedef enum { DC_CMD_ERROR, DC_CMD_ADD, DC_CMD_GET, DC_CMD_REMOVE, DC_CMD_HAVE } DC_CMD;
DESCRIPTION
DC_PLUG_read() will attempt to open the next distcache message received by
plug for reading. This message will block the reading of any other received messages remain until
DC_PLUG_consume() is called. If a message has already been opened for reading inside
plug, then
DC_PLUG_read() will fail unless
resume is set to non-zero in which case it will simply re-open the message that was already being read. If
DC_PLUG_read() succeeds,
request_uid,
cmd,
payload_data and
payload_len are populated with the message`s data. Note that
payload_data points to the original data stored inside
plug and this pointer is only valid until the next call to
DC_PLUG_consume().
DC_PLUG_consume() will close the message currently opened for reading in plug, and will allow a future call to DC_PLUG_read() to succeed if there any subsequent (complete) messages received from the plug object`s connection.
DC_PLUG_write() will attempt to open a distcache message for writing in plug. If successful, this message will block the writing of any other messages until the message is committed with DC_PLUG_commit() or discarded with DC_PLUG_rollback(). If a message has already been opened for writing, DC_PLUG_write() will fail unless resume is non-zero in which case the message will be re-opened and will overwrite the settings from the previous DC_PLUG_write() call. This is equivalent to DC_PLUG_rollback() followed immediately by DC_PLUG_write() with a zero resume value. Note that payload_len can be zero (and thus payload_data can be <FONT SIZE="-1">NULL</FONT>) even if the message will eventually have payload data - this can be supplemented afterwards using the DC_PLUG_write_more() function. request_uid and cmd, on the other hand, must be specified at once in DC_PLUG_write().
DC_PLUG_write_more() will attempt to add more payload data to the message currently opened for writing in plug. This data will be concatenated to the end of any payload data already provided in prior calls to DC_PLUG_write() or DC_PLUG_write_more().
DC_PLUG_commit() will close the message currently opened for writing, and queue it for serialisation out on the plug object`s connection.
DC_PLUG_rollback() will discard the message currently opened for writing.
RETURN VALUES
All these
<FONT SIZE="-1">
DC_PLUG</FONT>
functions return zero on failure, otherwise non-zero.
SEE ALSO
DC_PLUG_new(2) - Basic
<FONT SIZE="-1">
DC_PLUG</FONT>
functions.
distcache(8) - Overview of the distcache architecture.
http://www.distcache.org/ - Distcache home page.
AUTHOR
This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at
geoff@geoffthorpe.net.
Home Page: http://www.distcache.org
NAME
delete_module - delete a loadable module entry
SYNOPSIS
#include <linux/module.h> int delete_module(const char *name);
DESCRIPTION
delete_module attempts to remove an unused loadable module entry. If
name is
NULL, all unused modules marked auto-clean will be removed. This system call is only open to the superuser.
RETURN VALUE
On success, zero is returned. On error, -1 is returned and
errno is set appropriately.
ERRORS
- EPERM
- The user is not the superuser.
- ENOENT
- No module by that name exists.
- EINVAL
- name was the empty string.
- EBUSY
- The module is in use.
- EFAULT
- name is outside the program`s accessible address space.
SEE ALSO
create_module(2),
init_module(2),
query_module(2).
NAME
dup, dup2 - duplicate a file descriptor
SYNOPSIS
#include <unistd.h> int dup(int oldfd); int dup2(int oldfd, int newfd);
DESCRIPTION
dup and
dup2 create a copy of the file descriptor
oldfd.
After successful return of dup or dup2, the old and new descriptors may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other.
The two descriptors do not share the close-on-exec flag, however.
dup uses the lowest-numbered unused descriptor for the new descriptor.
dup2 makes newfd be the copy of oldfd, closing newfd first if necessary.
RETURN VALUE
dup and
dup2 return the new descriptor, or -1 if an error occurred (in which case,
errno is set appropriately).
ERRORS
- EBADF
- oldfd isn`t an open file descriptor, or newfd is out of the allowed range for file descriptors.
- EMFILE
- The process already has the maximum number of file descriptors open and tried to open a new one.
- EINTR
- The dup2 call was interrupted by a signal.
- EBUSY
- (Linux only) This may be returned by dup2 during a race condition with open() and dup().
WARNING
The error returned by
dup2 is different from that returned by
fcntl(...,
F_DUPFD, ...
) when
newfd is out of range. On some systems
dup2 also sometimes returns
EINVAL like
F_DUPFD.
BUGS
If
newfd was open, any errors that would have been reported at
close() time, are lost. A careful programmer will not use
dup2 without closing
newfd first.
CONFORMING TO
SVr4, SVID, POSIX, X/OPEN, BSD 4.3. SVr4 documents additional EINTR and ENOLINK error conditions. POSIX.1 adds EINTR. The EBUSY return is Linux-specific.
SEE ALSO
fcntl(2),
open(2),
close(2)
NAME
epoll_ctl - control interface for an epoll descriptor
SYNOPSIS
#include <sys/epoll.h> int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)
DESCRIPTION
Control an
epoll descriptor,
epfd, by requesting the operation
op be performed on the target file descriptor,
fd. The
event describes the object linked to the file descriptor
fd. The
struct epoll_event is defined as :
typedef union epoll_data { void *ptr; int fd; __uint32_t u32; __uint64_t u64; } epoll_data_t; struct epoll_event { __uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ };
The events member is a bit set composed using the following available event types :
- EPOLLIN
- The associated file is available for read(2) operations.
- EPOLLOUT
- The associated file is available for write(2) operations.
- EPOLLPRI
- There is urgent data available for read(2) operations.
- EPOLLERR
- Error condition happened on the associated file descriptor.
- EPOLLHUP
- Hang up happened on the associated file descriptor.
- EPOLLET
- Sets the Edge Triggered behaviour for the associated file descriptor. The default behaviour for epoll is Level Triggered. See epoll(4) for more detailed informations about Edge and Level Triggered event distribution architectures.
- EPOLLONESHOT
- Sets the One-Shot behaviour for the associated file descriptor. It means that after an event is pulled out with epoll_wait(2) the associated file descriptor is internally disabled and no other events will be reported by the epoll interface. The user must call epoll_ctl(2) with EPOLL_CTL_MOD to re-enable the file descriptor with a new event mask.
The epoll interface supports all file descriptors that support poll(2). Valid values for the op parameter are :
-
- EPOLL_CTL_ADD
- Add the target file descriptor fd to the epoll descriptor epfd and associate the event event with the internal file linked to fd.
- EPOLL_CTL_MOD
- Change the event event associated to the target file descriptor fd.
- EPOLL_CTL_DEL
- Remove the target file descriptor fd from the epoll file descriptor, epfd.
RETURN VALUE
When successful,
epoll_ctl(2) returns zero. When an error occurs,
epoll_ctl(2) returns -1 and
errno is set appropriately.
ERRORS
- EBADF
- The epfd file descriptor is not a valid file descriptor.
- EPERM
- The target file fd is not supported by epoll.
- EINVAL
- The supplied file descriptor, epfd, is not an epoll file descriptor, or the requested operation op is not supported by this interface.
- ENOMEM
- There was insufficient memory to handle the requested op control operation.
CONFORMING TO
epoll_ctl(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.
SEE ALSO
epoll_create(2),
epoll_wait(2),
epoll(4)
NAME
execve - execute program
SYNOPSIS
#include <unistd.h> int execve(const char *filename, char *const argv [], char *const envp[]);
DESCRIPTION
execve() executes the program pointed to by
filename.
filename must be either a binary executable, or a script starting with a line of the form "
#! interpreter [arg]". In the latter case, the interpreter must be a valid pathname for an executable which is not itself a script, which will be invoked as
interpreter [arg]
filename.
argv is an array of argument strings passed to the new program. envp is an array of strings, conventionally of the form key=value, which are passed as environment to the new program. Both argv and envp must be terminated by a null pointer. The argument vector and environment can be accessed by the called program`s main function, when it is defined as int main(int argc, char *argv[], char *envp[]).
execve() does not return on success, and the text, data, bss, and stack of the calling process are overwritten by that of the program loaded. The program invoked inherits the calling process`s PID, and any open file descriptors that are not set to close on exec. Signals pending on the calling process are cleared. Any signals set to be caught by the calling process are reset to their default behaviour. The SIGCHLD signal (when set to SIG_IGN) may or may not be reset to SIG_DFL.
If the current program is being ptraced, a SIGTRAP is sent to it after a successful execve().
If the set-uid bit is set on the program file pointed to by filename the effective user ID of the calling process is changed to that of the owner of the program file. Similarly, when the set-gid bit of the program file is set the effective group ID of the calling process is set to the group of the program file.
If the executable is an a.out dynamically-linked binary executable containing shared-library stubs, the Linux dynamic linker ld.so(8) is called at the start of execution to bring needed shared libraries into core and link the executable with them.
If the executable is a dynamically-linked ELF executable, the interpreter named in the PT_INTERP segment is used to load the needed shared libraries. This interpreter is typically /lib/ld-linux.so.1 for binaries linked with the Linux libc version 5, or /lib/ld-linux.so.2 for binaries linked with the GNU libc version 2.
RETURN VALUE
On success,
execve() does not return, on error -1 is returned, and
errno is set appropriately.
ERRORS
- EACCES
- The file or a script interpreter is not a regular file.
- EACCES
- Execute permission is denied for the file or a script or ELF interpreter.
- EACCES
- The file system is mounted noexec.
- EPERM
- The file system is mounted nosuid, the user is not the superuser, and the file has an SUID or SGID bit set.
- EPERM
- The process is being traced, the user is not the superuser and the file has an SUID or SGID bit set.
- E2BIG
- The argument list is too big.
- ENOEXEC
- An executable is not in a recognised format, is for the wrong architecture, or has some other format error that means it cannot be executed.
- EFAULT
- filename points outside your accessible address space.
- ENAMETOOLONG
- filename is too long.
- ENOENT
- The file filename or a script or ELF interpreter does not exist, or a shared library needed for file or interpreter cannot be found.
- ENOMEM
- Insufficient kernel memory was available.
- ENOTDIR
- A component of the path prefix of filename or a script or ELF interpreter is not a directory.
- EACCES
- Search permission is denied on a component of the path prefix of filename or the name of a script interpreter.
- ELOOP
- Too many symbolic links were encountered in resolving filename or the name of a script or ELF interpreter.
- ETXTBSY
- Executable was open for writing by one or more processes.
- EIO
- An I/O error occurred.
- ENFILE
- The limit on the total number of files open on the system has been reached.
- EMFILE
- The process has the maximum number of files open.
- EINVAL
- An ELF executable had more than one PT_INTERP segment (i.e., tried to name more than one interpreter).
- EISDIR
- An ELF interpreter was a directory.
- ELIBBAD
- An ELF interpreter was not in a recognised format.
CONFORMING TO
SVr4, SVID, X/OPEN, BSD 4.3. POSIX does not document the #! behavior but is otherwise compatible. SVr4 documents additional error conditions EAGAIN, EINTR, ELIBACC, ENOLINK, EMULTIHOP; POSIX does not document ETXTBSY, EPERM, EFAULT, ELOOP, EIO, ENFILE, EMFILE, EINVAL, EISDIR or ELIBBAD error conditions.
NOTES
SUID and SGID processes can not be
ptrace()d.
Linux ignores the SUID and SGID bits on scripts.
The result of mounting a filesystem nosuid vary between Linux kernel versions: some will refuse execution of SUID/SGID executables when this would give the user powers she did not have already (and return EPERM), some will just ignore the SUID/SGID bits and exec successfully.
A maximum line length of 127 characters is allowed for the first line in a #! executable shell script.
HISTORICAL
With Unix V6 the argument list of an exec call was ended by 0, while the argument list of
main was ended by -1. Thus, this argument list was not directly usable in a further exec call. Since Unix V7 both are NULL.
SEE ALSO
chmod(2),
fork(2),
execl(3),
environ(5),
ld.so(8)
NAME
chdir, fchdir - change working directory
SYNOPSIS
#include <unistd.h> int chdir(const char *path);
int fchdir(int fd);
DESCRIPTION
chdir changes the current directory to that specified in
path.
fchdir is identical to chdir, only that the directory is given as an open file descriptor.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
Depending on the file system, other errors can be returned. The more general errors for
chdir are listed below:
- EFAULT
- path points outside your accessible address space.
- ENAMETOOLONG
- path is too long.
- ENOENT
- The file does not exist.
- ENOMEM
- Insufficient kernel memory was available.
- ENOTDIR
- A component of path is not a directory.
- EACCES
- Search permission is denied on a component of path.
- ELOOP
- Too many symbolic links were encountered in resolving path.
- EIO
- An I/O error occurred.
The general errors for fchdir are listed below:
- EBADF
- fd is not a valid file descriptor.
- EACCES
- Search permission was denied on the directory open on fd.
NOTES
The prototype for
fchdir is only available if
_BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).
CONFORMING TO
The
chdir call is compatible with SVr4, SVID, POSIX, X/OPEN, 4.4BSD. SVr4 documents additional EINTR, ENOLINK, and EMULTIHOP error conditions but has no ENOMEM. POSIX.1 does not have ENOMEM or ELOOP error conditions. X/OPEN does not have EFAULT, ENOMEM or EIO error conditions.
The fchdir call is compatible with SVr4, 4.4BSD and X/OPEN. SVr4 documents additional EIO, EINTR, and ENOLINK error conditions. X/OPEN documents additional EINTR and EIO error conditions.
SEE ALSO
getcwd(3),
chroot(2)
NAME
chown, fchown, lchown - change ownership of a file
SYNOPSIS
#include <sys/types.h> #include <unistd.h> int chown(const char *path, uid_t owner, gid_t group);
int fchown(int fd, uid_t owner, gid_t group);
int lchown(const char *path, uid_t owner, gid_t group);
DESCRIPTION
The owner of the file specified by
path or by
fd is changed. Only the super-user may change the owner of a file. The owner of a file may change the group of the file to any group of which that owner is a member. The super-user may change the group arbitrarily.
If the owner or group is specified as -1, then that ID is not changed.
When the owner or group of an executable file are changed by a non-super-user, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown; the Linux behaviour depends on the kernel version. In case of a non-group-executable file (with clear S_IXGRP bit) the S_ISGID bit indicates mandatory locking, and is not cleared by a chown.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
Depending on the file system, other errors can be returned. The more general errors for
chown are listed below:
- EPERM
- The effective UID does not match the owner of the file, and is not zero; or the owner or group were specified incorrectly.
- EROFS
- The named file resides on a read-only file system.
- EFAULT
- path points outside your accessible address space.
- ENAMETOOLONG
- path is too long.
- ENOENT
- The file does not exist.
- ENOMEM
- Insufficient kernel memory was available.
- ENOTDIR
- A component of the path prefix is not a directory.
- EACCES
- Search permission is denied on a component of the path prefix.
- ELOOP
- Too many symbolic links were encountered in resolving path.
The general errors for fchown are listed below:
- EBADF
- The descriptor is not valid.
- ENOENT
- See above.
- EPERM
- See above.
- EROFS
- See above.
- EIO
- A low-level I/O error occurred while modifying the inode.
NOTES
In versions of Linux prior to 2.1.81 (and distinct from 2.1.46),
chown did not follow symbolic links. Since Linux 2.1.81,
chown does follow symbolic links, and there is a new system call
lchown that does not follow symbolic links. Since Linux 2.1.86, this new call (that has the same semantics as the old
chown) has got the same syscall number, and
chown got the newly introduced number.
The prototype for fchown is only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).
CONFORMING TO
The
chown call conforms to SVr4, SVID, POSIX, X/OPEN. The 4.4BSD version can only be used by the superuser (that is, ordinary users cannot give away files). SVr4 documents EINVAL, EINTR, ENOLINK and EMULTIHOP returns, but no ENOMEM. POSIX.1 does not document ENOMEM or ELOOP error conditions.
The fchown call conforms to 4.4BSD and SVr4. SVr4 documents additional EINVAL, EIO, EINTR, and ENOLINK error conditions.
RESTRICTIONS
The
chown() semantics are deliberately violated on NFS file systems which have UID mapping enabled. Additionally, the semantics of all system calls which access the file contents are violated, because
chown() may cause immediate access revocation on already open files. Client side caching may lead to a delay between the time where ownership have been changed to allow access for a user and the time where the file can actually be accessed by the user on other clients.
SEE ALSO
chmod(2),
flock(2)
NAME
fdatasync - synchronize a file`s in-core data with that on disk
SYNOPSIS
#include <unistd.h> int fdatasync(int fd);
DESCRIPTION
fdatasync flushes all data buffers of a file to disk (before the system call returns). It resembles
fsync but is not required to update the metadata such as access time.
Applications that access databases or log files often write a tiny data fragment (e.g., one line in a log file) and then call fsync immediately in order to ensure that the written data is physically stored on the harddisk. Unfortunately, fsync will always initiate two write operations: one for the newly written data and another one in order to update the modification time stored in the inode. If the modification time is not a part of the transaction concept fdatasync can be used to avoid unnecessary inode disk write operations.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- fd is not a valid file descriptor open for writing.
- EROFS, EINVAL
- fd is bound to a special file which does not support synchronization.
- EIO
- An error occurred during synchronization.
BUGS
Currently (Linux 2.2)
fdatasync is equivalent to
fsync.
AVAILABILITY
On POSIX systems on which
fdatasync is available,
_POSIX_SYNCHRONIZED_IO is defined in <
unistd.h> to a value greater than 0. (See also
sysconf(3).)
CONFORMING TO
POSIX1b (formerly POSIX.4)
SEE ALSO
fsync(2), B.O. Gallmeister, POSIX.4, O`Reilly, pp. 220-223 and 343.
NAME
listxattr, llistxattr, flistxattr - list extended attribute names
SYNOPSIS
#include <sys/types.h> #include <attr/xattr.h> ssize_t listxattr (const char *path, char *list, size_t size); ssize_t llistxattr (const char *path, char *list, size_t size); ssize_t flistxattr (int filedes, char *list, size_t size);
DESCRIPTION
Extended attributes are name:value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the
stat(2) data). A complete overview of extended attributes concepts can be found in
attr(5).
listxattr retrieves the list of extended attribute names associated with the given path in the filesystem. The list is the set of (NULL-terminated) names, one after the other. Names of extended attributes to which the calling process does not have access may be omitted from the list. The length of the attribute name list is returned.
llistxattr is identical to listxattr, except in the case of a symbolic link, where the list of names of extended attributes associated with the link itself is retrieved, not the file that it refers to.
flistxattr is identical to listxattr, only the open file pointed to by filedes (as returned by open(2)) is interrogated in place of path.
A single extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode.
An empty buffer of size zero can be passed into these calls to return the current size of the list of extended attribute names, which can be used to estimate the size of a buffer which is sufficiently large to hold the list of names.
EXAMPLES
The
list of names is returned as an unordered array of NULL-terminated character strings (attribute names are separated by NULL characters), like this:
-
user.name1 system.name1 user.name2
Filesystems like ext2, ext3 and XFS which implement POSIX ACLs using extended attributes, might return a
list like this:
-
system.posix_acl_access system.posix_acl_default
RETURN VALUE
On success, a positive number is returned indicating the size of the extended attribute name list. On failure, -1 is returned and
errno is set appropriately.
If the size of the list buffer is too small to hold the result, errno is set to ERANGE.
If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.
The errors documented for the stat(2) system call are also applicable here.
AUTHORS
Andreas Gruenbacher, <
a.gruenbacher@computer.org> and the SGI XFS development team, <
linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.
SEE ALSO
getfattr(1),
setfattr(1),
open(2),
stat(2),
getxattr(2),
setxattr(2),
removexattr(2), and
attr(5).
NAME
fork - create a child process
SYNOPSIS
#include <sys/types.h> #include <unistd.h> pid_t fork(void);
DESCRIPTION
fork creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited.
Under Linux, fork is implemented using copy-on-write pages, so the only penalty incurred by fork is the time and memory required to duplicate the parent`s page tables, and to create a unique task structure for the child.
RETURN VALUE
On success, the PID of the child process is returned in the parent`s thread of execution, and a 0 is returned in the child`s thread of execution. On failure, a -1 will be returned in the parent`s context, no child process will be created, and
errno will be set appropriately.
ERRORS
- EAGAIN
- fork cannot allocate sufficient memory to copy the parent`s page tables and allocate a task structure for the child.
- ENOMEM
- fork failed to allocate the necessary kernel structures because memory is tight.
CONFORMING TO
The
fork call conforms to SVr4, SVID, POSIX, X/OPEN, BSD 4.3.
SEE ALSO
clone(2),
execve(2),
vfork(2),
wait(2)
NAME
removexattr, lremovexattr, fremovexattr - remove an extended attribute
SYNOPSIS
#include <sys/types.h> #include <attr/xattr.h> int removexattr (const char *path, const char *name); int lremovexattr (const char *path, const char *name); int fremovexattr (int filedes, const char *name);
DESCRIPTION
Extended attributes are
name:value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the
stat(2) data). A complete overview of extended attributes concepts can be found in
attr(5).
removexattr removes the extended attribute identified by name and associated with the given path in the filesystem.
lremovexattr is identical to removexattr, except in the case of a symbolic link, where the extended attribute is removed from the link itself, not the file that it refers to.
fremovexattr is identical to removexattr, only the extended attribute is removed from the open file pointed to by filedes (as returned by open(2)) in place of path.
An extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode.
RETURN VALUE
On success, zero is returned. On failure, -1 is returned and
errno is set appropriately.
If the named attribute does not exist, errno is set to ENOATTR.
If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.
The errors documented for the stat(2) system call are also applicable here.
AUTHORS
Andreas Gruenbacher, <
a.gruenbacher@computer.org> and the SGI XFS development team, <
linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.
SEE ALSO
getfattr(1),
setfattr(1),
open(2),
stat(2),
setxattr(2),
getxattr(2),
listxattr(2), and
attr(5).
NAME
stat, fstat, lstat - get file status
SYNOPSIS
#include <sys/types.h> #include <sys/stat.h> #include <unistd.h> int stat(const char *file_name, struct stat *buf);
int fstat(int filedes, struct stat *buf);
int lstat(const char *file_name, struct stat *buf);
DESCRIPTION
These functions return information about the specified file. You do not need any access rights to the file to get this information but you need search rights to all directories named in the path leading to the file.
stat stats the file pointed to by file_name and fills in buf.
lstat is identical to stat, except in the case of a symbolic link, where the link itself is stat-ed, not the file that it refers to.
fstat is identical to stat, only the open file pointed to by filedes (as returned by open(2)) is stat-ed in place of file_name.
They all return a stat structure, which contains the following fields:
-
struct stat { dev_t st_dev; /* device */ ino_t st_ino; /* inode */ mode_t st_mode; /* protection */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device type (if inode device) */ off_t st_size; /* total size, in bytes */ blksize_t st_blksize; /* blocksize for filesystem I/O */ blkcnt_t st_blocks; /* number of blocks allocated */ time_t st_atime; /* time of last access */ time_t st_mtime; /* time of last modification */ time_t st_ctime; /* time of last status change */ };
The value st_size gives the size of the file (if it is a regular file or a symlink) in bytes. The size of a symlink is the length of the pathname it contains, without trailing NUL.
The value st_blocks gives the size of the file in 512-byte blocks. (This may be smaller than st_size/512 e.g. when the file has holes.) The value st_blksize gives the "preferred" blocksize for efficient file system I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.)
Not all of the Linux filesystems implement all of the time fields. Some file system types allow mounting in such a way that file accesses do not cause an update of the st_atime field. (See `noatime` in mount(8).)
The field st_atime is changed by file accesses, e.g. by execve(2), mknod(2), pipe(2), utime(2) and read(2) (of more than zero bytes). Other routines, like mmap(2), may or may not update st_atime.
The field st_mtime is changed by file modifications, e.g. by mknod(2), truncate(2), utime(2) and write(2) (of more than zero bytes). Moreover, st_mtime of a directory is changed by the creation or deletion of files in that directory. The st_mtime field is not changed for changes in owner, group, hard link count, or mode.
The field st_ctime is changed by writing or by setting inode information (i.e., owner, group, link count, mode, etc.).
The following POSIX macros are defined to check the file type:
-
- S_ISREG(m)
- is it a regular file?
- S_ISDIR(m)
- directory?
- S_ISCHR(m)
- character device?
- S_ISBLK(m)
- block device?
- S_ISFIFO(m)
- fifo?
- S_ISLNK(m)
- symbolic link? (Not in POSIX.1-1996.)
- S_ISSOCK(m)
- socket? (Not in POSIX.1-1996.)
The following flags are defined for the st_mode field:
| S_IFMT | 0170000 | bitmask for the file type bitfields
|
| S_IFSOCK | 0140000 | socket
|
| S_IFLNK | 0120000 | symbolic link
|
| S_IFREG | 0100000 | regular file
|
| S_IFBLK | 0060000 | block device
|
| S_IFDIR | 0040000 | directory
|
| S_IFCHR | 0020000 | character device
|
| S_IFIFO | 0010000 | fifo
|
| S_ISUID | 0004000 | set UID bit
|
| S_ISGID | 0002000 | set GID bit (see below)
|
| S_ISVTX | 0001000 | sticky bit (see below)
|
| S_IRWXU | 00700 | mask for file owner permissions
|
| S_IRUSR | 00400 | owner has read permission
|
| S_IWUSR | 00200 | owner has write permission
|
| S_IXUSR | 00100 | owner has execute permission
|
| S_IRWXG | 00070 | mask for group permissions
|
| S_IRGRP | 00040 | group has read permission
|
| S_IWGRP | 00020 | group has write permission
|
| S_IXGRP | 00010 | group has execute permission
|
| S_IRWXO | 00007 | mask for permissions for others (not in group)
|
| S_IROTH | 00004 | others have read permission
|
| S_IWOTH | 00002 | others have write permisson
|
| S_IXOTH | 00001 | others have execute permission
|
The set GID bit (S_ISGID) has several special uses: For a directory it indicates that BSD semantics is to be used for that directory: files created there inherit their group ID from the directory, not from the effective gid of the creating process, and directories created there will also get the S_ISGID bit set. For a file that does not have the group execution bit (S_IXGRP) set, it indicates mandatory file/record locking. The `sticky` bit (S_ISVTX) on a directory means that a file in that directory can be renamed or deleted only by the owner of the file, by the owner of the directory, and by root.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- filedes is bad.
- ENOENT
- A component of the path file_name does not exist, or the path is an empty string.
- ENOTDIR
- A component of the path is not a directory.
- ELOOP
- Too many symbolic links encountered while traversing the path.
- EFAULT
- Bad address.
- EACCES
- Permission denied.
- ENOMEM
- Out of memory (i.e. kernel memory).
- ENAMETOOLONG
- File name too long.
CONFORMING TO
The
stat and
fstat calls conform to SVr4, SVID, POSIX, X/OPEN, BSD 4.3. The
lstat call conforms to 4.3BSD and SVr4. SVr4 documents additional
fstat error conditions EINTR, ENOLINK, and EOVERFLOW. SVr4 documents additional
stat and
lstat error conditions EACCES, EINTR, EMULTIHOP, ENOLINK, and EOVERFLOW. Use of the
st_blocks and
st_blksize fields may be less portable. (They were introduced in BSD. Are not specified by POSIX. The interpretation differs between systems, and possibly on a single system when NFS mounts are involved.)
POSIX does not describe the S_IFMT, S_IFSOCK, S_IFLNK, S_IFREG, S_IFBLK, S_IFDIR, S_IFCHR, S_IFIFO, S_ISVTX bits, but instead demands the use of the macros S_ISDIR(), etc. The S_ISLNK and S_ISSOCK macros are not in POSIX.1-1996, but both will be in the next POSIX standard; the former is from SVID 4v2, the latter from SUSv2.
Unix V7 (and later systems) had S_IREAD, S_IWRITE, S_IEXEC, where POSIX prescribes the synonyms S_IRUSR, S_IWUSR, S_IXUSR.
OTHER SYSTEMS
Values that have been (or are) in use on various systems:
| hex | name | ls | octal | description
|
| f000 | S_IFMT | | 170000 | mask for file type
|
| 0000 | | | 000000 | SCO out-of-service inode, BSD unknown type
|
| | | | SVID-v2 and XPG2 have both 0 and 0100000 for ordinary file
|
| 1000 | S_IFIFO | p| | 010000 | fifo (named pipe)
|
| 2000 | S_IFCHR | c | 020000 | character special (V7)
|
| 3000 | S_IFMPC | | 030000 | multiplexed character special (V7)
|
| 4000 | S_IFDIR | d/ | 040000 | directory (V7)
|
| 5000 | S_IFNAM | | 050000 | XENIX named special file
|
| | | | with two subtypes, distinguished by st_rdev values 1, 2:
|
| 0001 | S_INSEM | s | 000001 | XENIX semaphore subtype of IFNAM
|
| 0002 | S_INSHD | m | 000002 | XENIX shared data subtype of IFNAM
|
| 6000 | S_IFBLK | b | 060000 | block special (V7)
|
| 7000 | S_IFMPB | | 070000 | multiplexed block special (V7)
|
| 8000 | S_IFREG | - | 100000 | regular (V7)
|
| 9000 | S_IFCMP | | 110000 | VxFS compressed
|
| 9000 | S_IFNWK | n | 110000 | network special (HP-UX)
|
| a000 | S_IFLNK | l@ | 120000 | symbolic link (BSD)
|
| b000 | S_IFSHAD | | 130000 | Solaris shadow inode for ACL (not seen by userspace)
|
| c000 | S_IFSOCK | s= | 140000 | socket (BSD; also "S_IFSOC" on VxFS)
|
| d000 | S_IFDOOR | D> | 150000 | Solaris door
|
| e000 | S_IFWHT | w% | 160000 | BSD whiteout (not used for inode)
|
| | | |
|
| 0200 | S_ISVTX | | 001000 | `sticky bit`: save swapped text even after use (V7)
|
| | | | reserved (SVID-v2)
|
| | | | On non-directories: don`t cache this file (SunOS)
|
| | | | On directories: restricted deletion flag (SVID-v4.2)
|
| 0400 | S_ISGID | | 002000 | set group ID on execution (V7)
|
| | | | for directories: use BSD semantics for propagation of gid
|
| 0400 | S_ENFMT | | 002000 | SysV file locking enforcement (shared w/ S_ISGID)
|
| 0800 | S_ISUID | | 004000 | set user ID on execution (V7)
|
| 0800 | S_CDF | | 004000 | directory is a context dependent file (HP-UX)
|
A sticky command appeared in Version 32V AT&T UNIX.
SEE ALSO
chmod(2),
chown(2),
readlink(2),
utime(2)
NAME
statvfs, fstatvfs - get file system statistics
SYNOPSIS
#include <sys/statvfs.h> int statvfs(const char *path, struct statvfs *buf);
int fstatvfs(int fd, struct statvfs *buf);
DESCRIPTION
The function
statvfs returns information about a mounted file system.
path is the path name of any file within the mounted filesystem.
buf is a pointer to a
statvfs structure defined approximately as follows:
struct statvfs { unsigned long f_bsize; /* file system block size */ unsigned long f_frsize; /* fragment size */ fsblkcnt_t f_blocks; /* size of fs in f_frsize units */ fsblkcnt_t f_bfree; /* # free blocks */ fsblkcnt_t f_bavail; /* # free blocks for non-root */ fsfilcnt_t f_files; /* # inodes */ fsfilcnt_t f_ffree; /* # free inodes */ fsfilcnt_t f_favail; /* # free inodes for non-root */ unsigned long f_fsid; /* file system id */ unsigned long f_flag; /* mount flags */ unsigned long f_namemax; /* maximum filename length */ };
Here the types fsblkcnt_t and fsfilcnt_t are defined in <sys/types.h>. Both used to be unsigned long.
The field f_flag is a bit mask (of mount flags, see mount(8)). Bits defined by POSIX are
- ST_RDONLY
- Read-only file system.
- ST_NOSUID
- Setuid/setgid bits are ignored by exec(2).
It is unspecified whether all members of the returned struct have meaningful values on all filesystems.
fstatvfs returns the same information about an open file referenced by descriptor fd.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- (fstatvfs) fd is not a valid open file descriptor.
- EACCES
- (statvfs) Search permission is denied for a component of the path prefix of path.
- ELOOP
- (statvfs) Too many symbolic links were encountered in translating path.
- ENAMETOOLONG
- (statvfs) path is too long.
- ENOENT
- (statvfs) The file referred to by path does not exist.
- ENOTDIR
- (statvfs) A component of the path prefix of path is not a directory.
- EFAULT
- Buf or path points to an invalid address.
- EINTR
- This call was interrupted by a signal.
- EIO
- An I/O error occurred while reading from the file system.
- ENOMEM
- Insufficient kernel memory was available.
- ENOSYS
- The file system does not support this call.
- EOVERFLOW
- Some values were too large to be represented in the returned struct.
CONFORMING TO
Solaris, Irix, POSIX 1003.1-2001
NOTES
The Linux kernel has system calls statfs, fstatfs to support this library call.
The current glibc implementation of
pathconf(path, _PC_REC_XFER_ALIGN); pathconf(path, _PC_ALLOC_SIZE_MIN); pathconf(path, _PC_REC_MIN_XFER_SIZE);
uses the f_frsize, f_frsize, and f_bsize fields of the return value of statvfs(path,buf).
SEE ALSO
statfs(2)
NAME
truncate, ftruncate - truncate a file to a specified length
SYNOPSIS
#include <unistd.h> #include <sys/types.h> int truncate(const char *path, off_t length);
int ftruncate(int fd, off_t length);
DESCRIPTION
The
truncate and
ftruncate functions cause the regular file named by
path or referenced by
fd to be truncated to a size of precisely
length bytes.
If the file previously was larger than this size, the extra data is lost. If the file previously was shorter, it is extended, and the extended part reads as zero bytes.
The file pointer is not changed.
If the size changed, then the ctime and mtime fields for the file are updated, and suid and sgid mode bits may be cleared.
With ftruncate, the file must be open for writing; with truncate, the file must be writable.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
For
truncate:
- EACCES
- Search permission is denied for a component of the path prefix, or the named file is not writable by the user.
- EFAULT
- Path points outside the process`s allocated address space.
- EFBIG
- The argument length is larger than the maximum file size. (XSI)
- EINTR
- A signal was caught during execution.
- EINVAL
- The argument length is negative or larger than the maximum file size.
- EIO
- An I/O error occurred updating the inode.
- EISDIR
- The named file is a directory.
- ELOOP
- Too many symbolic links were encountered in translating the pathname.
- ENAMETOOLONG
- A component of a pathname exceeded 255 characters, or an entire path name exceeded 1023 characters.
- ENOENT
- The named file does not exist.
- ENOTDIR
- A component of the path prefix is not a directory.
- EROFS
- The named file resides on a read-only file system.
- ETXTBSY
- The file is a pure procedure (shared text) file that is being executed.
For ftruncate the same errors apply, but instead of things that can be wrong with path, we now have things that can be wrong with fd:
- EBADF
- The fd is not a valid descriptor.
- EBADF or EINVAL
- The fd is not open for writing.
- EINVAL
- The fd does not reference a regular file.
CONFORMING TO
4.4BSD, SVr4 (these function calls first appeared in BSD 4.2). POSIX 1003.1-1996 has
ftruncate. POSIX 1003.1-2001 also has
truncate, as an XSI extension.
SVr4 documents additional truncate error conditions EMFILE, EMULTIHP, ENFILE, ENOLINK. SVr4 documents for ftruncate an additional EAGAIN error condition.
NOTES
The above description is for XSI-compliant systems. For non-XSI-compliant systems, the POSIX standard allows two behaviours for
ftruncate when
length exceeds the file length (note that
truncate is not specified at all in such an environment): either returning an error, or extending the file. (Most Unices follow the XSI requirement.)
SEE ALSO
open(2)
NAME
getcontext, setcontext - get or set the user context
SYNOPSIS
#include <ucontext.h> int getcontext(ucontext_t *ucp);
int setcontext(const ucontext_t *ucp);
where:
- ucp
- points to a structure defined in <ucontext.h> containing the signal mask, execution stack, and machine registers.
DESCRIPTION
getcontext(2) gets the current context of the calling process, storing it in the ucontext struct pointed to by
ucp.
setcontext(2) sets the context of the calling process to the state stored in the ucontext struct pointed to by ucp. The struct must either have been created by getcontext(2) or have been passed as the third parameter of the sigaction(2) signal handler.
The ucontext struct created by getcontext(2) is defined in <ucontext.h> as follows:
-
typedef struct ucontext { unsigned long int uc_flags; struct ucontext *uc_link; stack_t uc_stack; mcontext_t uc_mcontext; __sigset_t uc_sigmask; struct _fpstate __fpregs_mem; } ucontext_t;
RETURN VALUES
getcontext(2) returns 0 on success and -1 on failure.
setcontext(2) does not return a value on success and returns -1 on failure.
STANDARDS
These functions comform to: XPG4-UNIX.
NOTES
When a signal handler executes, the current user context is saved and a new context is created by the kernel. If the calling process leaves the signal handler using
longjmp(2), the original context cannot be restored, and the result of future calls to
getcontext(2) are unpredictable. To avoid this problem, use
siglongjmp(2) or
setcontext(2) in signal handlers instead of
longjmp(2).
SEE ALSO
sigaction(2),
sigaltstack(2),
sigprocmask(2),
sigsetjmp(3),
setjmp(3).
NAME
getdomainname, setdomainname - get/set domain name
SYNOPSIS
#include <unistd.h> int getdomainname(char *name, size_t len);
int setdomainname(const char *name, size_t len);
DESCRIPTION
These functions are used to access or to change the domain name of the current processor. If the NUL-terminated domain name requires more than
len bytes,
getdomainname returns the first
len bytes (glibc) or returns an error (libc).
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EINVAL
- For getdomainname under libc: name is NULL or name is longer than len bytes.
- EINVAL
- For setdomainname: len was negative or too large.
- EPERM
- For setdomainname: the caller was not the superuser.
- EFAULT
- For setdomainname: name pointed outside of user address space.
CONFORMING TO
POSIX does not specify these calls.
SEE ALSO
gethostname(2),
sethostname(2),
uname(2)
NAME
getgid, getegid - get group identity
SYNOPSIS
#include <unistd.h> #include <sys/types.h> gid_t getgid(void);
gid_t getegid(void);
DESCRIPTION
getgid returns the real group ID of the current process.
getegid returns the effective group ID of the current process.
The real ID corresponds to the ID of the calling process. The effective ID corresponds to the set ID bit on the file being executed.
ERRORS
These functions are always successful.
CONFORMING TO
POSIX, BSD 4.3
SEE ALSO
setregid(2),
setgid(2)
NAME
getgid, getegid - get group identity
SYNOPSIS
#include <unistd.h> #include <sys/types.h> gid_t getgid(void);
gid_t getegid(void);
DESCRIPTION
getgid returns the real group ID of the current process.
getegid returns the effective group ID of the current process.
The real ID corresponds to the ID of the calling process. The effective ID corresponds to the set ID bit on the file being executed.
ERRORS
These functions are always successful.
CONFORMING TO
POSIX, BSD 4.3
SEE ALSO
setregid(2),
setgid(2)
NAME
gethostid, sethostid - get or set the unique identifier of the current host
SYNOPSIS
#include <unistd.h> long gethostid(void);
int sethostid(long hostid);
DESCRIPTION
Get or set a unique 32-bit identifier for the current machine. The 32-bit identifier is intended to be unique among all UNIX systems in existence. This normally resembles the Internet address for the local machine, as returned by
gethostbyname(3), and thus usually never needs to be set.
The sethostid call is restricted to the superuser.
The hostid argument is stored in the file /etc/hostid.
RETURN VALUE
gethostid returns the 32-bit identifier for the current host as set by
sethostid(2).
CONFORMING TO
4.2BSD. These functions were dropped in 4.4BSD. POSIX.1 does not define these functions, but ISO/IEC 9945-1:1990 mentions them in B.4.4.1. SVr4 includes
gethostid but not
sethostid.
FILES
/etc/hostid SEE ALSO
hostid(1),
gethostbyname(3)
NAME
getitimer, setitimer - get or set value of an interval timer
SYNOPSIS
- #include <sys/time.h>
- int getitimer(int which, struct itimerval *value);
- int setitimer(int which, const struct itimerval *value, struct itimerval *ovalue);
DESCRIPTION
The system provides each process with three interval timers, each decrementing in a distinct time domain. When any timer expires, a signal is sent to the process, and the timer (potentially) restarts. - ITIMER_REAL
- decrements in real time, and delivers SIGALRM upon expiration.
- ITIMER_VIRTUAL
- decrements only when the process is executing, and delivers SIGVTALRM upon expiration.
- ITIMER_PROF
- decrements both when the process executes and when the system is executing on behalf of the process. Coupled with ITIMER_VIRTUAL, this timer is usually used to profile the time spent by the application in user and kernel space. SIGPROF is delivered upon expiration.
Timer values are defined by the following structures:
-
struct itimerval { struct timeval it_interval; /* next value */ struct timeval it_value; /* current value */ }; struct timeval { long tv_sec; /* seconds */ long tv_usec; /* microseconds */ };
The function getitimer fills the structure indicated by value with the current setting for the timer indicated by which (one of ITIMER_REAL, ITIMER_VIRTUAL, or ITIMER_PROF). The element it_value is set to the amount of time remaining on the timer, or zero if the timer is disabled. Similarly, it_interval is set to the reset value. The function setitimer sets the indicated timer to the value in value. If ovalue is nonzero, the old value of the timer is stored there.
Timers decrement from it_value to zero, generate a signal, and reset to it_interval. A timer which is set to zero (it_value is zero or the timer expires and it_interval is zero) stops.
Both tv_sec and tv_usec are significant in determining the duration of a timer.
Timers will never expire before the requested time, instead expiring some short, constant time afterwards, dependent on the system timer resolution (currently 10ms). Upon expiration, a signal will be generated and the timer reset. If the timer expires while the process is active (always true for ITIMER_VIRT) the signal will be delivered immediately when generated. Otherwise the delivery will be offset by a small time dependent on the system loading.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and errno is set appropriately. ERRORS
- EFAULT
- value or ovalue are not valid pointers.
- EINVAL
- which is not one of ITIMER_REAL, ITIMER_VIRT, or ITIMER_PROF.
CONFORMING TO
SVr4, 4.4BSD (This call first appeared in 4.2BSD). SEE ALSO
gettimeofday(2), sigaction(2), signal(2) BUGS
Under Linux, the generation and delivery of a signal are distinct, and there each signal is permitted only one outstanding event. It`s therefore conceivable that under pathologically heavy loading, ITIMER_REAL will expire before the signal from a previous expiration has been delivered. The second signal in such an event will be lost.
NAME
getpeername - get name of connected peer socket
SYNOPSIS
#include <sys/socket.h> int getpeername(int s, struct sockaddr *name, socklen_t *namelen);
DESCRIPTION
Getpeername returns the name of the peer connected to socket
s. The
namelen parameter should be initialized to indicate the amount of space pointed to by
name. On return it contains the actual size of the name returned (in bytes). The name is truncated if the buffer provided is too small.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- The argument s is not a valid descriptor.
- ENOTSOCK
- The argument s is a file, not a socket.
- ENOTCONN
- The socket is not connected.
- ENOBUFS
- Insufficient resources were available in the system to perform the operation.
- EFAULT
- The name parameter points to memory not in a valid part of the process address space.
CONFORMING TO
SVr4, 4.4BSD (the
getpeername function call first appeared in 4.2BSD).
NOTE
The third argument of
getpeername is in reality an `int *` (and this is what BSD 4.* and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t. The draft standard has not been adopted yet, but glibc2 already follows it and also has socklen_t. See also
accept(2).
SEE ALSO
accept(2),
bind(2),
getsockname(2)
NAME
setpgid, getpgid, setpgrp, getpgrp - set/get process group
SYNOPSIS
#include <unistd.h> int setpgid(pid_t pid, pid_t pgid);
pid_t getpgid(pid_t pid);
int setpgrp(void);
pid_t getpgrp(void);
DESCRIPTION
setpgid sets the process group ID of the process specified by
pid to
pgid. If
pid is zero, the process ID of the current process is used. If
pgid is zero, the process ID of the process specified by
pid is used. If
setpgid is used to move a process from one process group to another (as is done by some shells when creating pipelines), both process groups must be part of the same session. In this case, the
pgid specifies an existing process group to be joined and the session ID of that group must match the session ID of the joining process.
getpgid returns the process group ID of the process specified by pid. If pid is zero, the process ID of the current process is used.
The call setpgrp() is equivalent to setpgid(0,0).
Similarly, getpgrp() is equivalent to getpgid(0). Each process group is a member of a session and each process is a member of the session of which its process group is a member.
Process groups are used for distribution of signals, and by terminals to arbitrate requests for their input: Processes that have the same process group as the terminal are foreground and may read, while others will block with a signal if they attempt to read. These calls are thus used by programs such as csh(1) to create process groups in implementing job control. The TIOCGPGRP and TIOCSPGRP calls described in termios(3) are used to get/set the process group of the control terminal.
If a session has a controlling terminal, CLOCAL is not set and a hangup occurs, then the session leader is sent a SIGHUP. If the session leader exits, the SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal.
If the exit of the process causes a process group to become orphaned, and if any member of the newly-orphaned process group is stopped, then a SIGHUP signal followed by a SIGCONT signal will be sent to each process in the newly-orphaned process group.
RETURN VALUE
On success,
setpgid and
setpgrp return zero. On error, -1 is returned, and
errno is set appropriately.
getpgid returns a process group on success. On error, -1 is returned, and errno is set appropriately.
getpgrp always returns the current process group.
ERRORS
- EINVAL
- pgid is less than 0 (setpgid, setpgrp).
- EACCES
- An attempt was made to change the process group ID of one of the children of the calling process and the child had already performed an execve (setpgid, setpgrp).
- EPERM
- An attempt was made to move a process into a process group in a different session, or to change the process group ID of one of the children of the calling process and the child was in a different session, or to change the process group ID of a session leader (setpgid, setpgrp).
- ESRCH
- For getpgid: pid does not match any process. For setpgid: pid is not the current process and not a child of the current process.
CONFORMING TO
The functions
setpgid and
getpgrp conform to POSIX.1. The function
setpgrp is from BSD 4.2. The function
getpgid conforms to SVr4.
NOTES
POSIX took
setpgid from the BSD function
setpgrp. Also SysV has a function with the same name, but it is identical to
setsid(2).
To get the prototypes under glibc, define both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer n larger than or equal to 500.
SEE ALSO
getuid(2),
setsid(2),
tcgetpgrp(3),
tcsetpgrp(3),
termios(3)
NAME
afs_syscall, break, ftime, getpmsg, gtty, lock, mpx, prof, profil, putpmsg, security, stty, ulimit - unimplemented system calls
SYNOPSIS
Unimplemented system calls.
DESCRIPTION
These system calls are not implemented in the Linux 2.4 kernel.
RETURN VALUE
These system calls always return -1 and set
errno to
ENOSYS.
NOTES
Note that
ftime(3),
profil(3) and
ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2) and {create,init,delete}_module(2) only exist when the Linux kernel was built with support for them.
SEE ALSO
obsolete(2)
NAME
getpriority, setpriority - get/set program scheduling priority
SYNOPSIS
#include <sys/time.h> #include <sys/resource.h> int getpriority(int which, int who);
int setpriority(int which, int who, int prio);
DESCRIPTION
The scheduling priority of the process, process group, or user, as indicated by
which and
who is obtained with the
getpriority call and set with the
setpriority call.
Which is one of
PRIO_PROCESS,
PRIO_PGRP, or
PRIO_USER, and
who is interpreted relative to
which (a process identifier for
PRIO_PROCESS, process group identifier for
PRIO_PGRP, and a user ID for
PRIO_USER). A zero value for
who denotes (respectively) the calling process, the process group of the calling process, or the real user ID of the calling process.
Prio is a value in the range -20 to 20 (but see the Notes below). The default priority is 0; lower priorities cause more favorable scheduling.
The getpriority call returns the highest priority (lowest numerical value) enjoyed by any of the specified processes. The setpriority call sets the priorities of all of the specified processes to the specified value. Only the super-user may lower priorities.
RETURN VALUE
Since
getpriority can legitimately return the value -1, it is necessary to clear the external variable
errno prior to the call, then check it afterwards to determine if a -1 is an error or a legitimate value. The
setpriority call returns 0 if there is no error, or -1 if there is.
ERRORS
- ESRCH
- No process was located using the which and who values specified.
- EINVAL
- Which was not one of PRIO_PROCESS, PRIO_PGRP, or PRIO_USER.
In addition to the errors indicated above, setpriority may fail if:
- EPERM
- A process was located, but neither the effective nor the real user ID of the caller matches its effective user ID.
- EACCES
- A non super-user attempted to lower a process priority.
NOTES
The details on the condition for EPERM depend on the system. The above description is what SUSv3 says, and seems to be followed on all SYSV-like systems. Linux requires the real or effective user ID of the caller to match the real user of the process
who (instead of its effective user ID). All BSD-like systems (SunOS 4.1.3, Ultrix 4.2, BSD 4.3, FreeBSD 4.3, OpenBSD-2.5, ...) require the effective user ID of the caller to match the real or effective user ID of the process
who.
The actual priority range varies between kernel versions. Linux before 1.3.36 had -infinity..15. Linux since 1.3.43 has -20..19, and the system call getpriority returns 40..1 for these values (since negative numbers are error codes). The library call converts N into 20-N.
Including <sys/time.h> is not required these days, but increases portability. (Indeed, <sys/resource.h> defines the rusage structure with fields of type struct timeval defined in <sys/time.h>.)
CONFORMING TO
SVr4, 4.4BSD (these function calls first appeared in 4.2BSD).
SEE ALSO
nice(1),
fork(2),
renice(8)
NAME
getresuid, getresgid - get real, effective and saved user or group ID
SYNOPSIS
#define _GNU_SOURCE #include <unistd.h> int getresuid(uid_t *ruid, uid_t *euid, uid_t *suid);
int getresgid(gid_t *rgid, gid_t *egid, gid_t *sgid);
DESCRIPTION
getresuid and
getresgid (both introduced in Linux 2.1.44) get the real, effective and saved user ID`s (resp. group ID`s) of the current process.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EFAULT
- One of the arguments specified an address outside the calling program`s address space.
CONFORMING TO
This call is Linux-specific. The prototype is given by glibc since version 2.3.2 provided _GNU_SOURCE is defined.
SEE ALSO
getuid(2),
setuid(2),
setreuid(2),
setresuid(2)
NAME
getrlimit, getrusage, setrlimit - get/set resource limits and usage
SYNOPSIS
#include <sys/time.h> #include <sys/resource.h> #include <unistd.h> int getrlimit(int resource, struct rlimit *rlim);
int getrusage(int who, struct rusage *usage);
int setrlimit(int resource, const struct rlimit *rlim);
DESCRIPTION
getrlimit and
setrlimit get and set resource limits respectively. Each resource has an associated soft and hard limit, as defined by the
rlimit structure (the
rlim argument to both
getrlimit() and
setrlimit()):
struct rlimit { rlim_t rlim_cur; /* Soft limit */ rlim_t rlim_max; /* Hard limit (ceiling for rlim_cur) */ };
The soft limit is the value that the kernel enforces for the corresponding resource. The hard limit acts as a ceiling for the soft limit: an unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit. A privileged process may make arbitrary changes to either limit value.
The value RLIM_INFINITY denotes no limit on a resource (both in the structure returned by getrlimit() and in the structure passed to setrlimit()).
resource must be one of:
- RLIMIT_AS
- The maximum size of the process`s virtual memory (address space) in bytes. This limit affects calls to brk(2), mmap(2) and mremap(2), which fail with the error ENOMEM upon exceeding this limit. Also automatic stack expansion will fail (and generate a SIGSEGV that kills the process when no alternate stack has been made available). Since the value is a long, on machines with a 32-bit long either this limit is at most 2 GiB, or this resource is unlimited.
- RLIMIT_CORE
- Maximum size of core file. When 0 no core dump files are created. When nonzero, larger dumps are truncated to this size.
- RLIMIT_CPU
- CPU time limit in seconds. When the process reaches the soft limit, it is sent a SIGXCPU signal. The default action for this signal is to terminate the process. However, the signal can be caught, and the handler can return control to the main program. If the process continues to consume CPU time, it will be sent SIGXCPU once per second until the hard limit is reached, at which time it is sent SIGKILL. (This latter point describes Linux 2.2 and 2.4 behaviour. Implementations vary in how they treat processes which continue to consume CPU time after reaching the soft limit. Portable applications that need to catch this signal should perform an orderly termination upon first receipt of SIGXCPU.)
- RLIMIT_DATA
- The maximum size of the process`s data segment (initialized data, uninitialized data, and heap). This limit affects calls to brk() and sbrk(), which fail with the error ENOMEM upon encountering the soft limit of this resource.
- RLIMIT_FSIZE
- The maximum size of files that the process may create. Attempts to extend a file beyond this limit result in delivery of a SIGXFSZ signal. By default, this signal terminates a process, but a process can catch this signal instead, in which case the relevant system call (e.g., write(), truncate()) fails with the error EFBIG.
- RLIMIT_LOCKS
- A limit on the combined number of flock() locks and fcntl() leases that this process may establish. (Early Linux 2.4 only.)
- RLIMIT_MEMLOCK
- The maximum number of bytes of virtual memory that may be locked into RAM using mlock() and mlockall().
- RLIMIT_NOFILE
- Specifies a value one greater than the maximum file descriptor number that can be opened by this process. Attempts (open(), pipe(), dup(), etc.) to exceed this limit yield the error EMFILE.
- RLIMIT_NPROC
- The maximum number of processes that can be created for the real user ID of the calling process. Upon encountering this limit, fork() fails with the error EAGAIN.
- RLIMIT_RSS
- Specifies the limit (in pages) of the process`s resident set (the number of virtual pages resident in RAM). This limit only has effect in Linux 2.4 onwatrds, and there only affects calls to madvise() specifying MADVISE_WILLNEED.
- RLIMIT_STACK
- The maximum size of the process stack, in bytes. Upon reaching this limit, a SIGSEGV signal is generated. To handle this signal, a process must employ an alternate signal stack (sigaltstack(2)).
RLIMIT_OFILE is the BSD name for RLIMIT_NOFILE.
getrusage returns the current resource usages, for a who of either RUSAGE_SELF or RUSAGE_CHILDREN. The former asks for resources used by the current process, the latter for resources used by those of its children that have terminated and have been waited for.
struct rusage { struct timeval ru_utime; /* user time used */ struct timeval ru_stime; /* system time used */ long ru_maxrss; /* maximum resident set size */ long ru_ixrss; /* integral shared memory size */ long ru_idrss; /* integral unshared data size */ long ru_isrss; /* integral unshared stack size */ long ru_minflt; /* page reclaims */ long ru_majflt; /* page faults */ long ru_nswap; /* swaps */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_msgsnd; /* messages sent */ long ru_msgrcv; /* messages received */ long ru_nsignals; /* signals received */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ };
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EFAULT
- rlim or usage points outside the accessible address space.
- EINVAL
- getrlimit or setrlimit is called with a bad resource, or getrusage is called with a bad who.
- EPERM
- A non-superuser tries to use setrlimit() to increase the soft or hard limit above the current hard limit, or a superuser tries to increase RLIMIT_NOFILE above the current kernel maximum.
CONFORMING TO
SVr4, BSD 4.3
NOTE
Including
<sys/time.h> is not required these days, but increases portability. (Indeed,
struct timeval is defined in
<sys/time.h>.)
On Linux, if the disposition of SIGCHLD is set to SIG_IGN then the resource usages of child processes are automatically included in the value returned by RUSAGE_CHILDREN, although POSIX 1003.1-2001 explicitly prohibits this.
The above struct was taken from BSD 4.3 Reno. Not all fields are meaningful under Linux. Right now (Linux 2.4, 2.6) only the fields ru_utime, ru_stime, ru_minflt, ru_majflt, and ru_nswap are maintained.
SEE ALSO
dup(2),
fcntl(2),
fork(2),
mlock(2),
mlockall(2),
mmap(2),
open(2),
quotactl(2),
sbrk(2),
wait3(2),
wait4(2),
malloc(3),
ulimit(3),
signal(7)
NAME
getsockname - get socket name
SYNOPSIS
#include <sys/socket.h> int getsockname(int s, struct sockaddr *name, socklen_t *namelen);
DESCRIPTION
Getsockname returns the current
name for the specified socket. The
namelen parameter should be initialized to indicate the amount of space pointed to by
name. On return it contains the actual size of the name returned (in bytes).
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- The argument s is not a valid descriptor.
- ENOTSOCK
- The argument s is a file, not a socket.
- ENOBUFS
- Insufficient resources were available in the system to perform the operation.
- EFAULT
- The name parameter points to memory not in a valid part of the process address space.
CONFORMING TO
SVr4, 4.4BSD (the
getsockname function call appeared in 4.2BSD). SVr4 documents additional ENOMEM and ENOSR error codes.
NOTE
The third argument of
getsockname is in reality an `int *` (and this is what BSD 4.* and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t. The draft standard has not been adopted yet, but glibc2 already follows it and also has socklen_t. See also
accept(2).
SEE ALSO
bind(2),
socket(2)
NAME
gettid - get thread identification
SYNOPSIS
#include <sys/types.h> #include <linux/unistd.h> _syscall0(pid_t,gettid)
pid_t gettid(void);
DESCRIPTION
gettid returns the thread ID of the current process. This is equal to the process ID (as returned by
getpid(2)), unless the process is part of a thread group (created by specifying the CLONE_THREAD flag to the
clone(2) system call). All processes in the same thread group have the same PID, but each one has a unique TID.
RETURN VALUE
On success, returns the thread ID of the current process.
ERRORS
This call is always successful.
CONFORMING TO
gettid is Linux specific and should not be used in programs that are intended to be portable.
SEE ALSO
getpid(2),
clone(2),
fork(2)
NAME
getuid, geteuid - get user identity
SYNOPSIS
#include <unistd.h> #include <sys/types.h> uid_t getuid(void);
uid_t geteuid(void);
DESCRIPTION
getuid returns the real user ID of the current process.
geteuid returns the effective user ID of the current process.
The real ID corresponds to the ID of the calling process. The effective ID corresponds to the set ID bit on the file being executed.
ERRORS
These functions are always successful.
CONFORMING TO
POSIX, BSD 4.3.
SEE ALSO
setreuid(2),
setuid(2)
NAME
get_kernel_syms - retrieve exported kernel and module symbols
SYNOPSIS
#include <linux/module.h> int get_kernel_syms(struct kernel_sym *table);
DESCRIPTION
If
table is
NULL,
get_kernel_syms returns the number of symbols available for query. Otherwise it fills in a table of structures:
-
struct kernel_sym { unsigned long value; char name[60]; };
The symbols are interspersed with magic symbols of the form #module-name with the kernel having an empty name. The value associated with a symbol of this form is the address at which the module is loaded.
The symbols exported from each module follow their magic module tag and the modules are returned in the reverse order they were loaded.
RETURN VALUE
Returns the number of symbols returned. There is no possible error return.
SEE ALSO
create_module(2),
init_module(2),
delete_module(2),
query_module(2).
BUGS
There is no way to indicate the size of the buffer allocated for
table. If symbols have been added to the kernel since the program queried for the symbol table size, memory will be corrupted.
The length of exported symbol names is limited to 59.
Because of these limitations, this system call is deprecated in favor of query_module.
NAME
afs_syscall, break, ftime, getpmsg, gtty, lock, mpx, prof, profil, putpmsg, security, stty, ulimit - unimplemented system calls
SYNOPSIS
Unimplemented system calls.
DESCRIPTION
These system calls are not implemented in the Linux 2.4 kernel.
RETURN VALUE
These system calls always return -1 and set
errno to
ENOSYS.
NOTES
Note that
ftime(3),
profil(3) and
ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2) and {create,init,delete}_module(2) only exist when the Linux kernel was built with support for them.
SEE ALSO
obsolete(2)
NAME
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O
DESCRIPTION
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
CONFORMING TO
outb and friends are hardware specific. The
port and
value arguments are in the opposite order from most DOS implementations.
SEE ALSO
ioperm(2),
iopl(2)
NAME
init_module - initialize a loadable module entry
SYNOPSIS
#include <linux/module.h> int init_module(const char *name, struct module *image);
DESCRIPTION
init_module loads the relocated module image into kernel space and runs the module`s
init function.
The module image begins with a module structure and is followed by code and data as appropriate. The module structure is defined as follows:
-
struct module { unsigned long size_of_struct; struct module *next; const char *name; unsigned long size; long usecount; unsigned long flags; unsigned int nsyms; unsigned int ndeps; struct module_symbol *syms; struct module_ref *deps; struct module_ref *refs; int (*init)(void); void (*cleanup)(void); const struct exception_table_entry *ex_table_start; const struct exception_table_entry *ex_table_end; #ifdef __alpha__ unsigned long gp; #endif };
All of the pointer fields, with the exception of next and refs, are expected to point within the module body and be initialized as appropriate for kernel space, i.e. relocated with the rest of the module.
This system call is only open to the superuser.
RETURN VALUE
On success, zero is returned. On error, -1 is returned and
errno is set appropriately.
ERRORS
- EPERM
- The user is not the superuser.
- ENOENT
- No module by that name exists.
- EINVAL
- Some image slot filled in incorrectly, image->name does not correspond to the original module name, some image->deps entry does not correspond to a loaded module, or some other similar inconsistency.
- EBUSY
- The module`s initialization routine failed.
- EFAULT
- name or image is outside the program`s accessible address space.
SEE ALSO
create_module(2),
delete_module(2),
query_module(2).
NAME
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O
DESCRIPTION
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
CONFORMING TO
outb and friends are hardware specific. The
port and
value arguments are in the opposite order from most DOS implementations.
SEE ALSO
ioperm(2),
iopl(2)
NAME
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O
DESCRIPTION
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
CONFORMING TO
outb and friends are hardware specific. The
port and
value arguments are in the opposite order from most DOS implementations.
SEE ALSO
ioperm(2),
iopl(2)
NAME
intro, _syscall - Introduction to system calls
DESCRIPTION
This chapter describes the Linux system calls. For a list of the 164 syscalls present in Linux 2.0, see
syscalls(2).
Calling Directly
In most cases, it is unnecessary to invoke a system call directly, but there are times when the Standard C library does not implement a nice function call for you.
Synopsis
#include <linux/unistd.h> A _syscall macro
desired system call
Setup
The important thing to know about a system call is its prototype. You need to know how many arguments, their types, and the function return type. There are six macros that make the actual call into the system easier. They have the form:
- _syscallX(type,name,type1,arg1,type2,arg2,...)
-
-
- where X is 0-5, which are the number of arguments taken by the system call
- type is the return type of the system call
- name is the name of the system call
- typeN is the Nth argument`s type
- argN is the name of the Nth argument
These macros create a function called name with the arguments you specify. Once you include the _syscall() in your source file, you call the system call by name. EXAMPLE
#include <stdio.h> #include <errno.h> #include <linux/unistd.h> /* for _syscallX macros/related stuff */ #include <linux/kernel.h> /* for struct sysinfo */ _syscall1(int, sysinfo, struct sysinfo *, info); /* Note: if you copy directly from the nroff source, remember to REMOVE the extra backslashes in the printf statement. */ int main(void) { struct sysinfo s_info; int error; error = sysinfo(&s_info); printf("code error = %d
", error); printf("Uptime = %lds
Load: 1 min %lu / 5 min %lu / 15 min %lu
" "RAM: total %lu / free %lu / shared %lu
" "Memory in buffers = %lu
Swap: total %lu / free %lu
" "Number of processes = %d
", s_info.uptime, s_info.loads[0], s_info.loads[1], s_info.loads[2], s_info.totalram, s_info.freeram, s_info.sharedram, s_info.bufferram, s_info.totalswap, s_info.freeswap, s_info.procs); return(0); }
Sample Output
code error = 0 uptime = 502034s Load: 1 min 13376 / 5 min 5504 / 15 min 1152 RAM: total 15343616 / free 827392 / shared 8237056 Memory in buffers = 5066752 Swap: total 27881472 / free 24698880 Number of processes = 40
NOTES
The _syscall() macros DO NOT produce a prototype. You may have to create one, especially for C++ users. System calls are not required to return only positive or negative error codes. You need to read the source to be sure how it will return errors. Usually, it is the negative of a standard error code, e.g., -EPERM. The _syscall() macros will return the result r of the system call when r is nonnegative, but will return -1 and set the variable errno to -r when r is negative. For the error codes, see errno(3).
Some system calls, such as mmap, require more than five arguments. These are handled by pushing the arguments on the stack and passing a pointer to the block of arguments.
When defining a system call, the argument types MUST be passed by-value or by-pointer (for aggregates like structs).
The preferred way to invoke system calls that glibc does not know about yet, is via syscall(2).
CONFORMING TO
Certain codes are used to indicate Unix variants and standards to which calls in the section conform. These are: - SVr4
- System V Release 4 Unix, as described in the "Programmer`s Reference Manual: Operating System API (Intel processors)" (Prentice-Hall 1992, ISBN 0-13-951294-2)
- SVID
- System V Interface Definition, as described in "The System V Interface Definition, Fourth Edition".
- POSIX.1
- IEEE 1003.1-1990 part 1, aka ISO/IEC 9945-1:1990s, aka "IEEE Portable Operating System Interface for Computing Environments", as elucidated in Donald Lewine`s "POSIX Programmer`s Guide" (O`Reilly & Associates, Inc., 1991, ISBN 0-937175-73-0.
- POSIX.1b
- IEEE Std 1003.1b-1993 (POSIX.1b standard) describing real-time facilities for portable operating systems, aka ISO/IEC 9945-1:1996, as elucidated in "Programming for the real world - POSIX.4" by Bill O. Gallmeister (O`Reilly & Associates, Inc. ISBN 1-56592-074-0).
- SUS, SUSv2
- Single Unix Specification. (Developed by X/Open and The Open Group. See also http://www.UNIX-systems.org/version2/ .)
- 4.3BSD/4.4BSD
- The 4.3 and 4.4 distributions of Berkeley Unix. 4.4BSD was upward-compatible from 4.3.
- V7
- Version 7, the ancestral Unix from Bell Labs.
FILES
/usr/include/linux/unistd.h SEE ALSO
syscall(2), errno(3)
NAME
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O
DESCRIPTION
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
CONFORMING TO
outb and friends are hardware specific. The
port and
value arguments are in the opposite order from most DOS implementations.
SEE ALSO
ioperm(2),
iopl(2)
NAME
ioperm - set port input/output permissions
SYNOPSIS
#include <unistd.h> /* for libc5 */
#include <sys/io.h> /* for glibc */
int ioperm(unsigned long from, unsigned long num, int turn_on);
DESCRIPTION
Ioperm sets the port access permission bits for the process for
num bytes starting from port address
from to the value
turn_on. The use of
ioperm requires root privileges.
Only the first 0x3ff I/O ports can be specified in this manner. For more ports, the iopl function must be used. Permissions are not inherited on fork, but on exec they are. This is useful for giving port access permissions to non-privileged tasks.
This call is mostly for the i386 architecture. On many other architectures it does not exist or will always return an error.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EINVAL
- Invalid values for from or num.
- EPERM
- Caller does not have the CAP_SYS_RAWIO privileges.
- EIO
- (on ppc) This call is not supported.
CONFORMING TO
ioperm is Linux specific and should not be used in programs intended to be portable.
NOTES
Libc5 treats it as a system call and has a prototype in
<unistd.h>. Glibc1 does not have a prototype. Glibc2 has a prototype both in
<sys/io.h> and in
<sys/perm.h>. Avoid the latter, it is available on i386 only.
SEE ALSO
iopl(2)
NAME
io_cancel - Cancel an outstanding asynchronous I/O operation
SYNOPSIS
#include <linux/aio.h>
-
- long io_cancel (aio_context_t ctx_id, struct iocb *iocb, struct io_event *result);
DESCRIPTION
io_cancel attempts to cancel an asynchronous I/O operation previously submitted with the io_submit system call. ctx_id is the AIO context ID of the operation to be cancelled. If the AIO context is found, the event will be cancelled and then copied into the memory pointed to by result without being placed into the completion queue.
RETURN VALUE
io_cancel returns 0 on success; otherwise, it returns one of the errors listed in the "Errors" section.
ERRORS
- EINVAL
- The AIO context specified by ctx_id is invalid.
- EFAULT
- One of the data structures points to invalid data.
- EAGAIN
- The iocb specified was not cancelled.
- ENOSYS
- io_cancel is not implemented on this architecture.
VERSIONS
The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.
CONFORMING TO
io_cancel is Linux specific and should not be used in programs that are intended to be portable.
SEE ALSO
io_setup(2), io_destroy(2), io_getevents(2), io_submit(2).
NOTES
The asynchronous I/O system calls were written by Benjamin LaHaise.
AUTHOR
Kent Yoder.
NAME
io_getevents - Read asynchronous I/O events from the completion queue
SYNOPSIS
#include <linux/time.h>
#include <linux/aio.h>
-
- long io_getevents (aio_context_t ctx_id, long min_nr, long nr, struct io_event *events, struct timespec *timeout);
DESCRIPTION
io_getevents attempts to read at least min_nr events and up to nr events from the completion queue of the AIO context specified by ctx_id. timeout specifies the amount of time to wait for events, where a NULL timeout waits until at least min_nr events have been seen. Note that timeout is relative and will be updated if not NULL and the operation blocks.
RETURN VALUE
io_getevents returns the number of events read: 0 if no events are available or < min_nr if the timeout has elapsed.
ERRORS
- EINVAL
- ctx_id is invalid. min_nr is out of range or nr is out of range.
- EFAULT
- Either events or timeout is an invalid pointer.
- ENOSYS
- io_getevents is not implemented on this architecture.
CONFORMING TO
io_getevents is Linux specific and should not be used in programs that are intended to be portable.
VERSIONS
The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.
SEE ALSO
io_setup(2), io_submit(2), io_getevents(2), io_cancel(2), io_destroy(2).
NOTES
The asynchronous I/O system calls were written by Benjamin LaHaise.
AUTHOR
Kent Yoder.
NAME
io_submit - Submit asynchronous I/O blocks for processing
SYNOPSIS
#include <linux/aio.h>
-
- long io_submit (aio_context_t ctx_id, long nr, struct iocb **iocbpp);
DESCRIPTION
io_submit queues nr I/O request blocks for processing in the AIO context ctx_id. iocbpp should be an array of nr AIO request blocks, which will be submitted to context ctx_id.
RETURN VALUE
io_submit returns the number of iocbs submitted and 0 if nr is zero.
ERRORS
- EINVAL
- The aio_context specified by ctx_id is invalid. nr is less than 0. The iocb at *iocbpp[0] is not properly initialized, or the operation specified is invalid for the file descriptor in the iocb.
- EFAULT
- One of the data structures points to invalid data.
- EBADF
- The file descriptor specified in the first iocb is invalid.
- EAGAIN
- Insufficient resources are available to queue any iocbs.
- ENOSYS
- io_submit is not implemented on this architecture.
CONFORMING TO
io_submit is Linux specific and should not be used in programs that are intended to be portable.
VERSIONS
The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.
SEE ALSO
io_setup(2), io_destroy(2), io_getevents(2), io_cancel(2).
NOTES
The asynchronous I/O system calls were written by Benjamin LaHaise.
AUTHOR
Kent Yoder.
NAME
kill - send signal to a process
SYNOPSIS
#include <sys/types.h>
#include <signal.h> int kill(pid_t pid, int sig);
DESCRIPTION
The
kill system call can be used to send any signal to any process group or process.
If pid is positive, then signal sig is sent to pid.
If pid equals 0, then sig is sent to every process in the process group of the current process.
If pid equals -1, then sig is sent to every process except for process 1 (init), but see below.
If pid is less than -1, then sig is sent to every process in the process group -pid.
If sig is 0, then no signal is sent, but error checking is still performed.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EINVAL
- An invalid signal was specified.
- ESRCH
- The pid or process group does not exist. Note that an existing process might be a zombie, a process which already committed termination, but has not yet been wait()ed for.
- EPERM
- The process does not have permission to send the signal to any of the receiving processes. For a process to have permission to send a signal to process pid it must either have root privileges, or the real or effective user ID of the sending process must equal the real or saved set-user-ID of the receiving process. In the case of SIGCONT it suffices when the sending and receiving processes belong to the same session.
NOTES
It is impossible to send a signal to task number one, the init process, for which it has not installed a signal handler. This is done to assure the system is not brought down accidentally.
POSIX 1003.1-2001 requires that kill(-1,sig) send sig to all processes that the current process may send signals to, except possibly for some implementation-defined system processes. Linux allows a process to signal itself, but on Linux the call kill(-1,sig) does not signal the current process.
LINUX HISTORY
Across different kernel versions, Linux has enforced different rules for the permissions required for an unprivileged process to send a signal to another process. In kernels 1.0 to 1.2.2, a signal could be sent if the effective user ID of the sender matched that of the receiver, or the real user ID of the sender matched that of the receiver. From kernel 1.2.3 until 1.3.77, a signal could be sent if the effective user ID of the sender matched either the real or effective user ID of the receiver. The current rules, which conform to POSIX 1003.1-2001, were adopted in kernel 1.3.78.
CONFORMING TO
SVr4, SVID, POSIX.1, X/OPEN, BSD 4.3, POSIX 1003.1-2001
SEE ALSO
_exit(2),
killpg(2),
signal(2),
tkill(2),
exit(3),
signal(7)
NAME
chown, fchown, lchown - change ownership of a file
SYNOPSIS
#include <sys/types.h> #include <unistd.h> int chown(const char *path, uid_t owner, gid_t group);
int fchown(int fd, uid_t owner, gid_t group);
int lchown(const char *path, uid_t owner, gid_t group);
DESCRIPTION
The owner of the file specified by
path or by
fd is changed. Only the super-user may change the owner of a file. The owner of a file may change the group of the file to any group of which that owner is a member. The super-user may change the group arbitrarily.
If the owner or group is specified as -1, then that ID is not changed.
When the owner or group of an executable file are changed by a non-super-user, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown; the Linux behaviour depends on the kernel version. In case of a non-group-executable file (with clear S_IXGRP bit) the S_ISGID bit indicates mandatory locking, and is not cleared by a chown.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
Depending on the file system, other errors can be returned. The more general errors for
chown are listed below:
- EPERM
- The effective UID does not match the owner of the file, and is not zero; or the owner or group were specified incorrectly.
- EROFS
- The named file resides on a read-only file system.
- EFAULT
- path points outside your accessible address space.
- ENAMETOOLONG
- path is too long.
- ENOENT
- The file does not exist.
- ENOMEM
- Insufficient kernel memory was available.
- ENOTDIR
- A component of the path prefix is not a directory.
- EACCES
- Search permission is denied on a component of the path prefix.
- ELOOP
- Too many symbolic links were encountered in resolving path.
The general errors for fchown are listed below:
- EBADF
- The descriptor is not valid.
- ENOENT
- See above.
- EPERM
- See above.
- EROFS
- See above.
- EIO
- A low-level I/O error occurred while modifying the inode.
NOTES
In versions of Linux prior to 2.1.81 (and distinct from 2.1.46),
chown did not follow symbolic links. Since Linux 2.1.81,
chown does follow symbolic links, and there is a new system call
lchown that does not follow symbolic links. Since Linux 2.1.86, this new call (that has the same semantics as the old
chown) has got the same syscall number, and
chown got the newly introduced number.
The prototype for fchown is only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).
CONFORMING TO
The
chown call conforms to SVr4, SVID, POSIX, X/OPEN. The 4.4BSD version can only be used by the superuser (that is, ordinary users cannot give away files). SVr4 documents EINVAL, EINTR, ENOLINK and EMULTIHOP returns, but no ENOMEM. POSIX.1 does not document ENOMEM or ELOOP error conditions.
The fchown call conforms to 4.4BSD and SVr4. SVr4 documents additional EINVAL, EIO, EINTR, and ENOLINK error conditions.
RESTRICTIONS
The
chown() semantics are deliberately violated on NFS file systems which have UID mapping enabled. Additionally, the semantics of all system calls which access the file contents are violated, because
chown() may cause immediate access revocation on already open files. Client side caching may lead to a delay between the time where ownership have been changed to allow access for a user and the time where the file can actually be accessed by the user on other clients.
SEE ALSO
chmod(2),
flock(2)
NAME
link - make a new name for a file
SYNOPSIS
#include <unistd.h> int link(const char *oldpath, const char *newpath);
DESCRIPTION
link creates a new link (also known as a hard link) to an existing file.
If newpath exists it will not be overwritten.
This new name may be used exactly as the old one for any operation; both names refer to the same file (and so have the same permissions and ownership) and it is impossible to tell which name was the `original`.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EXDEV
- oldpath and newpath are not on the same filesystem.
- EPERM
- The filesystem containing oldpath and newpath does not support the creation of hard links.
- EFAULT
- oldpath or newpath points outside your accessible address space.
- EACCES
- Write access to the directory containing newpath is not allowed for the process`s effective uid, or one of the directories in oldpath or newpath did not allow search (execute) permission.
- ENAMETOOLONG
- oldpath or newpath was too long.
- ENOENT
- A directory component in oldpath or newpath does not exist or is a dangling symbolic link.
- ENOTDIR
- A component used as a directory in oldpath or newpath is not, in fact, a directory.
- ENOMEM
- Insufficient kernel memory was available.
- EROFS
- The file is on a read-only filesystem.
- EEXIST
- newpath already exists.
- EMLINK
- The file referred to by oldpath already has the maximum number of links to it.
- ELOOP
- Too many symbolic links were encountered in resolving oldpath or newpath.
- ENOSPC
- The device containing the file has no room for the new directory entry.
- EPERM
- oldpath is a directory.
- EIO
- An I/O error occurred.
NOTES
Hard links, as created by
link, cannot span filesystems. Use
symlink if this is required.
CONFORMING TO
SVr4, SVID, POSIX, BSD 4.3, X/OPEN. SVr4 documents additional ENOLINK and EMULTIHOP error conditions; POSIX.1 does not document ELOOP. X/OPEN does not document EFAULT, ENOMEM or EIO.
BUGS
On NFS file systems, the return code may be wrong in case the NFS server performs the link creation and dies before it can say so. Use
stat(2) to find out if the link got created.
SEE ALSO
symlink(2),
unlink(2),
rename(2),
open(2),
stat(2),
ln(1)
NAME
listxattr, llistxattr, flistxattr - list extended attribute names
SYNOPSIS
#include <sys/types.h> #include <attr/xattr.h> ssize_t listxattr (const char *path, char *list, size_t size); ssize_t llistxattr (const char *path, char *list, size_t size); ssize_t flistxattr (int filedes, char *list, size_t size);
DESCRIPTION
Extended attributes are name:value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the
stat(2) data). A complete overview of extended attributes concepts can be found in
attr(5).
listxattr retrieves the list of extended attribute names associated with the given path in the filesystem. The list is the set of (NULL-terminated) names, one after the other. Names of extended attributes to which the calling process does not have access may be omitted from the list. The length of the attribute name list is returned.
llistxattr is identical to listxattr, except in the case of a symbolic link, where the list of names of extended attributes associated with the link itself is retrieved, not the file that it refers to.
flistxattr is identical to listxattr, only the open file pointed to by filedes (as returned by open(2)) is interrogated in place of path.
A single extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode.
An empty buffer of size zero can be passed into these calls to return the current size of the list of extended attribute names, which can be used to estimate the size of a buffer which is sufficiently large to hold the list of names.
EXAMPLES
The
list of names is returned as an unordered array of NULL-terminated character strings (attribute names are separated by NULL characters), like this:
-
user.name1 system.name1 user.name2
Filesystems like ext2, ext3 and XFS which implement POSIX ACLs using extended attributes, might return a
list like this:
-
system.posix_acl_access system.posix_acl_default
RETURN VALUE
On success, a positive number is returned indicating the size of the extended attribute name list. On failure, -1 is returned and
errno is set appropriately.
If the size of the list buffer is too small to hold the result, errno is set to ERANGE.
If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.
The errors documented for the stat(2) system call are also applicable here.
AUTHORS
Andreas Gruenbacher, <
a.gruenbacher@computer.org> and the SGI XFS development team, <
linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.
SEE ALSO
getfattr(1),
setfattr(1),
open(2),
stat(2),
getxattr(2),
setxattr(2),
removexattr(2), and
attr(5).
NAME
_llseek - reposition read/write file offset
SYNOPSIS
#include <unistd.h> #include <linux/unistd.h>
_syscall5(int, _llseek, uint, fd, ulong, hi, ulong, lo, loff_t *, res, uint, wh);
int _llseek(unsigned int fd, unsigned long offset_high, unsigned long offset_low, loff_t *result, unsigned int whence);
DESCRIPTION
The
_llseek function repositions the offset of the file descriptor
fd to
(offset_high<<32) | offset_low bytes relative to the beginning of the file, the current position in the file, or the end of the file, depending on whether
whence is
SEEK_SET,
SEEK_CUR, or
SEEK_END, respectively. It returns the resulting file position in the argument
result.
RETURN VALUE
Upon successful completion,
_llseek returns 0. Otherwise, a value of -1 is returned and
errno is set to indicate the error.
ERRORS
- EBADF
- fd is not an open file descriptor.
- EINVAL
- whence is invalid.
- EFAULT
- Problem with copying results to user space.
CONFORMING TO
This function is Linux-specific, and should not be used in programs intended to be portable.
SEE ALSO
lseek(2)
NAME
lookup_dcookie - return a directory entry`s path
SYNOPSIS
int lookup_dcookie(u64 cookie, char * buffer, size_t len);
DESCRIPTION
Look up the full path of the directory entry specified by the value
cookie The cookie is an opaque identifier uniquely identifying a particular directory entry. The buffer given is filled in with the full path of the directory entry.
For lookup_dcookie to return successfully, the kernel must still hold a cookie reference to the directory entry.
NOTES
lookup_dcookie is a special-purpose system call, currently used only by the oprofile profiler. It relies on a kernel driver to register cookies for directory entries.
The path returned may be suffixed by the string " (deleted)" if the directory entry has been removed.
RETURN VALUE
On success,
lookup_dcookie returns the length of the path string copied into the buffer. On error, -1 is returned, and
errno is set appropriately.
ERRORS
EPERM The process does not have the capability to look up cookie values.
- EINVAL
- The kernel has no registered cookie/directory entry mappings at the time of lookup, or the cookie does not refer to a valid directory entry.
- ENOMEM
- The kernel could not allocate memory for the temporary buffer holding the path.
- ERANGE
- The buffer was not large enough to hold the path of the directory entry.
- ENAMETOOLONG
- The name could not fit in the buffer.
- EFAULT
- The buffer was not valid.
CONFORMING TO
lookup_dcookie is Linux-specific.
AVAILABILITY
Since Linux 2.5.43. The ENAMETOOLONG error return was added in 2.5.70.
NAME
lseek - reposition read/write file offset
SYNOPSIS
#include <sys/types.h> #include <unistd.h> off_t lseek(int fildes, off_t offset, int whence);
DESCRIPTION
The
lseek function repositions the offset of the file descriptor
fildes to the argument
offset according to the directive
whence as follows:
- SEEK_SET
- The offset is set to offset bytes.
- SEEK_CUR
- The offset is set to its current location plus offset bytes.
- SEEK_END
- The offset is set to the size of the file plus offset bytes.
The lseek function allows the file offset to be set beyond the end of the existing end-of-file of the file (but this does not change the size of the file). If data is later written at this point, subsequent reads of the data in the gap return bytes of zeros (until data is actually written into the gap).
RETURN VALUE
Upon successful completion,
lseek returns the resulting offset location as measured in bytes from the beginning of the file. Otherwise, a value of (off_t)-1 is returned and
errno is set to indicate the error.
ERRORS
- EBADF
- fildes is not an open file descriptor.
- ESPIPE
- fildes is associated with a pipe, socket, or FIFO.
- EINVAL
- whence is not one of SEEK_SET, SEEK_CUR, SEEK_END, or the resulting file offset would be negative.
- EOVERFLOW
- The resulting file offset cannot be represented in an off_t.
CONFORMING TO
SVr4, POSIX, BSD 4.3
RESTRICTIONS
Some devices are incapable of seeking and POSIX does not specify which devices must support it.
Linux specific restrictions: using lseek on a tty device returns ESPIPE.
NOTES
This document`s use of
whence is incorrect English, but maintained for historical reasons.
When converting old code, substitute values for whence with the following macros:
| old | new
|
| 0 | SEEK_SET
|
| 1 | SEEK_CUR
|
| 2 | SEEK_END
|
| L_SET | SEEK_SET
|
| L_INCR | SEEK_CUR
|
| L_XTND | SEEK_END
|
SVR1-3 returns long instead of off_t, BSD returns int.
Note that file descriptors created by dup(2) or fork(2) share the current file position pointer, so seeking on such files may be subject to race conditions.
SEE ALSO
dup(2),
fork(2),
open(2),
fseek(3)
NAME
stat, fstat, lstat - get file status
SYNOPSIS
#include <sys/types.h> #include <sys/stat.h> #include <unistd.h> int stat(const char *file_name, struct stat *buf);
int fstat(int filedes, struct stat *buf);
int lstat(const char *file_name, struct stat *buf);
DESCRIPTION
These functions return information about the specified file. You do not need any access rights to the file to get this information but you need search rights to all directories named in the path leading to the file.
stat stats the file pointed to by file_name and fills in buf.
lstat is identical to stat, except in the case of a symbolic link, where the link itself is stat-ed, not the file that it refers to.
fstat is identical to stat, only the open file pointed to by filedes (as returned by open(2)) is stat-ed in place of file_name.
They all return a stat structure, which contains the following fields:
-
struct stat { dev_t st_dev; /* device */ ino_t st_ino; /* inode */ mode_t st_mode; /* protection */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device type (if inode device) */ off_t st_size; /* total size, in bytes */ blksize_t st_blksize; /* blocksize for filesystem I/O */ blkcnt_t st_blocks; /* number of blocks allocated */ time_t st_atime; /* time of last access */ time_t st_mtime; /* time of last modification */ time_t st_ctime; /* time of last status change */ };
The value st_size gives the size of the file (if it is a regular file or a symlink) in bytes. The size of a symlink is the length of the pathname it contains, without trailing NUL.
The value st_blocks gives the size of the file in 512-byte blocks. (This may be smaller than st_size/512 e.g. when the file has holes.) The value st_blksize gives the "preferred" blocksize for efficient file system I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.)
Not all of the Linux filesystems implement all of the time fields. Some file system types allow mounting in such a way that file accesses do not cause an update of the st_atime field. (See `noatime` in mount(8).)
The field st_atime is changed by file accesses, e.g. by execve(2), mknod(2), pipe(2), utime(2) and read(2) (of more than zero bytes). Other routines, like mmap(2), may or may not update st_atime.
The field st_mtime is changed by file modifications, e.g. by mknod(2), truncate(2), utime(2) and write(2) (of more than zero bytes). Moreover, st_mtime of a directory is changed by the creation or deletion of files in that directory. The st_mtime field is not changed for changes in owner, group, hard link count, or mode.
The field st_ctime is changed by writing or by setting inode information (i.e., owner, group, link count, mode, etc.).
The following POSIX macros are defined to check the file type:
-
- S_ISREG(m)
- is it a regular file?
- S_ISDIR(m)
- directory?
- S_ISCHR(m)
- character device?
- S_ISBLK(m)
- block device?
- S_ISFIFO(m)
- fifo?
- S_ISLNK(m)
- symbolic link? (Not in POSIX.1-1996.)
- S_ISSOCK(m)
- socket? (Not in POSIX.1-1996.)
The following flags are defined for the st_mode field:
| S_IFMT | 0170000 | bitmask for the file type bitfields
|
| S_IFSOCK | 0140000 | socket
|
| S_IFLNK | 0120000 | symbolic link
|
| S_IFREG | 0100000 | regular file
|
| S_IFBLK | 0060000 | block device
|
| S_IFDIR | 0040000 | directory
|
| S_IFCHR | 0020000 | character device
|
| S_IFIFO | 0010000 | fifo
|
| S_ISUID | 0004000 | set UID bit
|
| S_ISGID | 0002000 | set GID bit (see below)
|
| S_ISVTX | 0001000 | sticky bit (see below)
|
| S_IRWXU | 00700 | mask for file owner permissions
|
| S_IRUSR | 00400 | owner has read permission
|
| S_IWUSR | 00200 | owner has write permission
|
| S_IXUSR | 00100 | owner has execute permission
|
| S_IRWXG | 00070 | mask for group permissions
|
| S_IRGRP | 00040 | group has read permission
|
| S_IWGRP | 00020 | group has write permission
|
| S_IXGRP | 00010 | group has execute permission
|
| S_IRWXO | 00007 | mask for permissions for others (not in group)
|
| S_IROTH | 00004 | others have read permission
|
| S_IWOTH | 00002 | others have write permisson
|
| S_IXOTH | 00001 | others have execute permission
|
The set GID bit (S_ISGID) has several special uses: For a directory it indicates that BSD semantics is to be used for that directory: files created there inherit their group ID from the directory, not from the effective gid of the creating process, and directories created there will also get the S_ISGID bit set. For a file that does not have the group execution bit (S_IXGRP) set, it indicates mandatory file/record locking. The `sticky` bit (S_ISVTX) on a directory means that a file in that directory can be renamed or deleted only by the owner of the file, by the owner of the directory, and by root.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- filedes is bad.
- ENOENT
- A component of the path file_name does not exist, or the path is an empty string.
- ENOTDIR
- A component of the path is not a directory.
- ELOOP
- Too many symbolic links encountered while traversing the path.
- EFAULT
- Bad address.
- EACCES
- Permission denied.
- ENOMEM
- Out of memory (i.e. kernel memory).
- ENAMETOOLONG
- File name too long.
CONFORMING TO
The
stat and
fstat calls conform to SVr4, SVID, POSIX, X/OPEN, BSD 4.3. The
lstat call conforms to 4.3BSD and SVr4. SVr4 documents additional
fstat error conditions EINTR, ENOLINK, and EOVERFLOW. SVr4 documents additional
stat and
lstat error conditions EACCES, EINTR, EMULTIHOP, ENOLINK, and EOVERFLOW. Use of the
st_blocks and
st_blksize fields may be less portable. (They were introduced in BSD. Are not specified by POSIX. The interpretation differs between systems, and possibly on a single system when NFS mounts are involved.)
POSIX does not describe the S_IFMT, S_IFSOCK, S_IFLNK, S_IFREG, S_IFBLK, S_IFDIR, S_IFCHR, S_IFIFO, S_ISVTX bits, but instead demands the use of the macros S_ISDIR(), etc. The S_ISLNK and S_ISSOCK macros are not in POSIX.1-1996, but both will be in the next POSIX standard; the former is from SVID 4v2, the latter from SUSv2.
Unix V7 (and later systems) had S_IREAD, S_IWRITE, S_IEXEC, where POSIX prescribes the synonyms S_IRUSR, S_IWUSR, S_IXUSR.
OTHER SYSTEMS
Values that have been (or are) in use on various systems:
| hex | name | ls | octal | description
|
| f000 | S_IFMT | | 170000 | mask for file type
|
| 0000 | | | 000000 | SCO out-of-service inode, BSD unknown type
|
| | | | SVID-v2 and XPG2 have both 0 and 0100000 for ordinary file
|
| 1000 | S_IFIFO | p| | 010000 | fifo (named pipe)
|
| 2000 | S_IFCHR | c | 020000 | character special (V7)
|
| 3000 | S_IFMPC | | 030000 | multiplexed character special (V7)
|
| 4000 | S_IFDIR | d/ | 040000 | directory (V7)
|
| 5000 | S_IFNAM | | 050000 | XENIX named special file
|
| | | | with two subtypes, distinguished by st_rdev values 1, 2:
|
| 0001 | S_INSEM | s | 000001 | XENIX semaphore subtype of IFNAM
|
| 0002 | S_INSHD | m | 000002 | XENIX shared data subtype of IFNAM
|
| 6000 | S_IFBLK | b | 060000 | block special (V7)
|
| 7000 | S_IFMPB | | 070000 | multiplexed block special (V7)
|
| 8000 | S_IFREG | - | 100000 | regular (V7)
|
| 9000 | S_IFCMP | | 110000 | VxFS compressed
|
| 9000 | S_IFNWK | n | 110000 | network special (HP-UX)
|
| a000 | S_IFLNK | l@ | 120000 | symbolic link (BSD)
|
| b000 | S_IFSHAD | | 130000 | Solaris shadow inode for ACL (not seen by userspace)
|
| c000 | S_IFSOCK | s= | 140000 | socket (BSD; also "S_IFSOC" on VxFS)
|
| d000 | S_IFDOOR | D> | 150000 | Solaris door
|
| e000 | S_IFWHT | w% | 160000 | BSD whiteout (not used for inode)
|
| | | |
|
| 0200 | S_ISVTX | | 001000 | `sticky bit`: save swapped text even after use (V7)
|
| | | | reserved (SVID-v2)
|
| | | | On non-directories: don`t cache this file (SunOS)
|
| | | | On directories: restricted deletion flag (SVID-v4.2)
|
| 0400 | S_ISGID | | 002000 | set group ID on execution (V7)
|
| | | | for directories: use BSD semantics for propagation of gid
|
| 0400 | S_ENFMT | | 002000 | SysV file locking enforcement (shared w/ S_ISGID)
|
| 0800 | S_ISUID | | 004000 | set user ID on execution (V7)
|
| 0800 | S_CDF | | 004000 | directory is a context dependent file (HP-UX)
|
A sticky command appeared in Version 32V AT&T UNIX.
SEE ALSO
chmod(2),
chown(2),
readlink(2),
utime(2)
NAME
mbind - Set memory policy for an memory range
SYNOPSIS
#include <numaif.h> int mbind(void *start, unsigned long len, int policy, unsigned long *nodemask, unsigned long maxnode, unsigned flags)
DESCRIPTION
mbind sets the NUMA memory
policy for the memory range starting with
start and length
len. The memory of a NUMA machine is divided into multiple nodes. The memory policy defines in which node memory is allocated.
mbind has only an effect for new allocations; when the pages inside the range have been already touched before setting the policy the policy has no effect.
Available policies are MPOL_DEFAULT, MPOL_BIND, MPOL_INTERLEAVE, MPOL_PREFERRED. All policies except MPOL_DEFAULT require to specify the nodes they apply to in the nodemask parameter. nodemask is a bit field of nodes that contains upto maxnode bits. The node mask bit field size is rounded to the next multiple of sizeof(unsigned long), but the kernel will only use bits upto maxnode.
When MPOL_MF_STRICT is passed in the flags parameter EIO will be returned when the existing pages in the mapping don`t follow the policy.
The MPOL_DEFAULT policy is the default and means to use the underlying process policy (which can be modified with set_mempolicy(2) ). Unless the process policy has been changed this means to allocate memory on the node of the CPU that triggered the allocation. nodemask should be passed as NULL.
The MPOL_BIND policy is a strict policy that restricts memory allocation to the nodes specified in nodemask. There won`t be allocations on other nodes.
MPOL_INTERLEAVE interleaves allocations to the nodes specified in nodemask. This optimizes for bandwidth instead of latency. To be effective the memory area should be fairly large, at least 1MB or bigger.
MPOL_PREFERRED sets the preferred node for allocation. The kernel will try to allocate in this node first and fall back to other nodes when the preferred nodes is low on free memory. Only the first node in the nodemask is used. When no node is set in the mask the current node is used for allocation.
RETURN VALUE
mbind returns -1 when an error occurred, otherwise 0.
ERRORS
- EFAULT
- There was a unmapped hole in the specified memory range or an passed pointer was not valid.
- EINVAL
- An illegal parameter was passed.
- ENOMEM
- System out of memory
- EIO
- MPOL_F_STRICT was specified and an existing page was already on an wrong node.
NOTES
For a higher level interface it is recommended to use the functions in
numa(3). Until glibc supports these system calls you can link with -lnuma to get system call definitions.
MPOL_MF_STRICT is ignored on huge page mappings right now. For preferred and interleave mappings it will only accept the first choice node.
For MPOL_INTERLEAVE mode the interleaving is changed at fault time. The final layout of the pages depends on the order they were faulted in first.
SEE ALSO
numa(3), numactl(8), set_mempolicy(2), get_mempolicy(2), mmap(2)
NAME
mkdir - create a directory
SYNOPSIS
#include <sys/stat.h> #include <sys/types.h> int mkdir(const char *pathname, mode_t mode);
DESCRIPTION
mkdir attempts to create a directory named
pathname.
The parameter mode specifies the permissions to use. It is modified by the process`s umask in the usual way: the permissions of the created directory are (mode & ~umask & 0777). Other mode bits of the created directory depend on the operating system. For Linux, see below.
The newly created directory will be owned by the effective uid of the process. If the directory containing the file has the set group id bit set, or if the filesystem is mounted with BSD group semantics, the new directory will inherit the group ownership from its parent; otherwise it will be owned by the effective gid of the process.
If the parent directory has the set group id bit set then so will the newly created directory.
RETURN VALUE
mkdir returns zero on success, or -1 if an error occurred (in which case,
errno is set appropriately).
ERRORS
- EPERM
- The filesystem containing pathname does not support the creation of directories.
- EEXIST
- pathname already exists (not necessarily as a directory). This includes the case where pathname is a symbolic link, dangling or not.
- EFAULT
- pathname points outside your accessible address space.
- EACCES
- The parent directory does not allow write permission to the process, or one of the directories in pathname did not allow search (execute) permission.
- ENAMETOOLONG
- pathname was too long.
- ENOENT
- A directory component in pathname does not exist or is a dangling symbolic link.
- ENOTDIR
- A component used as a directory in pathname is not, in fact, a directory.
- ENOMEM
- Insufficient kernel memory was available.
- EROFS
- pathname refers to a file on a read-only filesystem.
- ELOOP
- Too many symbolic links were encountered in resolving pathname.
- ENOSPC
- The device containing pathname has no room for the new directory.
- ENOSPC
- The new directory cannot be created because the user`s disk quota is exhausted.
CONFORMING TO
SVr4, POSIX, BSD, SYSV, X/OPEN. SVr4 documents additional EIO, EMULTIHOP and ENOLINK error conditions; POSIX.1 omits ELOOP.
NOTES
Under Linux apart from the permission bits, only the S_ISVTX mode bit is honored. That is, under Linux the created directory actually gets mode (
mode & ~
umask & 01777). See also
stat(2).
There are many infelicities in the protocol underlying NFS. Some of these affect mkdir.
SEE ALSO
mkdir(1),
chmod(2),
mknod(2),
mount(2),
rmdir(2),
stat(2),
umask(2),
unlink(2)
NAME
mlock - disable paging for some parts of memory
SYNOPSIS
#include <sys/mman.h> int mlock(const void *addr, size_t len);
DESCRIPTION
mlock disables paging for the memory in the range starting at
addr with length
len bytes. All pages which contain a part of the specified memory range are guaranteed be resident in RAM when the
mlock system call returns successfully and they are guaranteed to stay in RAM until the pages are unlocked by
munlock or
munlockall, until the pages are unmapped via
munmap, or until the process terminates or starts another program with
exec. Child processes do not inherit page locks across a
fork.
Memory locking has two main applications: real-time algorithms and high-security data processing. Real-time applications require deterministic timing, and, like scheduling, paging is one major cause of unexpected program execution delays. Real-time applications will usually also switch to a real-time scheduler with sched_setscheduler. Cryptographic security software often handles critical bytes like passwords or secret keys as data structures. As a result of paging, these secrets could be transferred onto a persistent swap store medium, where they might be accessible to the enemy long after the security software has erased the secrets in RAM and terminated. (But be aware that the suspend mode on laptops and some desktop computers will save a copy of the system`s RAM to disk, regardless of memory locks.)
Memory locks do not stack, i.e., pages which have been locked several times by calls to mlock or mlockall will be unlocked by a single call to munlock for the corresponding range or by munlockall. Pages which are mapped to several locations or by several processes stay locked into RAM as long as they are locked at least at one location or by at least one process.
On POSIX systems on which mlock and munlock are available, _POSIX_MEMLOCK_RANGE is defined in <unistd.h> and the value PAGESIZE from <limits.h> indicates the number of bytes per page.
NOTES
With the Linux system call,
addr is automatically rounded down to the nearest page boundary. However, POSIX 1003.1-2001 allows an implementation to require that
addr is page aligned, so portable applications should ensure this.
RETURN VALUE
On success,
mlock returns zero. On error, -1 is returned,
errno is set appropriately, and no changes are made to any locks in the address space of the process.
ERRORS
- ENOMEM
- Some of the specified address range does not correspond to mapped pages in the address space of the process or the process tried to exceed the maximum number of allowed locked pages. Non-root processes are allowed to lock up to their current RLIMIT_MEMLOCK resource limit.
- EPERM
- The calling process does not have appropriate privileges. Processes are permitted to lock pages if they running with the CAP_IPC_LOCK capability (normally only true for root) or if their current RLIMIT_MEMLOCK resource limit is non-zero.
- EINVAL
- (Not on Linux) addr was not a multiple of the page size.
Linux adds
- EINVAL
- len was negative.
CONFORMING TO
POSIX.1b, SVr4. SVr4 documents an additional EAGAIN error code.
SEE ALSO
mlockall(2),
munlock(2),
munlockall(2),
munmap(2),
setrlimit(2)
NAME
mlockall - disable paging for calling process
SYNOPSIS
#include <sys/mman.h> int mlockall(int flags);
DESCRIPTION
mlockall disables paging for all pages mapped into the address space of the calling process. This includes the pages of the code, data and stack segment, as well as shared libraries, user space kernel data, shared memory and memory mapped files. All mapped pages are guaranteed to be resident in RAM when the
mlockall system call returns successfully and they are guaranteed to stay in RAM until the pages are unlocked again by
munlock or
munlockall or until the process terminates or starts another program with
exec. Child processes do not inherit page locks across a
fork.
Memory locking has two main applications: real-time algorithms and high-security data processing. Real-time applications require deterministic timing, and, like scheduling, paging is one major cause of unexpected program execution delays. Real-time applications will usually also switch to a real-time scheduler with sched_setscheduler. Cryptographic security software often handles critical bytes like passwords or secret keys as data structures. As a result of paging, these secrets could be transfered onto a persistent swap store medium, where they might be accessible to the enemy long after the security software has erased the secrets in RAM and terminated. For security applications, only small parts of memory have to be locked, for which mlock is available.
The flags parameter can be constructed from the bitwise OR of the following constants:
- MCL_CURRENT
- Lock all pages which are currently mapped into the address space of the process.
- MCL_FUTURE
- Lock all pages which will become mapped into the address space of the process in the future. These could be for instance new pages required by a growing heap and stack as well as new memory mapped files or shared memory regions.
If MCL_FUTURE has been specified and the number of locked pages exceeds the upper limit of allowed locked pages, then the system call which caused the new mapping will fail with ENOMEM. If these new pages have been mapped by the the growing stack, then the kernel will deny stack expansion and send a SIGSEGV.
Real-time processes should reserve enough locked stack pages before entering the time-critical section, so that no page fault can be caused by function calls. This can be achieved by calling a function which has a sufficiently large automatic variable and which writes to the memory occupied by this large array in order to touch these stack pages. This way, enough pages will be mapped for the stack and can be locked into RAM. The dummy writes ensure that not even copy-on-write page faults can occur in the critical section.
Memory locks do not stack, i.e., pages which have been locked several times by calls to mlockall or mlock will be unlocked by a single call to munlockall. Pages which are mapped to several locations or by several processes stay locked into RAM as long as they are locked at least at one location or by at least one process.
RETURN VALUE
On success,
mlockall returns zero. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- ENOMEM
- The process tried to exceed the maximum number of allowed locked pages. Non-root processes are allowed to lock up to their current RLIMIT_MEMLOCK resource limit.
- EPERM
- The calling process does not have appropriate privileges. Processes are permitted to lock pages if they running with the CAP_IPC_LOCK capability (normally only true for root) or if their current RLIMIT_MEMLOCK resource limit is non-zero.
- EINVAL
- Unknown flags were specified.
AVAILABILITY
On POSIX systems on which
mlockall and
munlockall are available,
_POSIX_MEMLOCK is defined in <
unistd.h> to a value greater than 0. (See also
sysconf(3).)
CONFORMING TO
POSIX.1b, SVr4. SVr4 documents an additional EAGAIN error code.
SEE ALSO
munlockall(2),
mlock(2),
munlock(2),
sysconf(3)
NAME
mmap, munmap - map or unmap files or devices into memory
SYNOPSIS
#include <sys/mman.h> void * mmap(void *start, size_t length, int prot , int flags, int fd, off_t offset);
int munmap(void *start, size_t length);
DESCRIPTION
The
mmap function asks to map
length bytes starting at offset
offset from the file (or other object) specified by the file descriptor
fd into memory, preferably at address
start. This latter address is a hint only, and is usually specified as 0. The actual place where the object is mapped is returned by
mmap, and is never 0.
The prot argument describes the desired memory protection (and must not conflict with the open mode of the file). It is either PROT_NONE or is the bitwise OR of one or more of the other PROT_* flags.
- PROT_EXEC
- Pages may be executed.
- PROT_READ
- Pages may be read.
- PROT_WRITE
- Pages may be written.
- PROT_NONE
- Pages may not be accessed.
The flags parameter specifies the type of the mapped object, mapping options and whether modifications made to the mapped copy of the page are private to the process or are to be shared with other references. It has bits
- MAP_FIXED
- Do not select a different address than the one specified. If the specified address cannot be used, mmap will fail. If MAP_FIXED is specified, start must be a multiple of the pagesize. Use of this option is discouraged.
- MAP_SHARED
- Share this mapping with all other processes that map this object. Storing to the region is equivalent to writing to the file. The file may not actually be updated until msync(2) or munmap(2) are called.
- MAP_PRIVATE
- Create a private copy-on-write mapping. Stores to the region do not affect the original file. It is unspecified whether changes made to the file after the mmap call are visible in the mapped region.
You must specify exactly one of MAP_SHARED and MAP_PRIVATE.
The above three flags are described in POSIX.1b (formerly POSIX.4) and SUSv2. Linux also knows about the following non-standard flags:
- MAP_DENYWRITE
- This flag is ignored. (Long ago, it signalled that attempts to write to the underlying file should fail with ETXTBUSY. But this was a source of denial-of-service attacks.)
- MAP_EXECUTABLE
- This flag is ignored.
- MAP_NORESERVE
- (Used together with MAP_PRIVATE.) Do not reserve swap space pages for this mapping. When swap space is reserved, one has the guarantee that it is possible to modify this private copy-on-write region. When it is not reserved one might get SIGSEGV upon a write when no memory is available.
- MAP_LOCKED
- (Linux 2.5.37 and later) Lock the pages of the mapped region into memory in the manner of mlock(). This flag is ignored in older kernels.
- MAP_GROWSDOWN
- Used for stacks. Indicates to the kernel VM system that the mapping should extend downwards in memory.
- MAP_ANONYMOUS
- The mapping is not backed by any file; the fd and offset arguments are ignored. This flag in conjunction with MAP_SHARED is implemented since Linux 2.4.
- MAP_ANON
- Alias for MAP_ANONYMOUS. Deprecated.
- MAP_FILE
- Compatibility flag. Ignored.
- MAP_32BIT
- Put the mapping into the first 2GB of the process address space. Ignored when MAP_FIXED is set. This flag is currently only supported on x86-64 for 64bit programs.
Some systems document the additional flags MAP_AUTOGROW, MAP_AUTORESRV, MAP_COPY, and MAP_LOCAL.
fd should be a valid file descriptor, unless MAP_ANONYMOUS is set, in which case the argument is ignored.
offset should be a multiple of the page size as returned by getpagesize(2).
Memory mapped by mmap is preserved across fork(2), with the same attributes.
A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file. The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified.
The munmap system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references. The region is also automatically unmapped when the process is terminated. On the other hand, closing the file descriptor does not unmap the region.
The address start must be a multiple of the page size. All pages containing a part of the indicated range are unmapped, and subsequent references to these pages will generate SIGSEGV. It is not an error if the indicated range does not contain any mapped pages.
For file-backed mappings, the st_atime field for the mapped file may be updated at any time between the mmap() and the corresponding unmapping; the first reference to a mapped page will update the field if it has not been already.
The st_ctime and st_mtime field for a file mapped with PROT_WRITE and MAP_SHARED will be updated after a write to the mapped region, and before a subsequent msync() with the MS_SYNC or MS_ASYNC flag, if one occurs.
RETURN VALUE
On success,
mmap returns a pointer to the mapped area. On error, the value
MAP_FAILED (that is, (void *) -1) is returned, and
errno is set appropriately. On success,
munmap returns 0, on failure -1, and
errno is set (probably to EINVAL).
NOTES
It is architecture dependent whether
PROT_READ includes
PROT_EXEC or not. Portable programs should always set
PROT_EXEC if they intend to execute code in the new mapping.
ERRORS
- EBADF
- fd is not a valid file descriptor (and MAP_ANONYMOUS was not set).
- EACCES
- A file descriptor refers to a non-regular file. Or MAP_PRIVATE was requested, but fd is not open for reading. Or MAP_SHARED was requested and PROT_WRITE is set, but fd is not open in read/write (O_RDWR) mode. Or PROT_WRITE is set, but the file is append-only.
- EINVAL
- We don`t like start or length or offset. (E.g., they are too large, or not aligned on a PAGESIZE boundary.)
- ETXTBSY
- MAP_DENYWRITE was set but the object specified by fd is open for writing.
- EAGAIN
- The file has been locked, or too much memory has been locked.
- ENOMEM
- No memory is available, or the process`s maximum number of mappings would have been exceeded.
- ENODEV
- The underlying filesystem of the specified file does not support memory mapping.
Use of a mapped region can result in these signals:
- SIGSEGV
- Attempted write into a region specified to mmap as read-only.
- SIGBUS
- Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file).
AVAILABILITY
On POSIX systems on which
mmap,
msync and
munmap are available,
_POSIX_MAPPED_FILES is defined in <
unistd.h> to a value greater than 0. (See also
sysconf(3).)
CONFORMING TO
SVr4, POSIX.1b (formerly POSIX.4), 4.4BSD, SUSv2. SVr4 documents additional error codes ENXIO and ENODEV. SUSv2 documents additional error codes EMFILE and EOVERFLOW.
MAP_32BIT is a Linux extension.
SEE ALSO
getpagesize(2),
mlock(2),
mmap2(2),
mremap(2),
msync(2),
shm_open(2), B.O. Gallmeister, POSIX.4, O`Reilly, pp. 128-129 and 389-391.
NAME
modify_ldt - get or set ldt
SYNOPSIS
#include <linux/ldt.h> #include <linux/unistd.h> _syscall3(int, modify_ldt, int, func, void *, ptr, unsigned long, bytecount)
int modify_ldt(int func, void *ptr, unsigned long bytecount);
DESCRIPTION
modify_ldt reads or writes the local descriptor table (ldt) for a process. The ldt is a per-process memory management table used by the i386 processor. For more information on this table, see an Intel 386 processor handbook.
When func is 0, modify_ldt reads the ldt into the memory pointed to by ptr. The number of bytes read is the smaller of bytecount and the actual size of the ldt.
When func is 1, modify_ldt modifies one ldt entry. ptr points to a modify_ldt_ldt_s structure and bytecount must equal the size of this structure.
RETURN VALUE
On success,
modify_ldt returns either the actual number of bytes read (for reading) or 0 (for writing). On failure,
modify_ldt returns -1 and sets
errno.
ERRORS
- ENOSYS
- func is neither 0 nor 1.
- EINVAL
- ptr is 0, or func is 1 and bytecount is not equal to the size of the structure modify_ldt_ldt_s, or func is 1 and the new ldt entry has invalid values.
- EFAULT
- ptr points outside the address space.
CONFORMING TO
This call in Linux-specific and should not be used in programs intended to be portable.
SEE ALSO
vm86(2)
NAME
mprotect - control allowable accesses to a region of memory
SYNOPSIS
#include <sys/mman.h> int mprotect(const void *addr, size_t len, int prot);
DESCRIPTION
The function
mprotect specifies the desired protection for the memory page(s) containing part or all of the interval [
addr,
addr+
len-1]. If an access is disallowed by the protection given it, the program receives a
SIGSEGV.
prot is a bitwise-or of the following values:
- PROT_NONE
- The memory cannot be accessed at all.
- PROT_READ
- The memory can be read.
- PROT_WRITE
- The memory can be written to.
- PROT_EXEC
- The memory can contain executing code.
The new protection replaces any existing protection. For example, if the memory had previously been marked PROT_READ, and mprotect is then called with prot PROT_WRITE, it will no longer be readable.
RETURN VALUE
On success,
mprotect returns zero. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EINVAL
- addr is not a valid pointer, or not a multiple of PAGESIZE.
- EFAULT
- The memory cannot be accessed.
- EACCES
- The memory cannot be given the specified access. This can happen, for example, if you mmap(2) a file to which you have read-only access, then ask mprotect to mark it PROT_WRITE.
- ENOMEM
- Internal kernel structures could not be allocated.
EXAMPLE
#include <stdio.h> #include <stdlib.h> #include <errno.h> #include <sys/mman.h> #include <limits.h> /* for PAGESIZE */ #ifndef PAGESIZE #define PAGESIZE 4096 #endif int main(void) { char *p; char c; /* Allocate a buffer; it will have the default protection of PROT_READ|PROT_WRITE. */ p = malloc(1024+PAGESIZE-1); if (!p) { perror("Couldn`t malloc(1024)"); exit(errno); } /* Align to a multiple of PAGESIZE, assumed to be a power of two */ p = (char *)(((int) p + PAGESIZE-1) & ~(PAGESIZE-1)); c = p[666]; /* Read; ok */ p[666] = 42; /* Write; ok */ /* Mark the buffer read-only. */ if (mprotect(p, 1024, PROT_READ)) { perror("Couldn`t mprotect"); exit(errno); } c = p[666]; /* Read; ok */ p[666] = 42; /* Write; program dies on SIGSEGV */ exit(0); }
CONFORMING TO
SVr4, POSIX.1b (formerly POSIX.4). SVr4 defines an additional error code EAGAIN. The SVr4 error conditions don`t map neatly onto Linux`s. POSIX.1b says that
mprotect can be used only on regions of memory obtained from
mmap(2).
SEE ALSO
mmap(2)
NAME
mremap - re-map a virtual memory address
SYNOPSIS
#include <unistd.h> #include <sys/mman.h> void * mremap(void *old_address, size_t old_size , size_t new_size, unsigned long flags);
DESCRIPTION
mremap expands (or shrinks) an existing memory mapping, potentially moving it at the same time (controlled by the
flags argument and the available virtual address space).
old_address is the old address of the virtual memory block that you want to expand (or shrink). Note that old_address has to be page aligned. old_size is the old size of the virtual memory block. new_size is the requested size of the virtual memory block after the resize.
The flags argument is a bitmap of flags.
In Linux the memory is divided into pages. A user process has (one or) several linear virtual memory segments. Each virtual memory segment has one or more mappings to real memory pages (in the page table). Each virtual memory segment has its own protection (access rights), which may cause a segmentation violation if the memory is accessed incorrectly (e.g., writing to a read-only segment). Accessing virtual memory outside of the segments will also cause a segmentation violation.
mremap uses the Linux page table scheme. mremap changes the mapping between virtual addresses and memory pages. This can be used to implement a very efficient realloc.
FLAGS
- MREMAP_MAYMOVE
- indicates if the operation should fail, or change the virtual address if the resize cannot be done at the current virtual address.
RETURN VALUE
On success
mremap returns a pointer to the new virtual memory area. On error, the value
MAP_FAILED (that is, (void *) -1) is returned, and
errno is set appropriately.
ERRORS
- EINVAL
- An invalid argument was given. Most likely old_address was not page aligned.
- EFAULT
- "Segmentation fault." Some address in the range old_address to old_address+old_size is an invalid virtual memory address for this process. You can also get EFAULT even if there exist mappings that cover the whole address space requested, but those mappings are of different types.
- EAGAIN
- The memory segment is locked and cannot be re-mapped.
- ENOMEM
- The memory area cannot be expanded at the current virtual address, and the MREMAP_MAYMOVE flag is not set in flags. Or, there is not enough (virtual) memory available.
NOTES
With current glibc includes, in order to get the definition of
MREMAP_MAYMOVE, you need to define _GNU_SOURCE before including <
sys/mman.h>.
CONFORMING TO
This call is Linux-specific, and should not be used in programs intended to be portable. 4.2BSD had a (never actually implemented)
mremap(2) call with completely different semantics.
SEE ALSO
getpagesize(2),
realloc(3),
malloc(3),
brk(2),
sbrk(2),
mmap(2) Your favorite OS text book for more information on paged memory. (
Modern Operating Systems by Andrew S. Tannenbaum,
Inside Linux by Randolf Bentson,
The Design of the UNIX Operating System by Maurice J. Bach.)
NAME
msgget - get a message queue identifier
SYNOPSIS
#include <sys/types.h> #include <sys/ipc.h> #include <sys/msg.h>
int msgget(key_t key, int msgflg);
DESCRIPTION
The function returns the message queue identifier associated with the value of the
key argument. A new message queue is created if
key has the value
IPC_PRIVATE or
key isn`t
IPC_PRIVATE, no message queue with the given key
key exists, and
IPC_CREAT is asserted in
msgflg (i.e.,
msgflg&IPC_CREAT is nonzero). The presence in
msgflg of the fields
IPC_CREAT and
IPC_EXCL plays the same role, with respect to the existence of the message queue, as the presence of
O_CREAT and
O_EXCL in the mode argument of the
open(2) system call: i.e. the
msgget function fails if
msgflg asserts both
IPC_CREAT and
IPC_EXCL and a message queue already exists for
key.
Upon creation, the lower 9 bits of the argument msgflg define the access permissions of the message queue. These permission bits have the same format and semantics as the access permissions parameter in open(2) or creat(2) system calls. (The execute permissions are not used.)
If a new message queue is created, the system call initializes the system message queue data structure msqid_ds as follows:
- msg_perm.cuid and msg_perm.uid are set to the effective user-ID of the calling process.
- msg_perm.cgid and msg_perm.gid are set to the effective group-ID of the calling process.
- The lowest order 9 bits of msg_perm.mode are set to the lowest order 9 bit of msgflg.
- msg_qnum, msg_lspid, msg_lrpid, msg_stime and msg_rtime are set to 0.
- msg_ctime is set to the current time.
- msg_qbytes is set to the system limit MSGMNB.
If the message queue already exists the access permissions are verified, and a check is made to see if it is marked for destruction.
RETURN VALUE
If successful, the return value will be the message queue identifier (a nonnegative integer), otherwise
-1 with
errno indicating the error.
ERRORS
On failure,
errno is set to one of the following values:
- EACCES
- A message queue exists for key, but the calling process has no access permissions to the queue.
- EEXIST
- A message queue exists for key and msgflg was asserting both IPC_CREAT and IPC_EXCL.
- ENOENT
- No message queue exists for key and msgflg wasn`t asserting IPC_CREAT.
- ENOMEM
- A message queue has to be created but the system has not enough memory for the new data structure.
- ENOSPC
- A message queue has to be created but the system limit for the maximum number of message queues (MSGMNI) would be exceeded.
NOTES
IPC_PRIVATE isn`t a flag field but a
key_t type. If this special value is used for
key, the system call ignores everything but the lowest order 9 bits of
msgflg and creates a new message queue (on success).
The following is a system limit on message queue resources affecting a msgget call:
- MSGMNI
- System wide maximum number of message queues: policy dependent.
BUGS
The name choice IPC_PRIVATE was perhaps unfortunate, IPC_NEW would more clearly show its function.
CONFORMING TO
SVr4, SVID. Until version 2.3.20 Linux would return EIDRM for a
msgget on a message queue scheduled for deletion.
SEE ALSO
ftok(3),
ipc(5),
msgctl(2),
msgsnd(2),
msgrcv(2)
NAME
msgop - message operations
SYNOPSIS
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
int msgsnd(int msqid, struct msgbuf *msgp, size_t msgsz, int msgflg);
ssize_t msgrcv(int msqid, struct msgbuf *msgp, size_t msgsz, long msgtyp, int msgflg);
DESCRIPTION
To send or receive a message, the calling process allocates a structure of the following general form:
struct msgbuf {
long mtype; /* message type, must be > 0 */
char mtext[1]; /* message data */
};
The
mtext field is an array (or other structure) whose size is specified by msgsz, a non-negative integer value. Messages of zero length (i.e., no mtext field) are permitted. The mtype field must have a strictly positive integer value that can be used by the receiving process for message selection (see the section about msgrcv).
The calling process must have write permission to send and read permission to receive a message on the queue.
The msgsnd system call appends a copy of the message pointed to by msgp to the message queue whose identifier is specified by msqid.
If sufficient space is available on the queue, msgsnd succeeds immediately. (The queue capacity is defined by the msg_bytes field in the associated data structure for the message queue. During queue creation this field is initialised to MSGMNB bytes, but this limit can be modified using msgctl.) If insufficient space is available on the queue, then the default behaviour of msgsnd is to block until space becomes available. If IPC_NOWAIT is asserted in msgflg then the call instead fails with the error EAGAIN.
A blocked msgsnd call may also fail if the queue is removed (in which case the system call fails with errno set to EIDRM), or a signal is caught (in which case the system call fails with errno set to EINTR). (msgsnd and msgrcv are never automatically restarted after being interrupted by a signal handler, regardless of the setting of the SA_RESTART flag when establishing a signal handler.)
Upon successful completion the message queue data structure is updated as follows:
- msg_lspid is set to the process ID of the calling process.
- msg_qnum is incremented by 1.
- msg_stime is set to the current time.
The system call msgrcv reads a message from the message queue specified by msqid into the msgbuf pointed to by the msgp argument, removing the read message from the queue.
The argument msgsz specifies the maximum size in bytes for the member mtext of the structure pointed to by the msgp argument. If the message text has length greater than msgsz, then if the msgflg argument asserts MSG_NOERROR, the message text will be truncated (and the truncated part will be lost), otherwise the message isn`t removed from the queue and the system call fails returning with errno set to E2BIG.
The argument msgtyp specifies the type of message requested as follows:
- If msgtyp is 0, then the first message in the queue is read.
- If msgtyp is greater than 0, then the first message on the queue of type msgtyp is read, unless MSG_EXCEPT was asserted in msgflg, in which case the first message on the queue of type not equal to msgtyp will be read.
- If msgtyp is less than 0, then the first message on the queue with the lowest type less than or equal to the absolute value of msgtyp will be read.
The msgflg argument asserts none, one or more (or-ing them) of the following flags:
- IPC_NOWAIT For immediate return if no message of the requested type is on the queue. The system call fails with errno set to ENOMSG.
- MSG_EXCEPT Used with msgtyp greater than 0 to read the first message on the queue with message type that differs from msgtyp.
- MSG_NOERROR To truncate the message text if longer than msgsz bytes.
If no message of the requested type is available and IPC_NOWAIT isn`t asserted in msgflg, the calling process is blocked until one of the following conditions occurs:
- A message of the desired type is placed on the queue.
- The message queue is removed from the system. In this case the system call fails with errno set to EIDRM.
- The calling process catches a signal. In this case the system call fails with errno set to EINTR.
Upon successful completion the message queue data structure is updated as follows:
- msg_lrpid is set to the process ID of the calling process.
- msg_qnum is decremented by 1.
- msg_rtime is set to the current time.
RETURN VALUE
On a failure both functions return
-1 with
errno indicating the error, otherwise
msgsnd returns
0 and
msgrvc returns the number of bytes actually copied into the
mtext array.
ERRORS
When
msgsnd fails, at return
errno will be set to one among the following values:
- EAGAIN
- The message can`t be sent due to the msg_qbytes limit for the queue and IPC_NOWAIT was asserted in mgsflg.
- EACCES
- The calling process has no write permission on the message queue.
- EFAULT
- The address pointed to by msgp isn`t accessible.
- EIDRM
- The message queue was removed.
- EINTR
- Sleeping on a full message queue condition, the process caught a signal.
- EINVAL
- Invalid msqid value, or nonpositive mtype value, or invalid msgsz value (less than 0 or greater than the system value MSGMAX).
- ENOMEM
- The system has not enough memory to make a copy of the supplied msgbuf.
When msgrcv fails, at return errno will be set to one among the following values:
- E2BIG
- The message text length is greater than msgsz and MSG_NOERROR isn`t asserted in msgflg.
- EACCES
- The calling process does not have read permission on the message queue.
- EFAULT
- The address pointed to by msgp isn`t accessible.
- EIDRM
- While the process was sleeping to receive a message, the message queue was removed.
- EINTR
- While the process was sleeping to receive a message, the process received a signal that had to be caught.
- EINVAL
- Illegal msgqid value, or msgsz less than 0.
- ENOMSG
- IPC_NOWAIT was asserted in msgflg and no message of the requested type existed on the message queue.
NOTES
The followings are system limits affecting a
msgsnd system call:
- MSGMAX
- Maximum size for a message text: the implementation set this value to 8192 bytes.
- MSGMNB
- Default maximum size in bytes of a message queue: 16384 bytes. The super-user can increase the size of a message queue beyond MSGMNB by a msgctl system call.
The implementation has no intrinsic limits for the system wide maximum number of message headers (MSGTQL) and for the system wide maximum size in bytes of the message pool (MSGPOOL).
CONFORMING TO
SVr4, SVID.
NOTE
The pointer argument is declared as
struct msgbuf * with libc4, libc5, glibc 2.0, glibc 2.1. It is declared as
void * (
const void * for
msgsnd()) with glibc 2.2, following the SUSv2.
SEE ALSO
ipc(5),
msgctl(2),
msgget(2),
msgrcv(2),
msgsnd(2)
NAME
msync - synchronize a file with a memory map
SYNOPSIS
#include <sys/mman.h> int msync(void *start, size_t length, int flags);
DESCRIPTION
msync flushes changes made to the in-core copy of a file that was mapped into memory using
mmap(2) back to disk. Without use of this call there is no guarantee that changes are written back before
munmap(2) is called. To be more precise, the part of the file that corresponds to the memory area starting at
start and having length
length is updated. The
flags argument may have the bits MS_ASYNC, MS_SYNC and MS_INVALIDATE set, but not both MS_ASYNC and MS_SYNC. MS_ASYNC specifies that an update be scheduled, but the call returns immediately. MS_SYNC asks for an update and waits for it to complete. MS_INVALIDATE asks to invalidate other mappings of the same file (so that they can be updated with the fresh values just written).
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EINVAL
- start is not a multiple of PAGESIZE, or any bit other than MS_ASYNC | MS_INVALIDATE | MS_SYNC is set in flags.
- ENOMEM
- The indicated memory (or part of it) was not mapped.
AVAILABILITY
On POSIX systems on which
msync is available, both
_POSIX_MAPPED_FILES and
_POSIX_SYNCHRONIZED_IO are defined in <
unistd.h> to a value greater than 0. (See also
sysconf(3).)
CONFORMING TO
POSIX.1b (formerly POSIX.4)
This call was introduced in Linux 1.3.21, and then used EFAULT instead of ENOMEM. In Linux 2.4.19 this was changed to the POSIX value ENOMEM.
SEE ALSO
mmap(2), B.O. Gallmeister, POSIX.4, O`Reilly, pp. 128-129 and 389-391.
NAME
munlockall - reenable paging for calling process
SYNOPSIS
#include <sys/mman.h> int munlockall(void);
DESCRIPTION
munlockall reenables paging for all pages mapped into the address space of the calling process.
Memory locks do not stack, i.e., pages which have been locked several times by calls to mlock or mlockall will be unlocked by a single call to munlockall. Pages which are mapped to several locations or by several processes stay locked into RAM as long as they are locked at least at one location or by at least one process.
On POSIX systems on which mlockall and munlockall are available, _POSIX_MEMLOCK is defined in <unistd.h> .
RETURN VALUE
On success,
munlockall returns zero. On error, -1 is returned and
errno is set appropriately.
CONFORMING TO
POSIX.1b, SVr4
SEE ALSO
mlockall(2),
mlock(2),
munlock(2)
NAME
NAL_ADDRESS_new, NAL_ADDRESS_free, NAL_ADDRESS_create, NAL_ADDRESS_set_def_buffer_size, NAL_ADDRESS_can_connect, NAL_ADDRESS_can_listen - libnal addressing functions
SYNOPSIS
#include <libnal/nal.h>
NAL_ADDRESS *NAL_ADDRESS_new(void); void NAL_ADDRESS_free(NAL_ADDRESS *addr); void NAL_ADDRESS_reset(NAL_ADDRESS *addr); int NAL_ADDRESS_create(NAL_ADDRESS *addr, const char *addr_string, unsigned int def_buffer_size); unsigned int NAL_ADDRESS_get_def_buffers_size(const NAL_ADDRESS *addr); int NAL_ADDRESS_set_def_buffer_size(NAL_ADDRESS *addr, unsigned int def_buffer_size); int NAL_ADDRESS_can_connect(const NAL_ADDRESS *addr); int NAL_ADDRESS_can_listen(const NAL_ADDRESS *addr);
DESCRIPTION
NAL_ADDRESS_new() allocates and initialises a new
<FONT SIZE="-1">
NAL_ADDRESS</FONT>
object.
NAL_ADDRESS_free() destroys a <FONT SIZE="-1">NAL_ADDRESS</FONT> object.
NAL_ADDRESS_reset() will, if necessary, cleanup any prior state in addr so that it can be reused in NAL_ADDRESS_create(). Internally, there are other optimisations and benefits to using NAL_ADDRESS_reset() instead of NAL_ADDRESS_free() and NAL_ADDRESS_new() - the implementation can try to avoid repeated reallocation and reinitialisation of state, only doing full cleanup and reinitialisation when necessary.
NAL_ADDRESS_create() will attempt to parse a network address from the string constant provided in addr_string. If this succeeds, then addr will represent the given network address for use in other libnal functions. The significance of def_buffer_size is that any <FONT SIZE="-1">NAL_CONNECTION</FONT> object created with addr will inherent def_buffer_size as the default size for its read and write buffers (see NAL_CONNECTION_set_size(2)). If addr is used to create a <FONT SIZE="-1">NAL_LISTENER</FONT> object, then any <FONT SIZE="-1">NAL_CONNECTION</FONT> objects that are assigned connections from the listener will likewise have the given default size for its buffers. See the ``<FONT SIZE="-1">NOTES</FONT>`` section for information on the syntax of addr.
NAL_ADDRESS_set_def_buffer_size() sets def_buffer_size as the default buffer size in addr. This operation is built into NAL_ADDRESS_create() already. NAL_ADDRESS_get_def_buffer_size() returns the current default buffer size of addr.
NAL_ADDRESS_can_connect() will indicate whether the address represented by addr is of an appropriate form for creating a <FONT SIZE="-1">NAL_CONNECTION</FONT> object. NAL_ADDRESS_can_listen() likewise indicates if addr is appopriate for creating a <FONT SIZE="-1">NAL_LISTENER</FONT> object. In other words, these functions determine whether the address can be ``connected to`` or ``listened on``. Depending on the type of transport and the string from which addr was parsed, some addresses are only good for connecting or listening whereas others can be good for both. See ``<FONT SIZE="-1">NOTES</FONT>``.
RETURN VALUES
NAL_ADDRESS_new() returns a valid
<FONT SIZE="-1">
NAL_ADDRESS</FONT>
object on success, <FONT SIZE="-1">NULL</FONT> otherwise.
NAL_ADDRESS_free() and NAL_ADDRESS_reset() have no return value.
NAL_ADDRESS_get_def_buffer_size() returns the size of the current default buffer size in a <FONT SIZE="-1">NAL_ADDRESS</FONT> object.
All other <FONT SIZE="-1">NAL_ADDRESS</FONT> functions return zero for failure or false, and non-zero for success or true.
NOTES
The string syntax implemented by
libnal is used by all the
distcache libraries and tools. At the time of writing, only TCP/IPv4 and unix domain sockets were supported as underlying transports and so likewise the implemented syntax handling only supported these two forms.
- TCP/IPv4 addresses
- The syntax for TCP/IPv4 addresses has two forms, depending on whether you specify a hostname (or alternatively a dotted-numeric <FONT SIZE="-1">IP</FONT> address) with the port number or just the port number on its own. Eg. to represent port 9001, one uses;
IP:9001
whereas to specify a hostname or <FONT SIZE="-1">IP</FONT> address with it, the syntax is;
IP:machinename.domain:9001 IP:192.168.0.1:9001
Either form of TCP/IPv4 address is generally valid for creating a <FONT SIZE="-1">NAL_LISTENER</FONT> object, although it will depend at run-time on the situation in the system - ie. whether privileges exist to listen on the port, whether the port is already in use, whether the specified hostname or <FONT SIZE="-1">IP</FONT> address is bound to a running network interface that can be listened on, etc. For creating a <FONT SIZE="-1">NAL_CONNECTION</FONT> object, an address must be specified. This is why the NAL_CONNECTION_can_connect() and NAL_CONNECTION_can_listen() helper functions exist - to detect whether the syntax used is logical for the intended use. Failures to set up network resources afterwards will in turn say whether the given address data is possible within the host system.
- unix domain addresses
- There is only one syntax for unix domain addresses, and so any correctly parsed address string is in theory valid for connecting to or listening on. The form is;
UNIX:/path/to/socket
This represents the path to the socket in the file system.
SEE ALSO
NAL_CONNECTION_new(2) - Functions for the <FONT SIZE="-1">NAL_CONNECTION</FONT> type.
NAL_LISTENER_new(2) - Functions for the <FONT SIZE="-1">NAL_LISTENER</FONT> type.
NAL_SELECTOR_new(2) - Functions for the <FONT SIZE="-1">NAL_SELECTOR</FONT> type.
NAL_BUFFER_new(2) - Functions for the <FONT SIZE="-1">NAL_BUFFER</FONT> type.
distcache(8) - Overview of the distcache architecture.
http://www.distcache.org/ - Distcache home page.
AUTHOR
This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at
geoff@geoffthorpe.net.
Home Page: http://www.distcache.org
NAME
NAL_CONNECTION_new, NAL_CONNECTION_free, NAL_CONNECTION_create, NAL_CONNECTION_create_pair, NAL_CONNECTION_create_dummy, NAL_CONNECTION_set_size, NAL_CONNECTION_get_read, NAL_CONNECTION_get_send, NAL_CONNECTION_io, NAL_CONNECTION_io_cap, NAL_CONNECTION_is_established, NAL_CONNECTION_add_to_selector, NAL_CONNECTION_del_from_selector - libnal connection functions
SYNOPSIS
#include <libnal/nal.h>
#define NAL_SELECT_FLAG_READ (unsigned int)0x0001 #define NAL_SELECT_FLAG_SEND (unsigned int)0x0002 #define NAL_SELECT_FLAG_RW (NAL_SELECT_FLAG_READ | NAL_SELECT_FLAG_SEND)
NAL_CONNECTION *NAL_CONNECTION_new(void); void NAL_CONNECTION_free(NAL_CONNECTION *conn); void NAL_CONNECTION_reset(NAL_CONNECTION *conn); int NAL_CONNECTION_create(NAL_CONNECTION *conn, const NAL_ADDRESS *addr); int NAL_CONNECTION_accept(NAL_CONNECTION *conn, NAL_LISTENER *list, NAL_SELECTOR *sel); int NAL_CONNECTION_create_pair(NAL_CONNECTION *conn1, NAL_CONNECTION *conn2, unsigned int def_buffer_size); #if 0 int NAL_CONNECTION_create_dummy(NAL_CONNECTION *conn, unsigned int def_buffer_size); #endif int NAL_CONNECTION_set_size(NAL_CONNECTION *conn, unsigned int size); NAL_BUFFER *NAL_CONNECTION_get_read(NAL_CONNECTION *conn); NAL_BUFFER *NAL_CONNECTION_get_send(NAL_CONNECTION *conn); const NAL_BUFFER *NAL_CONNECTION_get_read_c(const NAL_CONNECTION *conn); const NAL_BUFFER *NAL_CONNECTION_get_send_c(const NAL_CONNECTION *conn); int NAL_CONNECTION_io(NAL_CONNECTION *conn, NAL_SELECTOR *sel); int NAL_CONNECTION_io_cap(NAL_CONNECTION *conn, NAL_SELECTOR *sel, unsigned int max_read, unsigned int max_send); int NAL_CONNECTION_is_established(const NAL_CONNECTION *conn); void NAL_CONNECTION_add_to_selector(const NAL_CONNECTION *conn, NAL_SELECTOR *sel); void NAL_CONNECTION_add_to_selector_ex(const NAL_CONNECTION *conn, NAL_SELECTOR *sel, unsigned int flags); void NAL_CONNECTION_del_from_selector(const NAL_CONNECTION *conn, NAL_SELECTOR *sel);
DESCRIPTION
NAL_CONNECTION_new() allocates and initialises a new
<FONT SIZE="-1">
NAL_CONNECTION</FONT>
object.
NAL_CONNECTION_free() destroys a <FONT SIZE="-1">NAL_CONNECTION</FONT> object.
NAL_CONNECTION_reset() will, if necessary, cleanup any prior state in conn so that it can be reused in NAL_CONNECTION_create(). Internally, there are other optimisations and benefits to using NAL_CONNECTION_reset() instead of NAL_CONNECTION_free() and NAL_CONNECTION_new() - the implementation can try to avoid repeated reallocation and reinitialisation of state, only doing full cleanup and reinitialisation when necessary.
NAL_CONNECTION_create() will attempt to connect to the address represented by addr. If this succeeds, it means either that the underlying connection of conn is established, or that a non-blocking connect was successfully initiated but has not yet completed (it may still be rejected by the peer eventually). Typically, unix domain sockets connect or fail immediately, and usually TCP/IPv4 connect non-blocking, though this may not be true for some interfaces such as `localhost`. NAL_CONNECTION_is_established() can be used to distinguish the difference. The size of the connection`s underlying read and send <FONT SIZE="-1">NAL_BUFFER</FONT>s is initialised to the default that was created in addr. See the ``<FONT SIZE="-1">NOTES</FONT>`` section for more discussion of connection semantics.
NAL_CONNECTION_accept() will not block waiting for incoming connection requests on list, but will accept any pending connection request that had already been identified by a previous call to NAL_SELECTOR_select(2) on sel. See ``<FONT SIZE="-1">NOTES</FONT>``.
NAL_CONNECTION_create_pair() will initialise conn1 and conn2 to be end-points of a single connection. This is typically implemented using the socketpair(2) function, and is designed to allow for an <FONT SIZE="-1">IPC</FONT> mechanism that integrates with libnal. def_buffer_size will control the size of the read and send buffers of both connections if the functions succeed. See the <FONT SIZE="-1">EXAMPLES</FONT> section for some uses of ``pairs``.
NAL_CONNECTION_create_dummy() will implement a virtual <FONT SIZE="-1">FIFO</FONT> that has no underlying network resource associated with it. Writing data to the connection amounts to pushing data onto the front of the <FONT SIZE="-1">FIFO</FONT>, and reading data from the connection amounts to popping data off the end of the <FONT SIZE="-1">FIFO</FONT>. The size of the <FONT SIZE="-1">FIFO</FONT> is specified by def_buffer_size. See the ``<FONT SIZE="-1">BUGS</FONT>`` section for a note on using these connection types with <FONT SIZE="-1">NAL_SELECTOR</FONT>.
NAL_CONNECTION_set_size() will resize the read and send buffers of conn to size. The default size of those buffers is inherited from the setting created in the <FONT SIZE="-1">NAL_ADDRESS</FONT> that initialised conn, or if conn was accepted from a <FONT SIZE="-1">NAL_LISTENER</FONT> object, then from the address that created the listener. The individual buffers can be resized independantly by using the following two functions to obtain the buffesr and using <FONT SIZE="-1">NAL_BUFFER</FONT> functions directly.
NAL_CONNECTION_get_read() and NAL_CONNECTION_get_send() return the read and send buffers of conn. This is how reading and writing is performed on conn, as <FONT SIZE="-1">NAL_BUFFER</FONT> functions may be used on these buffers directly. NAL_CONNECTION_get_read_c() and NAL_CONNECTION_get_send_c() perform the same function but on a constant conn parameter and returning constant pointers to the corresponding buffers.
NAL_CONNECTION_io() will perform any network input/output that is possible given the state in sel. Unless conn had been added to sel via NAL_SELECTOR_add_conn() (or its `_ex` variant) and a resulting call to NAL_SELECTOR_select() had revealed readability and/or writability on conn, this function will silently succeed. Otherwise it will attempt to perform whatever reading or writing was required. If this function fails, that indicates that the connection is no longer valid - this represents a disconnection by the peer, the result of a non-blocking connect that had been initiated but was unable to connect, or some network error that makes conn unusable. See the ``<FONT SIZE="-1">NOTES</FONT>`` section.
NAL_CONNECTION_io_cap() is a version of NAL_CONNECTION_io() that allows the caller to specify a limit on the maximum amount conn should read from, or send to, the network. Whether this amount is read or sent (or even whether reading or sending takes place at all) depends on; the data (and space) available is in the connection`s buffers, what the results of the last select on sel were, and how much data the host system`s networking support will accept or provide to conn.
NAL_CONNECTION_is_established() is useful for determining when a non-blocking connect has completed. See the ``<FONT SIZE="-1">NOTES</FONT>`` section.
NAL_CONNECTION_add_to_selector() registers conn with the selector sel for any events relevant to it. NAL_CONNECTION_del_from_selector() can be used to reverse this if called before any subsequent call to NAL_SELECTOR_select(). NAL_CONNECTION_add_to_selector_ex() extends NAL_CONNECTION_add_to_selector() by allowing a bit-mask to be supplied to control what events the connection can be selected on, these flags are indicated above prefixed with <FONT SIZE="-1">NAL_SELECT_FLAG_</FONT>.
RETURN VALUES
NAL_CONNECTION_new() returns a valid
<FONT SIZE="-1">
NAL_CONNECTION</FONT>
object on success, <FONT SIZE="-1">NULL</FONT> otherwise.
NAL_CONNECTION_free(), NAL_CONNECTION_reset(), NAL_CONNECTION_add_to_selector(), NAL_CONNECTION_add_to_selector_ex(), and NAL_CONNECTION_del_from_selector() have no return value.
NAL_CONNECTION_get_read(), NAL_CONNECTION_get_send(), NAL_CONNECTION_get_read_c(), and NAL_CONNECTION_get_send_c() return pointers to the connection`s buffer objects or <FONT SIZE="-1">NULL</FONT> for failure.
NAL_CONNECTION_accept() returns non-zero if a connection was accepted and is represented by the provided <FONT SIZE="-1">NAL_CONNECTION</FONT> object, or zero if no connection attempt was pending (or if there was but an error prevented the accept operation).
All other <FONT SIZE="-1">NAL_CONNECTION</FONT> functions return zero for failure or false, and non-zero for success or true.
NOTES
A
<FONT SIZE="-1">
NAL_CONNECTION</FONT>
object encapsulates two
<FONT SIZE="-1">
NAL_BUFFER</FONT>
objects and a non-blocking socket. Any data that has been read from the socket is placed in the read buffer, and applications write data into the send buffer for it to be (eventually) written out to the socket. The
<FONT SIZE="-1">
NAL_SELECTOR</FONT>
type provides the ability to poll for any requested network events and then allow connections and listeners to perform their network input/output based on the results.
NAL_CONNECTION_add_to_selector() uses the following logic; the connection is always selected for exception events, and will be selected for readability if its read buffer is not full and writability if its send buffer is not empty.
NAL_CONNECTION_io() is used after calling NAL_CONNECTION_add_to_selector() and a subsequent call to NAL_SELECTOR_select(). It observes the following logic; if an exception event has occured it returns failure, if readability is indicated it will read incoming data up to the limit of the available space in the read buffer, and if writability is indicated it will send as much of the send buffer`s data as possible. If NAL_CONNECTION_io() returns failure, the connection is considered broken for some reason and no further I/O operations should be attempted (the behaviour is undefined). <FONT SIZE="-1">NB:</FONT> The connection object is not automatically cleaned up so as to allow the caller to continue reading any data in the read buffer and/or examine any unsent data in the send buffer.
The above is almost true, <FONT SIZE="-1">BTW</FONT> :-) The special case is that of non-blocking connects. If NAL_CONNECTION_create() cannot immediately connect without blocking, it will return success but subsequent calls to NAL_CONNECTION_is_established() will reveal that the connection is not yet complete. Any connection that is not complete will request selection for sendability inside NAL_CONNECTION_add_to_selector(), whether the application has provided data to send or not. The completion (or failure) of the non-blocking connect will thus cause any subsequent NAL_SELECTOR_select() operation to break. As with all other semantics, it is the follow up call to NAL_CONNECTION_io() that changes the state of the connection object - if it returns failure, the non-blocking connect failed. If it returns success, you should still call NAL_CONNECTION_is_established() to determine if the connection is complete, as the selector could have broken because of signals or network events on other objects.
NAL_CONNECTION_accept() will return immediately, and will only succeed if the <FONT SIZE="-1">NAL_LISTENER</FONT> object had already been added to the selector using NAL_LISTENER_add_to_select(), the selector had been subsequently selected using NAL_SELECTOR_select(2), and this indicated an incoming connection request waiting on the listener.
It should be noted that the actual transport in use is virtualised to allow for multiple transports and, because of this, multiple semantics for how the network functionality behaves. TCP/IPv4 and unix domain socket based connections, as well as connection pairs from NAL_CONNECTION_create_pair(), operate very much as described here. The <FONT SIZE="-1">FIFO</FONT> connection type, created by NAL_CONNECTION_create_dummy() is not yet consistent with this and is described in the ``<FONT SIZE="-1">BUGS</FONT>`` section.
BUGS
Dummy <FONT SIZE="-1">FIFO</FONT> connections created using
NAL_CONNECTION_create_dummy() should be trivially selectable if anyone`s daft enough to try. Ie. if you add a dummy connection to a selector, the
NAL_SELECTOR_select() should break instantly if the <FONT SIZE="-1">FIFO</FONT> is non-empty otherwise the <FONT SIZE="-1">FIFO</FONT> should have no influence at all on the real
select(2). Right now,
NAL_CONNECTION_add_to_selector() silently ignores dummy connections completely.
EXAMPLES
A typical state-machine implementation using a single connection is illustrated here (without error-checking);
NAL_BUFFER *c_read, *c_send; NAL_SELECTOR *sel = NAL_SELECTOR_new(); NAL_CONNECTION *conn = NAL_CONNECTION_new(); NAL_ADDRESS *addr = retrieve_the_desired_address();
/* Setup */ NAL_CONNECTION_create(conn, addr); c_read = NAL_CONNECTION_get_read(conn); c_send = NAL_CONNECTION_get_send(conn);
/* Loop */ do { /* This is where the state-machine code should process as much data as * possible from `c_read` and/or produce as much output to `c_send` as * it can. */ ... ... user code ... /* block on (relevant) network events for `conn` */ NAL_CONNECTION_add_to_selector(conn, sel); NAL_SELECTOR_select(sel, 0, 0); /* Do network I/O after the above blocking select and continue looping * only if the connection is still alive. */ } while(NAL_CONNECTION_io(conn, sel));
An example of using a connection pair (with 2 Kb read and send buffers for each connection) to create <FONT SIZE="-1">IPC</FONT> between a parent process and its child (again, no error checking);
NAL_CONNECTION *ipc_to_parent = NAL_CONNECTION_new(); NAL_CONNECTION *ipc_to_child = NAL_CONNECTION_new();
/* Setup */ NAL_CONNECTION_create_pair(ipc_to_parent, ipc_to_child, 2048);
/* Create child process */ switch(fork()) { case 0: /* Inside the child process, close our copy of the parent`s side */ NAL_CONNECTION_free(ipc_to_child); /* Do child process things, and use `ipc_to_parent` to communicate * with the parent. */ do_child_logic(ipc_to_parent); exit(0); default: /* Inside the parent process, close our copy of the child`s side */ NAL_CONNECTION_free(ipc_to_parent); break; } /* Continue in the parent process, and use `ipc_to_child` to communicate * with the child. */ do_parent_logic(ipc_to_child);
Note that these connection pairs can also be a useful way of handling process termination that allow you to bypass signal handling altogether. If a child process terminates, the connection between the pair will be broken and so this will be noticed in the parent process by any selector selecting on the ipc_to_child connection - the subsequent NAL_CONNECTION_io() operation will fail indicating that the child process is dead (or in the process of dying) and so the parent could immediately call wait(2) or waitpid(2). Whether the <FONT SIZE="-1">SIGCHLD</FONT> signal arrives before the NAL_CONNECTION_io() call or not is not too important, at worst it might prematurely interrupt NAL_SELECTOR_select() (causing it to return zero) so that a redundant loop of the state-machine runs before the next select operation will notice the disconnection. If you already need <FONT SIZE="-1">IPC</FONT> between the parent and child for exchange of data anyway, this mechanism could be useful in avoiding global variables, signal handlers, and the associated difficulties.
SEE ALSO
NAL_CONNECTION_new(2) - Functions for the <FONT SIZE="-1">NAL_CONNECTION</FONT> type.
NAL_LISTENER_new(2) - Functions for the <FONT SIZE="-1">NAL_LISTENER</FONT> type.
NAL_SELECTOR_new(2) - Functions for the <FONT SIZE="-1">NAL_SELECTOR</FONT> type.
NAL_BUFFER_new(2) - Functions for the <FONT SIZE="-1">NAL_BUFFER</FONT> type.
distcache(8) - Overview of the distcache architecture.
http://www.distcache.org/ - Distcache home page.
AUTHOR
This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at
geoff@geoffthorpe.net.
Home Page: http://www.distcache.org
NAME
access - check user`s permissions for a file
SYNOPSIS
#include <unistd.h> int access(const char *pathname, int mode);
DESCRIPTION
access checks whether the process would be allowed to read, write or test for existence of the file (or other file system object) whose name is
pathname. If
pathname is a symbolic link permissions of the file referred to by this symbolic link are tested.
mode is a mask consisting of one or more of R_OK, W_OK, X_OK and F_OK.
R_OK, W_OK and X_OK request checking whether the file exists and has read, write and execute permissions, respectively. F_OK just requests checking for the existence of the file.
The tests depend on the permissions of the directories occurring in the path to the file, as given in pathname, and on the permissions of directories and files referred to by symbolic links encountered on the way.
The check is done with the process`s real uid and gid, rather than with the effective ids as is done when actually attempting an operation. This is to allow set-UID programs to easily determine the invoking user`s authority.
Only access bits are checked, not the file type or contents. Therefore, if a directory is found to be "writable," it probably means that files can be created in the directory, and not that the directory can be written as a file. Similarly, a DOS file may be found to be "executable," but the execve(2) call will still fail.
If the process has appropriate privileges, an implementation may indicate success for X_OK even if none of the execute file permission bits are set.
RETURN VALUE
On success (all requested permissions granted), zero is returned. On error (at least one bit in
mode asked for a permission that is denied, or some other error occurred), -1 is returned, and
errno is set appropriately.
ERRORS
access shall fail if:
- EACCES
- The requested access would be denied to the file or search permission is denied to one of the directories in pathname.
- ELOOP
- Too many symbolic links were encountered in resolving pathname.
- ENAMETOOLONG
- pathname is too long.
- ENOENT
- A directory component in pathname would have been accessible but does not exist or was a dangling symbolic link.
- ENOTDIR
- A component used as a directory in pathname is not, in fact, a directory.
- EROFS
- Write permission was requested for a file on a read-only filesystem.
access may fail if:
- EFAULT
- pathname points outside your accessible address space.
- EINVAL
- mode was incorrectly specified.
- EIO
- An I/O error occurred.
- ENOMEM
- Insufficient kernel memory was available.
- ETXTBSY
- Write access was requested to an executable which is being executed.
RESTRICTIONS
access returns an error if any of the access types in the requested call fails, even if other types might be successful.
access may not work correctly on NFS file systems with UID mapping enabled, because UID mapping is done on the server and hidden from the client, which checks permissions.
Using access to check if a user is authorized to e.g. open a file before actually doing so using open(2) creates a security hole, because the user might exploit the short time interval between checking and opening the file to manipulate it.
CONFORMING TO
SVID, AT&T, POSIX, X/OPEN, BSD 4.3
SEE ALSO
stat(2),
open(2),
chmod(2),
chown(2),
setuid(2),
setgid(2)
NAME
adjtimex - tune kernel clock
SYNOPSIS
#include <sys/timex.h> int adjtimex(struct timex *buf);
DESCRIPTION
Linux uses David L. Mills` clock adjustment algorithm (see RFC 1305). The system call
adjtimex reads and optionally sets adjustment parameters for this algorithm. It takes a pointer to a
timex structure, updates kernel parameters from field values, and returns the same structure with current kernel values. This structure is declared as follows:
-
struct timex { int modes; /* mode selector */ long offset; /* time offset (usec) */ long freq; /* frequency offset (scaled ppm) */ long maxerror; /* maximum error (usec) */ long esterror; /* estimated error (usec) */ int status; /* clock command/status */ long constant; /* pll time constant */ long precision; /* clock precision (usec) (read only) */ long tolerance; /* clock frequency tolerance (ppm) (read only) */ struct timeval time; /* current time (read only) */ long tick; /* usecs between clock ticks */ };
The modes field determines which parameters, if any, to set. It may contain a bitwise-or combination of zero or more of the following bits:
-
#define ADJ_OFFSET 0x0001 /* time offset */ #define ADJ_FREQUENCY 0x0002 /* frequency offset */ #define ADJ_MAXERROR 0x0004 /* maximum time error */ #define ADJ_ESTERROR 0x0008 /* estimated time error */ #define ADJ_STATUS 0x0010 /* clock status */ #define ADJ_TIMECONST 0x0020 /* pll time constant */ #define ADJ_TICK 0x4000 /* tick value */ #define ADJ_OFFSET_SINGLESHOT 0x8001 /* old-fashioned adjtime */
Ordinary users are restricted to a zero value for mode. Only the superuser may set any parameters.
RETURN VALUE
On success,
adjtimex returns the clock state:
-
#define TIME_OK 0 /* clock synchronized */ #define TIME_INS 1 /* insert leap second */ #define TIME_DEL 2 /* delete leap second */ #define TIME_OOP 3 /* leap second in progress */ #define TIME_WAIT 4 /* leap second has occurred */ #define TIME_BAD 5 /* clock not synchronized */
On failure, adjtimex returns -1 and sets errno.
ERRORS
- EFAULT
- buf does not point to writable memory.
- EPERM
- buf.mode is non-zero and the user is not super-user.
- EINVAL
- An attempt is made to set buf.offset to a value outside the range -131071 to +131071, or to set buf.status to a value other than those listed above, or to set buf.tick to a value outside the range 900000/HZ to 1100000/HZ, where HZ is the system timer interrupt frequency.
CONFORMING TO
adjtimex is Linux specific and should not be used in programs intended to be portable. There is a similar but less general call
adjtime in SVr4.
SEE ALSO
settimeofday(2)
NAME
alarm - set an alarm clock for delivery of a signal
SYNOPSIS
#include <unistd.h> unsigned int alarm(unsigned int seconds);
DESCRIPTION
alarm arranges for a
SIGALRM signal to be delivered to the process in
seconds seconds.
If seconds is zero, no new alarm is scheduled.
In any event any previously set alarm is cancelled.
RETURN VALUE
alarm returns the number of seconds remaining until any previously scheduled alarm was due to be delivered, or zero if there was no previously scheduled alarm.
NOTES
alarm and
setitimer share the same timer; calls to one will interfere with use of the other.
sleep() may be implemented using SIGALRM; mixing calls to alarm() and sleep() is a bad idea.
Scheduling delays can, as ever, cause the execution of the process to be delayed by an arbitrary amount of time.
CONFORMING TO
SVr4, SVID, POSIX, X/OPEN, BSD 4.3
SEE ALSO
setitimer(2),
signal(2),
sigaction(2),
gettimeofday(2),
select(2),
pause(2),
sleep(3)
NAME
arch_prctl - set architecture specific thread state
SYNOPSIS
#include <asm/prctl.h> #include <sys/prctl.h>
int arch_prctl(int code, unsigned long addr)
DESCRIPTION
The
arch_prctl function sets architecture specific process or thread state.
code selects a subfunction and passes argument
addr to it.
Sub functions for x86-64 are:
- ARCH_SET_FS
- Set the 64bit base for the FS register to addr.
- ARCH_GET_FS
- Return the 64bit base value for the FS register of the current thread in the unsigned long pointed to by the address parameter
- ARCH_SET_GS
- Set the 64bit base for the GS register to addr.
- ARCH_GET_GS
- Return the 64bit base value for the GS register of the current thread in the unsigned long pointed to by the address parameter.
NOTES
arch_prctl is only supported on Linux/x86-64 for 64bit programs currently.
The 64bit base changes when a new 32bit segment selector is loaded.
ARCH_SET_GS is disabled in some kernels.
Context switches for 64bit segment bases are rather expensive. It may be a faster alternative to set a 32bit base using a segment selector by setting up an LDT with modify_ldt(2) or using the set_thread_area(2) system call in a 2.5 kernel. arch_prctl is only needed when you want to set bases that are larger than 4GB. Memory in the first 2GB of address space can be allocated by using mmap(2) with the MAP_32BIT flag.
No prototype for arch_prctl in glibc 2.2. You have to declare it yourself for now. This will be fixed in future glibc versions.
FS may be already used by the threading library.
ERRORS
- EINVAL
- code is not a valid subcommand.
- EPERM
- addr is outside the process address space.
- EFAULT
- addr points to an unmapped address or is outside the process address space.
AUTHOR
Man page written by Andi Kleen.
CONFORMANCE
arch_prctl is a Linux/x86-64 extension and should not be used in programs intended to be portable.
SEE ALSO
mmap(2),
modify_ldt(2),
prctl(2),
set_thread_area(2)
AMD X86-64 Programmer`s manual
NAME
bind - bind a name to a socket
SYNOPSIS
#include <sys/types.h> #include <sys/socket.h> int bind(int sockfd, struct sockaddr *my_addr, socklen_t addrlen);
DESCRIPTION
bind gives the socket
sockfd the local address
my_addr.
my_addr is
addrlen bytes long. Traditionally, this is called lqassigning a name to a socket.rq When a socket is created with
socket(2), it exists in a name space (address family) but has no name assigned.
It is normally necessary to assign a local address using bind before a SOCK_STREAM socket may receive connections (see accept(2)).
The rules used in name binding vary between address families. Consult the manual entries in Section 7 for detailed information. For AF_INET see ip(7), for AF_UNIX see unix(7), for AF_APPLETALK see ddp(7), for AF_PACKET see packet(7), for AF_X25 see x25(7) and for AF_NETLINK see netlink(7).
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- sockfd is not a valid descriptor.
- EINVAL
- The socket is already bound to an address. This may change in the future: see linux/unix/sock.c for details.
- EACCES
- The address is protected, and the user is not the super-user.
- ENOTSOCK
- Argument is a descriptor for a file, not a socket.
The following errors are specific to UNIX domain (AF_UNIX) sockets:
- EINVAL
- The addrlen is wrong, or the socket was not in the AF_UNIX family.
- EROFS
- The socket inode would reside on a read-only file system.
- EFAULT
- my_addr points outside the user`s accessible address space.
- ENAMETOOLONG
- my_addr is too long.
- ENOENT
- The file does not exist.
- ENOMEM
- Insufficient kernel memory was available.
- ENOTDIR
- A component of the path prefix is not a directory.
- EACCES
- Search permission is denied on a component of the path prefix.
- ELOOP
- Too many symbolic links were encountered in resolving my_addr.
BUGS
The transparent proxy options are not described.
CONFORMING TO
SVr4, 4.4BSD (the
bind function first appeared in BSD 4.2). SVr4 documents additional
EADDRNOTAVAIL,
EADDRINUSE, and
ENOSR general error conditions, and additional
EIO and
EISDIR Unix-domain error conditions.
NOTE
The third argument of
bind is in reality an int (and this is what BSD 4.* and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t. See also
accept(2).
SEE ALSO
accept(2),
connect(2),
listen(2),
socket(2),
getsockname(2),
ip(7),
socket(7)
NAME
brk, sbrk - change data segment size
SYNOPSIS
#include <unistd.h> int brk(void *end_data_segment);
void *sbrk(intptr_t increment);
DESCRIPTION
brk sets the end of the data segment to the value specified by
end_data_segment, when that value is reasonable, the system does have enough memory and the process does not exceed its max data size (see
setrlimit(2)).
sbrk increments the program`s data space by increment bytes. sbrk isn`t a system call, it is just a C library wrapper. Calling sbrk with an increment of 0 can be used to find the current location of the program break.
RETURN VALUE
On success,
brk returns zero, and
sbrk returns a pointer to the start of the new area. On error, -1 is returned, and
errno is set to
ENOMEM.
CONFORMING TO
BSD 4.3
brk and sbrk are not defined in the C Standard and are deliberately excluded from the POSIX.1 standard (see paragraphs B.1.1.1.3 and B.8.3.3).
NOTES
Various systems use various types for the parameter of
sbrk(). Common are
int,
ssize_t,
ptrdiff_t,
intptr_t. XPGv6 obsoletes this function.
SEE ALSO
execve(2),
getrlimit(2),
malloc(3)
NAME
chdir, fchdir - change working directory
SYNOPSIS
#include <unistd.h> int chdir(const char *path);
int fchdir(int fd);
DESCRIPTION
chdir changes the current directory to that specified in
path.
fchdir is identical to chdir, only that the directory is given as an open file descriptor.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
Depending on the file system, other errors can be returned. The more general errors for
chdir are listed below:
- EFAULT
- path points outside your accessible address space.
- ENAMETOOLONG
- path is too long.
- ENOENT
- The file does not exist.
- ENOMEM
- Insufficient kernel memory was available.
- ENOTDIR
- A component of path is not a directory.
- EACCES
- Search permission is denied on a component of path.
- ELOOP
- Too many symbolic links were encountered in resolving path.
- EIO
- An I/O error occurred.
The general errors for fchdir are listed below:
- EBADF
- fd is not a valid file descriptor.
- EACCES
- Search permission was denied on the directory open on fd.
NOTES
The prototype for
fchdir is only available if
_BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).
CONFORMING TO
The
chdir call is compatible with SVr4, SVID, POSIX, X/OPEN, 4.4BSD. SVr4 documents additional EINTR, ENOLINK, and EMULTIHOP error conditions but has no ENOMEM. POSIX.1 does not have ENOMEM or ELOOP error conditions. X/OPEN does not have EFAULT, ENOMEM or EIO error conditions.
The fchdir call is compatible with SVr4, 4.4BSD and X/OPEN. SVr4 documents additional EIO, EINTR, and ENOLINK error conditions. X/OPEN documents additional EINTR and EIO error conditions.
SEE ALSO
getcwd(3),
chroot(2)
NAME
chown, fchown, lchown - change ownership of a file
SYNOPSIS
#include <sys/types.h> #include <unistd.h> int chown(const char *path, uid_t owner, gid_t group);
int fchown(int fd, uid_t owner, gid_t group);
int lchown(const char *path, uid_t owner, gid_t group);
DESCRIPTION
The owner of the file specified by
path or by
fd is changed. Only the super-user may change the owner of a file. The owner of a file may change the group of the file to any group of which that owner is a member. The super-user may change the group arbitrarily.
If the owner or group is specified as -1, then that ID is not changed.
When the owner or group of an executable file are changed by a non-super-user, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown; the Linux behaviour depends on the kernel version. In case of a non-group-executable file (with clear S_IXGRP bit) the S_ISGID bit indicates mandatory locking, and is not cleared by a chown.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
Depending on the file system, other errors can be returned. The more general errors for
chown are listed below:
- EPERM
- The effective UID does not match the owner of the file, and is not zero; or the owner or group were specified incorrectly.
- EROFS
- The named file resides on a read-only file system.
- EFAULT
- path points outside your accessible address space.
- ENAMETOOLONG
- path is too long.
- ENOENT
- The file does not exist.
- ENOMEM
- Insufficient kernel memory was available.
- ENOTDIR
- A component of the path prefix is not a directory.
- EACCES
- Search permission is denied on a component of the path prefix.
- ELOOP
- Too many symbolic links were encountered in resolving path.
The general errors for fchown are listed below:
- EBADF
- The descriptor is not valid.
- ENOENT
- See above.
- EPERM
- See above.
- EROFS
- See above.
- EIO
- A low-level I/O error occurred while modifying the inode.
NOTES
In versions of Linux prior to 2.1.81 (and distinct from 2.1.46),
chown did not follow symbolic links. Since Linux 2.1.81,
chown does follow symbolic links, and there is a new system call
lchown that does not follow symbolic links. Since Linux 2.1.86, this new call (that has the same semantics as the old
chown) has got the same syscall number, and
chown got the newly introduced number.
The prototype for fchown is only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).
CONFORMING TO
The
chown call conforms to SVr4, SVID, POSIX, X/OPEN. The 4.4BSD version can only be used by the superuser (that is, ordinary users cannot give away files). SVr4 documents EINVAL, EINTR, ENOLINK and EMULTIHOP returns, but no ENOMEM. POSIX.1 does not document ENOMEM or ELOOP error conditions.
The fchown call conforms to 4.4BSD and SVr4. SVr4 documents additional EINVAL, EIO, EINTR, and ENOLINK error conditions.
RESTRICTIONS
The
chown() semantics are deliberately violated on NFS file systems which have UID mapping enabled. Additionally, the semantics of all system calls which access the file contents are violated, because
chown() may cause immediate access revocation on already open files. Client side caching may lead to a delay between the time where ownership have been changed to allow access for a user and the time where the file can actually be accessed by the user on other clients.
SEE ALSO
chmod(2),
flock(2)
NAME
clone - create a child process
SYNOPSIS
#include <sched.h> int clone(int (*fn)(void *), void *child_stack, int flags, void *arg);
_syscall2(int, clone, int, flags, void *, child_stack)
DESCRIPTION
clone creates a new process, just like
fork(2).
clone is a library function layered on top of the underlying
clone system call, hereinafter referred to as
sys_clone. A description of
sys_clone is given towards the end of this page.
Unlike fork(2), these calls allow the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers. (Note that on this manual page, "calling process" normally corresponds to "parent process". But see the description of CLONE_PARENT below.)
The main use of clone is to implement threads: multiple threads of control in a program that run concurrently in a shared memory space.
When the child process is created with clone, it executes the function application fn(arg). (This differs from fork(2), where execution continues in the child from the point of the fork(2) call.) The fn argument is a pointer to a function that is called by the child process at the beginning of its execution. The arg argument is passed to the fn function.
When the fn(arg) function application returns, the child process terminates. The integer returned by fn is the exit code for the child process. The child process may also terminate explicitly by calling exit(2) or after receiving a fatal signal.
The child_stack argument specifies the location of the stack used by the child process. Since the child and calling process may share memory, it is not possible for the child process to execute in the same stack as the calling process. The calling process must therefore set up memory space for the child stack and pass a pointer to this space to clone. Stacks grow downwards on all processors that run Linux (except the HP PA processors), so child_stack usually points to the topmost address of the memory space set up for the child stack.
The low byte of flags contains the number of the signal sent to the parent when the child dies. If this signal is specified as anything other than SIGCHLD, then the parent process must specify the __WALL or __WCLONE options when waiting for the child with wait(2). If no signal is specified, then the parent process is not signaled when the child terminates.
flags may also be bitwise-or`ed with one or several of the following constants, in order to specify what is shared between the calling process and the child process:
- CLONE_PARENT
- (Linux 2.4 onwards) If CLONE_PARENT is set, then the parent of the new child (as returned by getppid(2)) will be the same as that of the calling process.
If CLONE_PARENT is not set, then (as with fork(2)) the child`s parent is the calling process.
Note that it is the parent process, as returned by getppid(2), which is signaled when the child terminates, so that if CLONE_PARENT is set, then the parent of the calling process, rather than the calling process itself, will be signaled.
- CLONE_FS
- If CLONE_FS is set, the caller and the child processes share the same file system information. This includes the root of the file system, the current working directory, and the umask. Any call to chroot(2), chdir(2), or umask(2) performed by the calling process or the child process also takes effect in the other process.
If CLONE_FS is not set, the child process works on a copy of the file system information of the calling process at the time of the clone call. Calls to chroot(2), chdir(2), umask(2) performed later by one of the processes do not affect the other process.
- CLONE_FILES
- If CLONE_FILES is set, the calling process and the child processes share the same file descriptor table. File descriptors always refer to the same files in the calling process and in the child process. Any file descriptor created by the calling process or by the child process is also valid in the other process. Similarly, if one of the processes closes a file descriptor, or changes its associated flags, the other process is also affected.
If CLONE_FILES is not set, the child process inherits a copy of all file descriptors opened in the calling process at the time of clone. Operations on file descriptors performed later by either the calling process or the child process do not affect the other process.
- CLONE_NEWNS
- (Linux 2.4.19 onwards) Start the child in a new namespace.
Every process lives in a namespace. The namespace of a process is the data (the set of mounts) describing the file hierarchy as seen by that process. After a fork(2) or clone(2) where the CLONE_NEWNS flag is not set, the child lives in the same namespace as the parent. The system calls mount(2) and umount(2) change the namespace of the calling process, and hence affect all processes that live in the same namespace, but do not affect processes in a different namespace.
After a clone(2) where the CLONE_NEWNS flag is set, the cloned child is started in a new namespace, initialized with a copy of the namespace of the parent.
Only a privileged process may specify the CLONE_NEWNS flag. It is not permitted to specify both CLONE_NEWNS and CLONE_FS in the same clone call.
- CLONE_SIGHAND
- If CLONE_SIGHAND is set, the calling process and the child processes share the same table of signal handlers. If the calling process or child process calls sigaction(2) to change the behavior associated with a signal, the behavior is changed in the other process as well. However, the calling process and child processes still have distinct signal masks and sets of pending signals. So, one of them may block or unblock some signals using sigprocmask(2) without affecting the other process.
If CLONE_SIGHAND is not set, the child process inherits a copy of the signal handlers of the calling process at the time clone is called. Calls to sigaction(2) performed later by one of the processes have no effect on the other process.
- CLONE_PTRACE
- If CLONE_PTRACE is specified, and the calling process is being traced, then trace the child also (see ptrace(2)).
- CLONE_VFORK
- If CLONE_VFORK is set, the execution of the calling process is suspended until the child releases its virtual memory resources via a call to execve(2) or _exit(2) (as with vfork(2)).
If CLONE_VFORK is not set then both the calling process and the child are schedulable after the call, and an application should not rely on execution occurring in any particular order.
- CLONE_VM
- If CLONE_VM is set, the calling process and the child processes run in the same memory space. In particular, memory writes performed by the calling process or by the child process are also visible in the other process. Moreover, any memory mapping or unmapping performed with mmap(2) or munmap(2) by the child or calling process also affects the other process.
If CLONE_VM is not set, the child process runs in a separate copy of the memory space of the calling process at the time of clone. Memory writes or file mappings/unmappings performed by one of the processes do not affect the other, as with fork(2).
- CLONE_PID
- (Obsolete) If CLONE_PID is set, the child process is created with the same process ID as the calling process. This is good for hacking the system, but otherwise of not much use. Since 2.3.21 this flag can be specified only by the system boot process (PID 0). It disappeared in Linux 2.5.16.
- CLONE_THREAD
- (Linux 2.4 onwards) If CLONE_THREAD is set, the child is placed in the same thread group as the calling process.
If CLONE_THREAD is not set, then the child is placed in its own (new) thread group, whose ID is the same as the process ID.
(Thread groups are feature added in Linux 2.4 to support the POSIX threads notion of a set of threads sharing a single PID. In Linux 2.4, calls to getpid(2) return the thread group ID of the caller.)
sys_clone
The
sys_clone system call corresponds more closely to
fork(2) in that execution in the child continues from the point of the call. Thus,
sys_clone only requires the
flags and
child_stack arguments, which have the same meaning as for
clone. (Note that the order of these arguments differs from
clone.)
Another difference for sys_clone is that the child_stack argument may be zero, in which case copy-on-write semantics ensure that the child gets separate copies of stack pages when either process modifies the stack. In this case, for correct operation, the CLONE_VM option should not be specified.
RETURN VALUE
On success, the PID of the child process is returned in the caller`s thread of execution. On failure, a -1 will be returned in the caller`s context, no child process will be created, and
errno will be set appropriately.
ERRORS
- EAGAIN
- Too many processes are already running.
- ENOMEM
- Cannot allocate sufficient memory to allocate a task structure for the child, or to copy those parts of the caller`s context that need to be copied.
- EINVAL
- Returned by clone when a zero value is specified for child_stack.
- EINVAL
- Both CLONE_FS and CLONE_NEWNS were specified in flags.
- EINVAL
- CLONE_THREAD was specified, but CLONE_SIGHAND was not. (Since Linux 2.5.35.)
- EINVAL
- Precisely one of CLONE_DETACHED and CLONE_THREAD was specified. (Since Linux 2.6.0-test6.)
- EINVAL
- CLONE_SIGHAND was specified, but CLONE_VM was not. (Since Linux 2.6.0-test6.)
- EPERM
- CLONE_NEWNS was specified by a non-root process (process without CAP_SYS_ADMIN).
- EPERM
- CLONE_PID was specified by a process other than process 0.
BUGS
There is no entry for
clone in libc version 5. libc 6 (a.k.a. glibc 2) provides
clone as described in this manual page.
NOTES
For kernel versions 2.4.7-2.4.18 the CLONE_THREAD flag implied the CLONE_PARENT flag.
CONFORMING TO
The
clone and
sys_clone calls are Linux-specific and should not be used in programs intended to be portable. For programming threaded applications (multiple threads of control in the same memory space), it is better to use a library implementing the POSIX 1003.1c thread API, such as the LinuxThreads library (included in glibc2). See
pthread_create(3).
SEE ALSO
fork(2),
wait(2),
pthread_create(3)
NAME
connect - initiate a connection on a socket
SYNOPSIS
#include <sys/types.h> #include <sys/socket.h> int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);
DESCRIPTION
The file descriptor
sockfd must refer to a socket. If the socket is of type
SOCK_DGRAM then the
serv_addr address is the address to which datagrams are sent by default, and the only address from which datagrams are received. If the socket is of type
SOCK_STREAM or
SOCK_SEQPACKET, this call attempts to make a connection to another socket. The other socket is specified by
serv_addr, which is an address (of length
addrlen) in the communications space of the socket. Each communications space interprets the
serv_addr parameter in its own way.
Generally, connection-based protocol sockets may successfully connect only once; connectionless protocol sockets may use connect multiple times to change their association. Connectionless sockets may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC.
RETURN VALUE
If the connection or binding succeeds, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
The following are general socket errors only. There may be other domain-specific error codes.
- EBADF
- The file descriptor is not a valid index in the descriptor table.
- EFAULT
- The socket structure address is outside the user`s address space.
- ENOTSOCK
- The file descriptor is not associated with a socket.
- EISCONN
- The socket is already connected.
- ECONNREFUSED
- No one listening on the remote address.
- ETIMEDOUT
- Timeout while attempting connection. The server may be too busy to accept new connections. Note that for IP sockets the timeout may be very long when syncookies are enabled on the server.
- ENETUNREACH
- Network is unreachable.
- EADDRINUSE
- Local address is already in use.
- EINPROGRESS
- The socket is non-blocking and the connection cannot be completed immediately. It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select indicates writability, use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure).
- EALREADY
- The socket is non-blocking and a previous connection attempt has not yet been completed.
- EAGAIN
- No more free local ports or insufficient entries in the routing cache. For PF_INET see the net.ipv4.ip_local_port_range sysctl in ip(7) on how to increase the number of local ports.
- EAFNOSUPPORT
- The passed address didn`t have the correct address family in its sa_family field.
- EACCES, EPERM
- The user tried to connect to a broadcast address without having the socket broadcast flag enabled or the connection request failed because of a local firewall rule.
CONFORMING TO
SVr4, 4.4BSD (the
connect function first appeared in BSD 4.2). SVr4 documents the additional general error codes
EADDRNOTAVAIL,
EINVAL,
EAFNOSUPPORT,
EALREADY,
EINTR,
EPROTOTYPE, and
ENOSR. It also documents many additional error conditions not described here.
NOTE
The third argument of
connect is in reality an int (and this is what BSD 4.* and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t. The draft standard has not been adopted yet, but glibc2 already follows it and also has socklen_t. See also
accept(2).
BUGS
Unconnecting a socket by calling
connect with a
AF_UNSPEC address is not yet implemented.
SEE ALSO
accept(2),
bind(2),
listen(2),
socket(2),
getsockname(2)
NAME
create_module - create a loadable module entry
SYNOPSIS
#include <linux/module.h> caddr_t create_module(const char *name, size_t size);
DESCRIPTION
create_module attempts to create a loadable module entry and reserve the kernel memory that will be needed to hold the module. This system call is only open to the superuser.
RETURN VALUE
On success, returns the kernel address at which the module will reside. On error -1 is returned and
errno is set appropriately.
ERRORS
- EPERM
- The user is not the superuser.
- EEXIST
- A module by that name already exists.
- EINVAL
- The requested size is too small even for the module header information.
- ENOMEM
- The kernel could not allocate a contiguous block of memory large enough for the module.
- EFAULT
- name is outside the program`s accessible address space.
SEE ALSO
init_module(2),
delete_module(2),
query_module(2).
NAME
DC_PLUG_new, DC_PLUG_free, DC_PLUG_to_select, DC_PLUG_io - basic DC_PLUG functions
SYNOPSIS
#include <distcache/dc_plug.h>
DC_PLUG *DC_PLUG_new(NAL_CONNECTION *conn, unsigned int flags); int DC_PLUG_free(DC_PLUG *plug); void DC_PLUG_to_select(DC_PLUG *plug, NAL_SELECTOR *sel); int DC_PLUG_io(DC_PLUG *plug, NAL_SELECTOR *sel);
DESCRIPTION
DC_PLUG_new() allocates and initialises a
<FONT SIZE="-1">
DC_PLUG</FONT>
structure encapsulating the specified connection. The
flags parameter is zero or a bitmask combining one or more of the following flags;
#define DC_PLUG_FLAG_TO_SERVER (unsigned int)0x0001 #define DC_PLUG_FLAG_NOFREE_CONN (unsigned int)0x0002
If the <FONT SIZE="-1">DC_PLUG_FLAG_TO_SERVER</FONT> flag is specified, the plug object will expect to be sending ``request`` messages and receiving ``response`` messages, otherwise will default to the opposite sense.
DC_PLUG_free() frees the <FONT SIZE="-1">DC_PLUG</FONT> structure and, unless it had been created with the <FONT SIZE="-1">DC_PLUG_FLAG_NOFREE_CONN</FONT> flag, will also destroy the connection object it encapsulates.
DC_PLUG_to_select() is used to add a plug object to the sel selector so that it can be tested for network events it is waiting on. This will automatically handle selection of flags depending on the plug object`s state. Ie. it will select for writability on its underlying connection only if there is data waiting to be sent, and likewise will select for readability only if it is ready to receive any data that may have arrived.
DC_PLUG_io() is used to allow network I/O to be performed on a plug object`s underlying connection depending on the results of the last select operation on sel.
RETURN VALUES
DC_PLUG_new() returns the new plug object on success, otherwise
<FONT SIZE="-1">
NULL</FONT>
for failure.
DC_PLUG_free() should never fail and should only return non-zero results.
DC_PLUG_to_select() has no return value.
DC_PLUG_io() return zero on an error, otherwise non-zero.
None of the <FONT SIZE="-1">DC_PLUG</FONT> functions sets (or clears) errno because it is implemented on top of the libnal library which in turn is an abstraction layer for the system`s networking interfaces. As such, any errno codes set by failure in system libraries will not be overwritten by these functions.
SEE ALSO
DC_PLUG_read(2) - Provides documentation for other
<FONT SIZE="-1">
DC_PLUG</FONT>
functions also.
distcache(8) - Overview of the distcache architecture.
http://www.distcache.org/ - Distcache home page.
AUTHOR
This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at
geoff@geoffthorpe.net.
Home Page: http://www.distcache.org
NAME
DC_SERVER_set_default_cache, DC_SERVER_set_cache, DC_SERVER_new, DC_SERVER_free, DC_SERVER_items_stored, DC_SERVER_reset_operations, DC_SERVER_num_operations, DC_SERVER_new_client, DC_SERVER_del_client, DC_SERVER_process_client, DC_SERVER_clients_to_sel, DC_SERVER_clients_io - distcache server API
SYNOPSIS
#include <distcache/dc_server.h>
DC_SERVER *DC_SERVER_new(unsigned int max_sessions); void DC_SERVER_free(DC_SERVER *ctx); int DC_SERVER_set_default_cache(void); int DC_SERVER_set_cache(const DC_CACHE_cb *impl); unsigned int DC_SERVER_items_stored(DC_SERVER *ctx, const struct timeval *now); void DC_SERVER_reset_operations(DC_SERVER *ctx); unsigned long DC_SERVER_num_operations(DC_SERVER *ctx); DC_CLIENT *DC_SERVER_new_client(DC_SERVER *ctx, NAL_CONNECTION *conn, unsigned int flags); int DC_SERVER_del_client(DC_CLIENT *clnt); int DC_SERVER_process_client(DC_CLIENT *clnt, const struct timeval *now); int DC_SERVER_clients_to_sel(DC_SERVER *ctx, NAL_SELECTOR *sel); int DC_SERVER_clients_io(DC_SERVER *ctx, NAL_SELECTOR *sel, const struct timeval *now);
RETURN VALUES
DC_SERVER_new() returns an initialised
<FONT SIZE="-1">
DC_SERVER</FONT>
object, or <FONT SIZE="-1">NULL</FONT> for failure.
DC_SERVER_free() and DC_SERVER_reset_operations() have no return value.
DC_SERVER_items_stored() returns the number of cached sessions in a cache (after any session expiry is performed).
DC_SERVER_num_operations() indicates how many operations the cache object has performed.
DC_SERVER_new_client() returns a new <FONT SIZE="-1">DC_CLIENT</FONT> object, or <FONT SIZE="-1">NULL</FONT> for failure.
The remaining functions return non-zero for success or zero for failure.
DESCRIPTION and NOTES
Use of the
dc_server.h header requires the "
struct timeval" type to be defined. On many systems, this will require that you include the
time.h header in advance, though details will vary from system to system. If in doubt, try consulting your system`s
gettimeofday(2) man page for information on how to have this system type defined.
These <FONT SIZE="-1">DC_SERVER</FONT> functions facilitate the implementation a session cache server to be compatible with the distcache protocol. The source code to dc_server(1) provides an example of using this <FONT SIZE="-1">API</FONT>, and is probably the ideal reference (a single C file of 304 lines). The storage of the cache is provided by a table of handler functions defined by the DC_CACHE_cb structure;
typedef struct st_DC_CACHE_cb { DC_CACHE * (*cache_new)(unsigned int max_sessions); void (*cache_free)(DC_CACHE *cache); int (*cache_add)(DC_CACHE *cache, const struct timeval *now, unsigned long timeout_msecs, const unsigned char *session_id, unsigned int session_id_len, const unsigned char *data, unsigned int data_len); unsigned int (*cache_get)(DC_CACHE *cache, const struct timeval *now, const unsigned char *session_id, unsigned int session_id_len, unsigned char *store, unsigned int store_size); int (*cache_remove)(DC_CACHE *cache, const struct timeval *now, const unsigned char *session_id, unsigned int session_id_len); int (*cache_have)(DC_CACHE *cache, const struct timeval *now, const unsigned char *session_id, unsigned int session_id_len); unsigned int (*cache_num_items)(DC_CACHE *cache, const struct timeval *now); } DC_CACHE_cb;
libdistcacheserver provides a default implementation that can be enabled by calling DC_SERVER_set_default_cache() prior to DC_SERVER_new(). Alternatively, a customised cache implementation can be specified by DC_SERVER_set_cache(). The reason that one or the other must be specified is so that custom implementations will not need to have the default implementation linked in because they won`t explicitly call DC_SERVER_set_default_cache().
The choice of DC_CACHE_cb implementation will control all manipulations and queries on the session cache. Each handler is passed a struct timeval value to allow it to implicitly handle expiry of old sessions without having to repeatedly query the time on each invokation.
Outside the actual cache implementation, the other subject covered by libdistcacheserver is that of managing client connections and processing their requests. It is assumed that the caller will use libnal to handle the network aspects of the cache server - otherwise the application would be better to use the lower-level <FONT SIZE="-1">DC_PLUG</FONT> <FONT SIZE="-1">API</FONT> (see DC_PLUG_new(2)), and the implementation of libdistcacheserver would provide a good reference for this.
New clients of the cache server are created by DC_SERVER_new_client() using the supplied connection object conn. The behaviour of the returned <FONT SIZE="-1">DC_CLIENT</FONT> object depends on the flags parameter, which is zero or a bitwise combination of the following values;
#define DC_CLIENT_FLAG_NOFREE_CONN (unsigned int)0x0001 #define DC_CLIENT_FLAG_IN_SERVER (unsigned int)0x0002
If <FONT SIZE="-1">DC_CLIENT_FLAG_NOFREE_CONN</FONT> is set, then conn will not be destroyed when the <FONT SIZE="-1">DC_CLIENT</FONT> object is destroyed by DC_SERVER_new_client(). Note, the <FONT SIZE="-1">DC_CLIENT</FONT> object encapsulates the provided conn object and does not copy it.
If <FONT SIZE="-1">DC_CLIENT_FLAG_IN_SERVER</FONT> is set, then network traffic and request processing for the client will be implicit in the DC_SERVER_clients_to_sel() and DC_SERVER_clients_io() functions. This includes destroying any clients that have disconnected at the network level or had corruption errors at the data level.
If <FONT SIZE="-1">DC_CLIENT_FLAG_IN_SERVER</FONT> is not set, then selecting and performing network I/O should be handled by the caller directly using the original conn object, and checking for (and processing of) requests should be handled directly by DC_SERVER_process_client(). A zero return value from this function indicates an error in the client`s processing, and would then require the caller to destroy the client object via DC_SERVER_del_client(). This allows network handling and logical cache handling to be explicitly separated by the implementation if required.
Note that the dc_server(1) implementation is greatly simplified by using <FONT SIZE="-1">DC_CLIENT_FLAG_IN_SERVER</FONT> and not setting <FONT SIZE="-1">DC_CLIENT_FLAG_NOFREE_CONN</FONT>. This allows it to forget about <FONT SIZE="-1">NAL_CONNECTION</FONT> objects after they have been successfully converted into <FONT SIZE="-1">DC_CLIENT</FONT> objects, and in fact can forget about the resulting <FONT SIZE="-1">DC_CLIENT</FONT> objects too as they become completely controlled by the <FONT SIZE="-1">DC_SERVER</FONT> object. If the client is closed, the underlying connection object is destroyed also. If the cache server itself is destroyed, then any remaining clients will likewise be properly cleaned up.
DC_SERVER_clients_to_sel() and DC_SERVER_clients_io() only operate on cache clients that are created with the <FONT SIZE="-1">DC_CLIENT_FLAG_IN_SERVER</FONT> flag.
SEE ALSO
DC_PLUG_new(2),
DC_PLUG_read(2) - Lower-level asynchronous implementation of the distcache protocol, useful for client and server operation.
dc_server(1) - Runs a cache server listening on a configurable network address.
distcache(8) - Overview of the distcache architecture.
http://www.distcache.org/ - Distcache home page.
AUTHOR
This toolkit was designed and implemented by Geoff Thorpe for Cryptographic Appliances Incorporated. Since the project was released into open source, it has a home page and a project environment where development, mailing lists, and releases are organised. For problems with the software or this man page please check for new releases at the project web-site below, mail the users mailing list described there, or contact the author at
geoff@geoffthorpe.net.
Home Page: http://www.distcache.org
NAME
dup, dup2 - duplicate a file descriptor
SYNOPSIS
#include <unistd.h> int dup(int oldfd); int dup2(int oldfd, int newfd);
DESCRIPTION
dup and
dup2 create a copy of the file descriptor
oldfd.
After successful return of dup or dup2, the old and new descriptors may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other.
The two descriptors do not share the close-on-exec flag, however.
dup uses the lowest-numbered unused descriptor for the new descriptor.
dup2 makes newfd be the copy of oldfd, closing newfd first if necessary.
RETURN VALUE
dup and
dup2 return the new descriptor, or -1 if an error occurred (in which case,
errno is set appropriately).
ERRORS
- EBADF
- oldfd isn`t an open file descriptor, or newfd is out of the allowed range for file descriptors.
- EMFILE
- The process already has the maximum number of file descriptors open and tried to open a new one.
- EINTR
- The dup2 call was interrupted by a signal.
- EBUSY
- (Linux only) This may be returned by dup2 during a race condition with open() and dup().
WARNING
The error returned by
dup2 is different from that returned by
fcntl(...,
F_DUPFD, ...
) when
newfd is out of range. On some systems
dup2 also sometimes returns
EINVAL like
F_DUPFD.
BUGS
If
newfd was open, any errors that would have been reported at
close() time, are lost. A careful programmer will not use
dup2 without closing
newfd first.
CONFORMING TO
SVr4, SVID, POSIX, X/OPEN, BSD 4.3. SVr4 documents additional EINTR and ENOLINK error conditions. POSIX.1 adds EINTR. The EBUSY return is Linux-specific.
SEE ALSO
fcntl(2),
open(2),
close(2)
NAME
epoll_create - open an epoll file descriptor
SYNOPSIS
#include <sys/epoll.h> int epoll_create(int size)
DESCRIPTION
Open an
epoll file descriptor by requesting the kernel allocate an event backing store dimensioned for
size descriptors. The
size is not the maximum size of the backing store but just a hint to the kernel about how to dimension internal structures. The returned file descriptor will be used for all the subsequent calls to the
epoll interface. The file descriptor returned by
epoll_create(2) must be closed by using
close(2).
RETURN VALUE
When successful,
epoll_create(2) returns a positive integer identifying the descriptor. When an error occurs,
epoll_create(2) returns -1 and
errno is set appropriately.
ERRORS
- ENOMEM
- There was insufficient memory to create the kernel object.
CONFORMING TO
epoll_create(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.
SEE ALSO
close(2),
epoll_ctl(2),
epoll_wait(2),
epoll(4)
NAME
epoll_wait - wait for an I/O event on an epoll file descriptor
SYNOPSIS
#include <sys/epoll.h> int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout)
DESCRIPTION
Wait for events on the
epoll file descriptor
epfd for a maximum time of
timeout milliseconds. The memory area pointed to by
events will contain the events that will be available for the caller. Up to
maxevents are returned by
epoll_wait(2). The
maxevents parameter must be greater than zero. Specifying a
timeout of -1 makes
epoll_wait(2) wait indefinitely, while specifying a
timeout equal to zero makes
epoll_wait(2) to return immediately even if no events are available ( return code equal to zero ). The
struct epoll_event is defined as :
typedef union epoll_data { void *ptr; int fd; __uint32_t u32; __uint64_t u64; } epoll_data_t; struct epoll_event { __uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ };
The data of each returned structure will contain the same data the user set with a epoll_ctl(2) (EPOLL_CTL_ADD,EPOLL_CTL_MOD) while the events member will contain the returned event bit field.
RETURN VALUE
When successful,
epoll_wait(2) returns the number of file descriptors ready for the requested I/O, or zero if no file descriptor became ready during the requested
timeout milliseconds. When an error occurs,
epoll_wait(2) returns -1 and
errno is set appropriately.
ERRORS
- EBADF
- epfd is not a valid file descriptor.
- EINVAL
- The supplied file descriptor, epfd, is not an epoll file descriptor, or the maxevents parameter is less than or equal to zero.
- EFAULT
- The memory area pointed to by events is not accessible with write permissions.
CONFORMING TO
epoll_wait(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.
SEE ALSO
epoll_ctl(2),
epoll_create(2),
epoll(4)
NAME
_exit, _Exit - terminate the current process
SYNOPSIS
#include <unistd.h> void _exit(int status);
#include <stdlib.h>
void _Exit(int status);
DESCRIPTION
The function
_exit terminates the calling process "immediately". Any open file descriptors belonging to the process are closed; any children of the process are inherited by process 1, init, and the process`s parent is sent a
SIGCHLD signal.
The value status is returned to the parent process as the process`s exit status, and can be collected using one of the wait family of calls.
The function _Exit is equivalent to _exit.
RETURN VALUE
These functions do not return.
CONFORMING TO
SVr4, SVID, POSIX, X/OPEN, BSD 4.3. The function
_Exit() was introduced by C99.
NOTES
For a discussion on the effects of an exit, the transmission of exit status, zombie processes, signals sent, etc., see
exit(3).
The function _exit is like exit(), but does not call any functions registered with the ANSI C atexit function, nor any registered signal handlers. Whether it flushes standard I/O buffers and removes temporary files created with tmpfile(3) is implementation-dependent. On the other hand, _exit does close open file descriptors, and this may cause an unknown delay, waiting for pending output to finish. If the delay is undesired, it may be useful to call functions like tcflush() before calling _exit(). Whether any pending I/O is cancelled, and which pending I/O may be cancelled upon _exit(), is implementation-dependent.
SEE ALSO
fork(2),
execve(2),
waitpid(2),
wait4(2),
kill(2),
wait(2),
exit(3),
termios(3)
NAME
chmod, fchmod - change permissions of a file
SYNOPSIS
#include <sys/types.h> #include <sys/stat.h> int chmod(const char *path, mode_t mode);
int fchmod(int fildes, mode_t mode);
DESCRIPTION
The mode of the file given by
path or referenced by
fildes is changed.
Modes are specified by or`ing the following:
-
- S_ISUID
- 04000 set user ID on execution
- S_ISGID
- 02000 set group ID on execution
- S_ISVTX
- 01000 sticky bit
- S_IRUSR (S_IREAD)
- 00400 read by owner
- S_IWUSR (S_IWRITE)
- 00200 write by owner
- S_IXUSR (S_IEXEC)
- 00100 execute/search by owner
- S_IRGRP
- 00040 read by group
- S_IWGRP
- 00020 write by group
- S_IXGRP
- 00010 execute/search by group
- S_IROTH
- 00004 read by others
- S_IWOTH
- 00002 write by others
- S_IXOTH
- 00001 execute/search by others
The effective UID of the process must be zero or must match the owner of the file.
If the effective UID of the process is not zero and the group of the file does not match the effective group ID of the process or one of its supplementary group IDs, the S_ISGID bit will be turned off, but this will not cause an error to be returned.
Depending on the file system, set user ID and set group ID execution bits may be turned off if a file is written. On some file systems, only the super-user can set the sticky bit, which may have a special meaning. For the sticky bit, and for set user ID and set group ID bits on directories, see stat(2).
On NFS file systems, restricting the permissions will immediately influence already open files, because the access control is done on the server, but open files are maintained by the client. Widening the permissions may be delayed for other clients if attribute caching is enabled on them.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
Depending on the file system, other errors can be returned. The more general errors for
chmod are listed below:
- EPERM
- The effective UID does not match the owner of the file, and is not zero.
- EROFS
- The named file resides on a read-only file system.
- EFAULT
- path points outside your accessible address space.
- ENAMETOOLONG
- path is too long.
- ENOENT
- The file does not exist.
- ENOMEM
- Insufficient kernel memory was available.
- ENOTDIR
- A component of the path prefix is not a directory.
- EACCES
- Search permission is denied on a component of the path prefix.
- ELOOP
- Too many symbolic links were encountered in resolving path.
- EIO
- An I/O error occurred.
The general errors for fchmod are listed below:
- EBADF
- The file descriptor fildes is not valid.
- EROFS
- See above.
- EPERM
- See above.
- EIO
- See above.
CONFORMING TO
The
chmod call conforms to SVr4, SVID, POSIX, X/OPEN, 4.4BSD. SVr4 documents EINTR, ENOLINK and EMULTIHOP returns, but no ENOMEM. POSIX.1 does not document EFAULT, ENOMEM, ELOOP or EIO error conditions, or the macros
S_IREAD,
S_IWRITE and
S_IEXEC.
The fchmod call conforms to 4.4BSD and SVr4. SVr4 documents additional EINTR and ENOLINK error conditions. POSIX requires the fchmod function if at least one of _POSIX_MAPPED_FILES and _POSIX_SHARED_MEMORY_OBJECTS is defined, and documents additional ENOSYS and EINVAL error conditions, but does not document EIO.
POSIX and X/OPEN do not document the sticky bit.
SEE ALSO
open(2),
chown(2),
execve(2),
stat(2)
NAME
fcntl - manipulate file descriptor
SYNOPSIS
#include <unistd.h> #include <fcntl.h> int fcntl(int fd, int cmd); int fcntl(int fd, int cmd, long arg); int fcntl(int fd, int cmd, struct flock *lock);
DESCRIPTION
fcntl performs one of various miscellaneous operations on
fd. The operation in question is determined by
cmd.
Handling close-on-exec
- F_DUPFD
- Find the lowest numbered available file descriptor greater than or equal to arg and make it be a copy of fd. This is different form dup2(2) which uses exactly the descriptor specified.
The old and new descriptors may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other.
The two descriptors do not share the close-on-exec flag, however. The close-on-exec flag of the copy is off, meaning that it will not be closed on exec.
On success, the new descriptor is returned.
- F_GETFD
- Read the close-on-exec flag. If the FD_CLOEXEC bit is 0, the file will remain open across exec, otherwise it will be closed.
- F_SETFD
- Set the close-on-exec flag to the value specified by the FD_CLOEXEC bit of arg.
The file status flags
A file descriptor has certain associated flags, initialized by
open(2) and possibly modified by
fcntl(2). The flags are shared between copies (made with
dup(2),
fork(2), etc.) of the same file descriptor.
The flags and their semantics are described in open(2).
- F_GETFL
- Read the file descriptor`s flags.
- F_SETFL
- Set the file status flags part of the descriptor`s flags to the value specified by arg. Remaining bits (access mode, file creation flags) in arg are ignored. On Linux this command can only change the O_APPEND, O_NONBLOCK, O_ASYNC, and O_DIRECT flags.
Advisory locking
F_GETLK,
F_SETLK and
F_SETLKW are used to acquire, release, and test for the existence of record locks (also known as file-segment or file-region locks). The third argument
lock is a pointer to a structure that has at least the following fields (in unspecified order).
struct flock { ... short l_type; /* Type of lock: F_RDLCK, F_WRLCK, F_UNLCK */ short l_whence; /* How to interpret l_start: SEEK_SET, SEEK_CUR, SEEK_END */ off_t l_start; /* Starting offset for lock */ off_t l_len; /* Number of bytes to lock */ pid_t l_pid; /* PID of process blocking our lock (F_GETLK only) */ ... };
The
l_whence,
l_start, and
l_len fields of this structure specify the range of bytes we wish to lock.
l_start is the starting offset for the lock, and is interpreted relative to either: the start of the file (if
l_whence is
SEEK_SET); the current file offset (if
l_whence is
SEEK_CUR); or the end of the file (if
l_whence is
SEEK_END). In the final two cases,
l_start can be a negative number provided the offset does not lie before the start of the file.
l_len is a non-negative integer (but see the NOTES below) specifying the number of bytes to be locked. Bytes past the end of the file may be locked, but not bytes before the start of the file. Specifying 0 for
l_len has the special meaning: lock all bytes starting at the location specified by
l_whence and
l_start through to the end of file, no matter how large the file grows. The
l_type field can be used to place a read (
F_RDLCK) or a write (
F_WDLCK) lock on a file. Any number of processes may hold a read lock (shared lock) on a file region, but only one process may hold a write lock (exclusive lock). An exclusive lock excludes all other locks, both shared and exclusive. A single process can hold only one type of lock on a file region; if a new lock is applied to an already-locked region, then the existing lock is converted to the the new lock type. (Such conversions may involve splitting, shrinking, or coalescing with an existing lock if the byte range specified by the new lock does not precisely coincide with the range of the existing lock.)
- F_SETLK
- Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release a lock (when l_type is F_UNLCK) on the bytes specified by the l_whence, l_start, and l_len fields of lock. If a conflicting lock is held by another process, this call returns -1 and sets errno to EACCES or EAGAIN.
- F_SETLKW
- As for F_SETLK, but if a conflicting lock is held on the file, then wait for that lock to be released. If a signal is caught while waiting, then the call is interrupted and (after the signal handler has returned) returns immediately (with return value -1 and errno set to EINTR).
- F_GETLK
- On input to this call, lock describes a lock we would like to place on the file. If the lock could be placed, fcntl() does not actually place it, but returns F_UNLCK in the l_type field of lock and leaves the other fields of the structure unchanged. If one or more incompatible locks would prevent this lock being placed, then fcntl() returns details about one of these locks in the l_type, l_whence, l_start, and l_len fields of lock and sets l_pid to be the PID of the process holding that lock. In order to place a read lock, fd must be open for reading. In order to place a write lock, fd must be open for writing. To place both types of lock, open a file read-write. As well as being removed by an explicit F_UNLCK, record locks are automatically released when the process terminates or if it closes any file descriptor referring to a file on which locks are held. This is bad: it means that a process can lose the locks on a file like /etc/passwd or /etc/mtab when for some reason a library function decides to open, read and close it. Record locks are not inherited by a child created via fork(2), but are preserved across an execve(2). Because of the buffering performed by the stdio(3) library, the use of record locking with routines in that package should be avoided; use read(2) and write(2) instead.
Mandatory locking
(Non-POSIX.) The above record locks may be either advisory or mandatory, and are advisory by default. To make use of mandatory locks, mandatory locking must be enabled (using the "-o mand" option to
mount(8)) for the file system containing the file to be locked and enabled on the file itself (by disabling group execute permission on the file and enabling the set-GID permission bit).
Advisory locks are not enforced and are useful only between cooperating processes. Mandatory locks are enforced for all processes.
Managing signals
F_GETOWN,
F_SETOWN,
F_GETSIG and
F_SETSIG are used to manage I/O availability signals:
- F_GETOWN
- Get the process ID or process group currently receiving SIGIO and SIGURG signals for events on file descriptor fd. Process groups are returned as negative values.
- F_SETOWN
- Set the process ID or process group that will receive SIGIO and SIGURG signals for events on file descriptor fd. Process groups are specified using negative values. (F_SETSIG can be used to specify a different signal instead of SIGIO).
If you set the O_ASYNC status flag on a file descriptor (either by providing this flag with the open(2) call, or by using the F_SETFL command of fcntl), a SIGIO signal is sent whenever input or output becomes possible on that file descriptor.
The process or process group to receive the signal can be selected by using the F_SETOWN command to the fcntl function. If the file descriptor is a socket, this also selects the recipient of SIGURG signals that are delivered when out-of-band data arrives on that socket. (SIGURG is sent in any situation where select(2) would report the socket as having an "exceptional condition".) If the file descriptor corresponds to a terminal device, then SIGIO signals are sent to the foreground process group of the terminal.
- F_GETSIG
- Get the signal sent when input or output becomes possible. A value of zero means SIGIO is sent. Any other value (including SIGIO) is the signal sent instead, and in this case additional info is available to the signal handler if installed with SA_SIGINFO.
- F_SETSIG
- Sets the signal sent when input or output becomes possible. A value of zero means to send the default SIGIO signal. Any other value (including SIGIO) is the signal to send instead, and in this case additional info is available to the signal handler if installed with SA_SIGINFO.
By using F_SETSIG with a non-zero value, and setting SA_SIGINFO for the signal handler (see sigaction(2)), extra information about I/O events is passed to the handler in a siginfo_t structure. If the si_code field indicates the source is SI_SIGIO, the si_fd field gives the file descriptor associated with the event. Otherwise, there is no indication which file descriptors are pending, and you should use the usual mechanisms (select(2), poll(2), read(2) with O_NONBLOCK set etc.) to determine which file descriptors are available for I/O.
By selecting a POSIX.1b real time signal (value >= SIGRTMIN), multiple I/O events may be queued using the same signal numbers. (Queuing is dependent on available memory). Extra information is available if SA_SIGINFO is set for the signal handler, as above.
Using these mechanisms, a program can implement fully asynchronous I/O without using select(2) or poll(2) most of the time.
The use of O_ASYNC, F_GETOWN, F_SETOWN is specific to BSD and Linux. F_GETSIG and F_SETSIG are Linux-specific. POSIX has asynchronous I/O and the aio_sigevent structure to achieve similar things; these are also available in Linux as part of the GNU C Library (Glibc).
Leases
F_SETLEASE and
F_GETLEASE (Linux 2.4 onwards) are used (respectively) to establish and retrieve the current setting of the calling process`s lease on the file referred to by
fd. A file lease provides a mechanism whereby the process holding the lease (the "lease holder") is notified (via delivery of a signal) when another process (the "lease breaker") tries to
open(2) or
truncate(2) that file.
- F_SETLEASE
- Set or remove a file lease according to which of the following values is specified in the integer arg:
-
- F_RDLCK
- Take out a read lease. This will cause us to be notified when another process opens the file for writing or truncates it.
- F_WRLCK
- Take out a write lease. This will cause us to be notified when another process opens the file (for reading or writing) or truncates it. A write lease may be placed on a file only if no other process currently has the file open.
- F_UNLCK
- Remove our lease from the file.
A process may hold only one type of lease on a file. Leases may only be taken out on regular files. An unprivileged process may only take out a lease on a file whose UID matches the file system UID of the process. - F_GETLEASE
- Indicates what type of lease we hold on the file referred to by fd by returning either F_RDLCK, F_WRLCK, or F_UNLCK, indicating, respectively, that the calling process holds a read, a write, or no lease on the file. (The third argument to fcntl() is omitted.)
When a process (the "lease breaker") performs an open() or truncate() that conflicts with a lease established via F_SETLEASE, the system call is blocked by the kernel, unless the O_NONBLOCK flag was specified to open(), in which case the system call will return with the error EWOULDBLOCK. The kernel notifies the lease holder by sending it a signal (SIGIO by default). The lease holder should respond to receipt of this signal by doing whatever cleanup is required in preparation for the file to be accessed by another process (e.g., flushing cached buffers) and then either remove or downgrade its lease. A lease is removed by performing an F_SETLEASE command specifying arg as F_UNLCK. If we currently hold a write lease on the file, and the lease breaker is opening the file for reading, then it is sufficient to downgrade the lease to a read lease. This is done by performing an F_SETLEASE command specifying arg as F_RDLCK.
If the lease holder fails to downgrade or remove the lease within the number of seconds specified in /proc/sys/fs/lease-break-time then the kernel forcibly removes or downgrades the lease holder`s lease.
Once the lease has been voluntarily or forcibly removed or downgraded, and assuming the lease breaker has not unblocked its system call, the kernel permits the lease breaker`s system call to proceed.
The default signal used to notify the lease holder is SIGIO, but this can be changed using the F_SETSIG command to fcntl (). If a F_SETSIG command is performed (even one specifying SIGIO), and the signal handler is established using SA_SIGINFO, then the handler will receive a siginfo_t sructure as its second argument, and the si_fd field of this argument will hold the descriptor of the leased file that has been accessed by another process. (This is useful if the caller holds leases against multiple files).
File and directory change notification
- F_NOTIFY
- (Linux 2.4 onwards) Provide notification when the directory referred to by fd or any of the files that it contains is changed. The events to be notified are specified in arg, which is a bit mask specified by ORing together zero or more of the following bits:
| Bit | Description (event in directory)
|
| DN_ACCESS |
| DN_MODIFY | A file was modified (write, pwrite,
|
| writev, truncate, ftruncate)
|
| DN_CREATE | A file was created (open, creat, mknod,
|
| mkdir, link, symlink, rename)
|
| DN_DELETE | A file was unlinked (unlink, rename to
|
| another directory, rmdir)
|
| DN_RENAME | A file was renamed within this
|
| directory (rename)
|
| DN_ATTRIB | The attributes of a file were changed
|
| (chown, chmod, utime[s])
|
(In order to obtain these definitions, the _GNU_SOURCE macro must be defined before including <fcntl.h>.)
Directory notifications are normally "one-shot", and the application must re-register to receive further notifications. Alternatively, if DN_MULTISHOT is included in arg, then notification will remain in effect until explicitly removed.
A series of F_NOTIFY requests is cumulative, with the events in arg being added to the set already monitored. To disable notification of all events, make an F_NOTIFY call specifying arg as 0.
Notification occurs via delivery of a signal. The default signal is SIGIO, but this can be changed using the F_SETSIG command to fcntl(). In the latter case, the signal handler receives a siginfo_t structure as its second argument (if the handler was established using SA_SIGINFO) and the si_fd field of this structure contains the file descriptor which generated the notification (useful when establishing notification on multiple directories).
Especially when using DN_MULTISHOT, a POSIX.1b real time signal should be used for notication, so that multiple notifications can be queued.
RETURN VALUE
For a successful call, the return value depends on the operation:
- F_DUPFD
- The new descriptor.
- F_GETFD
- Value of flag.
- F_GETFL
- Value of flags.
- F_GETOWN
- Value of descriptor owner.
- F_GETSIG
- Value of signal sent when read or write becomes possible, or zero for traditional SIGIO behaviour.
- All other commands
- Zero.
On error, -1 is returned, and errno is set appropriately.
ERRORS
- EACCES or EAGAIN
- Operation is prohibited by locks held by other processes. Or, operation is prohibited because the file has been memory-mapped by another process.
- EBADF
- fd is not an open file descriptor, or the command was F_SETLK or F_SETLKW and the file descriptor open mode doesn`t match with the type of lock requested.
- EDEADLK
- It was detected that the specified F_SETLKW command would cause a deadlock.
- EFAULT
- lock is outside your accessible address space.
- EINTR
- For F_SETLKW, the command was interrupted by a signal. For F_GETLK and F_SETLK, the command was interrupted by a signal before the lock was checked or acquired. Most likely when locking a remote file (e.g. locking over NFS), but can sometimes happen locally.
- EINVAL
- For F_DUPFD, arg is negative or is greater than the maximum allowable value. For F_SETSIG, arg is not an allowable signal number.
- EMFILE
- For F_DUPFD, the process already has the maximum number of file descriptors open.
- ENOLCK
- Too many segment locks open, lock table is full, or a remote locking protocol failed (e.g. locking over NFS).
- EPERM
- Attempted to clear the O_APPEND flag on a file that has the append-only attribute set.
NOTES
The errors returned by
dup2 are different from those returned by
F_DUPFD.
Since kernel 2.0, there is no interaction between the types of lock placed by flock(2) and fcntl(2).
POSIX 1003.1-2001 allows l_len to be negative. (And if it is, the interval described by the lock covers bytes l_start+l_len up to and including l_start-1.) This is supported by Linux since Linux 2.4.21 and 2.5.49.
Several systems have more fields in struct flock such as e.g. l_sysid. Clearly, l_pid alone is not going to be very useful if the process holding the lock may live on a different machine.
CONFORMING TO
SVr4, SVID, POSIX, X/OPEN, BSD 4.3. Only the operations F_DUPFD, F_GETFD, F_SETFD, F_GETFL, F_SETFL, F_GETLK, F_SETLK and F_SETLKW are specified in POSIX.1. F_GETOWN and F_SETOWN are BSDisms not supported in SVr4; F_GETSIG and F_SETSIG are specific to Linux.
F_NOTIFY,
F_GETLEASE, and
F_SETLEASE are Linux specific. (Define the _GNU_SOURCE macro before including <
fcntl.h> to obtain these definitions.) The flags legal for F_GETFL/F_SETFL are those supported by
open(2) and vary between these systems; O_APPEND, O_NONBLOCK, O_RDONLY, and O_RDWR are specified in POSIX.1. SVr4 supports several other options and flags not documented here.
SVr4 documents additional EIO, ENOLINK and EOVERFLOW error conditions.
SEE ALSO
dup2(2),
flock(2),
lockf(3),
open(2),
socket(2) See also locks.txt, mandatory.txt, and dnotify.txt in /usr/src/linux/Documentation.
NAME
getxattr, lgetxattr, fgetxattr - retrieve an extended attribute value
SYNOPSIS
#include <sys/types.h> #include <attr/xattr.h> ssize_t getxattr (const char *path, const char *name, void *value, size_t size); ssize_t lgetxattr (const char *path, const char *name, void *value, size_t size); ssize_t fgetxattr (int filedes, const char *name, void *value, size_t size);
DESCRIPTION
Extended attributes are
name:
value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the
stat(2) data). A complete overview of extended attributes concepts can be found in
attr(5).
getxattr retrieves the value of the extended attribute identified by name and associated with the given path in the filesystem. The length of the attribute value is returned.
lgetxattr is identical to getxattr, except in the case of a symbolic link, where the link itself is interrogated, not the file that it refers to.
fgetxattr is identical to getxattr, only the open file pointed to by filedes (as returned by open(2)) is interrogated in place of path.
An extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode. The value of an extended attribute is a chunk of arbitrary textual or binary data of specified length.
An empty buffer of size zero can be passed into these calls to return the current size of the named extended attribute, which can be used to estimate the size of a buffer which is sufficiently large to hold the value associated with the extended attribute.
The interface is designed to allow guessing of initial buffer sizes, and to enlarge buffers when the return value indicates that the buffer provided was too small.
RETURN VALUE
On success, a positive number is returned indicating the size of the extended attribute value. On failure, -1 is returned and
errno is set appropriately.
If the named attribute does not exist, or the process has no access to this attribute, errno is set to ENOATTR.
If the size of the value buffer is too small to hold the result, errno is set to ERANGE.
If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.
The errors documented for the stat(2) system call are also applicable here.
AUTHORS
Andreas Gruenbacher, <
a.gruenbacher@computer.org> and the SGI XFS development team, <
linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.
SEE ALSO
getfattr(1),
setfattr(1),
open(2),
stat(2),
setxattr(2),
listxattr(2),
removexattr(2), and
attr(5).
NAME
flock - apply or remove an advisory lock on an open file
SYNOPSIS
#include <sys/file.h> int flock(int fd, int operation);
DESCRIPTION
Apply or remove an advisory lock on the open file specified by
fd. The parameter
operation is one of the following:
-
- LOCK_SH
- Place a shared lock. More than one process may hold a shared lock for a given file at a given time.
- LOCK_EX
- Place an exclusive lock. Only one process may hold an exclusive lock for a given file at a given time.
- LOCK_UN
- Remove an existing lock held by this process.
A call to flock() may block if an incompatible lock is held by another process. To make a non-blocking request, include LOCK_NB (by ORing) with any of the above operations.
A single file may not simultaneously have both shared and exclusive locks.
Locks created by flock() are associated with a file, or, more precisely, an open file table entry. This means that duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer to the same lock, and this lock may be modified or released using any of these descriptors. Furthermore, the lock is released either by an explicit LOCK_UN operation on any of these duplicate descriptors, or when all such descriptors have been closed.
A process may only hold one type of lock (shared or exclusive) on a file. Subsequent flock() calls on an already locked file will convert an existing lock to the new lock mode.
Locks created by flock() are preserved across an execve(2).
A shared or exclusive lock can be placed on a file regardless of the mode in which the file was opened.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EWOULDBLOCK
- The file is locked and the LOCK_NB flag was selected.
- EBADF
- fd is not a not an open file descriptor.
- EINTR
- While waiting to acquire a lock, the call was interrupted by delivery of a signal caught by a handler.
- EINVAL
- operation is invalid.
- ENOLCK
- The kernel ran out of memory for allocating lock records.
CONFORMING TO
4.4BSD (the
flock(2) call first appeared in 4.2BSD). A version of
flock(2), possibly implemented in terms of
fcntl(2), appears on most Unices.
NOTES
flock(2) does not lock files over NFS. Use
fcntl(2) instead: that does work over NFS, given a sufficiently recent version of Linux and a server which supports locking.
Since kernel 2.0, flock(2) is implemented as a system call in its own right rather than being emulated in the GNU C library as a call to fcntl(2). This yields true BSD semantics: there is no interaction between the types of lock placed by flock(2) and fcntl(2), and flock(2) does not detect deadlock.
flock(2) places advisory locks only; given suitable permissions on a file, a process is free to ignore the use of flock(2) and perform I/O on the file.
flock(2) and fcntl(2) locks have different semantics with respect to forked processes and dup(2).
SEE ALSO
open(2),
close(2),
dup(2),
execve(2),
fcntl(2),
fork(2),
lockf(3)
There are also locks.txt and mandatory.txt in /usr/src/linux/Documentation.
NAME
alloc_hugepages, free_hugepages - allocate or free huge pages
SYNOPSIS
void *alloc_hugepages(int key, void *addr, size_t len, int prot, int flag); int free_hugepages(void *addr);
DESCRIPTION
The system calls
alloc_hugepages and
free_hugepages were introduced in Linux 2.5.36 and removed again in 2.5.54. They existed only on i386 and ia64 (when built with CONFIG_HUGETLB_PAGE). In Linux 2.4.20 the syscall numbers exist, but the calls return ENOSYS.
On i386 the memory management hardware knows about ordinary pages (4 KiB) and huge pages (2 or 4 MiB). Similarly ia64 knows about huge pages of several sizes. These system calls serve to map huge pages into the process` memory or to free them again. Huge pages are locked into memory, and are not swapped.
The key parameter is an identifier. When zero the pages are private, and not inherited by children. When positive the pages are shared with other applications using the same key, and inherited by child processes.
The addr parameter of free_hugepages() tells which page is being freed - it was the return value of a call to alloc_hugepages(). (The memory is first actually freed when all users have released it.) The addr parameter of alloc_hugepages() is a hint, that the kernel may or may not follow. Addresses must be properly aligned.
The len parameter is the length of the required segment. It must be a multiple of the huge page size.
The prot parameter specifies the memory protection of the segment. It is one of PROT_READ, PROT_WRITE, PROT_EXEC.
The flag parameter is ignored, unless key is positive. In that case, if flag is IPC_CREAT, then a new huge page segment is created when none with the given key existed. If this flag is not set, then ENOENT is returned when no segment with the given key exists. .SHRETURN VALUE On success, alloc_hugepages returns the allocated virtual address, and free_hugepages returns zero. On error, -1 is returned, and errno is set appropriately.
ERRORS
- ENOSYS
- The system call is not supported on this kernel.
CONFORMING TO
These calls existed only in Linux 2.5.36 - 2.5.54. These calls are specific to Linux on Intel processors, and should not be used in programs intended to be portable. Indeed, the system call numbers are marked for reuse, so programs using these may do something random on a future kernel.
FILES
/proc/sys/vm/nr_hugepages Number of configured hugetlb pages. This can be read and written.
/proc/meminfo Gives info on the number of configured hugetlb pages and on their size in the three variables HugePages_Total, HugePages_Free, Hugepagesize.
NOTES
The system calls are gone. Now the hugetlbfs filesystem can be used instead. Memory backed by huge pages (if the CPU supports them) is obtained by mmap`ing files in this virtual filesystem.
The maximal number of huge pages can be specified using the hugepages= boot parameter.
NAME
setxattr, lsetxattr, fsetxattr - set an extended attribute value
SYNOPSIS
#include <sys/types.h> #include <attr/xattr.h> int setxattr (const char *path, const char *name, const void *value, size_t size, int flags); int lsetxattr (const char *path, const char *name, const void *value, size_t size, int flags); int fsetxattr (int filedes, const char *name, const void *value, size_t size, int flags);
DESCRIPTION
Extended attributes are
name:
value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the
stat(2) data). A complete overview of extended attributes concepts can be found in
attr(5).
setxattr sets the value of the extended attribute identified by name and associated with the given path in the filesystem. The size of the value must be specified.
lsetxattr is identical to setxattr, except in the case of a symbolic link, where the extended attribute is set on the link itself, not the file that it refers to.
fsetxattr is identical to setxattr, only the extended attribute is set on the open file pointed to by filedes (as returned by open(2)) in place of path.
An extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode. The value of an extended attribute is a chunk of arbitrary textual or binary data of specified length.
The flags parameter can be used to refine the semantics of the operation. XATTR_CREATE specifies a pure create, which fails if the named attribute exists already. XATTR_REPLACE specifies a pure replace operation, which fails if the named attribute does not already exist. By default (no flags), the extended attribute will be created if need be, or will simply replace the value if the attribute exists.
RETURN VALUE
On success, zero is returned. On failure, -1 is returned and
errno is set appropriately.
If XATTR_CREATE is specified, and the attribute exists already, errno is set to EEXIST. If XATTR_REPLACE is specified, and the attribute does not exist, errno is set to ENOATTR.
If there is insufficient space remaining to store the extended attribute, errno is set to either ENOSPC, or EDQUOT if quota enforcement was the cause.
If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.
The errors documented for the stat(2) system call are also applicable here.
AUTHORS
Andreas Gruenbacher, <
a.gruenbacher@computer.org> and the SGI XFS development team, <
linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.
SEE ALSO
getfattr(1),
setfattr(1),
open(2),
stat(2),
getxattr(2),
listxattr(2),
removexattr(2), and
attr(5).
NAME
statfs, fstatfs - get file system statistics
SYNOPSIS
#include <sys/vfs.h> /* or <
sys/statfs.h> */
int statfs(const char *path, struct statfs *buf);
int fstatfs(int fd, struct statfs *buf);
DESCRIPTION
The function
statfs returns information about a mounted file system.
path is the path name of any file within the mounted filesystem.
buf is a pointer to a
statfs structure defined approximately as follows:
-
struct statfs { long f_type; /* type of filesystem (see below) */ long f_bsize; /* optimal transfer block size */ long f_blocks; /* total data blocks in file system */ long f_bfree; /* free blocks in fs */ long f_bavail; /* free blocks avail to non-superuser */ long f_files; /* total file nodes in file system */ long f_ffree; /* free file nodes in fs */ fsid_t f_fsid; /* file system id */ long f_namelen; /* maximum length of filenames */ }; File system types: ADFS_SUPER_MAGIC 0xadf5 AFFS_SUPER_MAGIC 0xADFF BEFS_SUPER_MAGIC 0x42465331 BFS_MAGIC 0x1BADFACE CIFS_MAGIC_NUMBER 0xFF534D42 CODA_SUPER_MAGIC 0x73757245 COH_SUPER_MAGIC 0x012FF7B7 CRAMFS_MAGIC 0x28cd3d45 DEVFS_SUPER_MAGIC 0x1373 EFS_SUPER_MAGIC 0x00414A53 EXT_SUPER_MAGIC 0x137D EXT2_OLD_SUPER_MAGIC 0xEF51 EXT2_SUPER_MAGIC 0xEF53 EXT3_SUPER_MAGIC 0xEF53 HFS_SUPER_MAGIC 0x4244 HPFS_SUPER_MAGIC 0xF995E849 HUGETLBFS_MAGIC 0x958458f6 ISOFS_SUPER_MAGIC 0x9660 JFFS2_SUPER_MAGIC 0x72b6 JFS_SUPER_MAGIC 0x3153464a MINIX_SUPER_MAGIC 0x137F /* orig. minix */ MINIX_SUPER_MAGIC2 0x138F /* 30 char minix */ MINIX2_SUPER_MAGIC 0x2468 /* minix V2 */ MINIX2_SUPER_MAGIC2 0x2478 /* minix V2, 30 char names */ MSDOS_SUPER_MAGIC 0x4d44 NCP_SUPER_MAGIC 0x564c NFS_SUPER_MAGIC 0x6969 NTFS_SB_MAGIC 0x5346544e OPENPROM_SUPER_MAGIC 0x9fa1 PROC_SUPER_MAGIC 0x9fa0 QNX4_SUPER_MAGIC 0x002f REISERFS_SUPER_MAGIC 0x52654973 ROMFS_MAGIC 0x7275 SMB_SUPER_MAGIC 0x517B SYSV2_SUPER_MAGIC 0x012FF7B6 SYSV4_SUPER_MAGIC 0x012FF7B5 TMPFS_MAGIC 0x01021994 UDF_SUPER_MAGIC 0x15013346 UFS_MAGIC 0x00011954 USBDEVICE_SUPER_MAGIC 0x9fa2 VXFS_SUPER_MAGIC 0xa501FCF5 XENIX_SUPER_MAGIC 0x012FF7B4 XFS_SUPER_MAGIC 0x58465342 _XIAFS_SUPER_MAGIC 0x012FD16D
Nobody knows what f_fsid is supposed to contain (but see below).
Fields that are undefined for a particular file system are set to 0. fstatfs returns the same information about an open file referenced by descriptor fd.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- (fstatfs) fd is not a valid open file descriptor.
- EACCES
- (statfs) Search permission is denied for a component of the path prefix of path.
- ELOOP
- (statfs) Too many symbolic links were encountered in translating path.
- ENAMETOOLONG
- (statfs) path is too long.
- ENOENT
- (statfs) The file referred to by path does not exist.
- ENOTDIR
- (statfs) A component of the path prefix of path is not a directory.
- EFAULT
- buf or path points to an invalid address.
- EINTR
- This call was interrupted by a signal.
- EIO
- An I/O error occurred while reading from the file system.
- ENOMEM
- Insufficient kernel memory was available.
- ENOSYS
- The file system does not support this call.
- EOVERFLOW
- Some values were too large to be represented in the returned struct.
CONFORMING TO
The Linux
statfs was inspired by the 4.4BSD one (but they do not use the same structure).
NOTES ON f_fsid
Solaris, Irix and POSIX have a system call
statvfs(2) that returns a
struct statvfs (defined in
<sys/statvfs.h>) containing an
unsigned long f_fsid. Linux, SunOS, HPUX, 4.4BSD have a system call
statfs that returns a
struct statfs (defined in
<sys/vfs.h>) containing a
fsid_t f_fsid, where
fsid_t is defined as
struct { int val[2]; }. The same holds for FreeBSD, except that it uses the include file
<sys/mount.h>.
The general idea is that f_fsid contains some random stuff such that the pair (f_fsid,ino) uniquely determines a file. Some OSes use (a variation on) the device number, or the device number combined with the filesystem type. Several OSes restrict giving out the f_fsid field to the superuser only (and zero it for nonprivileged users), because this field is used in the filehandle of the filesystem when NFS-exported, and giving it out is a security concern.
Under some OSes the fsid can be used as second parameter to the sysfs() system call.
NOTES
The kernel has system calls statfs, fstatfs, statfs64, fstatfs64 to support this library call.
Some systems only have <sys/vfs.h>, other systems also have <sys/statfs.h>, where the former includes the latter. So it seems including the former is the best choice.
LSB has deprecated the library calls [f]statfs() and tells us to use [f]statvfs() instead.
SEE ALSO
stat(2),
statvfs(2)
NAME
fsync, fdatasync - synchronize a file`s complete in-core state with that on disk
SYNOPSIS
#include <unistd.h> int fsync(int fd);
int fdatasync(int fd);
DESCRIPTION
fsync copies all in-core parts of a file to disk, and waits until the device reports that all parts are on stable storage. It also updates metadata stat information. It does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit
fsync on the file descriptor of the directory is also needed.
fdatasync does the same as fsync but only flushes user data, not the meta data like the mtime or atime.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- fd is not a valid file descriptor open for writing.
- EROFS, EINVAL
- fd is bound to a special file which does not support synchronization.
- EIO
- An error occurred during synchronization.
NOTES
In case the hard disk has write cache enabled, the data may not really be on permanent storage when
fsync/
fdatasync return.
When an ext2 file system is mounted with the sync option, directory entries are also implicitly synced by fsync.
On kernels before 2.4, fsync on big files can be inefficient. An alternative might be to use the O_SYNC flag to open(2).
CONFORMING TO
POSIX.1b (formerly POSIX.4)
SEE ALSO
bdflush(2),
open(2),
sync(2),
mount(8),
update(8),
sync(8)
NAME
futex - Fast Userspace Locking system call
SYNOPSIS
#include <linux/futex.h>
#include <sys/time.h>
int sys_futex (void *futex, int op, int val, const struct timespec *timeout);
DESCRIPTION
The sys_futex system call provides a method for a program to wait for a value at a given address to change, and a method to wake up anyone waiting on a particular address (while the addresses for the same memory in separate processes may not be equal, the kernel maps them internally so the same memory mapped in different locations will correspond for sys_futex calls). It is typically used to implement the contended case of a lock in shared memory, as described in futex(4).
When a futex(4) operation did not finish uncontended in userspace, a call needs to be made to the kernel to arbitrate. Arbitration can either mean putting the calling process to sleep or, conversely, waking a waiting process.
Callers of this function are expected to adhere to the semantics as set out in futex(4). As these semantics involve writing non-portable assembly instructions, this in turn probably means that most users will in fact be library authors and not general application developers.
The futex argument needs to point to an aligned integer which stores the counter. The operation to execute is passed via the op parameter, along with a value val.
Three operations are currently defined:
- FUTEX_WAIT
- This operation atomically verifies that the futex address still contains the value given, and sleeps awaiting FUTEX_WAKE on this futex address. If the timeout argument is non-NULL, its contents describe the maximum duration of the wait, which is infinite otherwise. For futex(4), this call is executed if decrementing the count gave a negative value (indicating contention), and will sleep until another process releases the futex and executes the FUTEX_WAKE operation.
- FUTEX_WAKE
- This operation wakes at most val processes waiting on this futex address (ie. inside FUTEX_WAIT). For futex(4), this is executed if incrementing the count showed that there were waiters, once the futex value has been set to 1 (indicating that it is available).
- FUTEX_FD
- To support asynchronous wakeups, this operation associates a file descriptor with a futex. If another process executes a FUTEX_WAKE, the process will receive the signal number that was passed in val. The calling process must close the returned file descriptor after use.
To prevent race conditions, the caller should test if the futex has been upped after FUTEX_FD returns.
RETURN VALUE
Depending on which operation was executed, the returned value can have differing meanings.
- FUTEX_WAIT
- Returns 0 if the process was woken by a FUTEX_WAKE call. In case of timeout, ETIMEDOUT is returned. If the futex was not equal to the expected value, the operation returns EWOULDBLOCK. Signals (or other spurious wakeups) cause FUTEX_WAIT to return EINTR.
- FUTEX_WAKE
- Returns the number of processes woken up.
- FUTEX_FD
- Returns the new file descriptor associated with the futex.
ERRORS
- EFAULT
- Error in getting timeout information from userspace.
- EINVAL
- An operation was not defined or error in page alignment.
NOTES
To reiterate, bare futexes are not intended as an easy to use abstraction for end-users. Implementors are expected to be assembly literate and to have read the sources of the futex userspace library referenced below.
AUTHORS
Futexes were designed and worked on by Hubertus Franke (IBM Thomas J. Watson Research Center), Matthew Kirkwood, Ingo Molnar (Red Hat) and Rusty Russell (IBM Linux Technology Center). This page written by bert hubert.
VERSIONS
Initial futex support was merged in Linux 2.5.7 but with different semantics from those described above. Current semantics are available from Linux 2.5.40 onwards.
SEE ALSO
futex(4), `Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux` (proceedings of the Ottawa Linux Symposium 2002), futex example library, futex-*.tar.bz2 <URL:ftp://ftp.nl.kernel.org:/pub/linux/kernel/people/rusty/>.
NAME
getdents - get directory entries
SYNOPSIS
#include <unistd.h> #include <linux/types.h> #include <linux/dirent.h> #include <linux/unistd.h> _syscall3(int, getdents, uint, fd, struct dirent *, dirp, uint, count); int getdents(unsigned int fd, struct dirent *dirp, unsigned int count);
DESCRIPTION
This is not the function you are interested in. Look at
readdir(3) for the POSIX conforming C library interface. This page documents the bare kernel system call interface.
The system call getdents reads several dirent structures from the directory pointed at by fd into the memory area pointed to by dirp. The parameter count is the size of the memory area.
The dirent structure is declared as follows:
-
struct dirent { long d_ino; /* inode number */ off_t d_off; /* offset to next dirent */ unsigned short d_reclen; /* length of this dirent */ char d_name [NAME_MAX+1]; /* file name (null-terminated) */ }
d_ino is an inode number. d_off is the distance from the start of the directory to the start of the next dirent. d_reclen is the size of this entire dirent. d_name is a null-terminated file name.
This call supersedes readdir(2).
RETURN VALUE
On success, the number of bytes read is returned. On end of directory, 0 is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- Invalid file descriptor fd.
- EFAULT
- Argument points outside the calling process`s address space.
- EINVAL
- Result buffer is too small.
- ENOENT
- No such directory.
- ENOTDIR
- File descriptor does not refer to a directory.
CONFORMING TO
SVr4, SVID. SVr4 documents additional ENOLINK, EIO error conditions.
SEE ALSO
readdir(2),
readdir(3)
NAME
getdtablesize - get descriptor table size
SYNOPSIS
#include <unistd.h> int getdtablesize(void);
DESCRIPTION
getdtablesize returns the maximum number of files a process can have open, one more than the largest possible value for a file descriptor.
RETURN VALUE
The current limit on the number of open files per process.
NOTES
getdtablesize is implemented as a libc library function. The glibc version calls
getrlimit(2) and returns the current
RLIMIT_NOFILE limit, or
OPEN_MAX when that fails. The libc4 and libc5 versions return
OPEN_MAX (set to 256 since Linux 0.98.4).
CONFORMING TO
SVr4, 4.4BSD (the
getdtablesize function first appeared in BSD 4.2).
SEE ALSO
close(2),
dup(2),
getrlimit(2),
open(2)
NAME
getuid, geteuid - get user identity
SYNOPSIS
#include <unistd.h> #include <sys/types.h> uid_t getuid(void);
uid_t geteuid(void);
DESCRIPTION
getuid returns the real user ID of the current process.
geteuid returns the effective user ID of the current process.
The real ID corresponds to the ID of the calling process. The effective ID corresponds to the set ID bit on the file being executed.
ERRORS
These functions are always successful.
CONFORMING TO
POSIX, BSD 4.3.
SEE ALSO
setreuid(2),
setuid(2)
NAME
getgroups, setgroups - get/set list of supplementary group IDs
SYNOPSIS
#include <sys/types.h> #include <unistd.h> int getgroups(int size, gid_t list[]);
#include <grp.h>
int setgroups(size_t size, const gid_t *list);
DESCRIPTION
- getgroups
- Up to size supplementary group IDs are returned in list. It is unspecified whether the effective group ID of the calling process is included in the returned list. (Thus, an application should also call getegid(2) and add or remove the resulting value.) If size is zero, list is not modified, but the total number of supplementary group IDs for the process is returned.
- setgroups
- Sets the supplementary group IDs for the process. Only the super-user may use this function.
RETURN VALUE
- getgroups
- On success, the number of supplementary group IDs is returned. On error, -1 is returned, and errno is set appropriately.
- setgroups
- On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
ERRORS
- EFAULT
- list has an invalid address.
- EPERM
- For setgroups, the user is not the super-user.
- EINVAL
- For setgroups, size is greater than NGROUPS (32 for Linux 2.0.32). For getgroups, size is less than the number of supplementary group IDs, but is not zero.
NOTES
A process can have up to at least NGROUPS_MAX supplementary group IDs in addition to the effective group ID. The set of supplementary group IDs is inherited from the parent process and may be changed using
setgroups. The maximum number of supplementary group IDs can be found using
sysconf(3):
long ngroups_max; ngroups_max = sysconf(_SC_NGROUPS_MAX);
The maximal return value of
getgroups cannot be larger than one more than the value obtained this way.
The prototype for setgroups is only available if _BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).
CONFORMING TO
SVr4, SVID (issue 4 only; these calls were not present in SVr3), X/OPEN, 4.3BSD. The
getgroups function is in POSIX.1. Since
setgroups requires privilege, it is not covered by POSIX.1.
SEE ALSO
initgroups(3),
getgid(2),
setgid(2)
NAME
gethostname, sethostname - get/set host name
SYNOPSIS
#include <unistd.h> int gethostname(char *name, size_t len);
int sethostname(const char *name, size_t len);
DESCRIPTION
These functions are used to access or to change the host name of the current processor. The
gethostname() function returns a NUL-terminated hostname (set earlier by
sethostname()) in the array
name that has a length of
len bytes. In case the NUL-terminated hostname does not fit, no error is returned, but the hostname is truncated. It is unspecified whether the truncated hostname will be NUL-terminated.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EINVAL
- len is negative or, for sethostname, len is larger than the maximum allowed size, or, for gethostname on Linux/i386, len is smaller than the actual size. (In this last case glibc 2.1 uses ENAMETOOLONG.)
- EPERM
- For sethostname, the caller was not the superuser.
- EFAULT
- name is an invalid address.
CONFORMING TO
SVr4, 4.4BSD (this function first appeared in 4.2BSD). POSIX 1003.1-2001 specifies
gethostname but not
sethostname.
BUGS
For many Linux kernel / libc combinations
gethostname will return an error instead of returning a truncated hostname.
NOTES
SUSv2 guarantees that `Host names are limited to 255 bytes`. POSIX 1003.1-2001 guarantees that `Host names (not including the terminating NUL) are limited to HOST_NAME_MAX bytes`.
SEE ALSO
getdomainname(2),
setdomainname(2),
uname(2)
NAME
getpagesize - get memory page size
SYNOPSIS
#include <unistd.h> int getpagesize(void);
DESCRIPTION
The function
getpagesize() returns the number of bytes in a page, where a "page" is the thing used where it says in the description of
mmap(2) that files are mapped in page-sized units.
The size of the kind of pages that mmap uses, is found using
-
#include <unistd.h> long sz = sysconf(_SC_PAGESIZE);
(where some systems also allow the synonym _SC_PAGE_SIZE for _SC_PAGESIZE), or
-
#include <unistd.h> int sz = getpagesize();
HISTORY
This call first appeared in 4.2BSD.
CONFORMING TO
SVr4, 4.4BSD, SUSv2. In SUSv2 the
getpagesize() call is labeled "legacy", and in POSIX 1003.1-2001 it has been dropped. HPUX does not have this call.
NOTES
Whether
getpagesize() is present as a Linux system call depends on the architecture. If it is, it returns the kernel symbol PAGE_SIZE, which is architecture and machine model dependent. Generally, one uses binaries that are architecture but not machine model dependent, in order to have a single binary distribution per architecture. This means that a user program should not find PAGE_SIZE at compile time from a header file, but use an actual system call, at least for those architectures (like sun4) where this dependency exists. Here libc4, libc5, glibc 2.0 fail because their
getpagesize() returns a statically derived value, and does not use a system call. Things are OK in glibc 2.1.
SEE ALSO
mmap(2),
sysconf(3)
NAME
setpgid, getpgid, setpgrp, getpgrp - set/get process group
SYNOPSIS
#include <unistd.h> int setpgid(pid_t pid, pid_t pgid);
pid_t getpgid(pid_t pid);
int setpgrp(void);
pid_t getpgrp(void);
DESCRIPTION
setpgid sets the process group ID of the process specified by
pid to
pgid. If
pid is zero, the process ID of the current process is used. If
pgid is zero, the process ID of the process specified by
pid is used. If
setpgid is used to move a process from one process group to another (as is done by some shells when creating pipelines), both process groups must be part of the same session. In this case, the
pgid specifies an existing process group to be joined and the session ID of that group must match the session ID of the joining process.
getpgid returns the process group ID of the process specified by pid. If pid is zero, the process ID of the current process is used.
The call setpgrp() is equivalent to setpgid(0,0).
Similarly, getpgrp() is equivalent to getpgid(0). Each process group is a member of a session and each process is a member of the session of which its process group is a member.
Process groups are used for distribution of signals, and by terminals to arbitrate requests for their input: Processes that have the same process group as the terminal are foreground and may read, while others will block with a signal if they attempt to read. These calls are thus used by programs such as csh(1) to create process groups in implementing job control. The TIOCGPGRP and TIOCSPGRP calls described in termios(3) are used to get/set the process group of the control terminal.
If a session has a controlling terminal, CLOCAL is not set and a hangup occurs, then the session leader is sent a SIGHUP. If the session leader exits, the SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal.
If the exit of the process causes a process group to become orphaned, and if any member of the newly-orphaned process group is stopped, then a SIGHUP signal followed by a SIGCONT signal will be sent to each process in the newly-orphaned process group.
RETURN VALUE
On success,
setpgid and
setpgrp return zero. On error, -1 is returned, and
errno is set appropriately.
getpgid returns a process group on success. On error, -1 is returned, and errno is set appropriately.
getpgrp always returns the current process group.
ERRORS
- EINVAL
- pgid is less than 0 (setpgid, setpgrp).
- EACCES
- An attempt was made to change the process group ID of one of the children of the calling process and the child had already performed an execve (setpgid, setpgrp).
- EPERM
- An attempt was made to move a process into a process group in a different session, or to change the process group ID of one of the children of the calling process and the child was in a different session, or to change the process group ID of a session leader (setpgid, setpgrp).
- ESRCH
- For getpgid: pid does not match any process. For setpgid: pid is not the current process and not a child of the current process.
CONFORMING TO
The functions
setpgid and
getpgrp conform to POSIX.1. The function
setpgrp is from BSD 4.2. The function
getpgid conforms to SVr4.
NOTES
POSIX took
setpgid from the BSD function
setpgrp. Also SysV has a function with the same name, but it is identical to
setsid(2).
To get the prototypes under glibc, define both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer n larger than or equal to 500.
SEE ALSO
getuid(2),
setsid(2),
tcgetpgrp(3),
tcsetpgrp(3),
termios(3)
NAME
getpid, getppid - get process identification
SYNOPSIS
#include <sys/types.h> #include <unistd.h> pid_t getpid(void);
pid_t getppid(void);
DESCRIPTION
getpid returns the process ID of the current process. (This is often used by routines that generate unique temporary file names.)
getppid returns the process ID of the parent of the current process.
CONFORMING TO
POSIX, BSD 4.3, SVID
SEE ALSO
exec(3),
fork(2),
kill(2),
mkstemp(3),
tmpnam(3),
tempnam(3),
tmpfile(3)
NAME
getpid, getppid - get process identification
SYNOPSIS
#include <sys/types.h> #include <unistd.h> pid_t getpid(void);
pid_t getppid(void);
DESCRIPTION
getpid returns the process ID of the current process. (This is often used by routines that generate unique temporary file names.)
getppid returns the process ID of the parent of the current process.
CONFORMING TO
POSIX, BSD 4.3, SVID
SEE ALSO
exec(3),
fork(2),
kill(2),
mkstemp(3),
tmpnam(3),
tempnam(3),
tmpfile(3)
NAME
getresuid, getresgid - get real, effective and saved user or group ID
SYNOPSIS
#define _GNU_SOURCE #include <unistd.h> int getresuid(uid_t *ruid, uid_t *euid, uid_t *suid);
int getresgid(gid_t *rgid, gid_t *egid, gid_t *sgid);
DESCRIPTION
getresuid and
getresgid (both introduced in Linux 2.1.44) get the real, effective and saved user ID`s (resp. group ID`s) of the current process.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EFAULT
- One of the arguments specified an address outside the calling program`s address space.
CONFORMING TO
This call is Linux-specific. The prototype is given by glibc since version 2.3.2 provided _GNU_SOURCE is defined.
SEE ALSO
getuid(2),
setuid(2),
setreuid(2),
setresuid(2)
NAME
getrlimit, getrusage, setrlimit - get/set resource limits and usage
SYNOPSIS
#include <sys/time.h> #include <sys/resource.h> #include <unistd.h> int getrlimit(int resource, struct rlimit *rlim);
int getrusage(int who, struct rusage *usage);
int setrlimit(int resource, const struct rlimit *rlim);
DESCRIPTION
getrlimit and
setrlimit get and set resource limits respectively. Each resource has an associated soft and hard limit, as defined by the
rlimit structure (the
rlim argument to both
getrlimit() and
setrlimit()):
struct rlimit { rlim_t rlim_cur; /* Soft limit */ rlim_t rlim_max; /* Hard limit (ceiling for rlim_cur) */ };
The soft limit is the value that the kernel enforces for the corresponding resource. The hard limit acts as a ceiling for the soft limit: an unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit. A privileged process may make arbitrary changes to either limit value.
The value RLIM_INFINITY denotes no limit on a resource (both in the structure returned by getrlimit() and in the structure passed to setrlimit()).
resource must be one of:
- RLIMIT_AS
- The maximum size of the process`s virtual memory (address space) in bytes. This limit affects calls to brk(2), mmap(2) and mremap(2), which fail with the error ENOMEM upon exceeding this limit. Also automatic stack expansion will fail (and generate a SIGSEGV that kills the process when no alternate stack has been made available). Since the value is a long, on machines with a 32-bit long either this limit is at most 2 GiB, or this resource is unlimited.
- RLIMIT_CORE
- Maximum size of core file. When 0 no core dump files are created. When nonzero, larger dumps are truncated to this size.
- RLIMIT_CPU
- CPU time limit in seconds. When the process reaches the soft limit, it is sent a SIGXCPU signal. The default action for this signal is to terminate the process. However, the signal can be caught, and the handler can return control to the main program. If the process continues to consume CPU time, it will be sent SIGXCPU once per second until the hard limit is reached, at which time it is sent SIGKILL. (This latter point describes Linux 2.2 and 2.4 behaviour. Implementations vary in how they treat processes which continue to consume CPU time after reaching the soft limit. Portable applications that need to catch this signal should perform an orderly termination upon first receipt of SIGXCPU.)
- RLIMIT_DATA
- The maximum size of the process`s data segment (initialized data, uninitialized data, and heap). This limit affects calls to brk() and sbrk(), which fail with the error ENOMEM upon encountering the soft limit of this resource.
- RLIMIT_FSIZE
- The maximum size of files that the process may create. Attempts to extend a file beyond this limit result in delivery of a SIGXFSZ signal. By default, this signal terminates a process, but a process can catch this signal instead, in which case the relevant system call (e.g., write(), truncate()) fails with the error EFBIG.
- RLIMIT_LOCKS
- A limit on the combined number of flock() locks and fcntl() leases that this process may establish. (Early Linux 2.4 only.)
- RLIMIT_MEMLOCK
- The maximum number of bytes of virtual memory that may be locked into RAM using mlock() and mlockall().
- RLIMIT_NOFILE
- Specifies a value one greater than the maximum file descriptor number that can be opened by this process. Attempts (open(), pipe(), dup(), etc.) to exceed this limit yield the error EMFILE.
- RLIMIT_NPROC
- The maximum number of processes that can be created for the real user ID of the calling process. Upon encountering this limit, fork() fails with the error EAGAIN.
- RLIMIT_RSS
- Specifies the limit (in pages) of the process`s resident set (the number of virtual pages resident in RAM). This limit only has effect in Linux 2.4 onwatrds, and there only affects calls to madvise() specifying MADVISE_WILLNEED.
- RLIMIT_STACK
- The maximum size of the process stack, in bytes. Upon reaching this limit, a SIGSEGV signal is generated. To handle this signal, a process must employ an alternate signal stack (sigaltstack(2)).
RLIMIT_OFILE is the BSD name for RLIMIT_NOFILE.
getrusage returns the current resource usages, for a who of either RUSAGE_SELF or RUSAGE_CHILDREN. The former asks for resources used by the current process, the latter for resources used by those of its children that have terminated and have been waited for.
struct rusage { struct timeval ru_utime; /* user time used */ struct timeval ru_stime; /* system time used */ long ru_maxrss; /* maximum resident set size */ long ru_ixrss; /* integral shared memory size */ long ru_idrss; /* integral unshared data size */ long ru_isrss; /* integral unshared stack size */ long ru_minflt; /* page reclaims */ long ru_majflt; /* page faults */ long ru_nswap; /* swaps */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_msgsnd; /* messages sent */ long ru_msgrcv; /* messages received */ long ru_nsignals; /* signals received */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ };
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EFAULT
- rlim or usage points outside the accessible address space.
- EINVAL
- getrlimit or setrlimit is called with a bad resource, or getrusage is called with a bad who.
- EPERM
- A non-superuser tries to use setrlimit() to increase the soft or hard limit above the current hard limit, or a superuser tries to increase RLIMIT_NOFILE above the current kernel maximum.
CONFORMING TO
SVr4, BSD 4.3
NOTE
Including
<sys/time.h> is not required these days, but increases portability. (Indeed,
struct timeval is defined in
<sys/time.h>.)
On Linux, if the disposition of SIGCHLD is set to SIG_IGN then the resource usages of child processes are automatically included in the value returned by RUSAGE_CHILDREN, although POSIX 1003.1-2001 explicitly prohibits this.
The above struct was taken from BSD 4.3 Reno. Not all fields are meaningful under Linux. Right now (Linux 2.4, 2.6) only the fields ru_utime, ru_stime, ru_minflt, ru_majflt, and ru_nswap are maintained.
SEE ALSO
dup(2),
fcntl(2),
fork(2),
mlock(2),
mlockall(2),
mmap(2),
open(2),
quotactl(2),
sbrk(2),
wait3(2),
wait4(2),
malloc(3),
ulimit(3),
signal(7)
NAME
getsid - get session ID
SYNOPSIS
#include <unistd.h> pid_t getsid(pid_t pid);
DESCRIPTION
getsid(0) returns the session ID of the calling process.
getsid(p) returns the session ID of the process with process ID
p. (The session ID of a process is the process group ID of the session leader.) On error, (pid_t) -1 will be returned, and
errno is set appropriately.
ERRORS
- EPERM
- A process with process ID p exists, but it is not in the same session as the current process, and the implementation considers this an error.
- ESRCH
- No process with process ID p was found.
CONFORMING TO
SVr4, POSIX 1003.1-2001.
NOTES
Linux does not return EPERM.
Linux has this system call since Linux 1.3.44. There is libc support since libc 5.2.19.
To get the prototype under glibc, define both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer n larger than or equal to 500.
SEE ALSO
getpgid(2),
setsid(2)
NAME
getsockopt, setsockopt - get and set options on sockets
SYNOPSIS
#include <sys/types.h> #include <sys/socket.h> int getsockopt(int s, int level, int optname, void *optval, socklen_t *optlen);
int setsockopt(int s, int level, int optname, const void *optval, socklen_t optlen);
DESCRIPTION
Getsockopt and
setsockopt manipulate the
options associated with a socket. Options may exist at multiple protocol levels; they are always present at the uppermost
socket level.
When manipulating socket options the level at which the option resides and the name of the option must be specified. To manipulate options at the socket level, level is specified as SOL_SOCKET. To manipulate options at any other level the protocol number of the appropriate protocol controlling the option is supplied. For example, to indicate that an option is to be interpreted by the TCP protocol, level should be set to the protocol number of TCP; see getprotoent(3).
The parameters optval and optlen are used to access option values for setsockopt. For getsockopt they identify a buffer in which the value for the requested option(s) are to be returned. For getsockopt, optlen is a value-result parameter, initially containing the size of the buffer pointed to by optval, and modified on return to indicate the actual size of the value returned. If no option value is to be supplied or returned, optval may be NULL.
Optname and any specified options are passed uninterpreted to the appropriate protocol module for interpretation. The include file <sys/socket.h> contains definitions for socket level options, described below. Options at other protocol levels vary in format and name; consult the appropriate entries in section 4 of the manual.
Most socket-level options utilize an int parameter for optval. For setsockopt, the parameter should be non-zero to enable a boolean option, or zero if the option is to be disabled.
For a description of the available socket options see socket(7) and the appropriate protocol man pages.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and
errno is set appropriately.
ERRORS
- EBADF
- The argument s is not a valid descriptor.
- ENOTSOCK
- The argument s is a file, not a socket.
- ENOPROTOOPT
- The option is unknown at the level indicated.
- EFAULT
- The address pointed to by optval is not in a valid part of the process address space. For getsockopt, this error may also be returned if optlen is not in a valid part of the process address space.
- EINVAL
- optlen invalid in setsockopt
CONFORMING TO
SVr4, 4.4BSD (these system calls first appeared in 4.2BSD). SVr4 documents additional ENOMEM and ENOSR error codes, but does not document the
SO_SNDLOWAT,
SO_RCVLOWAT,
SO_SNDTIMEO,
SO_RCVTIMEO options
NOTE
The fifth argument of
getsockopt and
setsockopt is in reality an int [*] (and this is what BSD 4.* and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t. The draft standard has not been adopted yet, but glibc2 already follows it and also has socklen_t [*]. See also
accept(2).
BUGS
Several of the socket options should be handled at lower levels of the system.
SEE ALSO
ioctl(2),
socket(2),
getprotoent(3),
protocols(5),
socket(7),
unix(7),
tcp(7)
NAME
gettimeofday, settimeofday - get / set time
SYNOPSIS
#include <sys/time.h> int gettimeofday(struct timeval *tv, struct timezone *tz);
int settimeofday(const struct timeval *tv , const struct timezone *tz);
DESCRIPTION
The functions
gettimeofday and
settimeofday can get and set the time as well as a timezone. The
tv argument is a
timeval struct, as specified in <
sys/time.h>:
struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* microseconds */ };
and gives the number of seconds and microseconds since the Epoch (see time(2)). The tz argument is a timezone :
struct timezone { int tz_minuteswest; /* minutes W of Greenwich */ int tz_dsttime; /* type of dst correction */ };
The use of the timezone struct is obsolete; the tz_dsttime field has never been used under Linux - it has not been and will not be supported by libc or glibc. Each and every occurrence of this field in the kernel source (other than the declaration) is a bug. Thus, the following is purely of historic interest.
The field tz_dsttime contains a symbolic constant (values are given below) that indicates in which part of the year Daylight Saving Time is in force. (Note: its value is constant throughout the year - it does not indicate that DST is in force, it just selects an algorithm.) The daylight saving time algorithms defined are as follows :
DST_NONE /* not on dst */
DST_USA /* USA style dst */
DST_AUST /* Australian style dst */
DST_WET /* Western European dst */
DST_MET /* Middle European dst */
DST_EET /* Eastern European dst */
DST_CAN /* Canada */
DST_GB /* Great Britain and Eire */
DST_RUM /* Rumania */
DST_TUR /* Turkey */
DST_AUSTALT /* Australian style with shift in 1986 */
Of course it turned out that the period in which Daylight Saving Time is in force cannot be given by a simple algorithm, one per country; indeed, this period is determined by unpredictable political decisions. So this method of representing time zones has been abandoned. Under Linux, in a call to settimeofday the tz_dsttime field should be zero.
Under Linux there is some peculiar `warp clock` semantics associated to the settimeofday system call if on the very first call (after booting) that has a non-NULL tz argument, the tv argument is NULL and the tz_minuteswest field is nonzero. In such a case it is assumed that the CMOS clock is on local time, and that it has to be incremented by this amount to get UTC system time. No doubt it is a bad idea to use this feature.
The following macros are defined to operate on a struct timeval :
#define timerisset(tvp)
((tvp)->tv_sec || (tvp)->tv_usec) #define timercmp(tvp, uvp, cmp) ((tvp)->tv_sec cmp (uvp)->tv_sec || (tvp)->tv_sec == (uvp)->tv_sec && (tvp)->tv_usec cmp (uvp)->tv_usec) #define timerclear(tvp)
((tvp)->tv_sec = (tvp)->tv_usec = 0)
If either tv or tz is null, the corresponding structure is not set or returned.
Only the super user may use settimeofday.
RETURN VALUE
gettimeofday and
settimeofday return 0 for success, or -1 for failure (in which case
errno is set appropriately).
ERRORS
- EPERM
- settimeofday is called by someone other than the superuser.
- EINVAL
- Timezone (or something else) is invalid.
- EFAULT
- One of tv or tz pointed outside your accessible address space.
NOTE
The prototype for
settimeofday and the defines for
timercmp,
timerisset,
timerclear,
timeradd,
timersub are (since glibc2.2.2) only available if
_BSD_SOURCE is defined (either explicitly, or implicitly, by not defining _POSIX_SOURCE or compiling with the -ansi flag).
Traditionally, the fields of struct timeval were longs.
CONFORMING TO
SVr4, BSD 4.3. POSIX 1003.1-2001 describes gettimeofday() but not settimeofday().
SEE ALSO
date(1),
adjtimex(2),
time(2),
ctime(3),
ftime(3)
NAME
getxattr, lgetxattr, fgetxattr - retrieve an extended attribute value
SYNOPSIS
#include <sys/types.h> #include <attr/xattr.h> ssize_t getxattr (const char *path, const char *name, void *value, size_t size); ssize_t lgetxattr (const char *path, const char *name, void *value, size_t size); ssize_t fgetxattr (int filedes, const char *name, void *value, size_t size);
DESCRIPTION
Extended attributes are
name:
value pairs associated with inodes (files, directories, symlinks, etc). They are extensions to the normal attributes which are associated with all inodes in the system (i.e. the
stat(2) data). A complete overview of extended attributes concepts can be found in
attr(5).
getxattr retrieves the value of the extended attribute identified by name and associated with the given path in the filesystem. The length of the attribute value is returned.
lgetxattr is identical to getxattr, except in the case of a symbolic link, where the link itself is interrogated, not the file that it refers to.
fgetxattr is identical to getxattr, only the open file pointed to by filedes (as returned by open(2)) is interrogated in place of path.
An extended attribute name is a simple NULL-terminated string. The name includes a namespace prefix - there may be several, disjoint namespaces associated with an individual inode. The value of an extended attribute is a chunk of arbitrary textual or binary data of specified length.
An empty buffer of size zero can be passed into these calls to return the current size of the named extended attribute, which can be used to estimate the size of a buffer which is sufficiently large to hold the value associated with the extended attribute.
The interface is designed to allow guessing of initial buffer sizes, and to enlarge buffers when the return value indicates that the buffer provided was too small.
RETURN VALUE
On success, a positive number is returned indicating the size of the extended attribute value. On failure, -1 is returned and
errno is set appropriately.
If the named attribute does not exist, or the process has no access to this attribute, errno is set to ENOATTR.
If the size of the value buffer is too small to hold the result, errno is set to ERANGE.
If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP.
The errors documented for the stat(2) system call are also applicable here.
AUTHORS
Andreas Gruenbacher, <
a.gruenbacher@computer.org> and the SGI XFS development team, <
linux-xfs@oss.sgi.com>. Please send any bug reports or comments to these addresses.
SEE ALSO
getfattr(1),
setfattr(1),
open(2),
stat(2),
setxattr(2),
listxattr(2),
removexattr(2), and
attr(5).
NAME
get_thread_area - Get a Thread Local Storage (TLS) area
SYNOPSIS
#include <linux/unistd.h> #include <asm/ldt.h> int get_thread_area (struct user_desc *u_info);
DESCRIPTION
get_thread_area returns an entry in the current thread`s Thread Local Storage (TLS) array. The index of the entry corresponds to the value of
u_info->entry_number, passed in by the user. If the value is in bounds,
get_thread_info copies the corresponding TLS entry into the area pointed to by
u_info.
RETURN VALUE
get_thread_area returns 0 on success. Otherwise, it returns -1 and sets
errno appropriately.
ERRORS
- EINVAL
- u_info->entry_number is out of bounds.
- EFAULT
- u_info is an invalid pointer.
CONFORMING TO
get_thread_area is Linux specific and should not be used in programs that are intended to be portable.
AVAILABILITY
A version of
get_thread_area first appeared in Linux 2.5.32.
SEE ALSO
set_thread_area(2),
modify_ldt(2)
NAME
idle - make process 0 idle
SYNOPSIS
#include <unistd.h> int idle(void);
DESCRIPTION
idle is an internal system call used during bootstrap. It marks the process`s pages as swappable, lowers its priority, and enters the main scheduling loop.
idle never returns.
Only process 0 may call idle. Any user process, even a process with super-user permission, will receive EPERM.
RETURN VALUE
idle never returns for process 0, and always returns -1 for a user process.
ERRORS
- EPERM
- Always, for a user process.
CONFORMING TO
This function is Linux-specific, and should not be used in programs intended to be portable.
NOTES
Since 2.3.13 this system call does not exist anymore.
NAME
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O
DESCRIPTION
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
CONFORMING TO
outb and friends are hardware specific. The
port and
value arguments are in the opposite order from most DOS implementations.
SEE ALSO
ioperm(2),
iopl(2)
NAME
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O
DESCRIPTION
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
CONFORMING TO
outb and friends are hardware specific. The
port and
value arguments are in the opposite order from most DOS implementations.
SEE ALSO
ioperm(2),
iopl(2)
NAME
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - port I/O
DESCRIPTION
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space