Commit graph

544 commits

Author SHA1 Message Date
Casey Bodley
19867b892f driver: made UNLOCK upcalls uninterruptible
connectathon locking tests trigger an interrupted UNLOCK upcall, which leads to the bugcheck in CloseSrvOpen() when freeing the security context

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-02-15 15:04:57 -05:00
Casey Bodley
c29710b47d readme: fixed typo date->data
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-02-03 15:25:23 -05:00
Casey Bodley
3f54cfd615 readme: added mount option and known issue for security
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-02-03 15:09:05 -05:00
Olga Kornievskaia
47b0ccda9c turning callback off for krb5p
sspi requires strict ordering of messages. we can't have more than 1 outstanding rpc thus, hold the lock over send and receive and turn off callbacks.
2011-02-03 13:13:10 -05:00
Olga Kornievskaia
67ae1eddaf making all but CLOSE interruptable
leaving CLOSE upcall non-interruptable as it leads to issues with security context.

making all other upcalls interruptable so that when something goes wrong we can ctrl-c out of a user application. otherwise, the machine requires a reboot (ie caz the wait we made the wait non-interrutable so nothing can kill it).
2011-02-03 11:46:51 -05:00
Olga Kornievskaia
4411d3d807 first stab at integrity and privacy
note: privacy will not work when we have more than 1 outstanding rpcs which generates out of order replies which sspi does not allow when privacy is enabled.

adding auth_wrap() and auth_unwrap() to per-message gss token protection required adding these methods to auth_sys and auth_non.

linux server doesnt support v2 kerberos tokens that have rotated data. sspi will always produce such tokens for aes. thus thus code was only tested for v1 kerberos tokens (ie des).
2011-01-27 13:52:08 -05:00
Olga Kornievskaia
b6120b41fd setting error status in rpc_reconnect if send_null fails
we were checking for error result of send_null but not setting
status, then going to "out_unlock" and since status is NO_ERROR
trying to send bind_conn_to_session
2011-01-17 11:55:36 -05:00
Olga Kornievskaia
3a60f23c91 cosmetic changing printouts in check_execute_access
adding the filename to the printouts and changing eprintf back to dprintf as it it happens too often.
2011-01-13 11:41:49 -05:00
Olga Kornievskaia
06f40459df making upcall wait uninterruptable
switching user's upcall wait from being UserMode and TRUE (interruptable) to KernelMode and FALSE. msdn doc does recommend for simplicity of the drivers to do that.

it seems to no longer generate interrupts on close irps but we are still able to ctrl-c running tests.
2011-01-12 12:44:42 -05:00
Olga Kornievskaia
4c07c25dbb saving security context in fobx
instead of getting security context on every upcall, acquire security context on open and save it in fobx. cache manager does read and write calls in a system csecurity context not in users, thus we need to use the context of the open instead.
2011-01-12 12:44:42 -05:00
Casey Bodley
089a52906b rpc: don't malloc server_name for getnameinfo()
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-12 12:40:40 -05:00
Casey Bodley
9c960aa409 rpc: rebind back channel on reconnect
after reestablishing an rpc connection, send BIND_CONN_TO_SESSION if we need a back channel

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-12 12:40:40 -05:00
Casey Bodley
9c59af4da5 fixes for bind_conn_to_session()
fixes for xdr encoding of bind_conn_to_session, after testing against linux server

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-12 12:40:39 -05:00
Casey Bodley
eb60a1ee6d check_execute_access() prints errors with eprintf()
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-10 15:16:18 -05:00
Casey Bodley
238a8a7015 callback: handles xdr decode errors
instead of ignoring errors from proc_cb_compound_args(), return NFS4ERR_BADXDR.  note that we still need to allocate the cb_compound_res structure to return this error

added null checks to the end of handle_cb_compound(); if the cb_compound_res allocation fails, we'd crash trying to access res->status and res->resarray_count

also fixed some indenting

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-10 15:16:18 -05:00
Casey Bodley
034b2b4ea2 nfs41_session_renew() error handling
on failure to renew a session, we don't need to free the session (this leads to crashes).  if we simply return the error to compound_encode_send_decode(), we'll fail any subsequent operations on the session, but still be able to unmount and remain stable

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-10 15:16:18 -05:00
Casey Bodley
757b637607 create_session uses compound_encode_send_decode()
send CREATE_SESSION with compound_encode_send_decode() instead of nfs41_send_compound() for its NFS4ERR_DELAY and NFS4ERR_STALE_CLIENTID handling

added 'try_recovery' argument to nfs41_create_session(), which is passed on to compound_encode_send_decode().  nfs41_session_renew() uses try_recovery=FALSE, because it handles the NFS4ERR_STALE_CLIENTID error on its own.  nfs41_session_create() uses try_recovery=TRUE to make use of the NFS4ERR_STALE_CLIENTID error handling.  modified the NFS4ERR_STALE_CLIENTID block to call nfs41_client_renew() and retry the operation (i.e. CREATE_SESSION), instead of falling through to session recovery

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-10 15:16:17 -05:00
Casey Bodley
f2095915aa cosmetic changes to lookup.c
removed unused variable 'buffer_size' in lookup_rpc()
renamed map_lookup_error()'s parameter 'is_last_component' to 'last_component' to avoid conflicting with function is_last_component()

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-10 15:16:17 -05:00
Casey Bodley
7ccdf2ba47 mount: memory leak on path overflow
changed goto out -> out_err, so the root is freed on buffer overflow
updated error messages for nfs41_root_create() and nfs41_root_mount_addrs()
if the root lookup fails, return ERROR_BAD_NETPATH instead of ERROR_FILE_NOT_FOUND

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-10 15:16:17 -05:00
Casey Bodley
229ec94c5c propagate errors from nfs41_name_cache_create()
server_create() was ignoring the return value of nfs41_name_cache_create(), but it needs to be propagated all the way back through nfs41_server_find_or_create() to nfs41_client_create() and nfs41_client_renew()

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-10 15:16:17 -05:00
Casey Bodley
81051ddce1 recovery: revoke all layouts and device info on client recovery
12.7.4. Recovery from Metadata Server Restart
"The client MUST stop using layouts and delete the device ID to device address mappings it previously received from the metadata server."

during client state recovery, call pnfs_file_layout_recall() to revoke all layouts and devices held by the client

LAYOUTGET, LAYOUTRETURN, and GETDEVICEINFO are all sent under their respective locks, and pnfs_file_layout_recall() requires a lock on each layout and device it operates on, so this would cause a deadlock if one of those operations triggered the recovery.  to avoid this, LAYOUTGET, LAYOUTRETURN, and GETDEVICEINFO are all sent with try_recovery=FALSE.  this behavior is preferable for recovery, because errors in the pnfs path cause us to fall back to the metadata server

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-04 14:26:28 -05:00
Casey Bodley
9cd9744567 pnfs: revoke device info on bulk layout recall
20.3. CB_LAYOUTRECALL
"LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL specify that all the storage device ID to storage device address mappings in the affected file system(s) are also recalled."

pnfs_file_layout_recall() now takes a nfs41_client instead of just the pnfs_file_layout_list, because both the layout list and device list are accessible from nfs41_client.  for bulk recalls, calls new function pnfs_file_device_list_invalidate().  each device with layout_count=0 is removed and freed, and devices in use are flagged as REVOKED and freed when layout_count->0

layout_recall_return() now takes a pnfs_file_layout instead of pnfs_layout for access to pnfs_file_layout.device.  pnfs_layout_io_start() and pnfs_layout_io_finish() do the same, because pnfs_layout_io_finish() calls layout_recall_return().  layout_recall_return() calls pnfs_file_device_put() to release its reference on the device

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-04 14:26:26 -05:00
Casey Bodley
e3119c281e pnfs: added status flags and ref count to struct pnfs_device
pnfs_device.status remembers whether a given device has been GRANTED/REVOKED

pnfs_device.layout_count tracks the number of layouts using the device, incremented by pnfs_file_device_get() and decremented by pnfs_file_device_put().  when pnfs_file_device_put() takes layout_count to 0, remove and free the device only if it's flagged as REVOKED

because pnfs_file_device_get() modifies pnfs_device.layout_count, we can no longer use a shared lock; changed pnfs_file_device.lock from SRWLOCK to CRITICAL_SECTION, and moved to pnfs_device.lock to document the fact that it's used for pnfs_device.status and pnfs_device.layout_count

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-04 14:26:23 -05:00
Casey Bodley
2286f7a1e3 build.vc10: added secur32.lib to all libtirpc configurations
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2011-01-04 13:44:02 -05:00
Casey Bodley
4ea730c881 fix for daemon version checking crash on close
upcall_cleanup() is called after every upcall regardless of errors.  if we get a CLOSE upcall after a daemon restart, we still call cleanup_close() and crash attempting to access the invalid open state pointer.  avoid calling upcall-specific cancel routines for these version mismatch errors

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-17 14:26:17 -05:00
Olga Kornievskaia
6331621924 turning unmap on
previously we noticed that calling MmUnmapLockedPages() causes kernel crashes (thus the code is if 0-ed). however, when we don't unmap memory, it keeps accumulating in the nfsd's process memory (and is never "freed").

in this patch
(a) calling unmap
(b) checking if MmMapLockedPagesSpecifyCache() returns us a NULL pointer which is a type of failure that doesn't throw an exception but still is a failure.
(c) cosmetic change to printf.

NOTE: this cause still leads to failures for general tests. Running them in a loop (previously produced kernel crashes) now just leads to test failing. the cause is unknown!
2010-12-17 13:31:23 -05:00
Olga Kornievskaia
89cd10a1f9 not allowing unmount if there are opened files
even though we might have the same server mounted under 2 drive letters, make it so that you can't umount if any files are opened in that netroot.

not checking for that allows us to umount the driver while it is still in use. then there is no way to "unmount" from nfsd's perspective and it'll have that session and connection going forever.

passing "false" to RxFinalizeConnection makes it so that when files are opened it won't allow the unmount, but when the files are closed, it will successfully unmount but RDBSS never call FinalizeNetRoot() function and thus we never really unmount.

i noticed that FinalizeVNetRoot() is never called. Returns from FinalizeNetRoot() are ignored so we can't fail there if we have opened files.
2010-12-15 16:15:29 -05:00
Casey Bodley
853dcc385e recovery: lock_owner to open_owner
if we're recovering a lock stateid for a LOCK operation, and the file has no outstanding locks, we won't be able to recover a lock stateid.  resend the LOCK with an open stateid instead

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-13 13:35:56 -05:00
Casey Bodley
1d9981e59e fix for retry on stateid recovery
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-10 14:51:46 -05:00
Casey Bodley
13f661add4 fix warnings for parse_cmdlineargs()
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-10 14:51:44 -05:00
Casey Bodley
c1b90c225a readme: instructions for visual studio and ldap configuration
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-10 14:50:46 -05:00
Casey Bodley
ee6bd9ce0e build.vc9: removed support for visual studio 2008
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-10 13:49:39 -05:00
Casey Bodley
f63528064c build.vc10: updated warning settings for visual studio projects
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-10 13:49:37 -05:00
unknown
21ee8ccaad fix for nfs41_rpc_clnt_create 2010-12-10 12:54:17 -05:00
Olga Kornievskaia
1196182a8e minor changes
cosmetic: renaming do_recovery to recover_stateid

removing client_state_remove() from setattr because we'll do it on close
2010-12-10 11:39:28 -05:00
unknown
168821c7fb removing daemon and libtirpc from ddk build 2010-12-10 11:26:05 -05:00
unknown
2ae743efe7 tracking open state in setattr for reboot recovery 2010-12-10 11:25:01 -05:00
unknown
a645f7030c fixing memory leaks in rpc client 2010-12-09 18:36:05 -05:00
Olga Kornievskaia
0d0b00a93b [cosmetic] moved reboot recovery code into separate function 2010-12-09 14:13:13 -05:00
Olga Kornievskaia
b0f1cff30e small fix for standalone nfsd version 2010-12-09 13:17:33 -05:00
Olga Kornievskaia
35d76cf593 fixing tirpc handle of auth_refresh
(a) auth_refresh recursively calls clnt_call() which will call
clnt_vc_call() and will try to acquire a lock on the socket which we have
already acquires. thus a change to see if the thread trying to acquire the
lock is the same holding the lock.

(b) authsspi_fresh() needed to check if we were called to refresh the
context due to the error (ie 2nd argument non-null) and if so, destroy
the old context and then reacquire a new sspi context.

it seems that InitializeSecurityContext() also requires new creds as well
so after initially calling AcquireCreds() we don't need to worry about
refreshing credentials.
2010-12-08 18:24:53 -05:00
Olga Kornievskaia
c596742659 fixing rbtree patch
name cache parent entry was never initialized. thus causing entries never to be removed from the name cache.
2010-12-07 16:50:45 -05:00
Casey Bodley
0a309c4350 recovery: use normal OPEN/LOCK on ERR_NO_GRACE
if we see NFS4ERR_NO_GRACE from recovery operations, it means we lost our state due to a lease expiration rather than a server reboot.  in this case, it's possible that conflicting locks were granted to other clients, so we have to try normal OPEN/LOCK operations to recover our state.  because they're sent during recovery, nfs41_open() and nfs41_lock() take a new 'bool_t try_recovery' argument so we can avoid recursion

if these operations fail due to conflicting locks, we have no choice but to return errors to the application.  using a stateid that was revoked due to lease expiration results in NFS4ERR_EXPIRED, and we map this error to ERROR_FILE_INVALID: The volume for a file has been externally altered so that the opened file is no longer valid.

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-06 14:29:32 -05:00
Casey Bodley
222c1bf020 recovery: remember byte-range locks and reclaim during recovery
nfs41_open_state maintains a list of outstanding byte-range locks by calling open_lock_add() and open_lock_remove() in lock.c

during client state recovery, after reclaiming each OPEN stateid, send LOCK requests with reclaim=TRUE for each lock it owns, and update the open's lock stateid with the result

added 'bool_t reclaim' argument to nfs41_lock(); when set, compound_encode_send_decode() is called with try_recovery=FALSE to avoid recursive recovery

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-06 14:29:25 -05:00
Casey Bodley
1906610544 cosmetic: moved client state recovery to separate function
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-06 14:29:10 -05:00
Casey Bodley
7c8f58b992 recovery: avoid recursive state recovery
avoid the recursive case where state recovery operations (OPEN for reclaim and RECLAIM_COMPLETE) return BADSESSION, which kicks off another round of recovery

added a 'bool_t try_recovery' argument to compound_encode_send_decode() in place of its unused 'bufsize_in' and 'bufsize_out'.  when try_recovery=FALSE, return BADSESSION/STALE_CLIENTID errors instead of attempting recovery.  nfs41_open_reclaim(), nfs41_reclaim_complete(), and nfs41_destroy_session() now pass try_recovery=FALSE

during state recovery, we can now check the return values of nfs41_open_reclaim() and nfs41_reclaim_complete() for BADSESSION, and use a goto to restart session recovery

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-06 14:29:01 -05:00
Olga Kornievskaia
80cb5b5f57 recovery updated handling of BADSESSION
moved recovery-related fields into struct nfs41_client.recovery.  now uses a com
bination of CRITICAL_SECTION and CONDITION_VARIABLE for use with SleepConditionV
ariableCS()

renamed check_renew_in_progress() to recovery_start_or_wait(), and fixed the loc
king so that we atomically check/set in_recovery

when recovery is finished (including error conditions), call recovery_finish() t
o reset the recovery status and wake any waiting threads

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-06 14:28:13 -05:00
Casey Bodley
8616b03597 recovery: recover from STALE_STATEID errors
consider an operation that takes a stateid, and results in a BADSESSION error due to server reboot.  we'll recover the client and session, and send OPENs to reclaim all of the client's state.  but after recovery, we'll resend the original operation with the original stateid, and this will result in a STALE_STATEID error

we handle this by making use of the information in stateid_arg.  if we determine that stateid_arg.stateid is different from the nfs41_open_state's stateid, we copy the new stateid into stateid_arg.stateid and retry

note that if another thread is in recovery, it hasn't finished reclaiming its open state yet; so we wait on recovery to finish before comparing the stateids

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-06 14:21:34 -05:00
Casey Bodley
3ecd38e414 recovery: operations take stateid_arg instead of stateid4
operations that require a stateid now take stateid_arg for recovery information.  these operations include close, setattr, lock/unlock, layoutget, and read/write (including pnfs)

nfs41_open_stateid_arg() locks nfs41_open_state and copies its stateid into a stateid_arg
nfs41_lock_stateid_arg() locks nfs41_open_state.last_lock and copies its stateid into a stateid_arg; if there is no lock state, it falls back to nfs41_open_stateid_arg()

pnfs_read/write() now take nfs41_open_state so they can generate stateid_args

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-06 14:21:28 -05:00
Casey Bodley
d59d17c3b4 recovery: reclaim opens on client renewal
after the client and session have been recovered, loop through the client's list of open state, calling nfs41_open_reclaim() and updating the stateid on success

nfs41_open_state saves the share_access and share_deny fields from the initial open, for use with nfs41_open_reclaim()

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
2010-12-06 14:21:22 -05:00