Skip to content

Introducing AdminClient APIs

AdminClient is intended for administrative functionality that is useful and often needed, but should not be used at the application level. The key functionality of the APIs is to provide extraction/loading of entries (or of keys only) in batches for offline data management and manipulation and to provide a way to get/set current state and metadata on individual nodes for cluster management. The AdminClient was initially designed to facilitate the rebalancing (aka dynamic cluster membership) feature, which is still in development.Some of the uses of AdminClient include

  • Extraction of entries/keys for backups
  • Daily batch ETL (extraction, transformation, loading) to other systems for analysis (e.g. hadoop, search, etc…)
  • Bulk loading of entries
  • Migrating partitions
  • Getting/Updating the cluster state/metadata

AdminClient APIs

These are the supported AdminClient APIs and their brief descriptions; please refer to the javadocs for a full reference. The AdminClient can be constructed given a voldemort server bootstrap URL or given a cluster object with information about nodes in the cluster.

  • fetchEntries() : Provides a way to fetch all entries (key/value pairs) from a remote node belonging to any of the specified partitions.
    The call returns an iterator instantaneously which internally keep fetching new entries as new values are requested (aka streaming mode).
  • fetchKeys() : Same as fetchEntries() but only returns the keys. The server side can be more intelligent and do optimization at storage level to for better performance than fetchEntries()
  • updateEntries(): Provides a way to bulk load entries at a remote node, takes an iterator as parameter and start updating remote node with iterator values as they are streamed.
  • migratePartitions() : Provides a way to migrate partitions from a remote node to another node by a third party. This API is used by the rebalancing system to start rebalancing operations at different nodes. The operation is started as an asynchronous operation at the updating remote node. A fetchEntries() request is started on the remote node, updating the values on that node. The status the of operation can be checked using the getAsyncRequestStatus() API.
  • (Get/Set)RemoteMetadata(): These two APIs provide a way to get/set node metadata (cluster.xml, stores.xml, states) at individual remote nodes for cluster monitoring, status checks and upgrades.
  • getAsyncRequestStatus(): Gets the status for async operation at the (remote) node. This allows the progress of an async operation (e.g. migratePartitions) to be monitored.
  • waitForCompletion(): use exponential backoff to await the completion of an asynchronous request, simulating “blocking” behaviour.

Streaming support

Streaming is the efficient bulk transfer of entire partitions between machines. The intended uses are extraction of all keys (or all key/value pairs) for off-line manipulation (e.g. MapReduce processing) or backups, migration of partitions between machines and deletion of entries. Experimental support also exists for server-side filtering using a client-side supplied Java bytecode implementing the VoldemortFilter interface. The major application of the migration functionality is dynamic deletion and addition of nodes (“rebalancing”) which is presently in development.

For efficiency during streaming messages (containing key/value pairs) are sent one-after-anothere (over a buffered input stream), without waiting for acknowledgment (or forcing an early buffer flush); an “end-of-stream” message is sent signifying completion, after which is the buffer is flushed.

When the values are streamed from a store backed by the BerkeleyDB JE storage engine, the values are read from a cursor. At the present time, unfortunately, the cursors are only opened in key order. This presents an issue: when the entries aren’t present in BDB cache or the operating system page cache (this is guaranteed to occur when the dataset is larger than the machine’s physical memory), they have to be fetched from disk; if the cursor is opened in key order, unless the key order matches the disk order (i.e. ordered keys written sequentially) that means random seeks are made to access the values. Fortunately, BerkeleyDB, the operating system and the disk controller provide a way to schedule the seeks. In our experimental setup, on a machine with eight gigabytes of memory, a single 7200 RPM SATA disk and bdb.cache.size set to 3G were able to achieve the rate of ~1,200 entries a second: multiple times higher than would have been possible if random seeks were to be made for each entry. The distributed nature of Voldemort also means fetches of partitions can be performed from multiple nodes in parallel (e.g. fetch partition 1 from node a, partition 2 from b, etc…) distributing the random seeks across multiple nodes.

In addition, there is support for throttling the streaming operations. There are two settings stream.read.byte.per.sec and stream.write.byte.per.sec, responsible for throttling reads and writes respectively. The default settings for both are 10MB/second. This allows you to limit the amount of read and writes operations that are being done by the admin client API, as to prevent exhaustion of the seek capacity and starvation of the system of resources needed to handle routine requests.