petitviolet blog

    Use RocksDB from Rust

    2021-03-25

    RustRocksDB

    RocksDB is an embeddable key-value storage that has been developed by Facebook. Many database products are using RocksDB as their low-level storage layer, such as MySQL, MongoDB, TiDB, etc. This post describes the basic usage of RocksDB from Rust.

    OSS community has been developing rust-rocksdb which is a Rust binding for RocksDB.

    Initialize RocksDB instance

    To initialize a RocksDB database instance, rocksdb provides rocksdb::DB::open function, however, if it needs to open ColumnFamilies as well, this following procedures would be necessary.

    initailize_rocksdb.rs
    let mut options = rocksdb::Options::default();
    options.set_error_if_exists(false);
    options.create_if_missing(true);
    options.create_missing_column_families(true);
    
    let path: &str = "./tmp";
    
    // list existing ColumnFamilies in the given path. returns Err when no DB exists.
    let cfs = rocksdb::DB::list_cf(&options, path).unwrap_or(vec![]);
    let my_column_family_exists = cfs
        .iter().find(|cf| cf == &"my_column_family").is_none();
    
    // open a DB with specifying ColumnFamilies
    let instance = rocksdb::DB::open_cf(&options, path, cfs).unwrap();
    
    if my_column_family_exists {
        // create a new ColumnFamily
        let options = rocksdb::Options::default();
        instance.create_cf("my_column_family", &options).unwrap();
    }
    
    instance // rocksdb::DB instance is available
    

    First of all, rocksdb::Options is to hold options that is used for opening a DB. RocksDB Tuning Guild is recommended to read to understand what each option is for before start performance optimization.

    As I commented in the snippet, when opening the existing DB with ColumnFamilies, it needs to know which ColumnFamilies exist within the DB in advance. For that purpose, rocksdb::DB::list_cf is the method to list all of ColumnFamilies of an existing DB and it returns Err when there is no DB. Then, use rocksdb::DB::create_cf to create a new ColumnFamily in the DB.

    Get/Set on ColumnFamily

    Interacting with the opened rocksdb::DB instance is straitforward rather than opening it. Basically, we can use get and set, or get_cf and set_cf when uses ColumnFamily.

    let cf = instance.cf_handle(cf_name).unwrap();
    
    let res1 = instance.get_cf(cf, "key-1");
    assert!(res1.unwrap().is_none());
    
    instance.put_cf(cf, "key-1", "value-1").unwrap();
    
    let res2 = instance.get_cf(cf, "key-1");
    assert!(res2.unwrap().unwrap() == "value-1".as_bytes());
    

    As RocksDB stores data in the form of byte array, one who uses RocksDB has to have knowledge about converting Vec<u8> to whatever wanted. If storing JSON string, I'd recommend using serde to serialize and deserialize JSON strings. By using serde, functions that executes get/put from RocksDB with de-/serializing JSON string can be written like the following code:

    use serde::{de::DeserializeOwned, Serialize};
    
    fn get_serialized<T: DeserializeOwned>(
        instance: &rocksdb::DB,
        cf: &ColumnFamily,
        key: &str,
    ) -> Result<Option<T>, String> {
        match instance.get_cf(cf, key) {
            Ok(opt) => match opt {
                Some(found) => match String::from_utf8(found) {
                    Ok(s) => match serde_json::from_str::<T>(&s) {
                        Ok(t) => Ok(Some(t)),
                        Err(err) => Err(format!("Failed to deserialize: {:?}", err)),
                    },
                    Err(err) => Err(format!("Failed to convert to String: {:?}", err)),
                },
                None => Ok(None),
            },
            Err(err) => Err(format!("Failed to get from ColumnFamily: {:?}", err)),
        }
    }
    
    fn put_serialized<T: Serialize + std::fmt::Debug>(
        instance: &mut rocksdb::DB,
        cf: &ColumnFamily,
        key: &str,
        value: &T,
    ) -> Result<(), String> {
        match serde_json::to_string(&value) {
            Ok(serialized) => instance
                .put_cf(cf, &key, serialized.into_bytes())
                .map_err(|err| format!("Failed to put to ColumnFamily:{:?}", err)),
            Err(err) => Err(format!(
                "Failed to serialize to String. T: {:?}, err: {:?}",
                value, err
            )),
        }
    }
    

    Since serde is not the main subject, please see the official documents if need be. I wish Rust provides Result::flat_map, by the way. Define a struct with tagging Serialize and Deserialize for ease, then we can get/put from RocksDB with de-/serializing like:

    use serde::{Deserialize, Serialize};
    
    #[derive(Serialize, Deserialize, Debug, PartialEq, Eq)]
    struct User {
        pub name: String,
        pub age: u32,
    }
    
    {
        let res3 = get_serialized::<User>(&instance, cf, "key-2");
        assert!(res3.unwrap().is_none());
    
        let user = User { name: "Alice".to_string(), age: 20 };
        let res4 = put_serialized(&instance, cf, "key-2", &user);
        assert!(res4.is_ok());
    
        let res5 = get_serialized::<User>(&instance, cf, "key-2");
        assert!(res5.unwrap().unwrap() == user);
    }
    

    Summary

    This post describes the very basic usage of RocksDB from Rust along with serde. Of course, RocksDB offers lots of functionality beyond what this post touched.