blog.petitviolet.net

Use RocksDB from Rust

2021-03-25

RustRocksDB

RocksDB is an embeddable key-value storage that has been developed by Facebook. Many database products are using RocksDB as their low-level storage layer, such as MySQL, MongoDB, TiDB, etc. This post describes the basic usage of RocksDB from Rust.

OSS community has been developing rust-rocksdb which is a Rust binding for RocksDB.

Initialize RocksDB instance

To initialize a RocksDB database instance, rocksdb provides rocksdb::DB::open function, however, if it needs to open ColumnFamilies as well, this following procedures would be necessary.

initailize_rocksdb.rs
let mut options = rocksdb::Options::default();
options.set_error_if_exists(false);
options.create_if_missing(true);
options.create_missing_column_families(true);

let path: &str = "./tmp";

// list existing ColumnFamilies in the given path. returns Err when no DB exists.
let cfs = rocksdb::DB::list_cf(&options, path).unwrap_or(vec![]);
let my_column_family_exists = cfs
    .iter().find(|cf| cf == &"my_column_family").is_none();

// open a DB with specifying ColumnFamilies
let instance = rocksdb::DB::open_cf(&options, path, cfs).unwrap();

if my_column_family_exists {
    // create a new ColumnFamily
    let options = rocksdb::Options::default();
    instance.create_cf("my_column_family", &options).unwrap();
}

instance // rocksdb::DB instance is available

First of all, rocksdb::Options is to hold options that is used for opening a DB. RocksDB Tuning Guild is recommended to read to understand what each option is for before start performance optimization.

As I commented in the snippet, when opening the existing DB with ColumnFamilies, it needs to know which ColumnFamilies exist within the DB in advance. For that purpose, rocksdb::DB::list_cf is the method to list all of ColumnFamilies of an existing DB and it returns Err when there is no DB. Then, use rocksdb::DB::create_cf to create a new ColumnFamily in the DB.

Get/Set on ColumnFamily

Interacting with the opened rocksdb::DB instance is straitforward rather than opening it. Basically, we can use get and set, or get_cf and set_cf when uses ColumnFamily.

let cf = instance.cf_handle(cf_name).unwrap();

let res1 = instance.get_cf(cf, "key-1");
assert!(res1.unwrap().is_none());

instance.put_cf(cf, "key-1", "value-1").unwrap();

let res2 = instance.get_cf(cf, "key-1");
assert!(res2.unwrap().unwrap() == "value-1".as_bytes());

As RocksDB stores data in the form of byte array, one who uses RocksDB has to have knowledge about converting Vec<u8> to whatever wanted. If storing JSON string, I’d recommend using serde to serialize and deserialize JSON strings. By using serde, functions that executes get/put from RocksDB with de-/serializing JSON string can be written like the following code:

use serde::{de::DeserializeOwned, Serialize};

fn get_serialized<T: DeserializeOwned>(
    instance: &rocksdb::DB,
    cf: &ColumnFamily,
    key: &str,
) -> Result<Option<T>, String> {
    match instance.get_cf(cf, key) {
        Ok(opt) => match opt {
            Some(found) => match String::from_utf8(found) {
                Ok(s) => match serde_json::from_str::<T>(&s) {
                    Ok(t) => Ok(Some(t)),
                    Err(err) => Err(format!("Failed to deserialize: {:?}", err)),
                },
                Err(err) => Err(format!("Failed to convert to String: {:?}", err)),
            },
            None => Ok(None),
        },
        Err(err) => Err(format!("Failed to get from ColumnFamily: {:?}", err)),
    }
}

fn put_serialized<T: Serialize + std::fmt::Debug>(
    instance: &mut rocksdb::DB,
    cf: &ColumnFamily,
    key: &str,
    value: &T,
) -> Result<(), String> {
    match serde_json::to_string(&value) {
        Ok(serialized) => instance
            .put_cf(cf, &key, serialized.into_bytes())
            .map_err(|err| format!("Failed to put to ColumnFamily:{:?}", err)),
        Err(err) => Err(format!(
            "Failed to serialize to String. T: {:?}, err: {:?}",
            value, err
        )),
    }
}

Since serde is not the main subject, please see the official documents if need be. I wish Rust provides Result::flat_map, by the way. Define a struct with tagging Serialize and Deserialize for ease, then we can get/put from RocksDB with de-/serializing like:

use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug, PartialEq, Eq)]
struct User {
    pub name: String,
    pub age: u32,
}

{
    let res3 = get_serialized::<User>(&instance, cf, "key-2");
    assert!(res3.unwrap().is_none());

    let user = User { name: "Alice".to_string(), age: 20 };
    let res4 = put_serialized(&instance, cf, "key-2", &user);
    assert!(res4.is_ok());

    let res5 = get_serialized::<User>(&instance, cf, "key-2");
    assert!(res5.unwrap().unwrap() == user);
}

Summary

This post describes the very basic usage of RocksDB from Rust along with serde. Of course, RocksDB offers lots of functionality beyond what this post touched.