[ripple/rippled] [questions] Understanding history sharding (#2625)

Great to hear and thanks for your answers! :-)

In that case I wonder why it was decided to go with the `earliest ledger sequence` design instead of just skipping shards 1+2 by default? There is a nonzero chance that earlier ledgers could be recovered and then all shard databases in the whole network might potentially need to be migrated or change because the `earliest ledger sequence` changes… Compared to just not having a few hundred of the earliest ledgers in a shard and making it much easier to calculate which shard a certain ledger height is going to be in in the process.

> Since we only import into the shards database, the order doesn’t not matter.

Would a node that wants full history and has all/most shards locally on disk then fill its node_db from the shard_db as fast as it can read/write the data or would it still query the data from the network? I really hope the former is the case…

> I believe it is possible to create ‘deterministically generated’ shards as you described.

Write singlethreaded to a NuDB in deterministic order (e.g. sort all nodes by key alphabetically) and keep the salt of the database constant. I’m not so sure about the spill records, since the data gets written asynchronously, so if the disk can’t keep up maybe it’ll introduce problems? Should be testable though.

A different option would be to have a dedicated import/export format (could be as simple as CSV, these are only key value pairs of hex-strings) for shards instead of sharing database files. This would also have the benefit of helping alternative use cases — NuDB is not exactly widely used and RocksDB database files are also probably not that easy to be used in an shard import context.
The downside would be that there’s the same data stored up to three times (node_db, shard_db, export/import file).

Добавить комментарий