Design choices

EdgeFS designed with blockchain-like principles in mind, e.g. an architecture with immutable self-validating location-independent metadata referencing self-validating location-independent data chunks.

[[images/decentralized-metadata-consistency.png]]

In the diagram above you can see “Quarantine Index” list that is designed to hold deleted versions of objects (NFS/SMB Files, iSCSI LUNs, S3 objects, S3X NoSQL database, etc).

While “Active Versions” used by S3 Versioning or File/LUN SnapView interfaces (snapshots), deleted versions still kept for certain amount of configurable time interval.

Globally unique and ordered EdgeFS objects modifications eventually gets merge-sorted and reconciled at all connected segments, thus ensuring decentralized data consistency and enhanced security with application exposed Trusted API.

An EdgeFS object can be presented as N-ary tree of hashes, where top cryptographic hash represents a globally unique, immutable version. Any modification to an object will create a new top hash, i.e. new version:

[[images/tree-of-hashes.png]]

The end result of placing a data chunk into EdgeFS object is that it is self-identified and self-validated with a strong cryptographic hash (SHA-3 variant) and that the cryptographic hash can be used to recall a chunk for retrieval from any connected EdgeFS segment:

[[images/chunk-recall.png]]

Transaction Log Controls

Any write I/O in EdgeFS will be recorded in so-called Transaction Log. Transaction log keeps ordered sequences of all local segment modifications, and as such represents a recoverable log to be used by EdgeFS Trusted API to ensure global modification consistency.

If application expects active, continuous write I/O, it is recommended to adjust space reclaim interval by setting trlogKeepDays and trlogProcessingInterval controls as per Rook EdgeFS Cluster CRD documentation.

trlogKeepDays: Controls for how many days cluster need to keep transaction log interval batches with object versions references. If you planning to have cluster disconnected from ISGW downlinks for longer period time, consider to increase this value. Default is 3. This is cluster wide setting and cannot be easily changed after cluster is created. The value can be fractional, e.g. 1.1 or 0.1, etc. It is not recommended to set value lower than 2.4hrs (i.e. 0.1).

trlogProcessingInterval: Controls for how many seconds cluster would aggregate object modifications prior to processing it by accounting, bucket updates, ISGW Links and notifications components. Has to be defined in seconds and must be composite of 60, i.e. 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30. Default is 10. Recommended range is 2 - 20. This is cluster wide setting and cannot be easily changed after cluster is created. Any new node added has to reflect exactly the same setting.