The power of interning: making a time series database 2000x smaller in Rust

chaospatterns@lemmy.world · 3 days ago

The power of interning: making a time series database 2000x smaller in Rust

wewbull@feddit.uk · 24 hours ago

It all depends on the data entropy. Formats like JSON compress very well anyway. If the data is also very repetitive too then 2000x is very possible.

FizzyOrange@programming.dev · 23 hours ago

In my experience taking an inefficient format and copping out by saying “we can just compress it” is always rubbish. Compression tends to be slow, rules out sparse reads, is awkward to deal with remotely, and you generally always end up with the inefficient decompressed data in the end anyway, whether in temporarily decompressed files or in memory.

I worked in a company where they went against my recommendation not to use JSON for a memory profiler output. We ended up with 10 GB JSON files, even compressed they were super annoying.

We switched to SQLite in the end which was far superior.

wewbull@feddit.uk · 23 hours ago

Of course compressing isn’t a good solution for this stuff. The point of the comment was to say how unremarkable the original claim was.

FizzyOrange@programming.dev · 20 hours ago

Yeah I agree.

The power of interning: making a time series database 2000x smaller in Rust

The power of interning: making a time series database 2000x smaller in Rust

The power of interning: making a time series database 2000x smaller in Rust | Blog | Guillaume Endignoux