Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow keys to be unpersisted #1705

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions lightning-background-processor/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -717,6 +717,14 @@ mod tests {

self.filesystem_persister.persist(key, object)
}

fn unpersist(&self, key: &str) -> std::io::Result<bool> {
if key == "manager" || key == "network_graph" || key == "scorer" {
return Err(std::io::Error::new(std::io::ErrorKind::Other, "Refusing to unpersist this key"));
}

self.filesystem_persister.unpersist(key)
}
}

fn get_full_filepath(filepath: String, filename: String) -> String {
Expand Down
33 changes: 19 additions & 14 deletions lightning-persister/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ use std::io::Cursor;
use std::ops::Deref;
use std::path::{Path, PathBuf};

/// FilesystemPersister persists channel data on disk, where each channel's
/// data is stored in a file named after its funding outpoint.
/// `FilesystemPersister` persists a given `Writeable` object on disk. In the case of channel data,
/// it is stored in a file named after its funding outpoint.
///
/// Warning: this module does the best it can with calls to persist data, but it
/// can only guarantee that the data is passed to the drive. It is up to the
Expand All @@ -39,23 +39,22 @@ use std::path::{Path, PathBuf};
/// persistent.
/// Corollary: especially when dealing with larger amounts of money, it is best
/// practice to have multiple channel data backups and not rely only on one
/// FilesystemPersister.
/// `FilesystemPersister`.
pub struct FilesystemPersister {
path_to_channel_data: String,
path_to_data: String,
}

impl FilesystemPersister {
/// Initialize a new FilesystemPersister and set the path to the individual channels'
/// files.
pub fn new(path_to_channel_data: String) -> Self {
/// Initialize a new FilesystemPersister and set the parent path of the individual files.
pub fn new(path_to_data: String) -> Self {
return Self {
path_to_channel_data,
path_to_data,
}
}

/// Get the directory which was provided when this persister was initialized.
pub fn get_data_dir(&self) -> String {
self.path_to_channel_data.clone()
self.path_to_data.clone()
}

/// Read `ChannelMonitor`s from disk.
Expand All @@ -64,7 +63,7 @@ impl FilesystemPersister {
) -> Result<Vec<(BlockHash, ChannelMonitor<Signer>)>, std::io::Error>
where K::Target: KeysInterface<Signer=Signer> + Sized,
{
let mut path = PathBuf::from(&self.path_to_channel_data);
let mut path = PathBuf::from(&self.path_to_data);
path.push("monitors");
if !Path::new(&path).exists() {
return Ok(Vec::new());
Expand Down Expand Up @@ -124,10 +123,16 @@ impl FilesystemPersister {

impl KVStorePersister for FilesystemPersister {
fn persist<W: Writeable>(&self, key: &str, object: &W) -> std::io::Result<()> {
let mut dest_file = PathBuf::from(self.path_to_channel_data.clone());
let mut dest_file = PathBuf::from(self.path_to_data.clone());
dest_file.push(key);
util::write_to_file(dest_file, object)
}

fn unpersist(&self, key: &str) -> std::io::Result<bool> {
let mut dest_file = PathBuf::from(self.path_to_data.clone());
dest_file.push(key);
util::delete_file(dest_file)
}
}

#[cfg(test)]
Expand Down Expand Up @@ -158,7 +163,7 @@ mod tests {
fn drop(&mut self) {
// We test for invalid directory names, so it's OK if directory removal
// fails.
match fs::remove_dir_all(&self.path_to_channel_data) {
match fs::remove_dir_all(&self.path_to_data) {
Err(e) => println!("Failed to remove test persister directory: {}", e),
_ => {}
}
Expand Down Expand Up @@ -242,7 +247,7 @@ mod tests {
#[test]
fn test_readonly_dir_perm_failure() {
let persister = FilesystemPersister::new("test_readonly_dir_perm_failure".to_string());
fs::create_dir_all(&persister.path_to_channel_data).unwrap();
fs::create_dir_all(&persister.path_to_data).unwrap();

// Set up a dummy channel and force close. This will produce a monitor
// that we can then use to test persistence.
Expand All @@ -260,7 +265,7 @@ mod tests {
// Set the persister's directory to read-only, which should result in
// returning a permanent failure when we then attempt to persist a
// channel update.
let path = &persister.path_to_channel_data;
let path = &persister.path_to_data;
let mut perms = fs::metadata(path).unwrap().permissions();
perms.set_readonly(true);
fs::set_permissions(path, perms).unwrap();
Expand Down
20 changes: 20 additions & 0 deletions lightning-persister/src/util.rs
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,26 @@ pub(crate) fn write_to_file<W: Writeable>(dest_file: PathBuf, data: &W) -> std::
Ok(())
}

pub(crate) fn delete_file(dest_file: PathBuf) -> std::io::Result<bool> {
if !dest_file.is_file() {
return Ok(false)
}

fs::remove_file(&dest_file)?;
let parent_directory = dest_file.parent().unwrap();
let dir_file = fs::OpenOptions::new().read(true).open(parent_directory)?;
#[cfg(not(target_os = "windows"))]
{
unsafe { libc::fsync(dir_file.as_raw_fd()); }
}

if dest_file.is_file() {
return Err(std::io::Error::new(std::io::ErrorKind::Other, "Unpersisting key failed"));
}

return Ok(true)
}

#[cfg(test)]
mod tests {
use lightning::util::ser::{Writer, Writeable};
Expand Down
6 changes: 5 additions & 1 deletion lightning/src/util/persist.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,12 @@ use super::{logger::Logger, ser::Writeable};
/// and [`Persist`] traits. It uses "manager", "network_graph",
/// and "monitors/{funding_txo_id}_{funding_txo_index}" for keys.
pub trait KVStorePersister {
/// Persist the given writeable using the provided key
/// Persist the given writeable using the provided key.
fn persist<W: Writeable>(&self, key: &str, object: &W) -> io::Result<()>;

/// Unpersist (i.e., remove) the writeable previously persisted under the provided key.
/// Returns `true` if the key was present, and `false` otherwise.
fn unpersist(&self, key: &str) -> io::Result<bool>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add motivation for adding this and where it will or can be used?

  • Its one thing if its for space optimization (delete items no longer being used)
  • And another if we are going to be relying on this function for immediate delete of items.
    Depending on use-case, Ideally i would like it to be just former, to support some of the systems which support async delete. (And in this case we might not care much if it doesn't succeed immediately or is not strongly consistent.)

Copy link
Contributor Author

@tnull tnull Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean add to motivation in comments, in the commit message, or here on GitHub?

I generally think that it would be nice to have list and unpersist functionalities, if the persister is going to be used for a wider array of objects. Moreover, the sample node and LdkLite currently just bypass the FilesystemPersister for initalization (i.e., they read the directory contents via std::fs methods), which is no option if we'd ever want to have another persister backend.

However, my concrete motivation was the initial draft of a PaymentInfoStore for LdkLite (lightningdevkit/ldk-node#13) which uses the persister, where I however didn't want to store one huge blob of data under a single key. I therefore now store each payment info separately using the payment hash as a key. It would however be nice to be able to prune that store eventually, hence the need for an unpersist method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant in PR description.(for future reference as well)

Got it, from your description our intention is space optimization. (i.e. pruning)
It would be best if we add to documentation of this api that this will be used for pruning. and hence we can utilize async deletes or eventually consistent deletes.
This also means:

  1. Logic correctness shouldn't depend on immediate delete.
  2. Even if we miss on deleting an item, worst that will happen is some wasted storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Logic correctness shouldn't depend on immediate delete.
  2. Even if we miss on deleting an item, worst that will happen is some wasted storage.

I tend to disagree. Yes, the current motivation for this is pruning some data which is fine to be delayed, which however might not be the case for other future use cases. It would also pretty annoying to just silently fail to unpersist for some reason. But, more importantly, I think it would be very confusing to have an interface in which different but very similarly named methods behave differently or provide very different consistency guarantees. I'd therefore argue that while data removal is almost always not as critical as persistence, we should try to align the guarantees two methods as far as possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is to have a flag, which allows for both. (if you think there will be future requirement for it and feel strongly about it) (imo we should keep it for pruning purpose to start with or by Default)
Not all systems provide same guarantees while deleting and we should think beyond filesystem-persistence.

And in LN and LDK usecase, persist and delete can live with different consistency requirements. In all case we need persist to be strongly consistent, whereas its not the same for delete.

The concern for "methods behave differently" is valid but that doesn't mean we should make api or consistency decisions based on that.
In which case do we expect immediate delete?
We already have big overhaul ongoing for async persist, and if we can be conservative while adding delete functionality then why not ?

Copy link
Collaborator

@TheBlueMatt TheBlueMatt Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, my concrete motivation was the initial draft of a PaymentInfoStore for LdkLite (lightningdevkit/ldk-node#13) which uses the persister, where I however didn't want to store one huge blob of data under a single key. I therefore now store each payment info separately using the payment hash as a key. It would however be nice to be able to prune that store eventually, hence the need for an unpersist method.

I don't think a separate key per payment is the right approach here, either? That is a lot of keys, and with very small values. Generally many filesystems are really bad at folders with lots of keys, as many operations expect to load the full set of keys in a folder at a time, so I don't think we can do that in general.

I agree that the motivation here needs to be scoped a bit better. Concretely, we want to move towards removing ChannelMonitors in the future, but I'm not actually sure that we really ever want to fully remove a monitor - we just want to "archive" them. I worry if we have a bug we may need a monitor to recover and claim some funds in the future, so would prefer we just archive and not delete.

Copy link
Contributor Author

@tnull tnull Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a separate key per payment is the right approach here, either? That is a lot of keys, and with very small values. Generally many filesystems are really bad at folders with lots of keys, as many operations expect to load the full set of keys in a folder at a time, so I don't think we can do that in general.

Yes, but I think the current PaymentInfoStore draft should really be a starting point, since we want a database backend for these kinds of data anyways. So regarding that concrete use case I'd argue we still want to add unpersist to the trait, but implement a KVStorePersister with a different backend in LDKLite (probably using the sled database at least initially, since it's already in use for BDK persistence).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but then presumably the unpersist function doesn't belong in the upstream trait but rather in your local trait for database logic, as its not reasonable to expect all implementors to be used for that functionality?

Copy link
Contributor Author

@tnull tnull Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but I don't see why we then have an upstream trait at all, if it doesn't capture functionalities that ~all implementations would need? Even more so than for unpersist this is true for list and get functionalities, which literally every user/implementation will need. Take for example ldk-sample which of course also needs these functionalities for initialization, but solves the problem by simply bypassing the trait and even FilesystemPersister by just going through std::fs and reading the data stored at the specific hardcoded locations. However, this of course doesn't work for all backend implementations, so while the trait is supposed to enable different backends, it currently just works if we assume storing data in files.

In particular if we want to make the persister configurable for the user, e.g., in LDKLite, the KVStorePersister trait needs to offer a way to list and retrieve the data stored for a specific key. Of course this could be solved by yet another augmenting downstream trait specific to LDKLite, but this is not the idea of having a general KVStorePersister trait?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but I don't see why we then have an upstream trait at all, if it doesn't capture functionalities that ~all implementations would need?

The upstream trait should capture the things that upstream needs. If users want to add methods, they should totally do that, but its not of upstream's concern?

Even more so than for unpersist this is true for list and get functionalities, which literally every user/implementation will need.

Yes, I think we should list and get to the upstream trait, and then use that to implement initialization methods upstream, as utilities for the user. Thus using those methods :)

}

/// Trait that handles persisting a [`ChannelManager`], [`NetworkGraph`], and [`WriteableScore`] to disk.
Expand Down