bugfix: Reuse the same file pointer for repeated find output#511
bugfix: Reuse the same file pointer for repeated find output#511tonakai-s wants to merge 8 commits intouutils:mainfrom
find output#511Conversation
find output (#439)find output
|
Hi, I see some tests failed, could you please check them? Thanks. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #511 +/- ##
==========================================
+ Coverage 87.49% 87.50% +0.01%
==========================================
Files 31 31
Lines 6815 6821 +6
Branches 321 321
==========================================
+ Hits 5963 5969 +6
Misses 621 621
Partials 231 231 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I executed the test suite on my windows machine and it failed, and I don't know why certainly. |
src/find/matchers/mod.rs
Outdated
| // Used on file output arguments. | ||
| // When the same file is specified multiple times, the same pointer is used. | ||
| struct FileMemoizer { | ||
| mem: HashMap<String, Arc<File>>, |
There was a problem hiding this comment.
Using a string as the hash key won't work. It needs to handle different spellings of the same path, as well as hard links:
$ touch file
$ ln file link
$ find -fprint file -fprint ./file -fprint linkThere was a problem hiding this comment.
True, newbie mistake of my part, will fix it.
src/find/matchers/ls.rs
Outdated
| fn matches(&self, file_info: &WalkEntry, matcher_io: &mut MatcherIO) -> bool { | ||
| if let Some(file) = &self.output_file { | ||
| self.print(file_info, file, true); | ||
| self.print(file_info, file.clone(), true); |
There was a problem hiding this comment.
| self.print(file_info, file.clone(), true); | |
| self.print(file_info, file.as_ref(), true); |
Same everywhere else, there's usually no need to clone the Arc
src/find/matchers/mod.rs
Outdated
| // Based on path inode or file_index check if the file already has been specified. | ||
| // If yes, use the same file pointer. | ||
| struct FileMemoizer { | ||
| mem: HashMap<u64, Arc<File>>, |
There was a problem hiding this comment.
A single u64 is not enough to uniquely identify a file. On UNIX-like systems, you need the pair (st_dev, st_ino) (AKA device ID + inode number) at least for a unique ID. Not sure what's necessary on Windows. But using just the inode number is wrong since inode numbers are only unique within a single file system.
There was a problem hiding this comment.
Sure, I read about it sometime ago but forgot, thanks for the precise explanation.
I will use the std::fs::metadata to take the st_dev and st_ino from the same source, but using std::os::unix::fs::MetadataExt and std::os::windows::fs::MetadataExt.
Will do a manual test to check the assurance of this implementation, but I don't think that has a way to do an automated test of it, what do you think?
There was a problem hiding this comment.
Maybe use uucore's FileInformation? Check samefile.rs for an example. (Hopefully it impls Hash)
There was a problem hiding this comment.
Oh, that makes sense, FileInformation impls PartialEq and Hash, it's more simple than what I was trying to do. Will implement, test, and send a new commit (again).
|
it needs to be rebased, sorry! |
…ot overwrite the previous path.
…Arc<File> clones. test:Add new tests to symbolic and hardlinks.
…ready imported unix symlink module.
|
Did you run some benchmarks in this? Thanks |
Not yet, I can run if needed, do you guys have a pattern? |
Fixes #439
This PR creates a kind of memoization to the file creation inside the findutils/find command.
Uses a hashmap to know which file paths already has been created and returns the Arc to prevent overwrite.
It also add a new testing function
find_using_same_out_multiple_timesto these more "complex" cases.But, the expected outputs are fixed, if the assert would be better by catching the expected value dynamically in case of the test_data/simple structure change, let me know!