Skip to content

Conversation

@apuig
Copy link
Contributor

@apuig apuig commented Dec 16, 2025

Replace listFiles() with File.walk().take(1000) to prevent scanning excessive files in directories with tens of thousands of event files. Improves performance and reduces resource usage during flush cycles.

Using listFiles() without bounds creates unpredictable resource usage that scales linearly with directory size. By limiting to 1000 files:

  • Memory cost per call is max ~ 150 KB
  • I/O cost (critical for network FS/cloud): Limiting to 1,000 stat() operations caps network round-trips at ~10-20ms, preventing the 500ms-2s delays that occur when enumerating large directories on
    NFS/EFS/cloud storage.

Threshold rationale: at 30-second flush intervals, 1K files represents ~8 hours of backlog

Replace listFiles() with File.walk().take(1000) to prevent scanning
excessive files in directories with tens of thousands of event files.
Improves performance and reduces resource usage during flush cycles.
@wenxi-zeng
Copy link
Contributor

@apuig this is a great idea to do pagination. now I understand why it is a concern mentioned in #283. yeah, in server use case, that 2s delay is pretty bad. let me think more about it see if we have better way to resolve this for server (don't get me wrong. I like your idea, it's simple and neat). this solution could lead to another edge case that not all events are flushed at the desired threshold (time, count, or other criteria depending on the flush policy).

in the meanwhile, I think the best workaround for now is to write a custom EventStream, something like

class PaginatedFileEventStream(
    val directory: File
): FileEventStream {
    override fun read(): List<String> = directory.walk()
        .maxDepth(1)
        .filter { it.isFile }
        .take(1000)
        .map { it.absolutePath }
        .toList()
}

and then follow this as an example to write a custom StorageProvider that provides the PaginatedFileEventStream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants