Skip to content

storage: fix stale chunks handling on limits#11632

Open
edsiper wants to merge 4 commits intomasterfrom
storage-fix-stale-chunk
Open

storage: fix stale chunks handling on limits#11632
edsiper wants to merge 4 commits intomasterfrom
storage-fix-stale-chunk

Conversation

@edsiper
Copy link
Copy Markdown
Member

@edsiper edsiper commented Mar 30, 2026


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes

    • Improved space projection and release logic to better estimate append growth and avoid unnecessary removals; introduced a two-pass local deletion strategy to meet required space sooner.
    • Adjusted behavior to prefer removing actually-deletable chunk files before shared/unlinked ones under storage pressure.
  • Tests

    • Added an automated test that verifies storage-management prefers deletable chunks when multiple outputs share limits.

edsiper added 2 commits March 30, 2026 13:06
…ction

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
…egressions

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

Adds helpers and projected write-size estimation, implements a two-pass local space-release strategy for input chunks, updates release/placement APIs to use projected sizes, and adds a test verifying deletable files are preferred when storage limits are hit.

Changes

Cohort / File(s) Summary
Chunk space management
src/flb_input_chunk.c
Introduced local mirror of chunkio FS view and helpers: flb_input_chunk_has_other_routes, flb_input_chunk_prefers_physical_delete, flb_input_chunk_get_projected_write_size. Refactored flb_input_chunk_release_space to support two-pass local deletion and early-exit when requirement met. Changed flb_input_chunk_release_space_compound signature to add size_t required_space. Propagated projected placement_size into placement and release flows.
Tests — deletion preference
tests/internal/input_chunk.c
Added count_chunk_files() helper and new test flb_test_input_chunk_prefers_deletable_files_on_limit() (registered in TEST_LIST) that sets up storage+outputs, appends chunks, adjusts limits, and asserts that physically deletable chunks are preferred when limits are enforced.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Input as Input logic
participant Router as Router/Outputs
participant FS as Filesystem (storage)
Input->>Input: compute projected write size
Input->>Router: inspect routes_mask / other routes
Router-->>Input: route membership info
Input->>FS: request/release required_space
alt local scope, pass 0
Input->>Input: decide prefers physical delete?
opt skip physical deletion
Input-->>FS: skip physical delete (unlink deferred)
end
else pass 1 or global
Input-->>FS: delete/unlink candidate chunk(s)
end
FS-->>Input: released_space
Input->>FS: place new chunk using projected placement_size
FS-->>Input: confirm write/placement

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

backport to v4.2.x

Poem

🐰 I nibble bytes beneath the moon,

Two-pass I hop, decide too soon? No—soon.
I drop the chunks that can be shed,
Keep shared friends safe in their bed,
Storage tidy, I hum a tune.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'storage: fix stale chunks handling on limits' directly relates to the main changes in the PR, which focus on improving how chunks are handled when storage limits are enforced, including better space projection and deletion logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch storage-fix-stale-chunk

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/flb_input_chunk.c (1)

694-733: Document the magic number 24 for chunk header overhead.

The value 24 corresponds to the chunk file header structure (2 header bytes + 4 CRC32 + 16 padding + 2 metadata length bytes), as documented in flb_input_chunk_get_real_size (lines 522-526). Consider either referencing that constant or adding a similar comment here for maintainability.

    meta_size = (size_t) cio_meta_size(ic->chunk);
    /* See https://github.com/edsiper/chunkio#file-layout for header overhead:
     * 2 (header) + 4 (CRC32) + 16 (padding) + 2 (metadata length) = 24 bytes */
    logical_size = (size_t) flb_input_chunk_get_size(ic) + meta_size + 24;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/flb_input_chunk.c` around lines 694 - 733, The code in
flb_input_chunk_get_projected_write_size uses the magic number 24 for chunk
header overhead without explanation; update the function to document or
reference that constant by adding a brief comment explaining the 24 bytes
breakdown (2 header bytes + 4 CRC32 + 16 padding + 2 metadata length bytes) or
use the same named constant used by flb_input_chunk_get_real_size if one exists;
reference flb_input_chunk_get_projected_write_size and
flb_input_chunk_get_real_size so reviewers can locate and align the
header-overhead handling and ensure maintainability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/flb_input_chunk.c`:
- Around line 694-733: The code in flb_input_chunk_get_projected_write_size uses
the magic number 24 for chunk header overhead without explanation; update the
function to document or reference that constant by adding a brief comment
explaining the 24 bytes breakdown (2 header bytes + 4 CRC32 + 16 padding + 2
metadata length bytes) or use the same named constant used by
flb_input_chunk_get_real_size if one exists; reference
flb_input_chunk_get_projected_write_size and flb_input_chunk_get_real_size so
reviewers can locate and align the header-overhead handling and ensure
maintainability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c3e81c91-1a73-4a0f-8e0b-9685008fa15a

📥 Commits

Reviewing files that changed from the base of the PR and between 8647f02 and 0723c30.

📒 Files selected for processing (2)
  • src/flb_input_chunk.c
  • tests/internal/input_chunk.c

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0723c308b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/flb_input_chunk.c Outdated
…iction

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/flb_input_chunk.c (1)

75-91: Consider directly including <chunkio/cio_file.h> instead of mirroring the struct.

The struct flb_input_chunk_cio_file_view mirrors fields from cio_file, which is defined in the public header lib/chunkio/include/chunkio/cio_file.h. This header is not private—it's part of chunkio's public API and is directly included in chunkio's own tests and source files (e.g., lib/chunkio/tests/fs.c, lib/chunkio/tests/fs_windows.c). Fluent-bit could simplify this by including <chunkio/cio_file.h> and using struct cio_file directly, as chunkio itself does, rather than maintaining a duplicate struct definition. If there is a specific reason to avoid this include (e.g., build isolation), document it in the comment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/flb_input_chunk.c` around lines 75 - 91, Replace the mirrored struct
definition flb_input_chunk_cio_file_view with a direct use of the public struct
cio_file by including the public header <chunkio/cio_file.h> and update any
casts/usages that referenced flb_input_chunk_cio_file_view to use struct
cio_file instead; if including the header is intentionally avoided for
build/isolation reasons, add a clear comment above the mirrored struct
explaining why and referencing that decision so the duplicate layout is
documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/flb_input_chunk.c`:
- Around line 756-764: The loop that increments projected_size by realloc_size
can overflow for very large append_size; replace the iterative loop in
flb_input_chunk.c with a safe calculation: first check for overflow when
computing target = logical_size + append_size, then compute needed = (target >
projected_size) ? target - projected_size : 0; compute increments = (needed +
realloc_size - 1) / realloc_size and add projected_size += increments *
realloc_size while guarding the multiplication/addition for overflow; finally
round projected_size up to page_size with an overflow check — reference
variables projected_size, alloc_size, realloc_size, logical_size, append_size,
and page_size to locate and update the logic.

---

Nitpick comments:
In `@src/flb_input_chunk.c`:
- Around line 75-91: Replace the mirrored struct definition
flb_input_chunk_cio_file_view with a direct use of the public struct cio_file by
including the public header <chunkio/cio_file.h> and update any casts/usages
that referenced flb_input_chunk_cio_file_view to use struct cio_file instead; if
including the header is intentionally avoided for build/isolation reasons, add a
clear comment above the mirrored struct explaining why and referencing that
decision so the duplicate layout is documented.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 208da22d-b623-48b0-a6b5-d661bb3f37b5

📥 Commits

Reviewing files that changed from the base of the PR and between 0723c30 and ccd4de1.

📒 Files selected for processing (1)
  • src/flb_input_chunk.c

Comment thread src/flb_input_chunk.c Outdated
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/flb_input_chunk.c (1)

75-94: Fragile struct mirroring pattern - ensure synchronization with chunkio.

The mirror struct must exactly match the leading layout of struct cio_file in lib/chunkio/include/chunkio/cio_file.h. Currently, the field order appears correct based on the chunkio header. However, this pattern is inherently fragile:

  1. Any reordering or insertion of fields in the original struct will silently corrupt reads
  2. Platform-specific padding differences could cause misalignment

Consider adding a static assertion or compile-time check to validate the struct size/offsets match expectations, or document a version dependency on chunkio.

#!/bin/bash
# Verify the mirror struct matches the original cio_file struct layout
# by checking the field order in the chunkio header

echo "=== Checking cio_file struct definition in chunkio ==="
fd -t f 'cio_file.h' --exec cat {} \; 2>/dev/null | head -60

echo ""
echo "=== Checking if there are any other references to cio_file fields ==="
rg -n 'struct cio_file' --type c -A 20 | head -80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/flb_input_chunk.c` around lines 75 - 94, The mirrored struct
flb_input_chunk_cio_file_view must be validated against chunkio's real struct
cio_file to avoid silent corruption; add compile-time assertions that sizeof(fl
b_input_chunk_cio_file_view) equals sizeof(struct cio_file) and that offsetof
for key fields (fd, flags, synced, allocate_strategy, fs_size, data_size,
page_size, alloc_size, realloc_size) match the corresponding offsetof(struct
cio_file, ...); implement these checks using static_assert/_Static_assert or
equivalent macros (and guard them with `#ifdef` that includes chunkio/cio_file.h
when available), or document a strict chunkio version requirement if the header
cannot be included. Ensure the checks are named/placed with references to
flb_input_chunk_cio_file_view and cio_file so failures point to the mismatch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/flb_input_chunk.c`:
- Around line 798-805: flb_input_chunk_get_real_size can return -1 on error but
its result is cast to size_t into current_size causing unsigned wrap; in the
block that sets current_size when ic->fs_counted == FLB_TRUE, call
flb_input_chunk_get_real_size(ic) into a signed variable (e.g. ssize_t real =
flb_input_chunk_get_real_size(ic)), check for real == -1 and handle the error
path (either treat current_size as 0, log/propagate the error, or return an
error code) before casting to size_t; update the calculation using that
validated value so projected_size - current_size cannot underflow unexpectedly.
- Around line 389-400: flb_input_chunk_get_real_size() can return -1 on error;
before using its result in the FLB_INPUT_CHUNK_RELEASE_SCOPE_LOCAL branch,
validate chunk_size != -1 and handle the error path: when chunk_size == -1, log
or handle the failure and skip subtracting from output_plugin->fs_chunks_size
and skip adding to released_space (or treat chunk_size as 0) to avoid
underflow/incorrect accounting. Update the block that references chunk_size,
old_input_chunk, output_plugin->fs_chunks_size, and released_space to perform
this check and safe handling.

---

Nitpick comments:
In `@src/flb_input_chunk.c`:
- Around line 75-94: The mirrored struct flb_input_chunk_cio_file_view must be
validated against chunkio's real struct cio_file to avoid silent corruption; add
compile-time assertions that sizeof(fl b_input_chunk_cio_file_view) equals
sizeof(struct cio_file) and that offsetof for key fields (fd, flags, synced,
allocate_strategy, fs_size, data_size, page_size, alloc_size, realloc_size)
match the corresponding offsetof(struct cio_file, ...); implement these checks
using static_assert/_Static_assert or equivalent macros (and guard them with
`#ifdef` that includes chunkio/cio_file.h when available), or document a strict
chunkio version requirement if the header cannot be included. Ensure the checks
are named/placed with references to flb_input_chunk_cio_file_view and cio_file
so failures point to the mismatch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a9a3464-e543-4c7c-9406-8605d69a6f75

📥 Commits

Reviewing files that changed from the base of the PR and between ccd4de1 and 732996f.

📒 Files selected for processing (1)
  • src/flb_input_chunk.c

Comment thread src/flb_input_chunk.c
Comment on lines +389 to 400
chunk_size = flb_input_chunk_get_real_size(old_input_chunk);
chunk_released = FLB_FALSE;
chunk_destroy_flag = FLB_FALSE;

chunk_destroy_flag = flb_routes_mask_is_empty(
old_input_chunk->routes_mask,
input_plugin->config->router);
if (release_scope == FLB_INPUT_CHUNK_RELEASE_SCOPE_LOCAL) {
flb_routes_mask_clear_bit(old_input_chunk->routes_mask,
output_plugin->id,
input_plugin->config->router);

chunk_released = FLB_TRUE;
}
else if (release_scope == FLB_INPUT_CHUNK_RELEASE_SCOPE_GLOBAL) {
chunk_destroy_flag = FLB_TRUE;
}
FS_CHUNK_SIZE_DEBUG_MOD(output_plugin, old_input_chunk, chunk_size);
output_plugin->fs_chunks_size -= chunk_size;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing validation for negative chunk_size return value.

flb_input_chunk_get_real_size() can return -1 on error (as seen in its implementation at lines 522-550). If it returns -1, the subsequent subtraction at line 399 would cause an underflow of fs_chunks_size, and the addition at line 480 would incorrectly adjust released_space.

🛡️ Proposed fix to validate chunk_size
             chunk_size = flb_input_chunk_get_real_size(old_input_chunk);
+            if (chunk_size < 0) {
+                continue;
+            }
             chunk_released = FLB_FALSE;
             chunk_destroy_flag = FLB_FALSE;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/flb_input_chunk.c` around lines 389 - 400,
flb_input_chunk_get_real_size() can return -1 on error; before using its result
in the FLB_INPUT_CHUNK_RELEASE_SCOPE_LOCAL branch, validate chunk_size != -1 and
handle the error path: when chunk_size == -1, log or handle the failure and skip
subtracting from output_plugin->fs_chunks_size and skip adding to released_space
(or treat chunk_size as 0) to avoid underflow/incorrect accounting. Update the
block that references chunk_size, old_input_chunk,
output_plugin->fs_chunks_size, and released_space to perform this check and safe
handling.

Comment thread src/flb_input_chunk.c
Comment on lines +798 to +805
current_size = 0;
if (ic->fs_counted == FLB_TRUE) {
current_size = (size_t) flb_input_chunk_get_real_size(ic);
}

if (projected_size > current_size) {
return projected_size - current_size;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential issue with negative return from flb_input_chunk_get_real_size.

At line 800, flb_input_chunk_get_real_size(ic) can return -1 on error. Casting this to size_t would produce a very large value (due to unsigned conversion), causing the subtraction at line 804 to potentially return 0 incorrectly or cause unexpected behavior.

🛡️ Proposed fix to handle negative return
     current_size = 0;
     if (ic->fs_counted == FLB_TRUE) {
-        current_size = (size_t) flb_input_chunk_get_real_size(ic);
+        ssize_t real_size = flb_input_chunk_get_real_size(ic);
+        if (real_size > 0) {
+            current_size = (size_t) real_size;
+        }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/flb_input_chunk.c` around lines 798 - 805, flb_input_chunk_get_real_size
can return -1 on error but its result is cast to size_t into current_size
causing unsigned wrap; in the block that sets current_size when ic->fs_counted
== FLB_TRUE, call flb_input_chunk_get_real_size(ic) into a signed variable (e.g.
ssize_t real = flb_input_chunk_get_real_size(ic)), check for real == -1 and
handle the error path (either treat current_size as 0, log/propagate the error,
or return an error code) before casting to size_t; update the calculation using
that validated value so projected_size - current_size cannot underflow
unexpectedly.

@lecaros
Copy link
Copy Markdown
Contributor

lecaros commented Apr 17, 2026

The issue is still present, the storage keeps growing and other error message is in the picture now:

[2026/04/17 10:10:34.204] [error] [storage] cannot mmap/read chunk './tmp/fluentbit_storage/dummy.0/15031-1776435034.203997000.flb'

This is reproducible with the same config previously shared.
@edsiper

@lecaros
Copy link
Copy Markdown
Contributor

lecaros commented Apr 17, 2026

@edsiper

When a chunk is successfully delivered, the routes_mask bit is not being removed, hence flb_input_chunk_has_other_routes will return false positives, thinking it still has to deliver data to an output it already delivered it.

I've added some debug options on top of this branch. If you run the shared config, you'll see this message among others:

[2026/04/17 16:52:20.890] [debug] [input chunk] chunk '77716-1776459139.415212000.flb': output 'file.1' (id=1) still holds a routes_mask bit — physical delete blocked for output 'non_working_endpoint' (note: this bit is set at chunk creation and cleared only when all outputs finish or on eviction, NOT on successful delivery)

In this case, the data was already delivered to file.1. When non_working_endpoint tries to delete the chunk to make some space, it's blocked because of the routes_mask bit.

Patch with annotations
diff --git a/include/fluent-bit/flb_task.h b/include/fluent-bit/flb_task.h
index de7720d19..5ce332eff 100644
--- a/include/fluent-bit/flb_task.h
+++ b/include/fluent-bit/flb_task.h
@@ -169,6 +169,13 @@ static inline void flb_task_users_release(struct flb_task *task)
     if (task->users == 0 && mk_list_size(&task->retries) == 0) {
         flb_task_destroy(task, FLB_TRUE);
     }
+    else {
+        flb_debug("[task] task id=%i NOT destroyed: users=%i, pending "
+                  "retries=%i — chunk file will stay on disk until both "
+                  "reach zero",
+                  task->id, task->users,
+                  (int) mk_list_size(&task->retries));
+    }
 }
 
 /* Increase the counter for users */
diff --git a/src/flb_engine.c b/src/flb_engine.c
index c2786a1a2..15562b925 100644
--- a/src/flb_engine.c
+++ b/src/flb_engine.c
@@ -439,6 +439,16 @@ static inline int handle_output_event(uint64_t ts,
 
         flb_task_retry_clean(task, ins);
         flb_task_users_dec(task, FLB_TRUE);
+
+        flb_debug("[engine] output '%s' OK for chunk '%s' (task_id=%i): "
+                  "task now has %i user(s) and %i retry/retries pending — "
+                  "routes_mask bits for ALL outputs remain set until task "
+                  "is fully destroyed (users==0 && retries==0)",
+                  out_name,
+                  flb_input_chunk_get_name(task->ic),
+                  task_id,
+                  task->users,
+                  mk_list_size(&task->retries));
     }
     else if (ret == FLB_RETRY) {
         if (ins->retry_limit == FLB_OUT_RETRY_NONE) {
diff --git a/src/flb_input_chunk.c b/src/flb_input_chunk.c
index a704c29bb..1945cbc51 100644
--- a/src/flb_input_chunk.c
+++ b/src/flb_input_chunk.c
@@ -354,7 +354,18 @@ static int flb_input_chunk_release_space(
         pass_limit = 2;
     }
 
+    flb_debug("[input chunk] release_space: output='%s' scope=%s "
+              "required=%zd bytes",
+              flb_output_name(output_plugin),
+              (release_scope == FLB_INPUT_CHUNK_RELEASE_SCOPE_LOCAL)
+                  ? "LOCAL" : "GLOBAL",
+              *required_space);
+
     for (pass = 0; pass < pass_limit; pass++) {
+        flb_debug("[input chunk] release_space: starting pass %d/%d for "
+                  "output='%s'", pass, pass_limit - 1,
+                  flb_output_name(output_plugin));
+
         mk_list_foreach_safe(input_chunk_iterator, input_chunk_iterator_tmp,
                              &input_plugin->chunks) {
             old_input_chunk = mk_list_entry(input_chunk_iterator,
@@ -376,6 +387,11 @@ static int flb_input_chunk_release_space(
                 flb_input_chunk_prefers_physical_delete(old_input_chunk,
                                                         output_plugin,
                                                         release_scope) == FLB_FALSE) {
+                flb_debug("[input chunk] release_space pass 0: skipping "
+                          "chunk '%s' for output='%s' — has other routes or "
+                          "task still active; will retry on pass 1",
+                          flb_input_chunk_get_name(old_input_chunk),
+                          flb_output_name(output_plugin));
                 continue;
             }
 
@@ -402,6 +418,14 @@ static int flb_input_chunk_release_space(
                                                     old_input_chunk->routes_mask,
                                                     input_plugin->config->router);
 
+                flb_debug("[input chunk] release_space LOCAL: cleared "
+                          "routes_mask bit for output='%s' on chunk '%s'; "
+                          "routes_mask now %s — physical file will %s",
+                          flb_output_name(output_plugin),
+                          flb_input_chunk_get_name(old_input_chunk),
+                          chunk_destroy_flag ? "empty" : "NOT empty (other outputs still hold bits)",
+                          chunk_destroy_flag ? "be deleted" : "remain on disk");
+
                 chunk_released = FLB_TRUE;
             }
             else if (release_scope == FLB_INPUT_CHUNK_RELEASE_SCOPE_GLOBAL) {
@@ -465,6 +489,15 @@ static int flb_input_chunk_release_space(
 
                         chunk_released = FLB_TRUE;
                     }
+                    else {
+                        flb_debug("[input chunk] chunk '%s' routes_mask is "
+                                  "empty but task id=%i still has %i user(s) "
+                                  "— cannot destroy task/chunk yet (a "
+                                  "coroutine is in-flight for this chunk)",
+                                  flb_input_chunk_get_name(old_input_chunk),
+                                  old_input_chunk->task->id,
+                                  old_input_chunk->task->users);
+                    }
                 }
                 else {
                     flb_debug("[input chunk] drop chunk %s with no output route from input plugin %s",
@@ -636,6 +669,9 @@ static int flb_input_chunk_is_task_safe_delete(struct flb_task *task)
     }
 
     if (task->users != 0) {
+        flb_debug("[input chunk] task id=%i cannot be safely deleted: "
+                  "%i user(s) still active (coroutines in-flight or retries "
+                  "re-acquiring the task)", task->id, task->users);
         return FLB_FALSE;
     }
 
@@ -681,6 +717,14 @@ static int flb_input_chunk_has_other_routes(struct flb_input_chunk *ic,
         if (flb_routes_mask_get_bit(ic->routes_mask,
                                     candidate->id,
                                     ic->in->config->router) != 0) {
+            flb_debug("[input chunk] chunk '%s': output '%s' (id=%i) still "
+                      "holds a routes_mask bit — physical delete blocked for "
+                      "output '%s' (note: this bit is set at chunk creation "
+                      "and cleared only when all outputs finish or on eviction, "
+                      "NOT on successful delivery)",
+                      flb_input_chunk_get_name(ic),
+                      flb_output_name(candidate), candidate->id,
+                      flb_output_name(o_ins));
             return FLB_TRUE;
         }
     }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants