Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent path explosion in file traversal with complex symlinks #354

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rezmoss
Copy link
Contributor

@rezmoss rezmoss commented Feb 7, 2025

Problem

When processing Node.js packages with complex symlink structures (particularly .pnpm or turf packages), the _pathsToNode function would enter an exponential path growth pattern:

  • Normal packages process in 0-1ms
  • .pnpm/turf packages cause path counts to explode thousands
  • This leads to severe performance degradation/stuck forever

Root Cause

The issue stems from how parent paths are processed. The combination of:

  1. Getting all constituent paths with p.ConstituentPaths()
  2. Recursively processing each parent path
  3. Returning nil for visited nodes (causing reprocessing)

creates a compounding effect where each level of symlink traversal multiplies the paths to process

Solution

The fix modifies _pathsToNode to:

  1. Process only immediate parent path instead of all constituents
  2. Return accumulated paths for visited nodes instead of nil
  3. Maintain proper path building with basename

This prevents the path explosion while preserving the expected file traversal behavior

@kzantow
Copy link
Contributor

kzantow commented Feb 7, 2025

Thanks for the contribution @rezmoss! Is there a way to add a test illustrating the problem on a small scale and how this change fixes it?

@rezmoss
Copy link
Contributor Author

rezmoss commented Feb 7, 2025

@kzantow Sure! I'll work on that to reproduce the issue with a new Docker image and share it here soon

@rezmoss
Copy link
Contributor Author

rezmoss commented Feb 10, 2025

@kzantow Building a similar structure to a real docker image was really hard, but I finally built a dockerfile that reproduces almost the same behavior, my real docker image would get stuck for hours, but this one took 26 mins and 35 secs on my machine

original-code

After the patch, the same process took just 1 min and 32 secs

patch-v

here’s the dockerfile you can build and test

FROM node:16 AS builder

WORKDIR /app/lib

RUN npm install -g pnpm@8

RUN for dir in metric-analyzer event-processor cache-manager data-validator stream-utils audit-tools metrics-collector token-handler queue-manager workflow-engine service-registry task-scheduler load-balancer trace-analyzer storage-engine pipeline-runner; do \
    mkdir -p $dir && \
    cd $dir && \
    echo "{\"name\":\"$dir\",\"version\":\"1.0.0\",\"dependencies\":{\
        \"@turf/along\":\"6.5.0\",\
        \"@turf/turf\":\"6.5.0\",\
        \"@babel/runtime\":\"7.23.7\",\
        \"tslib\":\"2.5.3\",\
        \"@turf/boolean-point-in-polygon\":\"6.5.0\",\
        \"@turf/center-of-mass\":\"6.5.0\",\
        \"@turf/center-mean\":\"6.5.0\"\
    }}" > package.json && \
    cd .. ; \
done

RUN for dir in metric-analyzer event-processor cache-manager data-validator stream-utils audit-tools metrics-collector token-handler queue-manager workflow-engine service-registry task-scheduler load-balancer trace-analyzer storage-engine pipeline-runner; do \
    cd $dir && \
    pnpm install && \
    cd ..; \
done

# create deeply nested struc
RUN for dir in metric-analyzer event-processor cache-manager data-validator stream-utils audit-tools metrics-collector token-handler queue-manager workflow-engine service-registry task-scheduler load-balancer trace-analyzer storage-engine pipeline-runner; do \
    mkdir -p $dir/deep/nested/path/node_modules/.pnpm/@turf/along/dist/es && \
    echo '{"nested": true}' > $dir/deep/nested/path/node_modules/.pnpm/@turf/along/dist/es/package.json && \
    for pkg in along turf boolean-point-in-polygon center-of-mass center-mean; do \
        pkg_path="$dir/deep/nested/path/node_modules/.pnpm/@turf/$pkg/dist/es" && \
        mkdir -p $pkg_path && \
        echo "{\"name\":\"@turf/$pkg\",\"nested\":true}" > $pkg_path/package.json; \
    done; \
done

# create cross-links between pkgs and dirs
RUN for dir1 in metric-analyzer event-processor cache-manager data-validator stream-utils audit-tools metrics-collector token-handler queue-manager workflow-engine service-registry task-scheduler load-balancer trace-analyzer storage-engine pipeline-runner; do \
    for dir2 in metric-analyzer event-processor cache-manager data-validator stream-utils audit-tools metrics-collector token-handler queue-manager workflow-engine service-registry task-scheduler load-balancer trace-analyzer storage-engine pipeline-runner; do \
        if [ "$dir1" != "$dir2" ]; then \
            mkdir -p $dir1/node_modules/.pnpm/@turf/nested/$dir2 && \
            for pkg in along turf boolean-point-in-polygon center-of-mass center-mean; do \
                ln -s ../../../../../$dir2/node_modules/.pnpm/@turf+${pkg}@6.5.0 $dir1/node_modules/.pnpm/@turf/nested/$dir2/$pkg-link; \
            done; \
        fi; \
    done; \
done

# create deep nested pkgs cross lnks
RUN for dir in metric-analyzer event-processor cache-manager; do \
    cd $dir && \
    for i in $(seq 1 20); do \
        for pkg in along turf boolean-point-in-polygon center-of-mass center-mean; do \
            pkg_dir="deep/level$i/node_modules/.pnpm/@turf/$pkg" && \
            mkdir -p $pkg_dir/dist/es && \
            echo "{\"name\":\"@turf/$pkg\",\"level\":$i}" > $pkg_dir/dist/es/package.json && \
            if [ $i -gt 1 ]; then \
                prev=$((i-1)) && \
                ln -s ../../../../../level$prev/node_modules/.pnpm/@turf/$pkg $pkg_dir/prev-link && \
                for other_pkg in along turf boolean-point-in-polygon center-of-mass center-mean; do \
                    if [ "$pkg" != "$other_pkg" ]; then \
                        ln -s ../../$other_pkg $pkg_dir/cross-link-$other_pkg; \
                    fi; \
                done; \
            fi; \
        done; \
    done && \
    cd ..; \
done

FROM node:16-slim
COPY --from=builder /app /app
WORKDIR /app

CMD ["tail", "-f", "/dev/null"]

@rezmoss
Copy link
Contributor Author

rezmoss commented Feb 18, 2025

Hey @kzantow just following up on this, have you had a chance to look at the Dockerfile I shared? Let me know if you need any more details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants