Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: support three star to match all files recursively #4422

Merged
merged 5 commits into from
Feb 29, 2024
Merged

Conversation

davies
Copy link
Contributor

@davies davies commented Feb 23, 2024

No description provided.

@davies davies requested a review from zhijian-pro February 23, 2024 07:36
Copy link

codecov bot commented Feb 23, 2024

Codecov Report

Attention: Patch coverage is 95.00000% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 56.09%. Comparing base (a6907b0) to head (c8c63d9).
Report is 7 commits behind head on main.

Files Patch % Lines
pkg/sync/sync.go 95.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4422      +/-   ##
==========================================
+ Coverage   56.01%   56.09%   +0.08%     
==========================================
  Files         154      154              
  Lines       40076    40170      +94     
==========================================
+ Hits        22447    22532      +85     
- Misses      15120    15123       +3     
- Partials     2509     2515       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhijian-pro
Copy link
Contributor

zhijian-pro commented Feb 27, 2024

*/subdir/*** This test case did not pass

$ tree mydir
mydir
├── file5.txt
├── file6.doc
├── subdir1
│   ├── file1.txt
│   ├── file2.doc
│   └── subdir
│       ├── file3.txt
│       └── file4.doc
└── subdir2
    ├── file3.txt
    ├── file4.doc
    └── subdir
        ├── file1.txt
        ├── file2.doc
        └── subdir1-2
            ├── file3.txt
            └── file4.doc

6 directories, 12 files

$ juicefs_main sync  /tmp/t1/mydir/ /tmp/t1/mydir2/  --exclude '*/subdir/***' -u -f --dirs
2024/02/27 14:17:14.086040 juicefs[54721] <INFO>: Prometheus metrics listening on 127.0.0.1:9567 [mount.go:158]
2024/02/27 14:17:14.086555 juicefs[54721] <INFO>: Syncing from file:///tmp/t1/mydir/ to file:///tmp/t1/mydir2/ [sync.go:1136]
Scanned objects: 18/18 [==============================================================]  7030.0/s used: 2.565334ms
Skipped objects: 0                 0.0/s
Pending objects: 0                 0.0/s
 Copied objects: 18                7498.7/s
   Copied bytes: 0.0 b (0 Bytes)   0.0 b/s
 Failed objects: 0                 0.0/s
2024/02/27 14:17:14.088982 juicefs[54721] <INFO>: Found: 18, skipped: 0, copied: 18 (0 B), failed: 0 [sync.go:1175]

$ rsync -av  /tmp/t1/mydir/ /tmp/t1/mydir3/  --exclude '*/subdir/***'
building file list ... done
created directory /tmp/t1/mydir3
./
file5.txt
file6.doc
subdir1/
subdir1/file1.txt
subdir1/file2.doc
subdir2/
subdir2/file3.txt
subdir2/file4.doc

sent 470 bytes  received 170 bytes  1280.00 bytes/sec
total size is 0  speedup is 0.00

$ tree mydir2
mydir2
├── file5.txt
├── file6.doc
├── subdir1
│   ├── file1.txt
│   ├── file2.doc
│   └── subdir
│       ├── file3.txt
│       └── file4.doc
└── subdir2
    ├── file3.txt
    ├── file4.doc
    └── subdir
        ├── file1.txt
        ├── file2.doc
        └── subdir1-2
            ├── file3.txt
            └── file4.doc

6 directories, 12 files

$ tree mydir3
mydir3
├── file5.txt
├── file6.doc
├── subdir1
│   ├── file1.txt
│   └── file2.doc
└── subdir2
    ├── file3.txt
    └── file4.doc

3 directories, 6 files

@zhijian-pro
Copy link
Contributor

How about this ?

// Consistent with rsync behavior, the matching order is adjusted according to the order of the "include" and "exclude" options
func matchKey(rules []rule, key string) bool {
	parts := strings.Split(key, "/")
	for i := range parts {
		if parts[i] == "" {
			continue
		}
		prefix := strings.Join(parts[:i+1], "/")
		for _, rule := range rules {
			var s string
			if i < len(parts)-1 && (strings.HasSuffix(rule.Pattern, "/") || strings.HasSuffix(rule.Pattern, "/***")) {
				s = "/"
			}
			// if the pattern is ***, it will match all and no need to check suffix.
			if rule.Pattern == "***" {
				if rule.Include {
					return true
				} else {
					return false
				}
			}
			flag := strings.HasSuffix(rule.Pattern, "/***")
			rule.Pattern = strings.TrimSuffix(rule.Pattern, "***")
			suffix := suffixForPattern(prefix+s, rule.Pattern)
			ok, err := path.Match(rule.Pattern, suffix)
			if err != nil {
				logger.Fatalf("match %s with %s: %v", rule.Pattern, suffix, err)
			}
			if ok {
				if rule.Include {
					// if the pattern ends with /***, it will match all and no need to check suffix.
					if flag {
						return true
					}
					break // try next level
				} else {
					return false
				}
			}
		}
	}
	return true
}

@zhijian-pro
Copy link
Contributor

zhijian-pro commented Feb 29, 2024

Our list return is ordered, so perhaps rules ending in *** and ** can be optimized by recording the last hit path of the rule to reduce a large amount of recursion.

  1. Convert the rule to a pointer passed into the function so that it can set its lastMatch when a match is successful.
  2. When a rule ending with *** is matched, set rule.lastMatch to xxxxx/.
  3. When the rule matched by ** is matched, set rule.lastMatch to xxxxx.
  4. If the rule cache is not empty, match based on prefix.
// Consistent with rsync behavior, the matching order is adjusted according to the order of the "include" and "exclude" options
func matchKey(rules []rule, key string) bool {
	parts := strings.Split(key, "/")
	for i := range parts {
		if parts[i] == "" {
			continue
		}
		for _, rule := range rules {
#------------------------------------
			// use the last matched path to optimize the matching process
			if rule.lastMatch!=""{
				if strings.HasSuffix(rule.pattern,"**") && strings.HasPrefix(key, rule.lastMatch) {
					if rule.include {
						break // try next level
					} else {
						return false
					}
				}
			}
#-------------------------------------
			ps := parts[:i+1]
			p := strings.Split(rule.pattern, "/")
			if i < len(parts)-1 && (p[len(p)-1] == "" || p[len(p)-1] == "***") {
				ps = append(append([]string{}, ps...), "") // don't overwrite parts
			}
			var ok bool
			if p[0] == "" {
				if ps[0] != "" {
					p = p[1:]
				}
				ok = matchPrefix(p, ps)
			} else {
				ok = matchSuffix(p, ps)
			}
			if ok {
				if rule.include {
					break // try next level
				} else {
					return false
				}
			}
		}
	}
	return true
}

@davies
Copy link
Contributor Author

davies commented Feb 29, 2024

Our list return is ordered, so perhaps rules ending in *** and ** can be optimized by recording the last hit path of the rule to reduce a large amount of recursion.

  1. Convert the rule to a pointer passed into the function so that it can set its lastMatch when a match is successful.
  2. When a rule ending with *** is matched, set rule.lastMatch to xxxxx.
  3. When the rule matched by ** is matched, set rule.lastMatch to xxxxx.
  4. If the rule cache is not empty, match based on prefix.
// Consistent with rsync behavior, the matching order is adjusted according to the order of the "include" and "exclude" options
func matchKey(rules []rule, key string) bool {
	parts := strings.Split(key, "/")
	for i := range parts {
		if parts[i] == "" {
			continue
		}
		for _, rule := range rules {
#------------------------------------
			// use the last matched path to optimize the matching process
			if rule.lastMatch!=""{
				if strings.HasSuffix(rule.pattern,"**") && strings.HasPrefix(key, rule.lastMatch) {
					if rule.include {
						break // try next level
					} else {
						return false
					}
				}
			}
#-------------------------------------
			ps := parts[:i+1]
			p := strings.Split(rule.pattern, "/")
			if i < len(parts)-1 && (p[len(p)-1] == "" || p[len(p)-1] == "***") {
				ps = append(append([]string{}, ps...), "") // don't overwrite parts
			}
			var ok bool
			if p[0] == "" {
				if ps[0] != "" {
					p = p[1:]
				}
				ok = matchPrefix(p, ps)
			} else {
				ok = matchSuffix(p, ps)
			}
			if ok {
				if rule.include {
					break // try next level
				} else {
					return false
				}
			}
		}
	}
	return true
}

this may work, we can defer that later.

@davies davies merged commit 58722dd into main Feb 29, 2024
26 checks passed
@davies davies deleted the sync_three_star branch February 29, 2024 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants