Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve split and merge #1612

Merged
merged 9 commits into from
May 11, 2024
Merged

Improve split and merge #1612

merged 9 commits into from
May 11, 2024

Conversation

pseudotensor
Copy link
Collaborator

@pseudotensor pseudotensor commented May 11, 2024

  • do semantic splitting for input chunking and summary-extraction chunking if possible
  • Interpret chunk_size as tokens if can
  • do sentence chunking if can
  • fix loss of text if original input is large like for summary, where it was chopping off (after split) all but 1 chunk
  • fix counting of tokens >= vs. >
  • fix doc show for large text, file type was full text
  • improve text doc view so wraps as sentences

@pseudotensor pseudotensor marked this pull request as ready for review May 11, 2024 08:48
@pseudotensor pseudotensor merged commit ae586c6 into main May 11, 2024
2 checks passed
@pseudotensor pseudotensor deleted the fix_split_merge branch May 11, 2024 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant