Skip to content

Add --default-extension argument to force file format#1842

Merged
mre merged 10 commits intomasterfrom
issue-1665
Sep 18, 2025
Merged

Add --default-extension argument to force file format#1842
mre merged 10 commits intomasterfrom
issue-1665

Conversation

@mre
Copy link
Member

@mre mre commented Sep 5, 2025

Summary

Fixes #1665 by implementing a --default-extension option that allows users to specify a default file extension for files without extensions.

Problem

Users needed a way to manually specify the file format for files without clear extensions. This is common with:

  • README files without extensions
  • Configuration files
  • Scripts without standard extensions
  • Files with ambiguous extensions

Solution

Added --default-extension CLI option that:

  • Accepts any valid file extension (e.g., md, html, txt)
  • Uses the extension to determine the file type for processing
  • Gracefully falls back to plaintext for unknown extensions
  • Follows the same pattern as the existing --extensions option

Examples

# Treat README as markdown
lychee --default-extension md README

# Treat index as HTML
lychee --default-extension html index  

# Treat CHANGELOG as plaintext
lychee --default-extension txt CHANGELOG

# Works with any supported extension
lychee --default-extension htm file_without_ext
lychee --default-extension markdown docs

Implementation Details

  • CLI Option: Added --default-extension argument to Config struct
  • File Type Processing: Enhanced FileType::from_extension() to be public
  • Input Processing: Updated inputs() method to convert extension to FileType hint
  • Error Handling: Invalid extensions gracefully fall back to default plaintext behavior
  • Testing: Comprehensive unit and integration tests

Design Decision: Extensions vs Types

Initially considered using abstract types (--default-type markdown), but switched to extensions (--default-extension md) because:

  • More intuitive: Users work with familiar file extensions
  • More flexible: Easy to support any extension lychee recognizes
  • Consistent: Matches the existing --extensions option pattern
  • Extensible: Could easily support multiple extensions in the future

Test Plan

  • Unit tests for FileType::from_extension() with valid/invalid extensions
  • Integration tests for CLI option with different file types
  • Error handling test for unknown extensions
  • Manual testing with various file scenarios
  • All existing tests still pass

Verification

Before:
Files without extensions were always treated as plaintext, potentially missing links in markdown/HTML content.

After:

$ lychee --default-extension md README
# Now extracts markdown links from README file

$ lychee --default-extension html index  
# Now extracts HTML links from index file

The feature integrates seamlessly with existing functionality while providing users the control they need over file type detection.

[default: md,mkd,mdx,mdown,mdwn,mkdn,mkdown,markdown,html,htm,txt]

--default-extension <EXTENSION>
Default file extension to treat files without extensions as having.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really confused by this sentence 😅

Suggested change
Default file extension to treat files without extensions as having.
This is the default file extension that is applied to files without an extension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just me editing the wording over and over again until I broke the grammar.

Copy link
Member

@thomas-zahner thomas-zahner Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mre I noticed now that you merged your version. Did you forget about changing it? I still think this original version is quite confusing 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that was by mistake. I thought I updated the text but apparently not.

mre and others added 8 commits September 18, 2025 15:32
Resolves #1665 by implementing a --default-extension option that allows users
to specify a default file extension for files without extensions.

Key features:
- New --default-extension CLI option (e.g., --default-extension md)
- Accepts any valid file extension: md, html, txt, etc.
- Extension is used to determine file type for processing files without extensions
- Invalid extensions gracefully fall back to default plaintext handling
- Consistent with existing --extensions option pattern

Implementation details:
- Added default_extension field to Config struct
- Enhanced FileType::from_extension() to be public with proper documentation
- Updated Input processing to use default file type hint when provided
- Added comprehensive unit and integration tests

Examples:
  lychee --default-extension md README        # Treat README as markdown
  lychee --default-extension html index       # Treat index as HTML
  lychee --default-extension txt CHANGELOG    # Treat CHANGELOG as plaintext

This approach is more user-friendly than abstract file types, as users work
directly with familiar file extensions while maintaining full compatibility
with the existing file type system.
- Made from_extension() method public with must_use attribute
- Added comprehensive unit tests for extension parsing
- Tests cover valid extensions, case sensitivity, and invalid inputs
The SSL error message varies between environments (local vs CI).
Reverting to original expected message for consistency with CI.
Fixes test_readme_usage_up_to_date by adding the new --default-extension
option to the usage section in README.md
Co-authored-by: Thomas Zahner <thomas.zahner@protonmail.ch>
Co-authored-by: Thomas Zahner <thomas.zahner@protonmail.ch>
- Change "invalid" to "unknown" in test comments and variables
- Rename parameter from "ext" to "extension" in from_extension function
- Keep from_extension method instead of implementing From<&str> trait for better API clarity
mre added 2 commits September 18, 2025 15:35
- Add missing default_extension field to Config initialization
- Update README documentation to match current help text format
- Add more descriptive comments explaining the test behavior
- Include actual link content in the test file to verify link extraction
- Assert that the link is actually found and extracted as plaintext
- This makes the test more robust and demonstrates the fallback behavior
@mre mre merged commit 1264614 into master Sep 18, 2025
6 checks passed
@mre mre deleted the issue-1665 branch September 18, 2025 14:05
This was referenced Sep 15, 2025
@thomas-zahner thomas-zahner mentioned this pull request Oct 3, 2025
2 tasks
This was referenced Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add argument to force file format

2 participants