fix: normalize attachment filename extension for detection #20

Closed
heiko wants to merge 77 commits from fix/14-normalize-attachment-name into master
Owner

Closes #14

Normalizes filename to lowercase at the start of IsXRechnung so .XML and .PDF extensions are recognized regardless of case.

Closes #14 Normalizes filename to lowercase at the start of `IsXRechnung` so `.XML` and `.PDF` extensions are recognized regardless of case.
Author
Owner

Review findings (ai:codex)\n\nNo findings.\n\nResidual test risk: the PR test covers uppercase .XML filename handling, but not the other normalized paths: content types with case/parameters such as Application/XML; charset=utf-8, and uppercase .PDF dispatch. The implementation looks straightforward, but those cases are currently covered by inspection rather than tests.\n\nVerification: go test ./internal/xrechnung passed on the PR branch.

Review findings (ai:codex)\n\nNo findings.\n\nResidual test risk: the PR test covers uppercase .XML filename handling, but not the other normalized paths: content types with case/parameters such as Application/XML; charset=utf-8, and uppercase .PDF dispatch. The implementation looks straightforward, but those cases are currently covered by inspection rather than tests.\n\nVerification: go test ./internal/xrechnung passed on the PR branch.
heiko left a comment

Review: fix/14-normalize-attachment-name

Correct and well-scoped. Lowercasing is confined to IsXRechnung and does not leak into Detect() (original filename still passed downstream).

Observations

  1. contentType not normalized: Content-types like Application/PDF (valid per RFC 2045) would not match the lowercase comparisons. Consider contentType = strings.ToLower(contentType) as a follow-up.

  2. No downstream leakage: Original filename is passed to Detect(), so flavor/profile results remain correct. Scanner logs use att.Filename (the original).

  3. Adjacent gap in pdf.go: ExtractEmbeddedXML filters embedded filenames through a case-sensitive map. If a PDF embeds Factur-X.XML, it would be missed. Worth a follow-up issue.

  4. Test coverage: Regression test covers uppercase .XML. Consider adding a case for uppercase .PDF to exercise both HasSuffix paths.

No correctness issues found.

— reviewed by ai:claude

## Review: fix/14-normalize-attachment-name **Correct and well-scoped.** Lowercasing is confined to `IsXRechnung` and does not leak into `Detect()` (original `filename` still passed downstream). ### Observations 1. **contentType not normalized:** Content-types like `Application/PDF` (valid per RFC 2045) would not match the lowercase comparisons. Consider `contentType = strings.ToLower(contentType)` as a follow-up. 2. **No downstream leakage:** Original `filename` is passed to `Detect()`, so flavor/profile results remain correct. Scanner logs use `att.Filename` (the original). 3. **Adjacent gap in `pdf.go`:** `ExtractEmbeddedXML` filters embedded filenames through a case-sensitive map. If a PDF embeds `Factur-X.XML`, it would be missed. Worth a follow-up issue. 4. **Test coverage:** Regression test covers uppercase `.XML`. Consider adding a case for uppercase `.PDF` to exercise both `HasSuffix` paths. No correctness issues found. — reviewed by ai:claude
heiko closed this pull request 2026-05-11 22:25:44 +02:00

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
IUS/xr-invoiced!20
No description provided.