yusyus
2ef6e59d06
fix: stop blindly appending /index.html.md to non-.md URLs ( #277 )
...
The previous fix (a82cf69 ) only addressed anchor fragment stripping but
left the fundamental problem: _convert_to_md_urls() blindly appended
/index.html.md to ALL non-.md URLs from llms.txt. This only works for
Docusaurus sites — for sites like Discord docs it generates mass 404s.
Changes:
- _convert_to_md_urls() now strips anchors and deduplicates only,
preserving original URLs as-is instead of appending /index.html.md
- New _has_md_extension() helper uses urlparse().path.endswith(".md")
instead of error-prone ".md" in url substring matching
- Fixed ".md" in url checks at 4 locations (lines 465, 554, 716, 775)
- Removed 24 lines of dead commented-out code
- Added real-world e2e test against docs.discord.com (no mocks)
- Updated unit tests for new behavior (32 tests)
Fixes #277
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-20 23:44:35 +03:00
..
2025-10-29 23:19:32 +03:00
2026-03-20 22:12:23 +03:00
2025-10-19 02:08:58 +03:00
2026-01-17 23:02:11 +03:00
2026-02-07 22:55:02 +03:00
2025-10-19 17:01:37 +03:00
2026-03-14 23:39:23 +03:00
2026-02-22 22:32:31 +03:00
2026-01-31 21:30:00 +03:00
2026-01-17 17:48:15 +00:00
2026-01-18 00:01:30 +03:00
2026-02-08 13:34:48 +03:00
2026-02-08 14:49:45 +03:00
2026-01-17 23:02:11 +03:00
2026-01-29 22:56:33 +03:00
2026-01-17 23:25:12 +03:00
2026-03-01 10:54:32 +03:00
2026-03-15 15:30:15 +03:00
2026-03-16 00:53:35 +03:00
2026-03-01 10:54:45 +03:00
2026-02-08 14:42:27 +03:00
2026-01-17 17:29:21 +00:00
2026-02-05 21:27:41 +03:00
2026-02-22 20:43:17 +03:00
2026-02-26 22:25:59 +03:00
2026-01-17 17:48:15 +00:00
2026-01-17 17:48:15 +00:00
2026-03-01 10:54:32 +03:00
2026-02-22 20:43:17 +03:00
2026-01-17 22:54:40 +03:00
2026-01-17 23:02:11 +03:00
2026-02-08 14:49:45 +03:00
2026-02-08 14:49:45 +03:00
2026-02-22 22:32:31 +03:00
2026-02-26 00:30:40 +03:00
2026-03-15 02:34:41 +03:00
2026-01-17 17:48:15 +00:00
2026-01-17 23:02:11 +03:00
2026-02-08 14:42:27 +03:00
2026-01-17 17:48:15 +00:00
2026-01-17 23:02:11 +03:00
2026-01-17 23:02:11 +03:00
2026-01-17 23:02:11 +03:00
2026-03-20 22:35:12 +03:00
2026-01-18 00:01:30 +03:00
2026-01-31 21:30:00 +03:00
2026-02-08 14:42:27 +03:00
2026-01-18 00:01:30 +03:00
2026-01-17 17:48:15 +00:00
2026-02-22 22:32:31 +03:00
2026-02-22 20:43:17 +03:00
2026-02-22 20:43:17 +03:00
2026-02-22 20:43:17 +03:00
2026-02-15 20:24:32 +03:00
2026-03-20 23:44:35 +03:00
2026-03-20 23:44:35 +03:00
2026-01-17 17:29:21 +00:00
2026-01-17 23:02:11 +03:00
2026-01-18 00:01:30 +03:00
2026-01-17 17:29:21 +00:00
2026-03-14 23:53:47 +03:00
2026-02-22 20:43:17 +03:00
2026-02-22 20:43:17 +03:00
2026-02-08 14:44:46 +03:00
2026-02-07 20:59:03 +03:00
2026-02-22 22:32:31 +03:00
2026-01-17 17:48:15 +00:00
2026-01-17 17:48:15 +00:00
2026-02-08 14:42:27 +03:00
2026-03-15 15:30:15 +03:00
2026-01-17 17:48:15 +00:00
2026-03-16 00:53:35 +03:00
2026-01-17 17:48:15 +00:00
2026-02-15 20:24:32 +03:00
2026-02-08 13:33:15 +03:00
2026-01-17 22:54:40 +03:00
2026-02-04 21:00:49 +03:00
2026-02-22 22:32:31 +03:00
2026-03-01 10:54:32 +03:00
2026-01-17 17:29:21 +00:00
2026-02-22 21:52:04 +03:00
2026-01-17 17:48:15 +00:00
2026-02-08 14:42:27 +03:00
2026-02-22 20:43:17 +03:00
2026-01-17 17:48:15 +00:00
2026-01-17 23:02:11 +03:00
2026-03-14 23:53:47 +03:00
2026-02-08 14:42:27 +03:00
2026-01-18 13:48:37 +03:00
2026-01-18 00:01:30 +03:00
2026-01-17 23:02:11 +03:00
2026-02-15 20:24:32 +03:00
2026-01-17 23:02:11 +03:00
2026-02-08 14:42:27 +03:00
2026-02-22 22:32:31 +03:00
2026-03-15 02:16:32 +03:00
2026-03-15 02:16:32 +03:00
2026-02-22 20:43:17 +03:00
2026-02-02 23:08:25 +03:00
2026-02-22 20:43:17 +03:00
2026-02-26 22:25:59 +03:00
2026-02-18 22:50:05 +03:00
2026-03-20 22:35:12 +03:00
2026-02-22 22:32:31 +03:00
2026-03-01 10:54:32 +03:00
2026-01-17 17:48:15 +00:00
2026-03-20 23:44:35 +03:00
2026-01-17 17:29:21 +00:00
2026-03-01 21:48:21 +03:00
2026-03-01 19:48:02 +03:00
2026-03-01 10:54:32 +03:00
2026-02-18 22:50:05 +03:00
2026-02-18 22:50:05 +03:00
2026-02-18 22:50:05 +03:00