fix: improved links (URLs) extraction for parse_node, resolves #822 #828

Levyathanus · 2024-11-24T13:53:13Z

The method _extract_urls of the parse_node was not extracting well formed URLs, causing problems when using the function urljoin from urllib.parse (ref. issue #822). These changes try to parse the URLs more precisely including: "absolute" URLs (e.g.: "www.website.com/...", "http://www.website.com/...", "https://www.website.com/...", "website.com/...", "http://website.com/...", "https://website.com/...", etc.), image URLs, "relative" URLs (e.g.: "/test/page.html", "/?test=test", etc.) which will be joined later with the source URL.

…#822

VinciGit00

Hi, thank you for the contribution

github-actions · 2024-11-26T07:26:43Z

🎉 This PR is included in version 1.32.0-beta.3 🎉

The release is available on:

v1.32.0-beta.3
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2024-12-05T21:53:06Z

🎉 This PR is included in version 1.33.0-beta.1 🎉

The release is available on:

v1.33.0-beta.1
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2024-12-05T21:53:27Z

🎉 This PR is included in version 1.33.0 🎉

The release is available on:

v1.33.0
GitHub release

Your semantic-release bot 📦🚀

fix: improved links extraction for parse_node, resolves ScrapeGraphAI…

7da7bfe

…#822

VinciGit00 approved these changes Nov 25, 2024

View reviewed changes

VinciGit00 merged commit adddd64 into ScrapeGraphAI:pre/beta Nov 26, 2024
1 check passed

github-actions bot added the released on @dev label Nov 26, 2024

github-actions bot added the released on @stable label Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: improved links (URLs) extraction for parse_node, resolves #822 #828

fix: improved links (URLs) extraction for parse_node, resolves #822 #828

Uh oh!

Levyathanus commented Nov 24, 2024

Uh oh!

VinciGit00 left a comment

Uh oh!

Uh oh!

github-actions bot commented Nov 26, 2024

Uh oh!

github-actions bot commented Dec 5, 2024

Uh oh!

github-actions bot commented Dec 5, 2024

Uh oh!

Uh oh!

Uh oh!

fix: improved links (URLs) extraction for parse_node, resolves #822 #828

fix: improved links (URLs) extraction for parse_node, resolves #822 #828

Uh oh!

Conversation

Levyathanus commented Nov 24, 2024

Uh oh!

VinciGit00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 26, 2024

Uh oh!

github-actions bot commented Dec 5, 2024

Uh oh!

github-actions bot commented Dec 5, 2024

Uh oh!

Uh oh!