Skip to content

OmniScraperGraph Invalid IPv6 URL on certain links  #822

Closed
@Graphiee

Description

@Graphiee

Hi guys, first of all great library (but I guess you're already aware of that). Small issue I've encoundered in OmniScraperGraph:

Describe the bug
When using certain links with OmniScraperGraph, the program raises Invalid IPv6 URL error. The issue occurs while executing (parse.py, line 497).

Python 3.10.15
Scrapegraphai 1.31.1

Code to reproduce

from scrapegraphai.graphs import OmniScraperGraph
import os


graph_config = {
    "llm": {
        "api_key": os.getenv("OPENAI_API_KEY"),
        "model": "openai/gpt-4o-mini",
    },
    "verbose": True,
    "headless": True,
}

url = "https://justjoin.it/job-offer/panowie-programisci-timetable-optimization-specialist-warszawa-other"
prompt = "Get information about the job offer."
smart_scraper_graph = OmniScrapperGraph(
    prompt=prompt,
    source=cleaned_url,
    config=graph_config
)

result = smart_scraper_graph.run()

On the other hand, running on https://nofluffjobs.com/pl/job/experienced-linux-engineer-comscore-via-cc-remote-wffuvhi5 works flawlessly. Assuming this might be a domain specific issue, but

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions