Skip to content

Can source part use selenium driver.page_source? #938

Open
@JamesGGGG

Description

@JamesGGGG

Describe the bug
error missing "content"

smart_scraper_graph = SmartScraperGraph(
prompt="summarize information in this page.",

also accepts a string with the already downloaded HTML code

source=driver.page_souce,
config=graph_config
)
result = smart_scraper_graph.run()

error: 'Input to PromptTemplate is missing variables {'"content"'}. Expected: ['"content"', 'question'] Received: ['question']\nNote: if you intended {"content"} to be part of the string and not a variable, please escape it with double curly braces like: '{{"content"}}'.\nFor troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/INVALID_PROMPT_INPUT '

Other way:
html_file_path = self.save_page_source(headless_driver, "information")
with open(html_file_path, 'r', encoding='utf-8') as file:
html_content = file.read()
smart_scraper_graph = SmartScraperGraph(
prompt="summarize information in this page.",

also accepts a string with the already downloaded HTML code

source=html_content,
config=graph_config
)

result = smart_scraper_graph.run()

error remains the same

Desktop (please complete the following information):

  • OS: mac os
  • Browser chrome
  • Version

Additional context
I created a selenium driver to get a url, after login, click some buttons, get to a page, I hope scrapegraphai can help me scrape informations in the page.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions