ScrapeGraphAI
diff --git a/‎CHANGELOG.md
Lines changed: 0 additions & 14 deletions b/‎CHANGELOG.md
Lines changed: 0 additions & 14 deletions
diff --git a/‎README.md
Lines changed: 19 additions & 7 deletions b/‎README.md
Lines changed: 19 additions & 7 deletions
diff --git a/‎examples/extras/chromium_selenium.py
Lines changed: 113 additions & 0 deletions b/‎examples/extras/chromium_selenium.py
Lines changed: 113 additions & 0 deletions
diff --git a/‎pyproject.toml
Lines changed: 30 additions & 2 deletions b/‎pyproject.toml
Lines changed: 30 additions & 2 deletions
diff --git a/‎scrapegraphai/__init__.py
Lines changed: 1 addition & 0 deletions b/‎scrapegraphai/__init__.py
Lines changed: 1 addition & 0 deletions
diff --git a/‎scrapegraphai/_version.py
Lines changed: 3 additions & 0 deletions b/‎scrapegraphai/_version.py
Lines changed: 3 additions & 0 deletions
diff --git a/‎scrapegraphai/docloaders/chromium.py
Lines changed: 9 additions & 0 deletions b/‎scrapegraphai/docloaders/chromium.py
Lines changed: 9 additions & 0 deletions
diff --git a/‎scrapegraphai/graphs/base_graph.py
Lines changed: 13 additions & 20 deletions b/‎scrapegraphai/graphs/base_graph.py
Lines changed: 13 additions & 20 deletions
diff --git a/‎scrapegraphai/graphs/code_generator_graph.py
Lines changed: 0 additions & 1 deletion b/‎scrapegraphai/graphs/code_generator_graph.py
Lines changed: 0 additions & 1 deletion
diff --git a/‎scrapegraphai/graphs/csv_scraper_graph.py
Lines changed: 1 addition & 1 deletion b/‎scrapegraphai/graphs/csv_scraper_graph.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎scrapegraphai/graphs/depth_search_graph.py
Lines changed: 0 additions & 1 deletion b/‎scrapegraphai/graphs/depth_search_graph.py
Lines changed: 0 additions & 1 deletion
@@ -1,17 +1,3 @@
-## [1.34.0-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.33.2...v1.34.0-beta.1) (2024-12-08)
-
-
-### Features
-
-* add new model token ([2a032d6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/2a032d6d7cf18c435fba59764e7cb28707737f0c))
-* added scrolling method to chromium docloader ([1c8b910](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/1c8b910562112947a357277bca9dc81619b72e61))
-
-
-### CI
-
-* **release:** 1.33.0-beta.1 [skip ci] ([60e2fdf](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/60e2fdff78e405e127ba8b10daa454d634bccf46)), closes [#822](https://github.com/ScrapeGraphAI/Scrapegraph-ai/issues/822) [#822](https://github.com/ScrapeGraphAI/Scrapegraph-ai/issues/822)
-* **release:** 1.33.0-beta.2 [skip ci] ([09995cd](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/09995cd56c96cfa709a68bea73113ab5debfcb97))
-
 ## [1.33.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.33.1...v1.33.2) (2024-12-06)
 
 
 
@@ -87,8 +87,8 @@ graph_config = {
 
 # Create the SmartScraperGraph instance
 smart_scraper_graph = SmartScraperGraph(
-    prompt="Find some information about what does the company do, the name and a contact email.",
-    source="https://scrapegraphai.com/",
+    prompt="Extract me all the news from the website",
+    source="https://www.wired.com",
     config=graph_config
 )
 
@@ -100,10 +100,20 @@ print(json.dumps(result, indent=4))
 The output will be a dictionary like the following:
 
 ```python
-{
-    "company": "ScrapeGraphAI",
-    "name": "ScrapeGraphAI Extracting content from websites and local documents using LLM",
-    "contact_email": "[email protected]"
+"result": {
+    "news": [
+      {
+        "title": "The New Jersey Drone Mystery May Not Actually Be That Mysterious",
+        "link": "https://www.wired.com/story/new-jersey-drone-mystery-maybe-not-drones/",
+        "author": "Lily Hay Newman"
+      },
+      {
+        "title": "Former ByteDance Intern Accused of Sabotage Among Winners of Prestigious AI Award",
+        "link": "https://www.wired.com/story/bytedance-intern-best-paper-neurips/",
+        "author": "Louise Matsakis"
+      },
+    ...
+    ]
 }
 ```
 There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.
@@ -126,7 +136,7 @@ Remember to have [Ollama](https://ollama.com/) installed and download the models
 ## 🔍 Demo
 Official streamlit demo:
 
-[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-ai-web-dashboard.streamlit.app)
+[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-demo-demo.streamlit.app)
 
 Try it directly on the web using Google Colab:
 
@@ -203,3 +213,5 @@ ScrapeGraphAI is licensed under the MIT License. See the [LICENSE](https://githu
 
 - We would like to thank all the contributors to the project and the open-source community for their support.
 - ScrapeGraphAI is meant to be used for data exploration and research purposes only. We are not responsible for any misuse of the library.
+
+Made with ❤️ by [ScrapeGraph AI](https://scrapegraphai.com)
@@ -0,0 +1,113 @@
+import asyncio
+import os
+import json
+from dotenv import load_dotenv
+from scrapegraphai.docloaders.chromium import ChromiumLoader  # Import your ChromiumLoader class
+from scrapegraphai.graphs import SmartScraperGraph
+from scrapegraphai.utils import prettify_exec_info
+from aiohttp import ClientError
+
+# Load environment variables for API keys
+load_dotenv()
+
+# ************************************************
+# Define function to analyze content with ScrapegraphAI
+# ************************************************
+async def analyze_content_with_scrapegraph(content: str):
+    """
+    Analyze scraped content using ScrapegraphAI.
+    
+    Args:
+        content (str): The scraped HTML or text content.
+
+    Returns:
+        dict: The result from ScrapegraphAI analysis.
+    """
+    try:
+        # Initialize ScrapegraphAI SmartScraperGraph
+        smart_scraper = SmartScraperGraph(
+            prompt="Summarize the main content of this webpage and extract any contact information.",
+            source=content,  # Pass the content directly
+            config={
+                "llm": {
+                    "api_key": os.getenv("OPENAI_API_KEY"),
+                    "model": "openai/gpt-4o",
+                },
+                "verbose": True
+            }
+        )
+        result = smart_scraper.run()
+        return result
+    except Exception as e:
+        print(f"❌ ScrapegraphAI analysis failed: {e}")
+        return {"error": str(e)}
+
+# ************************************************
+# Test scraper and ScrapegraphAI pipeline
+# ************************************************
+async def test_scraper_with_analysis(scraper: ChromiumLoader, urls: list):
+    """
+    Test scraper for the given backend and URLs, then analyze content with ScrapegraphAI.
+
+    Args:
+        scraper (ChromiumLoader): The ChromiumLoader instance.
+        urls (list): A list of URLs to scrape.
+    """
+    for url in urls:
+        try:
+            print(f"\n🔎 Scraping: {url} using {scraper.backend}...")
+            result = await scraper.scrape(url)
+
+            if "Error" in result or not result.strip():
+                print(f"❌ Failed to scrape {url}: {result}")
+            else:
+                print(f"✅ Successfully scraped {url}. Content (first 200 chars): {result[:200]}")
+
+                # Pass scraped content to ScrapegraphAI for analysis
+                print("🤖 Analyzing content with ScrapegraphAI...")
+                analysis_result = await analyze_content_with_scrapegraph(result)
+                print("📝 Analysis Result:")
+                print(json.dumps(analysis_result, indent=4))
+
+        except ClientError as ce:
+            print(f"❌ Network error while scraping {url}: {ce}")
+        except Exception as e:
+            print(f"❌ Unexpected error while scraping {url}: {e}")
+
+# ************************************************
+# Main Execution
+# ************************************************
+async def main():
+    urls_to_scrape = [
+        "https://example.com",
+        "https://www.python.org",
+        "https://invalid-url.test"
+    ]
+
+    # Test with Playwright backend
+    print("\n--- Testing Playwright Backend ---")
+    try:
+        scraper_playwright = ChromiumLoader(urls=urls_to_scrape, backend="playwright", headless=True)
+        await test_scraper_with_analysis(scraper_playwright, urls_to_scrape)
+    except ImportError as ie:
+        print(f"❌ Playwright ImportError: {ie}")
+    except Exception as e:
+        print(f"❌ Error initializing Playwright ChromiumLoader: {e}")
+
+    # Test with Selenium backend
+    print("\n--- Testing Selenium Backend ---")
+    try:
+        scraper_selenium = ChromiumLoader(urls=urls_to_scrape, backend="selenium", headless=True)
+        await test_scraper_with_analysis(scraper_selenium, urls_to_scrape)
+    except ImportError as ie:
+        print(f"❌ Selenium ImportError: {ie}")
+    except Exception as e:
+        print(f"❌ Error initializing Selenium ChromiumLoader: {e}")
+
+if __name__ == "__main__":
+    try:
+        asyncio.run(main())
+    except KeyboardInterrupt:
+        print("❌ Program interrupted by user.")
+    except Exception as e:
+        print(f"❌ Program crashed: {e}")
@@ -3,7 +3,8 @@ name = "scrapegraphai"
 
 
 
-version = "1.34.0b1"
+version = "1.33.2"
+
 
 
 
@@ -114,9 +115,36 @@ screenshot_scraper = [
 ]
 
 [build-system]
-requires = ["hatchling"]
+requires = ["hatchling>=1.0.0", "hatch-vcs"]
 build-backend = "hatchling.build"
 
+[tool.hatch.build]
+packages = ["scrapegraphai"]
+exclude = [
+    "tests/**",
+    "examples/**",
+]
+
+[tool.hatch.version]
+source = "vcs"
+
+[tool.hatch.build.hooks.vcs]
+version-file = "scrapegraphai/_version.py"
+
+[tool.hatch.build.targets.wheel]
+packages = ["scrapegraphai"]
+
+[tool.hatch.build.targets.sdist]
+include = [
+    "/scrapegraphai",
+    "pyproject.toml",
+    "README.md",
+    "LICENSE",
+]
+
+[tool.hatch.metadata]
+allow-direct-references = true
+
 [dependency-groups]
 dev = [
     "burr[start]==0.22.1",
 
@@ -1,3 +1,4 @@
 """
 __init__.py file for scrapegraphai folder
 """
+__version__ = "1.33.7"
@@ -0,0 +1,3 @@
+"""Version information."""
+__version__ = "1.33.7"
+version = __version__ 
@@ -66,6 +66,15 @@ def __init__(
         self.load_state = load_state
         self.requires_js_support = requires_js_support
         self.storage_state = storage_state
+        
+    async def scrape(self, url:str) -> str:
+        if self.backend == "playwright":
+            return await self.ascrape_playwright(url)
+        elif self.backend == "selenium":
+            return await self.ascrape_undetected_chromedriver(url)
+        else:
+            raise ValueError(f"Unsupported backend: {self.backend}")     
+
 
     async def ascrape_undetected_chromedriver(self, url: str) -> str:
         """
 
@@ -56,13 +56,11 @@ def __init__(self, nodes: list, edges: list, entry_point: str,
         self.callback_manager = CustomLLMCallbackManager()
 
         if nodes[0].node_name != entry_point.node_name:
-            # raise a warning if the entry point is not the first node in the list
             warnings.warn(
                 "Careful! The entry point node is different from the first node in the graph.")
 
         self._set_conditional_node_edges()
 
-        # Burr configuration
         self.use_burr = use_burr
         self.burr_config = burr_config or {}
 
@@ -91,7 +89,8 @@ def _set_conditional_node_edges(self):
             if node.node_type == 'conditional_node':
                 outgoing_edges = [(from_node, to_node) for from_node, to_node in self.raw_edges if from_node.node_name == node.node_name]
                 if len(outgoing_edges) != 2:
-                    raise ValueError(f"ConditionalNode '{node.node_name}' must have exactly two outgoing edges.")
+                    raise ValueError(f"""ConditionalNode '{node.node_name}'
+                                     must have exactly two outgoing edges.""")
                 node.true_node_name = outgoing_edges[0][1].node_name
                 try:
                     node.false_node_name = outgoing_edges[1][1].node_name
@@ -151,14 +150,14 @@ def _get_schema(self, current_node):
         """Extracts schema information from the node configuration."""
         if not hasattr(current_node, "node_config"):
             return None
-            
+
         if not isinstance(current_node.node_config, dict):
             return None
-            
+
         schema_config = current_node.node_config.get("schema")
         if not schema_config or isinstance(schema_config, dict):
             return None
-            
+
         try:
             return schema_config.schema()
         except Exception:
@@ -167,7 +166,7 @@ def _get_schema(self, current_node):
     def _execute_node(self, current_node, state, llm_model, llm_model_name):
         """Executes a single node and returns execution information."""
         curr_time = time.time()
-        
+
         with self.callback_manager.exclusive_get_callback(llm_model, llm_model_name) as cb:
             result = current_node.execute(state)
             node_exec_time = time.time() - curr_time
@@ -197,17 +196,17 @@ def _get_next_node(self, current_node, result):
             raise ValueError(
                 f"Conditional Node returned a node name '{result}' that does not exist in the graph"
             )
-        
+
         return self.edges.get(current_node.node_name)
 
     def _execute_standard(self, initial_state: dict) -> Tuple[dict, list]:
         """
-        Executes the graph by traversing nodes starting from the entry point using the standard method.
+        Executes the graph by traversing nodes
+        starting from the entry point using the standard method.
         """
         current_node_name = self.entry_point
         state = initial_state
-        
-        # Tracking variables
+
         total_exec_time = 0.0
         exec_info = []
         cb_total = {
@@ -230,16 +229,13 @@ def _execute_standard(self, initial_state: dict) -> Tuple[dict, list]:
 
         while current_node_name:
             current_node = self._get_node_by_name(current_node_name)
-            
-            # Update source information if needed
+
             if source_type is None:
                 source_type, source, prompt = self._update_source_info(current_node, state)
-            
-            # Get model information if needed
+
             if llm_model is None:
                 llm_model, llm_model_name, embedder_model = self._get_model_info(current_node)
-            
-            # Get schema if needed
+
             if schema is None:
                 schema = self._get_schema(current_node)
 
@@ -273,7 +269,6 @@ def _execute_standard(self, initial_state: dict) -> Tuple[dict, list]:
                 )
                 raise e
 
-        # Add total results to execution info
         exec_info.append({
             "node_name": "TOTAL RESULT",
             "total_tokens": cb_total["total_tokens"],
@@ -284,7 +279,6 @@ def _execute_standard(self, initial_state: dict) -> Tuple[dict, list]:
             "exec_time": total_exec_time,
         })
 
-        # Log final execution results
         graph_execution_time = time.time() - start_time
         response = state.get("answer", None) if source_type == "url" else None
         content = state.get("parsed_doc", None) if response is not None else None
@@ -343,4 +337,3 @@ def append_node(self, node):
         self.raw_edges.append((last_node, node))
         self.nodes.append(node)
         self.edges = self._create_edges({e for e in self.raw_edges})
-
@@ -17,7 +17,6 @@
     GenerateCodeNode,
 )
 
-
 class CodeGeneratorGraph(AbstractGraph):
     """
     CodeGeneratorGraph is a script generator pipeline that generates
 
@@ -59,7 +59,7 @@ def _create_graph(self):
         """
         Creates the graph of nodes representing the workflow for web scraping.
         """
-    
+
         fetch_node = FetchNode(
             input="csv | csv_dir",
             output=["doc"],
 
@@ -15,7 +15,6 @@
     GenerateAnswerNodeKLevel,
 )
 
-
 class DepthSearchGraph(AbstractGraph):
     """
     CodeGeneratorGraph is a script generator pipeline that generates
-Original file line number
+Diff line change
@@ @@ -1,3 +1,4 @@ @@
 """
 __init__.py file for scrapegraphai folder
 """
 +__version__ = "1.33.7"
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+"""Version information."""`
	`2`	`+__version__ = "1.33.7"`
	`3`	`+version = __version__`
Original file line number	Diff line number	Diff line change
`@@ -17,7 +17,6 @@`
`17`	`17`	`GenerateCodeNode,`
`18`	`18`	`)`
`19`	`19`
`20`		`-`
`21`	`20`	`class CodeGeneratorGraph(AbstractGraph):`
`22`	`21`	`"""`
`23`	`22`	`CodeGeneratorGraph is a script generator pipeline that generates`
Original file line number	Diff line number	Diff line change
`@@ -15,7 +15,6 @@`
`15`	`15`	`GenerateAnswerNodeKLevel,`
`16`	`16`	`)`
`17`	`17`
`18`		`-`
`19`	`18`	`class DepthSearchGraph(AbstractGraph):`
`20`	`19`	`"""`
`21`	`20`	`CodeGeneratorGraph is a script generator pipeline that generates`