Skip to content

Commit fcbfe78

Browse files
committed
feat: ⛏️ enhanced contribution and precommit added
1 parent 21147c4 commit fcbfe78

File tree

129 files changed

+3174
-1671
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

129 files changed

+3174
-1671
lines changed

.gitignore

+150
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,153 @@ lib/
4242
# extras
4343
cache/
4444
run_smart_scraper.py
45+
46+
# Byte-compiled / optimized / DLL files
47+
__pycache__/
48+
*.py[cod]
49+
*$py.class
50+
51+
# C extensions
52+
*.so
53+
54+
# Distribution / packaging
55+
.Python
56+
build/
57+
develop-eggs/
58+
dist/
59+
downloads/
60+
eggs/
61+
.eggs/
62+
lib/
63+
lib64/
64+
parts/
65+
sdist/
66+
var/
67+
wheels/
68+
share/python-wheels/
69+
*.egg-info/
70+
.installed.cfg
71+
*.egg
72+
MANIFEST
73+
74+
# PyInstaller
75+
*.manifest
76+
*.spec
77+
78+
# Installer logs
79+
pip-log.txt
80+
pip-delete-this-directory.txt
81+
82+
# Unit test / coverage reports
83+
htmlcov/
84+
.tox/
85+
.nox/
86+
.coverage
87+
.coverage.*
88+
.cache
89+
nosetests.xml
90+
coverage.xml
91+
*.cover
92+
*.py,cover
93+
.hypothesis/
94+
.pytest_cache/
95+
.ruff_cache/
96+
cover/
97+
98+
# Translations
99+
*.mo
100+
*.pot
101+
102+
# Django stuff:
103+
*.log
104+
local_settings.py
105+
db.sqlite3
106+
db.sqlite3-journal
107+
108+
# Flask stuff:
109+
instance/
110+
.webassets-cache
111+
112+
# Scrapy stuff:
113+
.scrapy
114+
115+
# Sphinx documentation
116+
docs/_build/
117+
118+
# PyBuilder
119+
.pybuilder/
120+
target/
121+
122+
# Jupyter Notebook
123+
.ipynb_checkpoints
124+
125+
# IPython
126+
profile_default/
127+
ipython_config.py
128+
129+
# pyenv
130+
.python-version
131+
132+
# pipenv
133+
Pipfile.lock
134+
135+
# poetry
136+
poetry.lock
137+
138+
# pdm
139+
pdm.lock
140+
.pdm.toml
141+
142+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
143+
__pypackages__/
144+
145+
# Celery stuff
146+
celerybeat-schedule
147+
celerybeat.pid
148+
149+
# SageMath parsed files
150+
*.sage.py
151+
152+
# Environments
153+
.env
154+
.venv
155+
env/
156+
venv/
157+
ENV/
158+
env.bak/
159+
venv.bak/
160+
161+
# Spyder project settings
162+
.spyderproject
163+
.spyproject
164+
165+
# Rope project settings
166+
.ropeproject
167+
168+
# mkdocs documentation
169+
/site
170+
171+
# mypy
172+
.mypy_cache/
173+
.dmypy.json
174+
dmypy.json
175+
176+
# Pyre type checker
177+
.pyre/
178+
179+
# pytype static type analyzer
180+
.pytype/
181+
182+
# Cython debug symbols
183+
cython_debug/
184+
185+
# PyCharm
186+
.idea/
187+
188+
# VS Code
189+
.vscode/
190+
191+
# macOS
192+
.DS_Store
193+
194+
dev.ipynb

.pre-commit-config.yaml

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
repos:
2+
- repo: https://github.com/psf/black
3+
rev: 24.8.0
4+
hooks:
5+
- id: black
6+
7+
- repo: https://github.com/charliermarsh/ruff-pre-commit
8+
rev: v0.6.9
9+
hooks:
10+
- id: ruff
11+
12+
- repo: https://github.com/pycqa/isort
13+
rev: 5.13.2
14+
hooks:
15+
- id: isort
16+
17+
- repo: https://github.com/pre-commit/pre-commit-hooks
18+
rev: v4.6.0
19+
hooks:
20+
- id: trailing-whitespace
21+
- id: end-of-file-fixer
22+
- id: check-yaml
23+
exclude: mkdocs.yml

CONTRIBUTING.md

+44-83
Original file line numberDiff line numberDiff line change
@@ -1,83 +1,44 @@
1-
# Contributing to ScrapeGraphAI
2-
3-
Thank you for your interest in contributing to **ScrapeGraphAI**! We welcome contributions from the community to help improve and grow the project. This document outlines the guidelines and steps for contributing.
4-
5-
## Table of Contents
6-
7-
- [Getting Started](#getting-started)
8-
- [Contributing Guidelines](#contributing-guidelines)
9-
- [Code Style](#code-style)
10-
- [Submitting a Pull Request](#submitting-a-pull-request)
11-
- [Reporting Issues](#reporting-issues)
12-
- [License](#license)
13-
14-
## Getting Started
15-
16-
To get started with contributing, follow these steps:
17-
18-
1. Fork the repository on GitHub **(FROM pre/beta branch)**.
19-
2. Clone your forked repository to your local machine.
20-
3. Install the necessary dependencies from requirements.txt or via pyproject.toml as you prefere :).
21-
4. Make your changes or additions.
22-
5. Test your changes thoroughly.
23-
6. Commit your changes with descriptive commit messages.
24-
7. Push your changes to your forked repository.
25-
8. Submit a pull request to the pre/beta branch.
26-
27-
N.B All the pull request to the main branch will be rejected!
28-
29-
## Contributing Guidelines
30-
31-
Please adhere to the following guidelines when contributing to ScrapeGraphAI:
32-
33-
- Follow the code style and formatting guidelines specified in the [Code Style](#code-style) section.
34-
- Make sure your changes are well-documented and include any necessary updates to the project's documentation and requirements if needed.
35-
- Write clear and concise commit messages that describe the purpose of your changes and the last commit before the pull request has to follow the following format:
36-
- `feat: Add new feature`
37-
- `fix: Correct issue with existing feature`
38-
- `docs: Update documentation`
39-
- `style: Improve formatting and style`
40-
- `refactor: Restructure code`
41-
- `test: Add or update tests`
42-
- `perf: Improve performance`
43-
- Be respectful and considerate towards other contributors and maintainers.
44-
45-
## Code Style
46-
47-
Please make sure to format your code accordingly before submitting a pull request.
48-
49-
### Python
50-
51-
- [Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)
52-
- [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
53-
- [The Hitchhiker's Guide to Python](https://docs.python-guide.org/writing/style/)
54-
- [Pylint style of code for the documentation](https://pylint.pycqa.org/en/1.6.0/tutorial.html)
55-
56-
## Submitting a Pull Request
57-
58-
To submit your changes for review, please follow these steps:
59-
60-
1. Ensure that your changes are pushed to your forked repository.
61-
2. Go to the main repository on GitHub and navigate to the "Pull Requests" tab.
62-
3. Click on the "New Pull Request" button.
63-
4. Select your forked repository and the branch containing your changes.
64-
5. Provide a descriptive title and detailed description for your pull request.
65-
6. Reviewers will provide feedback and discuss any necessary changes.
66-
7. Once your pull request is approved, it will be merged into the pre/beta branch.
67-
68-
## Reporting Issues
69-
70-
If you encounter any issues or have suggestions for improvements, please open an issue on the GitHub repository. Provide a clear and detailed description of the problem or suggestion, along with any relevant information or steps to reproduce the issue.
71-
72-
## License
73-
74-
ScrapeGraphAI is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for more information.
75-
By contributing to this project, you agree to license your contributions under the same license.
76-
77-
ScrapeGraphAI uses code from the Langchain
78-
frameworks. You find their original licenses below.
79-
80-
LANGCHAIN LICENSE
81-
https://github.com/langchain-ai/langchain/blob/master/LICENSE
82-
83-
Can't wait to see your contributions! :smile:
1+
# Contributing to ScrapeGraphAI 🚀
2+
3+
Hey there! Thanks for checking out **ScrapeGraphAI**! We're excited to have you here! 🎉
4+
5+
## Quick Start Guide 🏃‍♂️
6+
7+
1. Fork the repository from the **pre/beta branch** 🍴
8+
2. Clone your fork locally 💻
9+
3. Install uv (if you haven't):
10+
```bash
11+
curl -LsSf https://astral.sh/uv/install.sh | sh
12+
```
13+
4. Run `uv sync` (creates virtual env & installs dependencies) ⚡
14+
5. Run `uv run pre-commit install` 🔧
15+
6. Make your awesome changes ✨
16+
7. Test thoroughly 🧪
17+
8. Push & open a PR to the pre/beta branch 🎯
18+
19+
## Contribution Guidelines 📝
20+
21+
Keep it clean and simple:
22+
- Follow our code style (PEP 8 & Google Python Style) 🎨
23+
- Document your changes clearly 📚
24+
- Use these commit prefixes for your final PR commit:
25+
```
26+
feat: ✨ New feature
27+
fix: 🐛 Bug fix
28+
docs: 📚 Documentation
29+
style: 💅 Code style
30+
refactor: ♻️ Code changes
31+
test: 🧪 Testing
32+
perf: ⚡ Performance
33+
```
34+
- Be nice to others! 💝
35+
36+
## Need Help? 🤔
37+
38+
Found a bug or have a cool idea? Open an issue and let's chat! 💬
39+
40+
## License 📜
41+
42+
MIT Licensed. See [LICENSE](LICENSE) file for details.
43+
44+
Let's build something amazing together! 🌟

Makefile

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Makefile for Project Automation
2+
3+
.PHONY: install lint type-check test build all clean
4+
5+
# Variables
6+
PACKAGE_NAME = scrapegraphai
7+
TEST_DIR = tests
8+
9+
# Default target
10+
all: lint type-check test
11+
12+
# Install project dependencies
13+
install:
14+
uv sync
15+
uv run pre-commit install
16+
17+
# Linting and Formatting Checks
18+
lint:
19+
uv run ruff check $(PACKAGE_NAME) $(TEST_DIR)
20+
uv run black --check $(PACKAGE_NAME) $(TEST_DIR)
21+
uv run isort --check-only $(PACKAGE_NAME) $(TEST_DIR)
22+
23+
# Type Checking with MyPy
24+
type-check:
25+
uv run mypy $(PACKAGE_NAME) $(TEST_DIR)
26+
27+
# Run Tests with Coverage
28+
test:
29+
uv run pytest --cov=$(PACKAGE_NAME) --cov-report=xml $(TEST_DIR)/
30+
31+
# Run Pre-Commit Hooks
32+
pre-commit:
33+
uv run pre-commit run --all-files
34+
35+
# Clean Up Generated Files
36+
clean:
37+
rm -rf dist/
38+
rm -rf build/
39+
rm -rf *.egg-info
40+
rm -rf htmlcov/
41+
rm -rf .mypy_cache/
42+
rm -rf .pytest_cache/
43+
rm -rf .ruff_cache/
44+
rm -rf .uv/
45+
rm -rf .venv/
46+
47+
# Build the Package
48+
build:
49+
uv build --no-sources

examples/openai/smart_scraper_openai.py

+7-4
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
1-
"""
1+
"""
22
Basic example of scraping pipeline using SmartScraper
33
"""
4-
import os
4+
55
import json
6+
import os
7+
68
from dotenv import load_dotenv
9+
710
from scrapegraphai.graphs import SmartScraperGraph
811
from scrapegraphai.utils import prettify_exec_info
912

@@ -17,7 +20,7 @@
1720
graph_config = {
1821
"llm": {
1922
"api_key": os.getenv("OPENAI_API_KEY"),
20-
"model": "openai/gpt-4o",
23+
"model": "openai/gpt-4o00",
2124
},
2225
"verbose": True,
2326
"headless": False,
@@ -30,7 +33,7 @@
3033
smart_scraper_graph = SmartScraperGraph(
3134
prompt="Extract me the first article",
3235
source="https://www.wired.com",
33-
config=graph_config
36+
config=graph_config,
3437
)
3538

3639
result = smart_scraper_graph.run()

0 commit comments

Comments
 (0)