Skip to content

Commit 5d9bc5b

Browse files
author
David de Hilster
authored
Merge pull request #17 from VisualText/NLP-TUTORIALS-012
NLP-TUTORIALS-012 fixed python certificate problem
2 parents b9a2071 + 1216cc1 commit 5d9bc5b

File tree

2 files changed

+7
-4
lines changed

2 files changed

+7
-4
lines changed

tutorial-13/tutorial-13-a/README.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
# Tutorial 13-a
22

3-
This analyzer parses the URLs from this link: https://state.1keydata.com/ into a URL list. It then has a python script to fetch the webpages and save them in a folder. This folder then can easily be moved into the second analyzer where the pages will be processed.
3+
This analyzer parses the URLs from this link: https://state.1keydata.com/ into a URL list. It then has a python script to fetch the webpages and save them in a folder. This folder then can easily be moved into the second analyzer where the pages will be processed.
4+
5+
## NOTE
6+
7+
You will have to install BeautifulSoup and certifi before using the python script.

tutorial-13/tutorial-13-a/input/urlfetch.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,11 @@
55
from bs4 import BeautifulSoup
66
from pathlib import Path
77
import re
8+
import certifi
89

910
wordsfile = os.path.join(os.path.dirname(__file__), "urls.txt")
1011
file1 = codecs.open(wordsfile, "r", "utf-8")
1112
lines = file1.readlines()
12-
13-
urlbase = "https://state.1keydata.com/"
1413

1514
count = 0
1615
for url in lines:
@@ -31,7 +30,7 @@
3130
found = False
3231

3332
try:
34-
page = urllib.request.urlopen(url)
33+
page = urllib.request.urlopen(url, cafile=certifi.where())
3534
except HTTPError as e:
3635
print(' Error code: ', e.code)
3736
file1 = open(os.path.join(os.path.dirname(__file__), "urlorphans.txt"), "a")

0 commit comments

Comments
 (0)