Skip to content

Commit 6245acd

Browse files
committed
add --urls flag so user doesn't have to manually edit list
1 parent 4d48302 commit 6245acd

File tree

2 files changed

+7
-17
lines changed

2 files changed

+7
-17
lines changed

README.md

+2-10
Original file line numberDiff line numberDiff line change
@@ -29,28 +29,20 @@ A concurrent web scraper that extracts links from a list of URLs.
2929

3030
1. Make sure you have Go installed on your system.
3131
2. Save the code as a `.go` file (e.g., `scraper.go`).
32-
3. edit the list of URLs to fetch
33-
4. Run the program using `go run scraper.go`.
32+
3. Run the program using `go run scraper.go --urls="https://example.com,https://example2.com, ...etc"`.
3433

3534
**Dependencies:**
3635

3736
* `golang.org/x/net/html`: For HTML parsing.
3837

3938
**Example Usage:**
4039

41-
The program currently scrapes the following URLs:
42-
43-
* "https://google.com"
44-
* "https://old.reddit.com/"
45-
* "https://timevko.website"
46-
47-
You can modify the `urls` slice in the `main` function to scrape different websites.
40+
`go run fue.go --urls="https://timevko.website,https://old.reddit.com"`
4841

4942
**Potential Improvements:**
5043

5144
* **Error Handling:** More robust error handling for network requests and HTML parsing.
5245
* **Politeness:** Implement delays between requests to avoid overloading the target servers.
5346
* **Data Storage:** Store the extracted links in a file or database.
54-
* **Command-line Arguments:** Allow users to specify the URLs and other options through command-line arguments.
5547
* **Deduplication:** Remove duplicate links from the output.
5648
* **Advanced Extraction:** Use CSS selectors (e.g., with the `goquery` library) for more specific link extraction.

fue.go

+5-7
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
package main
22

33
import (
4+
"flag"
45
"fmt"
56
"net/http"
67
"net/url"
8+
"strings"
79
"sync"
810

911
"golang.org/x/net/html"
@@ -61,20 +63,16 @@ func extractLinks(doc *html.Node, baseURL *url.URL) map[string]string {
6163

6264
func main() {
6365
var wg sync.WaitGroup
64-
// URLs to scrape
65-
urls := []string{
66-
"https://google.com",
67-
"https://old.reddit.com/",
68-
"https://timevko.website",
69-
}
66+
urlList := flag.String("urls", "https://google.com", "Comma separated list of URL's to crawl")
67+
flag.Parse()
7068
urlChan := make(chan string)
7169
results := make(chan map[string]string)
7270

7371
for i := 1; i <= 3; i++ {
7472
wg.Add(1)
7573
go worker(i, urlChan, results, &wg)
7674
}
77-
for _, url := range urls {
75+
for _, url := range strings.Split(*urlList, ",") {
7876
urlChan <- url
7977
}
8078
close(urlChan)

0 commit comments

Comments
 (0)