Skip to content

PatrickSmith-GitHub/ScrapingDailyAgenda

Repository files navigation

Scraping Daily Agenda Project

This is a project where I built a basic flask webapp for the purposes of learning more about selenium, docker, and kubernetes.

Overview of Project

This project started with a scraping script that I made that scrapes the assignments that are due the next day. I then created a front end that will format the data as well as a button to request for the scraping to be initiated. The communication between everything was facilitated with flask and http requests. I then containerized and deployed the app on kubernetes. I used kubernetes basic auth secrets to pass the username and password to the scraping script as environmental variables and exposed the app using a loadbalancer service.

Scraping Script Design

In the scraping script some challenges I had to overcome were navigating through login screens, as well as traversing shadow doms to access the data I needed for my webapp. This challenged me to learn more about unfamiliar topics in html and to learn much more about selenium than I had ever been forced to in the past such as the intricacies of designing flow control to optimize for both speed and robustness of my script.

Webapp Design

For the webapp frontend there is a dropdown that allows you to choose schools. The information gathered from this dropdown is then passed in the http header in the request to the flask backend which is then used to dictate the scraping script used. This was designed so that it would be easy to later expand the application to different schools and perhaps other scripts such as one that finds what you have to do today or creates an agenda for the next week or month by simply passing a header that starts the script. What I would really like to expand upon is adding an authentication layer with a database in the backend that would store the school information and pass the environmental variables to the scraping script in some way. I feel as though this would be a far better idea because the way the app runs currently, by passing a value from the html to the backend which then runs a file, is inherently extremely insecure and I would like to change that in the future to learn more about website development. These developments would also help me to learn more about microservices design.

Challenges I Faced

Some challenges I faced in particular on this project were with getting a selenium app dockerized due to the dependencies required. I also was at one point in time getting over excited about using docker and kubernetes and trying to create two seperate images, one for the website and flask, and one for the scraping script that would then spin up and destroy the pod in a serverless fashion. However I felt that this was like being given a hammer and seeing everything as a nail, when in reality the problem was a screw and there was a much simpler solution of having the container run the script.

Conclusion

I felt as if this project was a really great project to get me exposure to several technologies that are of interest to me quickly. I had dabbled in these technologies previously such as by running a k8s cluster running various apps, and scraping in selenium, however combining these I felt brought me a deeper understanding of both as well as teaching me about containerizing apps. The biggest thing this project did for me is highlight gaps in my knowledge as well as exposing areas that I would like to learn more about that I hadn't thought of before. This is all to say that this project was a resounding success for learning purposes seeing as I did succeed in my original goals by not only creating a working web-app that scrapes the necessary information formats and presents it to a user, but also launched it using kubernetes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published