Scans MAL site for data need for recommeder system You Can (Not) Recomend.
For speed-up scans in parallel:
- Each scanner instance can perform several http requests at time (in queue).
- Several scanner instances can safely run together from many processes and PCs.
- Can use different "data providers" - parsing MAL site directly and with proxies, unofficial MAL API servers (see
mal_api_server
). See classesMalDataProvider
->MalParser
,MalApiClient
.
Safe parallelization is implemented with help of redis transactions (see class MalBaseScanner
).
Scanned data is saved to PostgreSQL db (see data/db-schema.sql
, class MalDataProcesser
).
Install PostgreSQL db schema data/db-schema.sql
Set options in config/config-scanner.js
Run node index.js
Add manually tasks to redis: rpush mal.queuedTasks <task>
See progress at cmd logs
List of tasks to grab only new data:
GenresOnce
- grab genres, onceAnimes_New
- grab new animesAnimesUserrecs_New
- grab users' anime-to-anime recommendationsUserLogins_New
- grab user id <-> login pairsUserLists_New
- grab user lists, only for users with never checked yet listUserProfiles_New
- grab user profile data, only for users with never checked yet profile
List of tasks to check udpates:
UserListsUpdated_Active
- check updates of active user lists, run frequentlyUserListsUpdated_WithoutList
- check appearing of user lists, run rarelyUserListsUpdated_NonActive
- check updates of nonactive user lists, run rarelyUserLists_Updated
- grab updated user lists, afterUserListsUpdated_*
AnimesUserrecs_All
- regrab users' anime-to-anime recommendations, run it rarely, like once in week..UserProfiles_All
- just to update favs, run it very rarely!Animes_All
- just to check possible updates of genres, relations, run it very rarely!
Special tasks to fix possible problems with logins swaps, will be added automatically:
SpUserLogins_Re
UserProfiles_Re
Adding tasks from timer