Skip to content

Storage: Raw Storage

Ween Jiann edited this page Mar 12, 2019 · 3 revisions

**If you haven't grab a local copy of our Examples, click here to learn how.

Code

The following will be based off

ai.preferred.crawler.stackoverflow.master

in our Examples package.

Let's learn

To get started on using FileManager, we need to do a few basic steps:

  1. Create a Fetcher with FileManager (ListingCrawler.java)
  2. Initialise MysqlFileManager (ListingCrawler.java)

1. Create a Fetcher with FileManager (ListingCrawler.java)

Instead of an empty builder, we have to set the FileManager we would like to use when building the Fetcher. Let's change createFetcher() a little to take in an additional parameter caleld fileManager, like this:

  private static Fetcher createFetcher(FileManager fileManager) {
    // You can look in builder the different things you can add
    return AsyncFetcher.builder()
        .setFileManager(fileManager)
        .build();
  }

2. Initialise MysqlFileManager (ListingCrawler.java)

Define all the configuration for MySQL and the storage directory.

    // Define config for MysqlFileManager
    final String host = "localhost";
    final int port = 3306;
    final String database = "filemanager";
    final String table = "responses";
    final String user = "root";
    final String password = "";
    final String dir = "C:\\data";
    final String mysqlLocation = "jdbc:mysql://" + host + ":" + port + "/" + database;

**It is important that both the database and storage directory exists!! The MySQL table will be created automatically by the FileManager.

Now we need to initialise MysqlFileManager in our main method, right before the Crawler, then pass fileManager as a parameter to createFetcher().

      try (final FileManager fileManager = new MysqlFileManager(mysqlLocation, table, user, password, dir);
           final Crawler crawler = createCrawler(createFetcher(fileManager), session).start()) {
        ...
      }

That's all we need to do to get MysqlFileManager working.

Congratulations

Now run the crawler and watch C:\data and MySQL get populated with html files and records respectively!