FrogAPI: Crawls (limited) data from that frog site and presents it in an easily consumable form.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Benjamin Shelton b757aae7d7 API client will go here. 2 weeks ago
app Added metadata. 3 weeks ago
assets Added embedded assets. Mostly just the sample config and LICENSE text. 1 month ago
cli create* functions were moved for reuse. 3 weeks ago
cmd/frogapi Added rebuild command. 3 weeks ago
dist Updated systemd unit file. 1 month ago
js/src API client will go here. 2 weeks ago
samples Update options. 3 weeks ago
.gitignore Renamed configuration file and ignore search index. 1 month ago
FROGAPI_VERSION Version info. 1 month ago
LICENSE Created repository. (Including license!) 1 month ago
LICENSES Added README and a complete list of 3rd party licenses. 1 month ago
Makefile Don't include the DWARF symbol table. Reduces file size slightly. 1 month ago
README.md Added development notes. 1 month ago
go.mod Update dependency. 3 weeks ago

README.md

FrogAPI: Crawls and provides an API for that froggy site...

...but only groups (for now?)

FrogAPI is intended to be a self-contained API server for disseminating group information from gab.com. Bill Tux already provides a tutorial site that links to an earlier corpus of data based on prior crawling efforts. This application is intended to provide either an easier means of updating his corpus or to replace the search mechanism with one that reduces the amount of traffic generated (currently around 24 megs at 3100 requests) and makes the overall data consumption easier for mobile users.

Installation

To install FrogAPI, you need to download and build the sources yourself as I have no intention of distributing pre-built binaries at this time.

Build Instructions

Building FrogAPI requires Git, make, and a fairly recent version of Go (v1.12 or higher recommended). It can be installed with:

$ git clone https://git.destrealm.org/zancarius/frogapi
$ cd frogapi
$ make

This will create a build directory and generate a build/frogapi binary. This binary contains all the tools you need to replicate the groups data and serve your own search instance.

If you only have Go available, you can theoretically forgo using git and make by running the command:

$ go get git.destrealm.org/zancarius/frogapi/cmd/frogapi

Be aware that by installing the binary directly with the go command (above) presumes that you have your $GOPATH properly configured and have added $GOPATH/bin to your $PATH. This method also removes some (minor) optimizations, version tagging, and will not build PIE-enabled binaries unless your Go installation has it turned on by default. The generated binary will be available under $GOPATH/bin/frogapi.

Running the Installer

Once you've build FrogAPI, you will need to install it. To do so, run:

$ frogapi install

and answer the on-screen questions. The installer will guide you through the process from configuring the data directory, setting up a port to listen on, and configuring the Gab API key

If you don't have a Gab API key, you will need to obtain one by logging in to your account (or creating a new one), go to your account preferences, click “development,” and click “new application.” From here, you can enter any value you wish for the application name and website. As a suggestion, you can enter “frogapi” for the name and https://git.destrealm.org/zancarius/frogapi for the website. This isn't necessary, and any value is perfectly reasonable.

You'll also need to uncheck every permission except “read.” Do not check any other read permission as it appears this can cause the Gab API to deny access to group visibility, rendering the crawler impotent. The drawback with this method is that it means the access token generated in this step will have read access to everything on your account. If you don't want this on your main account, I would suggest creating another account specifically for use with the crawler.

Once you're finished, click “submit” and then copy and paste the value next to “your access token” into the installer. Neither the client key nor client secret are needed.

Development or Modifying FrogAPI

If you intend to modify FrogAPI for whatever reason, you will likely need additional tooling. First, and most important, you will need our embedder application to re-generate the embedded VFS assets (currently only the YAML configuration). This can be obtained via the command go get git.destrealm.org/go/embedder/cmd/embedder. VFS regeneration can be performed by running make vfs.

If you need to run embedder manually for whatever reason, please reference the Makefile for the command line flags used by this application.

Basic Usage

Once you have FrogAPI installed, using it is fairly straightforward, and you can examine a list of commands by running frogapi -h. You'll most likely wish to start by crawling the Gab API before you run the server or importing crawler data from other sources. FrogAPI provides two methods to do so:

frogapi crawl

The frogapi crawl command will start the crawler process and pull results directly from Gab's API. This is a tedious, slow process that can get interrupted. If it does, re-running it with frogapi crawl --resume will restart the crawler at whatever point it left off at.

At present, FrogAPI is intended to be relatively quiet about what it outputs. If you wish to see progress, you'll need to change the logging defaults. This can be done by editing the frogapi.yaml file and changing log-level to either debug or info or, preferrably, by passing in the --log-level flag with the appropriate level:

$ frogapi --log-level debug crawl --resume

frogapi import

For a faster import of data, using frogapi import may be a better option than crawling Gab directly. Not only does this save bandwidth for everyone involved, it's also significantly faster to import this data into the FrogAPI database since it's only necessary to write this data directly to disk (actually a Bolt database).

Importing can be performed in one of two ways. The easiest of these is to use my reference server and curl directly by running:

$ curl https://research.destrealm.com/frogapi/api/v1/dump | frogapi --log-level debug import -

This skips intermediate save steps and passes the JSON dump directly into FrogAPI.

The second method is to use either curl -L -o groups.json to save the output to a file named groups.json or to browse to the reference server and save the file to a location you can use for later import.

Then, when you've obtains the JSON dump, run:

$ frogapi --log-level debug import groups.json

This example assumes you've saved the dump to the file groups.json.

Running the Server

To run FrogAPI and to test that it is functioning correctly, enter the command:

$ frogapi --log-level debug run

Then connect to your new endpoint using curl (assuming it is listening on localhost:7990):

$ curl localhost:7990

If successful, this will output the contents of all groups currently in the database.

At present, FrogAPI does not support forking into the background as a daemon. It should be possible to do this with standard shell commands and/or tools, and FrogAPI ought to play nicely with any process supervisor you're familiar with.

We've included a sample systemd unit file you can use located in dist/frogapi.service. To use this file, edit the ExecStart line to point to your binary's location, copy it to the location /etc/systemd/system, and issue the command systemctl daemon-reload as root. Finally, you can start FrogAPI with systemctl start frogapi.

If the systemd unit fails, it may be due to some of the security features added to the configuration. Commenting these out or removing them may resolve the problem.

API

At present, FrogAPI exposes the following endpoints:

Endpoint Intent
/api/v1/browse Group browser
/api/v1/dump Group dump utility
/api/v1/search Group search

We'll explore each one in this section.

The Browse API

Not currently implemented.

The Dump API

FrogAPI provides a means for clients to obtain a full dump of all groups available in either JSON or CSV format. The API is structured as:

Query variable Values Description
format json or csv Configures the returned format type. Defaults to json.
compression gzip or zip Returns the data compressed using either gzip or zip. If unspecified, no compression is performed.

The Search API

Not currently implemented.

Use with nginx

Although FrogAPI can be exposed to the Internet directly, it is presently using a pre-release framework that may exhibit some bugs. I recommend putting it behind a reverse proxy such as nginx or HAProxy. A sample configuration for nginx is as follows:

server {
    listen [::]:80;
    listen 80;
    server_name frogapi.example.org;

    location / {
        proxy_pass http://127.0.0.1:7990;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto;

    }
}

This example assumes that you're running FrogAPI listening on localhost port 7990 (the default) on the domain frogapi.example.org. If you're unwilling to create a subdomain specifically for FrogAPI, you can run it under a path off your server root, such as:

server {
    # ... other configurations ...

    location /frogapi {
        proxy_pass http://127.0.0.1:7990;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto;

    }

    # ... other configurations ...
}

Cron

FrogAPI includes a built-in cron that periodically performs a handful of different tasks that can automate some of the tedium associated with crawling upstream sources. In particular, it provides cron jobs for:

  • Checking empty groups.
  • Checking upstream for new groups.
  • Re-crawling groups that have returned an error.
  • Re-crawling missing groups.
  • Periodically re-crawling all groups to re-synchronize data.

For a complete description of these tasks, please refer to the generate frogapi.yaml file under the section labeled cron. Wiki entries may be added to describe this in further detail.

Updating

Periodically, FrogAPI may be updated to add new features or fix some outstanding bugs. To update it, you will need to follow the build instructions under the installation section to generate a new binary. Then, you will need to do the following.

Step 1: Move the old binary out of the way and replacing it with the new one

$ mv frogapi frogapi.old
$ mv build/frogapi frogapi

Step 2: Get the PID of the running FrogAPI process

$ ps aux | grep frogapi

Step 3: Send a HUP signal to the process to instruct it to gracefully restart

$ kill -HUP <PID>

(Replacing the text literal <PID> with the actual process ID.)

Some or all of these steps could be added into a systemd unit file.

About Development

As FrogAPI stabilizes over the coming days (or weeks, depending on how much time I can put into it), the master branch will become increasingly more stable. Typically, my preferred development style is to push changes, features, and fixes into a feature branch first, slowly trickling commits into master as they become proven. Consequently, using master should always yield the most recent stable code.

If you notice other branches on the remote, approach them with care! They aren't guaranteed to build, or even work, and may cause all manner of grief.

License

FrogAPI is copyright © 2020 Benjamin Shelton and distributed under the terms of the NCSA license, as with most of the open source I write. This license is roughly equivalent to a 3-clause BSD license crossed with an MIT license that happens to also cover the documentation (such as this file) as part of the licensed material.

This software also relies on other libraries that are distributed under different licenses which may present requirements beyond that of this software.