In the previous post we saw how to build a script that scrapes the website of a local supermarket for beer promotions and returns the data as a clean JSON array.
Remember, the goal is to build a system that lets people setup a subscription so they can be notified about actual promotions, which allows them to profit from cheap beer prices.
TODOs:
- Build a script that scrapes the supermarkets website and returns all promotions as JSON array (covered in the previous post)
- Deploy the script, schedule it to run in the cloud and persist the data (topic of this post)
- Build a frontend that lets users subscribe to promotions (we will look at this in the next post)
Preparing to use firebase cloud 🔥☁️
Allright, let’s get to it. To make our script available to the world, we could build a webserver that hosts the script and executes it whenever a users visits the page. But hey, it’s 2019, we don’t necessarily need a webserver for this simple use case. We can just deploy our script as a cloud function so that we don’t have to worry about any webserver. I personally like to use firebase for this, not only because of its simplicity, but also for it’s generous free tier. In addition, firebase also offers a database called firestore that we can use to persist the scraped data - perfect!
After creating a firebase project, we can setup our environment locally and initialize the project.
We need to install the firebase-tools using npm install -g firebase-tools
and then login to our account with firebase login
.
Writing a cloud function λ
Now we are ready to setup the project locally: mkdir myAwesomeProject && cd myAwesomeProject && firebase init functions
.
We can choose either JavaScript or TypeScript to write the function - I prefer TypeScript.
There are various kinds of functions. For now we will just use a normal https function.
In the index.ts
, which was generated by the firebase init
command, we need to import firebase-functions and also firebase-admin, to interact with the database:
import * as admin from 'firebase-admin';
import * as functions from 'firebase-functions';
admin.initializeApp(); // needed to initialize the admin sdk
Let’s also import the scrapePromotions
function that we created in the previous post (assuming it’s located and exported in a file called scrape.ts
):
import { scrapePromotions } from './scrape';
export const scrapePromotions = functions
.runWith({ timeoutSeconds: 30, memory: "1GB" }) // assure we have enough ressources
.region('europe-west1') // select a region that is close to your target audience
.https.onRequest(async (req, res) => {
const promotions = await scrapePromotions();
// TODO persist the promotions
res.write(promotions); // the JSON array is the body of the https response
});
Deployment 🚀
This is all the code we need for our function to work! Let’s deploy it using the command line:
firebase deploy --only functions
When we now navigate to https://europe-west1-myAwesomeProject.cloudfunctions.net/scrapePromotions it takes a couple seconds (because the scraping function needs to control a headless browser and wait until all content is loaded), but after that we receive our desired output:
[
{
"imageUrl": "//contentimages.coop.ch/aktionenimages/images/6.336.920_Anker_Lager_Bier_15x33cl_ZTGWPS_252943_XL_DE.png",
"oldPrice": "23.90",
"price": "11.95",
"title": "Anker Lagerbier, 2 x 15 x 33 cl (100 cl = 1.21)"
},
// ...
]
Storing the promotions to firestore 💾
Almost done 😍! The only part that’s left is persisting (or rather caching) the information in the firestore database, so that we have a faster way of accessing the data.
You won’t believe how easy that is:
await admin.firestore().doc('shops/mySupermarket').set({
updatedAt: new Date(),
promotions
}, { merge: true });
Note that shops
is used as collection name and mySupermarket
is the document id we chose for our supermarket. Firestore is a schemaless, document-oriented DB and we are free to use whatever names we want. We also store a field updatedAt
with the current date, so that we always know when the last scrape took place.
Scheduling the function with cloud Pub/Sub
Since we already setup a function that store the scraped data to a database, we might as well setup a job that regularly fetches the newest promotions. We can do that with cloud Pub/Sub. We can simply change our https function to a pubsub function with an schedule. Note that the schedule parameter accepts not only a cronjob string, but we can even use plain english to declare the interval!
export const scrapePromotions = functions
.runWith({ timeoutSeconds: 30, memory: "1GB" })
.region('europe-west1')
.pubsub
.schedule('every 4 hours').onRun(async context => {
// function body...
});
Deploy again firebase deploy --only functions
… Aaaand that’s a wrap! We made the scraping function available and everyone on the world can now access the data. The last part will be to build a nice frontend that users can use to be informed about current promotions. See you soon 👋😃!
Full code of the firebase cloud function:
import * as admin from 'firebase-admin';
import * as functions from 'firebase-functions';
import { scrapePromotions } from './scrape';
admin.initializeApp(); // needed to initialize the admin sdk
export const scrapePromotions = functions
.runWith({ timeoutSeconds: 30, memory: "1GB" }) // assure we have enough ressources
.region('europe-west1') // select a region that is close to your target audience
.pubsub
.schedule('every 4 hours').onRun(async context => {
const promotions = await scrapePromotions();
// write the data to firestore
await admin.firestore().doc('shops/mySupermarket').set({
updatedAt: new Date(),
promotions
}, { merge: true });
res.write(promotions); // the JSON array is the body of the https response
});