Detectors are live!

“Detectors” feature that will help Algiz gain more context about your data goes live.

Aug 26, 2024

I would like to open this one with a personal note. I haven’t been very active lately, both in development and in writing. This is exactly the opposite of what I want to do with this project, especially this early. To philosophize on the reasons a little bit without revealing too much (maybe a day for that will come, too);

Life sometimes throws things at you in unexpected ways at the worst times. It is OK to react at first. Our primal brain is simply scared of change and will release a bunch of chemicals to make us stressed, anxious and force us to plan our next move immediately. But, it is important to reflect once the first phase is over. Reflect on what these things, changes mean for you. Take them in without emotion. Assess. Only then speak, act or make decisions. Unless those things are a hungry pack of wolves looking right at you- in that case, run.

Now back to the main topic:

A GIF announcing the detectors feature on Algiz

“Detectors” are live!

A detector is what Algiz uses to detect sensitive data in the form of e.g. email addresses, phone numbers etc. Previously, Algiz shipped with pre-defined detectors and these were always on, so users didn’t have much choice as to what Algiz can detect for them. But as of now, users can expand the detection engine with their custom detectors!

Let’s say you are an organization operating in a certain country with strict regulations regarding the protection of their citizens’ ID numbers. You can now write a custom detector in regex format that would detect any such ID number.

This will help Algiz have much more context about what you consider sensitive data in your organization. The feedback I often receive is that context is key for DLP solutions. If a DLP tool automatically prevents a marketing file from being shared, it is creating more work than value. That’s why I have a few initiatives in mind that would help Algiz have more context in terms of sensitivity of data it scans. This is so that it can provide more value than being yet another tool that needs management and hence creating extra work.

Ways to write detectors

Currently, there are two ways to write detectors. You can either define a regular expression or a list of words. Algiz will then take that context into account whenever it scans a new file. I acknowledge that this is not enough to cover the whole ground but we are just getting started! Stay tuned for future improvements.

UX

Each tenant starts off with a set of pre-defined detectors. These detectors can neither be edited nor deleted. However, they can be “disabled” which essentially means removing them from Algiz’s detection engine. Since these cannot be edited or deleted, they will always exist in Algiz’s detectors database and can be enabled when needed. Detectors do not work retrospectively (at least not yet). Any time a detector is turned off, Algiz will stop detecting that specific type of sensitive information. This information will also be excluded from your Data Catalog. My thinking was that if someone turns off a detector, it is because they most likely don’t consider that type of data sensitive in their environment.

A user is expected to create their detectors before they enable any integrations. This however doesn’t mean that you can’t create detectors after you enabled an integration(s).

Creating new detectors is as easy as giving it a name, choosing its type and entering the expression, whether it be a deny list (entered as comma separated values) or a regex.

A screencast of detectors feature on Algiz dashboard

Process

This took me a bit longer than I thought it would initially. Both due to my inability to debug effectively and other life-related events. And it is still rough around the edges. Humans are notoriously bad at estimation, so I will cut myself some slack.

I don’t really like to plan everything about a feature right at the very beginning. Once I have a rough idea of the thing that I want to build, which I usually write down a few sentences about, I just start building it. To me, that initial drive is very important as it can die down very quickly. That’s one other reason it took a bit too long to build this. So, I try to ride the wave of that initial dopamine release.

I use MongoDB as the database for Algiz. It is a technology I am fairly familiar with, and I find it quite easy to build a database. Since detectors are tightly coupled with the detection engine, I decided to add another collection called “detectors” to the “analyzer” database, created the data model and set out to build the API. Apart from all the CRUD operations, I wanted the users to be able to enable/disable detectors as well, which essentially adds or removes the detector to Algiz’s detection engine. According to my dev log, this is where I first started to run into issues. I was sending the wrong detector ID to the backend. Now, you might ask why there multiple detector IDs in the first place. Well, remember what I said about MongoDB being easy to use, that was only partially true. If you are coupling MongoDB with FastAPI, you run into a few issues with IDs. But perhaps I will leave that story for another post. I ended up removing the wrong detector ID and everything started working correctly.

Building the UI for detectors (although looks ugly as of now) was rather simple. I used a DataGrid component from Material UI before for Analyses and Alerts view as well as the Data Catalog view. So, I just replicated the same behavior and populated it with correct data. Adding that little enable/disable switch took a bit of going back and forth but I got that working pretty fast as well. I must give credit to GitHub Copilot thought as it has taught me a lot about UI development—literally took me from a rotating “Hello world” written in red with HTML/CSS/JS to building a dashboard in React level.

I realized pretty quickly that I have to add a bit of guardrails around detectors as they have a huge impact on how Algiz performs. I like to tinker with other web apps to find those small loopholes, so it is only fair that I do the same to my web application. So, I drew up a list of what a user shouldn’t be able to do e.g. can’t delete a default detector. I tried to restrict those actions both in frontend and the backend. I should really look into integrating a testing framework to my development flow.

Once everything seemed to be working on my machine i.e. locally*,* it was time to test it on someone else’s computer i.e. on the cloud. Aaand of course, I ran into issues. It turns out my server configuration doesn’t allow PUT requests, which I was using for updating detectors. But luckily, that was an easy fix.

What’s next?

I realize that reading about a product and its development is much more boring than actually using and testing the product itself. This is why I am planning on making a demo website for Algiz that’s not behind an auth gate! Obviously, it will have to be in some sort of a “safe mode” meaning you won’t be able to add/remove anything on the platform e.g. you can’t mess with the integrations or detectors. But I think it would be fun to play around with the data catalog and see it in action (with test data).

However, I am currently working on some very fun UI updates, trying to give Algiz, especially the data catalog a more modern look with much more useful charts and an improved data table.

Until next time.

Building in public: Algiz DLP

Discussion about this post