73% accuracy for redaction object detection

I made some progress on my redaction model.

Alex Strick van Linschoten


December 11, 2021

Last time I wrote about my redaction model training project, I explained how I used Prodigy to annotate and label a bunch of images. I subsequently spent a long evening going through the process, getting to know my data. I managed to make 663 annotations, though quite a few of those were negative annotations: I was stating that a certain document contained no redactions at all.

Once I had my redactions, I needed to convert the files from a Prodigy format into a .coco annotation format. I am using IceVision, a really useful computer vision library, for which it is easier if I pass in the annotations in the .coco format.

From that point, it was fairly easy to follow the steps of the object detection tutorial outlined in the IceVision documentation. I ran into some problems with Paperspace Gradient not easily installing and importing IceVision. For some reason files don’t get unzipped on Paperspace, but it’s possible to just do this manually:

cd /root/.icevision/mmdetection_configs/
rm v2.16.0.zip
wget https://github.com/airctic/mmdetection_configs/archive/refs/tags/v2.16.0.zip
unzip v2.16.0.zip

Then run it again as normal. Later on, another error will get raised. Fix it with this (again in the terminal):

jupyter nbextension enable --py widgetsnbextension

This enables ipywidgets in the notebook, I think.

Once through all of that, I was able to fine-tune a model based on the annotations which I currently have. I selected VFNet as the model I wanted to use as the pertained model. After training for 40 epochs, I reached an accuracy of 73%:

If we look at some of the results (using model_type.show_results()) we can get a sense of the parts it found easy and the parts which it found hard. (All the boxes below are what it as predicted, not the ground truth annotations.) Some identification of boxes went as you might expect:

I was surprised that something like this worked as well as it did:

It wasn’t perfect, but I don’t remember having annotated too many of this specific redaction type, so I’m fairly happy with how it worked out. You can see it still makes a number of mistakes and isn’t always precise about where the boxes should go. I hope that’ll improve as I add more examples of this type of redaction.

My next steps for this project include the following: