Project TACOD: Devlog #1 - Tobiloba Kujore

The first project to kick off THE SIX – Training AI Captcha Over Discord.

Hey again! It’s been a while. Slightly off topic for this post, but I’m 17 now! So I guess that’s my excuse for not posting in a bit (￣￣|||) . Anyways, this post kicks off the beginning of THE SIX and I can’t wait to get started!

But what is Project TACOD?

Unless you’ve been living under a rock for the past, give or take, 4 years, you’d know that AI has had a huge impact on the technology space and the world as a whole. ChatGPT, Bard, C.AI – these generative language models have popped up on every corner of our life.

But T.A.C.O.D. isn’t about LLMs, or even generating content. This project is more interested in a slightly simpler task – recognition. For a while I’ve wanted to make a project based on Captcha. If you’ve not heard of Captcha, it stands for Completely Automated Public Turing [test to tell] Humans and Computers Apart. Users have had to identify twisted letters and select images of the same group. Can you tell what is a crosswalk and what isn’t in the sample below?

The Captcha is intended to differentiate computers from humans when browsing on the internet, and while it’s far from perfect at that, it has a secondary goal of actually training an AI based on the human responses. TACOD aims to hone in on this secondary goal on a smaller scale.

The aims of Project TACOD are as follows:

1. An investigation of how Captcha works, and building a skeleton version from the ground up.
2. Testing and evaluation of how the training data received of an AI can affect its decision making, and a benchmark of how effective humans are at providing training data for AI.
3. An exploration into reinforcement learning and Python’s AI libraries.

The Goal

The end goal is for the AI to be able to identify numbers in an LED-display format, such as below.

In order to achieve this, server members from my Discord server will be able to participate in the experiment by answering questions. They will be able to respond by selecting a button sent by a custom written Discord bot layer, which will then be recorded and act upon by the backend.

To the right is a sample question, Which display looks the most like a 7? Respondents can then click one of the 8 buttons corresponding to one of the 8 options. All responses, as well as user IDs to identify respondents, will be monitored and recorded for research purposes in a database.

Quantifying Progress

How we measure success is an important part of the program, because without it we would not be able to end the experiment!

For each number, 0 – 9, we define a model display – what we aim as a perfect representation. The model for a 4 is shown to the left. This has pixels ON at the co-ordinates (1, 1), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2), (3, 3), (4, 3), (5, 3) and OFF everywhere else on the display. However, during the experiment, the pixel displays will not start out as perfect – far from it.

Somewhere during the experiment, what the program thinks a ‘4’ is might look something like the display on the right instead. This has some pixels in the correct place, some missing from the model ‘4’, and some extra pixels in the wrong place. So, we need a way to quantify how close this display is to the model display to represent our success at recognising ‘4’s.

The pixels in the correct place are labelled green, the pixels missing are orange and the extra pixels in the wrong place are red. We can then create a function that will generate a value for how close this display matches to
the model dependent on the amounts of coloured pixels:

The letter N is the number of pixels in the display and N_colour is the subset number of pixels of a different colour. The range of this function is | M(N) | < 1. This function is used to measure how close we are to the model LED.

Timeline of Progress

Here’s a timeline of the progress made on Project TACOD so far.

5th April 2024 Project Inception Date

21st July 2024 Logo design for the project! The betatesters seemed to like it, let me know your thoughts in the comments below!

22nd July 2024 v1.0.0 Commit. A simple /ping command which (later) measured the latency of a command from being sent in the Discord client to being received by the program, measured in milliseconds, using the datetime module. Display() which uses a 5 x 7 2D NumPy array to represent ON and OFF pixels.

28th July 2024 v1.0.1 Commit. Addition of Question() which have an ID and a set of LED options. The user interface that the Discord bot layer uses to ask questions is designed, with the /captcha function.

30th July 2024 v1.0.2 Commit. Addition of Response(), which have an ID, another ID for the question the response is related to, a Discord user ID, and a datetime object. Questions and Responses are now stored in a sqlite3 database that can be saved and pulled from.

Python

def fetch_question(id):

    tup = cursor.execute(f"SELECT * FROM questions WHERE question_id = {id}").fetchone()

    return Question(tup[0], tup[1], [create_display_from_emojis(t) for t in tup[2].split('\n\n')])

The database structure is shown in the image below, with primary keys in gold and foreign keys in purple. Each of the fields also has a data type and a maximum length.

And here’s a look at the Response table in the data.db database file, which mirrors the above image. The only differing thing is the datetime field which is converted to a timestamp.

1st August 2024 Database functions are made asynchronous with asyncio.Lock() to prevent any concurrency issues and make it so that only one read or write can happen at once. This should be done with caution if there are being thousands or millions of read/write requests per second – but for a project like this a lock should be sufficient.

The beginnings of the /help function…

What’s the next steps?

For v1.0.3 and beyond, we have to begin to question the decision making from these responses.
As we work towards the model arrays, the program should store a separate array of confidence values. Each item in the array is called a confidence value, aka how much the program thinks this pixel should be ON⬜ or OFF ⬛. A value of 1.0 means absolute certainty ON and a value of 0.0 means absolute certainty OFF. Here’s a sample:

There is also a constant in the program called the CONFIDENCE_THRESHOLD. If a pixel’s confidence value is above this threshold, we call it ON. For the array above, a threshold of 0.15 would result in the following LED display snippet to the right:

The threshold value is computed by some function f(x) that is related to how many responses we have gathered, such that f(x) increases as x increases exponentially. So at the beginning (aka >100 responses), the threshold will be low (so something like 0.1 as an estimate) but as the amount of responses increases, the threshold in which we call confidence values a pixel increases, meaning in theory the accuracy should concentrate and improve as more responses come in.

This above is what I imagine f(x) could look like, some kind of exponential function that ranges between 0 and 1 for a few thousand responses in this Desmos graph.

In addition to this system, we also need to make ‘starter’ LEDs for when we have limited data to work off in the beginning of the experiment. This is essentially a complicated noise-generator style function, which is then pushed into a function called generate_using_weights().

Python

def generate_using_weights(weights_array, dim = LED_DIMENSIONS):

    d = Display(dim) # create a display

    for iy, ix in np.ndindex(weights_array.shape):

        if random.random() < weights_array[iy, ix]: # if P(this index),
            d.matrix[iy, ix] = 1

    return d

This function takes in a confidence value array and sets the index ON if the P(this index) returns True. You could say that a confidence value of 0.12 has a 12% chance of being ON. Here’s a randomly filled weights_array and the LED that we got back:

Then, we can use a line generator to increase the confidence value for certain straight lines in the array to give the LED look a better chance to generate than random soup. As of the time of writing, the line generator is still in the works!

Essentially, a response will slightly edit the confidence values a small amount of a certain number based on the question and chosen option, as well as a bunch of other variables. This will change over time what the program actually thinks a given model LED looks like. With enough responses and confidence-value fine tuning, the program can learn which pixels should be ON and which should not.

You can view the source code for TACOD at it’s latest version on my GitHub repo.

Thanks so much for reading this mega blog post, and I’ll see you in the next one! ＼(￣▽￣)／

But what is Project TACOD?

The Goal

Quantifying Progress

Timeline of Progress

What’s the next steps?

2 Comments