Pentesting with AI on a Cloud Server

Strix is an open-source AI pentesting tool that utilises an LLM together with standard tools to pentest software, repos and servers.

You can use Strix with most LLMs, such as those served by OpenAI, Anthropic or Ollama. Note, that on the strix Discord channel people report mixed results when using Ollama with a model like qwen. Infinite loops are mentioned. But then again, it is free, so it might be worth getting it to work. I will cover this in a future article.

Install Strix on VPS

To use Strix, you can install it on Linux or Mac. Not sure about Windows, but I don’t think it’s supported. You can install Strix on a home server or do as me and install it on a cloud VPS. I like the idea of having an init-script that I can quickly use to spin up an AI pentesting VPS!

This approach allows you to run professional pentests on your website or codebase for just a few dollars, without installing anything on your personal machine.

Steps

I’ll demonstrate the process using UpCloud, a Finnish cloud provider, but the same steps should work on any Ubuntu system and cloud provider.

Steps on UpCloud:

Go to Servers / Cloud Servers and click Deploy Server
Choose region (e.g. FI-HEL1)
Pick plan with 8 GB Memory or more.
- I use the Developer plan. On this plan you pay for stopped servers, but I will delete the server after using
- If you pentest often, you can choose Cloud Native and stop the server after use, which allegedly turns off billing for stopped servers.
Keep Storage as default and skip backups (server will only exist briefly)
Choose Ubuntu Server 24.04 LTS (Noble Numbat) or similar as the Operating System (OS)
Leave Network and Optionals as default (Metadata service must be enabled for cloud-init scripts to work in next section)
Login Method: select SSH and add your public key
- or select previously saved key
Use this init-script:
- Replace LLM_API_KEY and STRIX_LLM values with your own
Click Add as saved script, if you want to apply this to future VPS that you spin up.
Click Deploy

Wait a couple of minutes for the cloud-init script to finish running.

If Strix fails to install

You can check the init logs for errors:


# check that the strix command is available 
which strix

# If not, view logs produced by init-script
cat /var/log/cloud-init-output.log

When I was developing the init-script I could clearly see from the logs what I needed to fix.

Run pentest

Now we are ready to run a pentest. First, SSH into your newly-created VPS.

ssh root@IP_ADDRESS

After you have logged in, run Strix and watch it go to work pentesting your web application. You can also pentest code, but I will cover that in another article.

strix --target https://example.com

The process takes roughly 10-15 minutes and costs around a $1 if using openai/gpt-4o-mini.

In the above, we used the default behaviour of Strix. Optionally, you can provide custom instructions to the agent, by using the –instruction or –instruction-file flag:

# Focused testing with custom instructions
strix --target https://example.com --instructions 'try SQL injection on the server'

# Provide detailed instructions through file (e.g., rules of engagement, scope, exclusions)
strix --target https://api.example.com --instruction-file ./instruction.md

You can inspect results directly in Strix’s terminal UI. After the test is completed, Strix will write report files to the directory /root/strix_runs. You can download the report files using scp.

# on your own machine
# replace IP_ADDRESS and RUN_ID with actual values
scp -r root@IP_ADDRESS:/root/strix_runs/RUN_ID .

This downloads the following files to your computer:

.
├── events.jsonl
├── vulnerabilities
│   └── vuln-0001.md
└── vulnerabilities.csv

Does it work?

I want it to work, but the truth is that I’m not super impressed with my first few runs.

Results can likely be improved by providing clear instructions to the agent with the –instruction or –instruction-file flags. For example, you can provide rules of engagement, so SQL injection sticks to SELECT queries and not DROP TABLE queries.

Another fair disclaimer is that I used openai/gpt-4o-mini to save money, which may just be too weak a model to work for pentesting. The Strix Github repo mentions that the best results are obtained with the following models:

openai/gpt-5.4
anthropic/claude-sonnet-4-6
vertex_ai/gemini-3-pro-preview

Finally, I saw that agents tried to use tools that were not available. It was not super obvious how to fix this. Do you need to install tools like nmap on the host system? If the tools come in via Docker, why were the requested tools not available?

In a future article, I will deep dive into the results produced by Strix and also how to improve them.

Reusable init-script

I have shared a init script via a GitHub Gist. Use it to spin up a pentesting server in minutes, run your tests, and tear down the VPS when you are done. Then repeat another day without manual setup.

I used YAML (supported by UpCloud), but the setup can be translated into a shell script for existing machines. It assumes a fresh server, so idempotency is intentionally omitted for simplicity.

Before use, update the config with your LLM API key. The example uses openai/gpt-5.4, but you can switch to another model. Just remember to match the API key. I will discuss Ollama in a future article and perhaps publish an init-script for that setup.