During the first week of March 2026, Metal Toad held its semestral hackathon. The theme was Internal Agentic Application, which meant we had to use AI to create a solution for an internal business problem that impacted multiple teams. We had 3 days to ship something working.
I teamed up with five other people: Alex Rothier, Austin Verdi, Calli Vieira, David Dolan, and Talia Lippincott. Our team was diverse, three software engineers, a project manager, a marketing coordinator, and an account manager. We called ourselves the Handover Alliance.
After some brainstorming, we landed on a pain point that everyone at the company recognized: handoff documentation.
So what is handoff documentation? At Metal Toad, it's essentially our proof of performance, a document that summarizes what was built for a client, how it works, and how to maintain it. Think of it as the instruction manual you hand over when a project wraps up.
This documentation matters for several reasons:
So why is handover documentation a current problem for our business? Writing these documents is time consuming, time sensitive, hasn't been standardized, and, well, it can get delayed or even forgotten (not so much the latter). Someone has to sit down and write a N-page document explaining everything that was built. And that can take days, sometimes weeks.
But here's the thing: most of the information already exists. It's in the codebase, in the README files, in the infrastructure configs, in the SOW itself. It just needs to be extracted, organized, and written up in a client-friendly format.
The Handover Alliance believes that our Customer Handover Documentation should be Clear, Concise, and Easier for our teams to create.
We understood from the start that this tool wouldn't replace the developer writing the handoff. But it would give them a solid starting point instead of staring at a blank page. The goal was to go from "I need to write this entire document from scratch" to "I need to review and polish what was automatically captured from all of our hard work and generated up to this point."
The plan was straightforward: use a project's GitHub repositories and its SOW as the base for generating the handoff document. The tool would:
No manual writing. No copy-pasting from READMEs. Just AI doing the heavy lifting.
We split the work based on strengths. As a software engineer, I worked on the frontend, the API Gateway (the layer that connects the frontend to the backend), and the APIs. David and Alex, the other two engineers, tackled the backend pipeline. Alex worked on the step that analyzes GitHub repositories, and David handled the step that parses the SOW document, and the rest was a mixture between the 2 of them. Austin, Calli, and Talia helped design the flow/features on our Miro board, came up with ideas of features for both frontend and backend, gathered examples of previous handoff documents we could test with, created the presentation slides, and helped present at the end.
I built a React + TypeScript web application using Material UI for the interface components. We originally planned to use Next.js as the framework, but quickly realized it wouldn't work for our setup. Next.js is a full-stack framework that needs a server running to handle things like server-side rendering and API routes, which is great when you have a Node.js server, but we were hosting our frontend as static files in an S3 bucket behind CloudFront. S3 can only serve static files (HTML, CSS, JavaScript), it can't run a server. So we switched to Vite, which is a pure build tool that compiles everything into static files that S3 can serve directly. Same React code, just a different way of packaging it.
The design follows a minimalist, futuristic aesthetic with Metal Toad's signature orange as the primary color. It has three main pages:
The UI is fully responsive. On mobile, the sidebar collapses into a hamburger menu, page titles move to the top bar, and buttons stack vertically for better touch targets. We added light/dark mode support with a toggle in the sidebar (for the vampire devs such as myself). The app also has proper metadata and SEO tags, and the form page URLs are shareable; you can copy the link and send it to someone else if needed.
The main goal with the frontend was to make it as complete as possible so that future improvements would only require changes to the backend code.
For file uploads, we implemented something called a presigned URL flow to handle large files. Instead of sending the file through our API (which has a 10MB size limit), the frontend asks the backend for a temporary upload link, uploads the file directly to Amazon S3 (cloud storage), and then tells the backend to proceed. This lets users upload files of any reasonable size without hitting limits.
And here's the fun part: the entire frontend was built using Kiro (an AI-powered IDE from AWS) with my detailed instructions. Now, Kiro is extremely helpful and I wouldn't have finished in such a short amount of time without it, but it was a two-person job (or one person, one robot job). You need to know how to guide it, what to ask for, what you want, and you need to know the technologies well enough to review the output and tell Kiro when something isn't right (as you would with any other AI).
I won't go too deep into the APIs, but I created an API Gateway with several endpoints to make all the frontend features work with AWS services: listing records, creating new generations, managing gold standards, downloading files, and checking job status.
This is where it gets interesting. The backend is a serverless pipeline, meaning there are no servers to manage, just code that runs on demand; orchestrated by AWS Step Functions (a service that coordinates multiple steps in a workflow). The heavy lifting is done by AI agents built on the Strands SDK (an open-source framework for building AI agents) and Amazon Bedrock (AWS's managed AI service).
The Step Function has four main steps. The first gathers the input data. The second and third run in parallel: one for analyzing GitHub repos, one for parsing the SOW. The fourth step takes the results from both and generates the final document.
Here's the detailed flow:
1. Upload & Trigger
When a user submits the form, the frontend packages everything into a ZIP file containing a github_urls.json file (with the repository URLs) and the SOW PDF. This ZIP gets uploaded to an S3 bucket. As soon as the file lands, an S3 event notification triggers a Lambda function (a small piece of code that runs in the cloud) that kicks off the Step Function pipeline. You might wonder, why not trigger the Step Function directly from S3? The reason is that we needed to control the execution ID. The frontend generates a unique job ID when the user submits the form, and that same ID is used as the Step Function execution name. This way, the frontend knows exactly which execution to poll for status updates. If S3 triggered the Step Function directly, it would get an auto-generated ID and the frontend would have no way to track it.
2. Metadata Extraction
The first step opens the ZIP, extracts the GitHub URLs, and passes them to the next steps.
3. Parallel Processing
This is where the magic happens (Abracadabra!). Two processes run at the same time:
SOW Parser — This takes the SOW PDF and sends it to Amazon Bedrock using the Nova Pro AI model. It extracts the meaningful sections from the document: project overview, technical requirements, scope, milestones, deliverables, architecture, pricing, and more. The prompt (the instructions given to the AI) is carefully crafted to skip cover pages, signature blocks, and legal boilerplate; just the substance of the project. For image-heavy PDFs, it converts pages to images and processes them in batches. It handles retries when the AI service is busy, and has robust parsing to deal with the unpredictable ways AI models format their output.
Repository Parser — This is the most sophisticated piece (has Alex’s stamp of approval). It's a multi-phase pipeline that analyzes GitHub repositories using AI agents:
The triage agent is particularly clever. It looks at the directory map and the pre-read files, identifies what information is still missing, and strategically reads additional files to build a complete picture. The tool budget forces it to be selective rather than trying to read everything.
4. Handoff Generation
Once both parallel processes finish, their outputs merge into the Handoff Generator. This uses another AI agent (Claude Sonnet, a model by Anthropic) with a carefully tuned system prompt that positions it as a technical writer. The agent receives the SOW data for project context and the repo knowledge for what was actually built, then generates structured sections:
The key insight in the prompt design: the SOW provides context, but the repo knowledge drives the document. This is a handoff of what was built, not a restatement of what was planned.
5. DOCX Assembly & Completion
The structured content is assembled into a formatted DOCX document (Microsoft Word format), uploaded to S3, and an event automatically updates the record status so the user sees it as "completed" and can download it.
User fills form (GitHub URLs + SOW PDF) ↓ Frontend creates ZIP → uploads to S3 via presigned URL ↓ S3 notification triggers Lambda → starts Step Function ↓ Step 1: Extract metadata (GitHub URLs from ZIP) ↓ Step 2 & 3 (parallel): ├── SOW Parser (AI extracts project info from PDF) └── Repo Parser (AI analyzes GitHub codebase) ↓ Step 4: Handoff Generator (AI writes the document) ↓ DOCX file saved to S3 → record status updated ↓ User downloads the handoff document
Everything is managed with Terraform (infrastructure as code — we define our cloud resources in config files instead of clicking around in the AWS console) and deployed automatically via GitHub Actions (CI/CD — every time we push code, it automatically builds and deploys). The infrastructure includes:
CI/CD is split into two workflows. Frontend changes trigger a build, sync to S3, and CloudFront cache invalidation. Backend changes package all Lambda functions and run Terraform to update the infrastructure. Both workflows support development and production environments based on the Git branch.
Three days is not a lot of time to build a full-stack application with an AI pipeline. We had to make tough calls about what to include and what to cut. Authentication was the first thing to go; we knew we wanted it, but it wasn't essential for the demo. The gold standards backend integration was another casualty. We have the frontend ready to upload and manage reference documents, but the backend doesn't use them yet as few-shot examples for the AI.
AI models are unpredictable in how they format their output. Sometimes they wrap JSON in markdown formatting. Sometimes they add a conversational introduction before the actual data. Sometimes they double-encode strings. We built robust parsing layers with multiple fallback strategies to handle all these edge cases reliably.
We hit API Gateway's 10MB payload limit early on when users tried to upload SOW PDFs larger than 5-6MB. The fix was the presigned URL flow mentioned earlier, the frontend gets a temporary upload URL, uploads directly to S3, then signals completion. This required adding cross-origin configurations to the S3 buckets and restructuring the creation flow into multiple steps.
The repo parser and handoff generator are heavy, they load the AI framework, make multiple API calls to Bedrock, and process large amounts of text. We had to set generous timeouts (up to 15 minutes for the repo parser) and optimize the packaging. The SOW parser needed to be containerized (packaged as a Docker image) because its PDF processing dependencies were too large for a regular Lambda function.
By the end of the hackathon, we had a working end-to-end system. You paste in a GitHub URL, upload a SOW, and a few minutes later you get a DOCX document with a structured handoff covering the project overview, architecture, deployment instructions, and usage guide. The document is driven by what's actually in the codebase, not just what the SOW promised.
The biggest impact is time saving. What used to take days of manual writing now produces a solid first draft in minutes. It's not perfect, a developer still needs to review and refine it, but the starting point is dramatically better than a blank page.
The re-run feature turned out to be particularly useful, you can tweak the GitHub URLs or swap the SOW and regenerate without starting from scratch. This makes it easy to iterate on the output until you're happy with it.
We came out of the hackathon with a working product and a long list of things we'd love to add:
AI is not magic. It needs context. The quality of the handoff document is directly proportional to the quality of the context you feed the AI. The multi-phase repo analysis pipeline exists because just dumping a README into an AI model produces shallow results. You need to strategically gather information from multiple sources.
Prompt engineering is real engineering. The difference between a mediocre output and a good one often came down to a single sentence in the system prompt. Telling the agent "this is a handoff of what was built, not a restatement of the SOW" fundamentally changed the output quality.
Serverless is great for hackathons. The entire backend runs on Lambda and Step Functions. No servers to manage, no complex deployments. Terraform made it reproducible across environments. We spent our time on the actual product, not on infrastructure.
AI-assisted coding works… with guardrails. Using AI tools like Kiro to help write the code was a massive productivity multiplier. But you have to test constantly. The AI would generate plausible-looking code that had subtle bugs (wrong field names, inverted logic, missing error handling). It's a partnership: you bring the knowledge and direction, the AI brings the speed. Trust but verify.
Cross-functional teams make better products. Having non-engineers on the team wasn't just nice to have, it was essential. Austin, Calli, and Talia shaped the product direction, gathered real-world examples to test with, and made sure we were solving the right problem. The engineers could focus on building while the rest of the team handled design, testing, and presentation.
Built during the Metal Toad hackathon by the Handover Alliance team: Alex Rothier, Austin Verdi, Calli Vieira, David Dolan, Talia Lippincott, and myself.