Building an AI-Powered Handoff Document Generator in 3 Days

Written by Anne Gonçalves, Software Engineer | Mar 18, 2026 6:55:19 PM

The Hackathon

During the first week of March 2026, Metal Toad held its semestral hackathon. The theme was Internal Agentic Application, which meant we had to use AI to create a solution for an internal business problem that impacted multiple teams. We had 3 days to ship something working.

I teamed up with five other people: Alex Rothier, Austin Verdi, Calli Vieira, David Dolan, and Talia Lippincott. Our team was diverse, three software engineers, a project manager, a marketing coordinator, and an account manager. We called ourselves the Handover Alliance.

The Problem

After some brainstorming, we landed on a pain point that everyone at the company recognized: handoff documentation.

So what is handoff documentation? At Metal Toad, it's essentially our proof of performance, a document that summarizes what was built for a client, how it works, and how to maintain it. Think of it as the instruction manual you hand over when a project wraps up.

This documentation matters for several reasons:

Client Trust: Not only does the documentation confirm that the deliverables outlined in the SOW (Statement of Work — the contract that defines what we're building) were completed, it shows how our team built the solution.
Speed matters: Getting the solution into production is where our customers start seeing real return on their investment. By proving success in the PoC (Proof of Concept — the prototype that validates what we're building), refining cost estimates, and outlining improvements, we help shorten the path from development to production.
Partnerships: If the project involves AWS Funding, AWS requires a PoP (Proof of Performance — the documentation that verifies the work was completed as agreed) to validate the work delivered, and Metal Toad needs it to recoup our up-front development costs.

So why is handover documentation a current problem for our business? Writing these documents is time consuming, time sensitive, hasn't been standardized, and, well, it can get delayed or even forgotten (not so much the latter). Someone has to sit down and write a N-page document explaining everything that was built. And that can take days, sometimes weeks.

But here's the thing: most of the information already exists. It's in the codebase, in the README files, in the infrastructure configs, in the SOW itself. It just needs to be extracted, organized, and written up in a client-friendly format.

Our Mission

The Handover Alliance believes that our Customer Handover Documentation should be Clear, Concise, and Easier for our teams to create.

We understood from the start that this tool wouldn't replace the developer writing the handoff. But it would give them a solid starting point instead of staring at a blank page. The goal was to go from "I need to write this entire document from scratch" to "I need to review and polish what was automatically captured from all of our hard work and generated up to this point."

The Approach

The plan was straightforward: use a project's GitHub repositories and its SOW as the base for generating the handoff document. The tool would:

Analyze the GitHub repos to understand what was actually built
Analyze the SOW to understand the client commitments and project context
Feed everything to AWS Bedrock (Amazon's AI service) to generate the content
Output a polished DOCX document

No manual writing. No copy-pasting from READMEs. Just AI doing the heavy lifting.

Dividing the Work

We split the work based on strengths. As a software engineer, I worked on the frontend, the API Gateway (the layer that connects the frontend to the backend), and the APIs. David and Alex, the other two engineers, tackled the backend pipeline. Alex worked on the step that analyzes GitHub repositories, and David handled the step that parses the SOW document, and the rest was a mixture between the 2 of them. Austin, Calli, and Talia helped design the flow/features on our Miro board, came up with ideas of features for both frontend and backend, gathered examples of previous handoff documents we could test with, created the presentation slides, and helped present at the end.

What We Built

The Frontend

I built a React + TypeScript web application using Material UI for the interface components. We originally planned to use Next.js as the framework, but quickly realized it wouldn't work for our setup. Next.js is a full-stack framework that needs a server running to handle things like server-side rendering and API routes, which is great when you have a Node.js server, but we were hosting our frontend as static files in an S3 bucket behind CloudFront. S3 can only serve static files (HTML, CSS, JavaScript), it can't run a server. So we switched to Vite, which is a pure build tool that compiles everything into static files that S3 can serve directly. Same React code, just a different way of packaging it.

The design follows a minimalist, futuristic aesthetic with Metal Toad's signature orange as the primary color. It has three main pages:

Home — A dashboard showing all handoff generations with their status (pending, completed, failed), filterable and paginated. Each record shows the job ID (from the Step Function), how many GitHub URLs were included, the SOW filename, and when it was created. Completed jobs get a download button, and any job can be re-run. Final documents are DOCX since we can open it and export as PDF or any way we want it.
Form — Where you create a new generation. You add one or more GitHub repository URLs and upload a SOW PDF. The form also supports re-runs, where it pre-fills the data from a previous job and lets you modify the URLs or swap out the SOW. Or you can just re-run with no changes and see if the new results are more to your liking.
Gold Standards — A reference document library. You can upload example handoff documents that serve as quality benchmarks. These can be viewed inline, downloaded, or deleted.

The UI is fully responsive. On mobile, the sidebar collapses into a hamburger menu, page titles move to the top bar, and buttons stack vertically for better touch targets. We added light/dark mode support with a toggle in the sidebar (for the vampire devs such as myself). The app also has proper metadata and SEO tags, and the form page URLs are shareable; you can copy the link and send it to someone else if needed.

The main goal with the frontend was to make it as complete as possible so that future improvements would only require changes to the backend code.

For file uploads, we implemented something called a presigned URL flow to handle large files. Instead of sending the file through our API (which has a 10MB size limit), the frontend asks the backend for a temporary upload link, uploads the file directly to Amazon S3 (cloud storage), and then tells the backend to proceed. This lets users upload files of any reasonable size without hitting limits.

And here's the fun part: the entire frontend was built using Kiro (an AI-powered IDE from AWS) with my detailed instructions. Now, Kiro is extremely helpful and I wouldn't have finished in such a short amount of time without it, but it was a two-person job (or one person, one robot job). You need to know how to guide it, what to ask for, what you want, and you need to know the technologies well enough to review the output and tell Kiro when something isn't right (as you would with any other AI).

I won't go too deep into the APIs, but I created an API Gateway with several endpoints to make all the frontend features work with AWS services: listing records, creating new generations, managing gold standards, downloading files, and checking job status.

The Backend Pipeline

This is where it gets interesting. The backend is a serverless pipeline, meaning there are no servers to manage, just code that runs on demand; orchestrated by AWS Step Functions (a service that coordinates multiple steps in a workflow). The heavy lifting is done by AI agents built on the Strands SDK (an open-source framework for building AI agents) and Amazon Bedrock (AWS's managed AI service).

The Step Function has four main steps. The first gathers the input data. The second and third run in parallel: one for analyzing GitHub repos, one for parsing the SOW. The fourth step takes the results from both and generates the final document.

Here's the detailed flow:

1. Upload & Trigger

When a user submits the form, the frontend packages everything into a ZIP file containing a github_urls.json file (with the repository URLs) and the SOW PDF. This ZIP gets uploaded to an S3 bucket. As soon as the file lands, an S3 event notification triggers a Lambda function (a small piece of code that runs in the cloud) that kicks off the Step Function pipeline. You might wonder, why not trigger the Step Function directly from S3? The reason is that we needed to control the execution ID. The frontend generates a unique job ID when the user submits the form, and that same ID is used as the Step Function execution name. This way, the frontend knows exactly which execution to poll for status updates. If S3 triggered the Step Function directly, it would get an auto-generated ID and the frontend would have no way to track it.

2. Metadata Extraction

The first step opens the ZIP, extracts the GitHub URLs, and passes them to the next steps.

3. Parallel Processing

This is where the magic happens (Abracadabra!). Two processes run at the same time:

SOW Parser — This takes the SOW PDF and sends it to Amazon Bedrock using the Nova Pro AI model. It extracts the meaningful sections from the document: project overview, technical requirements, scope, milestones, deliverables, architecture, pricing, and more. The prompt (the instructions given to the AI) is carefully crafted to skip cover pages, signature blocks, and legal boilerplate; just the substance of the project. For image-heavy PDFs, it converts pages to images and processes them in batches. It handles retries when the AI service is busy, and has robust parsing to deal with the unpredictable ways AI models format their output.

Repository Parser — This is the most sophisticated piece (has Alex’s stamp of approval). It's a multi-phase pipeline that analyzes GitHub repositories using AI agents:

Clone: Downloads the repository code (using a secure token from AWS Secrets Manager for private repos)
Map: Generates a directory tree, a bird's-eye view of every file and folder in the project
Pre-read: Automatically reads the most important files: README, package.json, Dockerfile, Terraform configs, CI/CD workflows, dependency files, etc.
Triage: An AI agent with the ability to read files and list directories explores the repo to fill in gaps. It has a budget of 12 tool calls to prevent it from reading everything, and security protections to keep it inside the repository
Synthesis: Another AI agent takes all the gathered context and produces structured knowledge: project overview, architecture, API documentation, infrastructure, dependencies, and deployment instructions
Assembly: The raw output is cleaned up and normalized into a consistent format

The triage agent is particularly clever. It looks at the directory map and the pre-read files, identifies what information is still missing, and strategically reads additional files to build a complete picture. The tool budget forces it to be selective rather than trying to read everything.

4. Handoff Generation

Once both parallel processes finish, their outputs merge into the Handoff Generator. This uses another AI agent (Claude Sonnet, a model by Anthropic) with a carefully tuned system prompt that positions it as a technical writer. The agent receives the SOW data for project context and the repo knowledge for what was actually built, then generates structured sections:

Overview
Architecture
Deployment
How to Use/Run
API Endpoints (only included if the repo has API documentation)

The key insight in the prompt design: the SOW provides context, but the repo knowledge drives the document. This is a handoff of what was built, not a restatement of what was planned.

5. DOCX Assembly & Completion

The structured content is assembled into a formatted DOCX document (Microsoft Word format), uploaded to S3, and an event automatically updates the record status so the user sees it as "completed" and can download it.

The End-to-End Flow

User fills form (GitHub URLs + SOW PDF) ↓ Frontend creates ZIP → uploads to S3 via presigned URL ↓ S3 notification triggers Lambda → starts Step Function ↓ Step 1: Extract metadata (GitHub URLs from ZIP) ↓ Step 2 & 3 (parallel): ├── SOW Parser (AI extracts project info from PDF) └── Repo Parser (AI analyzes GitHub codebase) ↓ Step 4: Handoff Generator (AI writes the document) ↓ DOCX file saved to S3 → record status updated ↓ User downloads the handoff document

Infrastructure

Everything is managed with Terraform (infrastructure as code — we define our cloud resources in config files instead of clicking around in the AWS console) and deployed automatically via GitHub Actions (CI/CD — every time we push code, it automatically builds and deploys). The infrastructure includes:

5 S3 buckets (cloud storage): frontend (hosts the website), trigger (receives uploaded ZIPs), records (stores job metadata), outputs (stores generated DOCX files), gold standards (reference documents)
13 Lambda functions: 7 for API endpoints, 6 for the processing pipeline
REST API Gateway with CORS support (allows the frontend to talk to the backend)
CloudFront (CDN — delivers the website fast to users worldwide) with security headers
Step Functions for pipeline orchestration
EventBridge for routing completion events
Secrets Manager for securely storing the GitHub access token
A Strands Lambda Layer packaging the AI agent framework and its dependencies

CI/CD is split into two workflows. Frontend changes trigger a build, sync to S3, and CloudFront cache invalidation. Backend changes package all Lambda functions and run Terraform to update the infrastructure. Both workflows support development and production environments based on the Git branch.

Challenges & Difficulties

Compressed Timeline

Three days is not a lot of time to build a full-stack application with an AI pipeline. We had to make tough calls about what to include and what to cut. Authentication was the first thing to go; we knew we wanted it, but it wasn't essential for the demo. The gold standards backend integration was another casualty. We have the frontend ready to upload and manage reference documents, but the backend doesn't use them yet as few-shot examples for the AI.

LLM Output Parsing

AI models are unpredictable in how they format their output. Sometimes they wrap JSON in markdown formatting. Sometimes they add a conversational introduction before the actual data. Sometimes they double-encode strings. We built robust parsing layers with multiple fallback strategies to handle all these edge cases reliably.

File Size Limits

We hit API Gateway's 10MB payload limit early on when users tried to upload SOW PDFs larger than 5-6MB. The fix was the presigned URL flow mentioned earlier, the frontend gets a temporary upload URL, uploads directly to S3, then signals completion. This required adding cross-origin configurations to the S3 buckets and restructuring the creation flow into multiple steps.

Lambda Cold Starts & Timeouts

The repo parser and handoff generator are heavy, they load the AI framework, make multiple API calls to Bedrock, and process large amounts of text. We had to set generous timeouts (up to 15 minutes for the repo parser) and optimize the packaging. The SOW parser needed to be containerized (packaged as a Docker image) because its PDF processing dependencies were too large for a regular Lambda function.

Results

By the end of the hackathon, we had a working end-to-end system. You paste in a GitHub URL, upload a SOW, and a few minutes later you get a DOCX document with a structured handoff covering the project overview, architecture, deployment instructions, and usage guide. The document is driven by what's actually in the codebase, not just what the SOW promised.

The biggest impact is time saving. What used to take days of manual writing now produces a solid first draft in minutes. It's not perfect, a developer still needs to review and refine it, but the starting point is dramatically better than a blank page.

The re-run feature turned out to be particularly useful, you can tweak the GitHub URLs or swap the SOW and regenerate without starting from scratch. This makes it easy to iterate on the output until you're happy with it.

Future Improvements

We came out of the hackathon with a working product and a long list of things we'd love to add:

Authentication — The app currently has no login. We'd add authentication so only Metal Toad employees can access it.
Gold standard integration — The reference documents are uploaded but not yet fed to the AI as examples. Using them as few-shot examples would significantly improve output quality and consistency.
Integration with other systems — Tools like HubSpot, Jira, or Confluence could have important context relevant to the handoff document. Pulling data from these sources would make the output richer.
Client-side repositories — Currently the tool only works with GitHub repos accessible via Metal Toad's token. Supporting repos stored on the client's side would expand its usefulness.
Better DOCX formatting — Adding Metal Toad letterhead, branding, colors, and better styling to the output document.
Tokens for other private repos — Right now it only works with public repos and Metal Toad's own private repos. Supporting other organizations' private repos would require additional token management.
Architecture diagrams — Generating Mermaid diagrams (visual architecture diagrams from code) and rendering them in the document.
Improve the repo parser — This is the most "agentic" step and has the most room for improvement in how it explores and understands codebases.
Multiple repo output — The pipeline processes multiple repos, but the final document currently only uses the first one. Merging knowledge from all repos into a single coherent document is the next step.
Streaming status updates — Instead of refreshing to check if your document is ready, real-time updates would show progress as each step completes.
Customizable templates — Let users define their own document structure and sections rather than using the fixed format.
Better error handling — When a step fails, the user just sees "failed." More detailed error reporting would help users understand what went wrong and how to fix it.

Things We Learned

AI is not magic. It needs context. The quality of the handoff document is directly proportional to the quality of the context you feed the AI. The multi-phase repo analysis pipeline exists because just dumping a README into an AI model produces shallow results. You need to strategically gather information from multiple sources.

Prompt engineering is real engineering. The difference between a mediocre output and a good one often came down to a single sentence in the system prompt. Telling the agent "this is a handoff of what was built, not a restatement of the SOW" fundamentally changed the output quality.

Serverless is great for hackathons. The entire backend runs on Lambda and Step Functions. No servers to manage, no complex deployments. Terraform made it reproducible across environments. We spent our time on the actual product, not on infrastructure.

AI-assisted coding works… with guardrails. Using AI tools like Kiro to help write the code was a massive productivity multiplier. But you have to test constantly. The AI would generate plausible-looking code that had subtle bugs (wrong field names, inverted logic, missing error handling). It's a partnership: you bring the knowledge and direction, the AI brings the speed. Trust but verify.

Cross-functional teams make better products. Having non-engineers on the team wasn't just nice to have, it was essential. Austin, Calli, and Talia shaped the product direction, gathered real-world examples to test with, and made sure we were solving the right problem. The engineers could focus on building while the rest of the team handled design, testing, and presentation.

The Stack

Frontend: React, TypeScript, Material UI, Vite, React Router
Backend: Python 3.12, Strands SDK, python-docx, boto3
AI Models: Claude Sonnet 4.6 (handoff generation, repo analysis), Amazon Nova Pro (SOW parsing)
Infrastructure: AWS Lambda, Step Functions, S3, API Gateway (REST), CloudFront, EventBridge, Secrets Manager, Bedrock
IaC: Terraform
CI/CD: GitHub Actions
Hosting: CloudFront + S3 (frontend), Lambda (backend)

Built during the Metal Toad hackathon by the Handover Alliance team: Alex Rothier, Austin Verdi, Calli Vieira, David Dolan, Talia Lippincott, and myself.

View full post