Building a bare-bones local orchestration

Categories: orchestration, ollama, python, ai, joplin

The End Goal

Build out a very simple, auditable, and local example of how to orchestrate a workflow to help me generate new blog post ideas from my Joplin markdown notes. I want to keep the workflow tightly designed so that I can audit and understand each step of the workflow. I also want this to be a local tool that does not require outside access. Eventually, I will want to start adding tools that my tool can use, but I want to limit the blast radius for any unintended consequences.

I want to keep the model grounded in the retrieved notes and will provide some simple sterring prompts like suggested topic areas or known “blog series” that I want to produce. This prompt should return ranked suggestions for blog titles, outlines and what notes where found to support content for those posts.

Starting with the basics

At a high level, an agent orchestration is just a program that decides:

what the model should see,
what tools it is allowed to use,
when to call those tools,
how to store memory or state,
and when to stop.

A workflow follows a known sequence: classify → retrieve documents → summarize → save result. An agent is more dynamic

The mental model

Think of a local agent as five layers stacked on top of each other.

Layer 1: the model

This is the brain-shaped text engine, but not the whole system. Locally, that might be Ollama serving a model on your machine. Ollama now supports both tool calling and structured outputs, which matters because it lets the model either request a tool or produce JSON that your code can trust more than free-form prose.

Layer 2: the tools

A tool is just a function the model is allowed to invoke. It could read a file, search your notes, create a calendar event, run a shell command, or query a SQLite database. The model itself does not “have access” to anything unless you explicitly give it a tool.

Layer 3: the orchestrator

This is the traffic cop. It decides what prompt goes to the model, whether tool calls are allowed, how results are fed back, how many retries are allowed, and when the process ends. LangGraph, for example, models this as a graph with shared state and nodes that read from and write to that state.

Layer 4: memory/state

This is where prior messages, tool outputs, task variables, or scratch data live. In LangGraph, short-term memory is treated as part of graph state and can be persisted with checkpoints.

Layer 5: the data boundary

This is the part that I care about most. Which files can be read? Which folders are exposed? Which tools can write? Which logs are saved? Since I distrust cloud models (especially for security and privacy), this boundary matters more than the cleverness of the prompts.

At this point, you might be thinking “why not just use OpenClaw if you want an assistant or LangChain to build your own tools?”

Simply put, I do not trust OpenClaw and I would rather start super small and deepen my own understanding of how these things work. I am an engineer after all.

Code vs Models

Writing my own orchestration will also let me dig in and let code do what code is good at and the model to what models are good at.

Code is good at:

validation
branching
permissions
retries
schemas
persistence
deterministic transforms

Models are good at:

interpretation
summarization
classification
drafting
ranking fuzzy options
extracting meaning from messy text

The Project Structure

local_blog_agent/
├── data/
│   ├── joplin_export/
│   ├── processed/
│   │   ├── notes.jsonl
│   │   ├── chunks.jsonl
│   │   └── tfidf_index.pkl
│   └── outputs/
│       ├── blog_ideas.json
│       └── blog_ideas.md
├── src/
│   ├── config.py
│   ├── ingest.py
│   ├── chunking.py
│   ├── retrieve.py
│   ├── prompts.py
│   ├── llm.py
│   ├── schemas.py
│   ├── pipeline.py
│   └── cli.py
├── requirements.txt
└── README.md

All of my notes are written in markdown and while not all of them include frontmater some of them will which makes my output structure remaining consistent so much more important.

A simple schema like:

from pydantic import BaseModel, Field
from typing import List


class BlogIdea(BaseModel):
    rank: int
    title: str
    angle: str
    why_this_fits: str
    target_audience: str
    suggested_outline: List[str]
    source_chunks: List[str]
    confidence: float = Field(ge=0.0, le=1.0)


class BlogIdeasResponse(BaseModel):
    ideas: List[BlogIdea]

Will force my response into a known shape and can validate against this shape to fail loudly if the output is junk.

Where to go next

Once I have validated that this tool is useful and generating predictable suggestions for blog ideas with a given steering prompt, I would like to generate multiple search queries from a giving steering topic.