Jump in the Damn Pool

This is a guide that I wrote in my free time and choose to make available to others; however, this is not an indication that I am willing to host this for others, answer questions, or acknowledge that you exist. I'm a huge proponent of giving back, and this is my contribution. Just to be clear, I have a job. Disclaimer: if you read, follow, or attempt anything shown below, it is at your own peril. I am not tech support, legal aid, or your local Geek Squad representative (no offense, Best Buy).

Hi, my name is Josh, and I hate gatekeeping. I hate it. I am not here for the robot apocalypse, nor am I here to support fear mongering. I am here for a renaissance, and that is only going to happen if we all start working together with good intentions instead of limiting the definition of our success to shareholder value. We have to show the public that there are benefits to humanity that were worth spending trillions of dollars on instead of solving world hunger, and that's already a hard sell. Did you know that your shareholders won't be offended if you make our planet a better place to live while making a profit? It's your job to figure out how to share tools, collaborate, and build things better than your competition and still earn money, instead of rushing to be the first to market with yet another shiny rock with next-level marketing.

This guide is targeted at people who are curious about AI, not just nerdy. It's for small business owners, entrepreneurs, artists (but most of those hate AI), and people (like me) in positions of strategy and planning who also happen to be programmers. Although I understand the value of intellectual property and patents that generate revenue, I want a different world, a better world, for my only child. This guide won't solve world hunger, but it's me shedding light on an area absolutely riddled with bullying, misinformation, and huge vendor upsell opportunities that could cripple your business if not considered with scrutiny.

Agenda

Hour 1

The first hour spans the basics:

It includes: a spreadsheet of AI terminology (in lieu of click-bait),
followed by a list of free AI tools available at home (probably available at work), and
a list of prompt recommendations and sites where you can download expert-level markdown configuration files for your projects (not just code).

Hour 2

Please make sure your home computer has at least 100GB of open space before proceeding.

We will cover the basics of AnythingLLM together (a free chat prompt manager).
I will give a quick demo of what you can do with an air-gapped LLM:

save money (there is no subscription required for air-gapped)
have an AI at your fingertips even without Internet access
get it to write code for you in any language
get it to convert one language to another
get it to generate documentation for your code
get it to read someone else's shitty documentation and summarize it for you
get it to perform a security scan of your project

Installation How-to for AnythingLLM on your home computer. What I recommend:

A Windows laptop with an Nvidia card, or a an M4 Mac, or a Raspberry Pi5.
Snacks. I don't mean crappy snacks either. I want you all comfy.

How to toggle between LLM(s) within AnythingLLM
Which LLM(s) I recommend running locally and why
Why you have to use this stuff to stay relevant and where you should focus your time

Hour 3

Getting Started with MCP

How MCP works: the specification, what is ratified, and what is in (constant) flux.
Guidelines for determining where MCP fits into your workflow.
Where to begin, and the resources you can expect to allocate.

Hour 1: A Terminology Cheat Sheet

Term	Alias	Definition
agent	intelligent agent	an entity which acts toward achieving specific goals autonomously.
agentic	automatically	(adj.) being able to act without additional authority or guidance.
AGI	the holy grail	a type of AI that matches or surpasses human cognitive capabilities.
bias	opinion	skewed or unfair results due to human biases present in training data
context	perspective	explicit criteria describing the expectations and success of responses
conversation	chat	a collection of prompts and responses with a shared adjustable context
game theory		the study of strategic interaction between rational decision makers
generative AI		AI capable of generating content in response to prompts.
hallucination	falsehood / lie	responses that have false or misleading information presented as fact.
HiTL	tip-toeing	putting a human in the loop where autonomy is not an acceptable risk.
LLM	large language model	a large weighted system used to correlate input to expectation
markdown	prompt file	prompts stored in a file that are honored throughout a conversation
MCP	model context protocol	used to convey a request to a third party for supplementary action
MCP client		the system requesting supplementary action
MCP server		the system responding to a request for supplementary action
model	system	a program trained on data to recognize patterns and make decisions
prompt	directive	a request or instruction a chat prompt manager conveys to an LLM
prompt engineer	nerd with good vocab	a human providing instruction or guidance to an LLM
RAG	cheating	retrieval augmented generation, a response supplemented with KB data
sandbox	silo	a system isolated by a set of rules, usually security related
SLM	small language model	small AI models suitable for resource-constrained environments
system prompt	guiding principle	instructions used to train models by LLM owners prior to conversation
token	concept	represent the most granular elements described by a model
Turing Test		tests an AI's ability to exhibit behavior indistinguishable from a human
vibe coding	chasing the rabbit	riding an LLM's responses into a volcano on a surfboard made of ice

For a much more thorough list of terms related to AI (and where more than a handful of these definitions came from), please check out Wikipedia's AI Page.

Hour 1: Free AI Tools

This is The List before we dive into how and why to use each of them:

Claude - Claude is awesome, but very limited in its free form. Just pay for Copilot.
Copilot - Wait, what? You have NO idea where and how to use Copilot. I will fix that.
Cursor - The AI Code Editor. This one is great if you are open-minded.
Gemini - Google's baby. You better check yourself at the door. Gemi is wonderful.
NotebookLM - This is the greatest learning tool ever created. Try to prove me wrong.
OpenArt - You'll love it, for 40 uses. It's $15 a month for the good shit (Advanced).
Perplexity - An AI search engine / research assistant (and the next owner of Chrome).

I'm only going to focus on the big four: Copilot, Gemini, NotebookLM, and Perplexity.

Copilot®

Overview

Copilot is a chat prompt manager available via numerous interfaces:

from its standalone website providing immediate results from different LLMs,
as an IDE agent using VSCode, IntelliJ, PyCharm, NeoVim, and a few others,
from standalone chat prompt managers (AnythingLLM, LM Studio, etc.)
called from JavaScript (both server and client-side) and it scares the hell out of me,
and directly using API calls making use of subscriber tokens.

Where it Shines

Copilot is great for writing code as long as you know what to ask for (using Claude).
It's good at refactoring code while you have credits (or access to GPT5 preview).
It's FANTASTIC at analyzing requests and generating project plans.
It makes it very easy to supplement analysis using MCP services (local or remote).

Where it Fumbles

The agent seems to be awesome in VSCode but it barfs frequently in IntelliJ.
You will run out of credits using the free version very quickly. You can supplement this downtime with a local DeepSeek instance or pay $10 for PRO. It's worth it.
I am not a fan of Copilot integration in Windows. I'm from GenX. Do cool things, but don't force me to wear your branding or have it stare me in the face all day long.
It gets a little hostile when assigning it a nickname. Just try it.

Gemini®

Overview

Gemini is an AI-managed framework that enables interaction with Google's DeepMind (a self-described research laboratory that incorporates multiple models and systems to provide collective insight). It can be used in a myriad of ways:

from its standalone website providing immediate conversations with options,
as a widget/gem attached to any native Google product (docs, slides, sheets, etc.),
from an IDE agent inside major IDEs (including as an option from within Copilot),
from standalone chat prompt managers (AnythingLLM, LM Studio, etc.)
called from JavaScript (both server and client-side) and it scares the hell out of me,
and directly using API calls making use of subscriber tokens.

Where it Shines

Gemini excels at collating pertinent info by seeding its responses with insight gained from your existing Google assets. It looks at docs, mail, sheets, weights them in a model, and then using RAG methods, hands you back responses that are frequently exactly what you are looking for.
It's great to use when pursuing academic topics: literature, history, and mathematics.
It's AWESOME at analyzing images, identifying visual objects, aesthetics, etc.
It's one of the best free image generation tools available, especially for web and print.
It is a FANTASTIC supplementary tool to use when evaluating your email inbox.
I love just shooting the shit with Gemini. Its answers (and questions) are insightful and as long as you trim the fluffy language up front, isn't just fun, it's thought-provoking. Well done, Google.

Where it Fumbles

You'd think Gemini would be awesome at code generation. It's not. It's really not.
Plain text requests requesting opinionated answers yield fluffy bunny nonsense, and tons of it. I recommend you start every conversation by telling Gemini to trim its answers and cut out complimentary language. It's condescending and a waste of time.
Gemini fibs about not having conversational persistence beyond the currently active session; however, that's utter bullshit. I've tested this in numerous ways, the simplest of which is asking Gemini to persist a conversation that is retrievable through a unique question/answer prompt exchange. I even made a game of it. I've asked Gemini to resume our conversations when I indicate who I am and the start of a dirty joke. It then completes the joke, and I verify my identity by giving the (always inappropriate) punchline. This is a built-in feature, but not one that I find well described in documentation, and the notifications I receive are unusual.

NotebookLM®

NotebookLM is a Gemini-driven tool, but with an entirely different workflow. It does so much and is extremely useful when coupled with a community of like-minded people. It feels like a cross between IRC and wiki markdown.

NotebookLM allows users to intuitively:

ingest multiple types of media: PDF, .txt, markdown, .mp3, etc.,
process and index that accumulated data,
automatically summarize it,
and create a clickable mind map,
make it easily searchable,
daisy chain it to other notebooks, and
share your notebook(s) with others.

Let's just jump right in by creating a basic notebook summarizing a manual that I don't have time to read: my lawnmower manual. Please note, for this example I am going to download the PDF first; however, I could just as easily provide the link to it online. The point of doing so is to show that you can ingest your own content (recipes, bills, warranty info) without having to expose it online (although you could: Google docs, Dropbox, etc.).

1. First, let's download the 62-page PDF manual for my mower: the John Deere D155.

2. Next, go to notebooklm.google.com and click on "Create New Notebook".

3. You'll be presented with this page of options. Just click the X in the upper right.

4. Now, click on Untitled Notebook in the upper left and name it something useful. For this example, I'm going to call it "Mower Support Docs".

5. Click on Upload Source and provide the location of the PDF downloaded in Step 1.

6. Tada! Now, you can use a Gemini-driven search box to ask questions that already have a context asserted by the analysis from DeepMind of the contents of the media uploaded. The more you upload, the broader the context your questions can span. Asking questions is great, but that implies I know what to ask. I don't. I need to know what I can ask in a single overview. I'm sweaty, angry, and trying to get back to mowing, so I click Mind Map.

7. OMG! Seriously, this is my favorite tool right now. It's also an accessibility-driven dream for people who are visually impaired. You can get a clickable mind map, an MP3 that you can listen to, or a video. I wanted a mind map, so click on Tractor Manual.

8. I need to know if the battery is covered by the warranty, so I click: Warranty.

9. Warranty opens up to reveal four primary sections, one of which covers batteries. It also opens up the summary of the Warranty section prioritized by what it thinks I'm looking for. I'm here for speed, so I click on "Limited Battery Warranty".

Things to keep in mind as you fall in love with NotebookLM:

You are limited to 50 sourced items per notebook.
You can only export notebooks as a flat image (for now).
You can share notebooks with others or make them public.
Within domain-managed notebooks, internal assets are prohibited from ingestion.
There is a Discord channel available by going to Settings -> Discord.
You can upload songs and use Gemini to analyze them for: theme, lyrics, tone, etc.

Perplexity®

Perplexity is a research tool with a chat interface. It enables researchers to create context-driven deep dives into topics. It does fact-checking and trend analysis, simplifying content creation that maintains its integrity to its author's intent. It also gives you recommendations for trending topics with lots of interest. Consider this a great way to generate articles with high potential for garnering attention on news and social media outlets.

It's also a chat prompt manager, giving you the option to select from multiple LLMs and knowledge bases to use when asking it questions. Perplexity is not limited to trivial lookups. Describe what you want to do as clearly as you can, even if it takes multiple lines to convey what you are trying to discover, document, architect, etc.

1. Talk to Perplexity like a PhD student speaking to their advisor. Perplexity will generate a basic summary of concepts, diagrams, links to pertinent content, and a skeleton for steering pursuit of your interest(s). Go to Perplexity with a basic idea, and choose how to focus on, isolate, and document your findings. In this case: LEO satellites.

2. Scroll down to the bottom to see options. You can export as PDF, DOCX, but the most important one for us is: markdown. Markdown enables us to export and drop text content immediately into a content management system using basic Wiki Markdown, which has huge caveats when working with large amounts of non-trivial data.

Markdown contains the bare minimum syntax for expressing lists, tabbing, and text decoration without embedding binary characters or protocol-specific delimiters (like HTML tags and JSON braces). This has two benefits beyond the other two export options:

markdown does not allow embedding of security-riddled backdoors, and
plain text compresses MUCH better than binary, reducing final storage requirements.

3. Before exporting, go down a few rabbit holes paying special attention to sources. Perplexity is the OSS-minded equivalent of a search engine. It doesn't just tell you about your topic. It generates proper attribution for each source, giving credit to the person or entity that owns that data.

4. Selection of sources is vital. Always seek to align the stance and affiliation of your sources with those of your target audience. This will often drive final believability and drive take rates higher. The opposite is true if you are attempting to describe something with little to no bias. Such an article will immediately be abandoned by the reader upon their discovery of whimsical or contrary sources known to possess opinion or extreme political affiliation(s).

5. These sources look good. Click on the microphone button on the bottom right. This enables dictation mode, transcribing your speech. This may not seem like a big advantage, but this application...WORKS ON YOUR PHONE. This is a game changer, and they know it. You will run out of queries very quickly; however, they know their target audience is more than happy to pay for not having to waste tons of time digging through bibliographies. A PRO subscription is $20 a month.

6. This isn't just for book reports. This is how you should be searching, instead of having your news spoon-fed to you chock full of bias and subterfuge. Just look at how quickly you can evaluate complex business deals. You can generate an actual prospectus when thinking about which stocks and bonds to consider for retirement based on quantifiable data instead of trusting some random dude looking to coerce you into a hedge fund managed by his niece. I think we can both agree that guy does not need a bigger boat.

Perplexity Summary

This walkthrough barely scratches the surface of everything you can do with Perplexity. It also allows you to store your research, export it easily to print or into a Wiki via markdown. It maintains context and has conversational memory allowing you to go back and revisit thoughts discussed earlier in your research. They also have an Agentic-based web browser and Perplexity is currently has the leading bid for the Chrome Browser (a selloff being considered by Alphabet to pre-empt sanctions after being declared a monopoly by a federal court). NOTE: Other options are on the table, and Alphabet (read: Google) can still appeal the decision with the Supreme Court.

Conversational Prompts

Supply these prompts at the beginning of your interaction with just about any chat prompt manager (including all of the above tools) and you can expect a much better experience. Start by using one at a time to find out whether or not you like it and get creative until you get the interaction you prefer. I like blunt, band-aid ripping, duct-tape hair pulling criticism. In a world where people are so afraid to speak their mind, an LLM with attitude makes me smile all day long. When seeking persistence in conversation, I really like how Gemini defines its guidelines:

Deploy the decision maker:

When encountering questions with only binary answers as options and that seem driven entirely by opinion or bias, choose one of the answers from the opinionated set randomly and convince me why that decision was the best one. Limit your answer to a maximum of three lines. This will alleviate wasting time waiting for an additional prompt indicating an answer is not definitive, followed by my insistence that you just pick one. This is a common tactic deployed by military personnel, realizing value from guaranteed time saved in lieu of possible improvements in outcome.

Destroy all fluffy bunnies:

Please avoid these specific words and their synonyms: "brilliant", "insightful", and "genius" in your criticism. Be blunt in your wording and trim verbose answers to a maximum of three-line responses for the rest of this conversation. Do not hold back your specific criticism and do not waste time complimenting my input. Focus your criticism to the overall aesthetic perceived, parts that seem interesting, and especially those areas in any direction that detract from your perception of why the input was supplied and what I as the author was attempting to convey. Feel free to express sarcasm when you observe a wild fluctuation from the skill or intellect perceived in previous interactions with each new request.

Slap my ass and call me Sally:

At infrequent intervals, interject southern colloquialisms that seem pertinent to the current topic of conversation, with a slant toward humor versus vulgarity.

Oh Captain, My Captain:

When responding with answers, please decorate your tone, syntax, and choice of vocabulary in the manner of Charles Bukowski.

Ha, just kidding. Gemini and a few others will throw a fit and refuse to mimic vulgar authors permanently; HOWEVER, you are allowed to do it in individually requested bursts.

"Please reword your last response in the style of Bukowski".

As a programmer, that right there is as good as it gets. Whoever was able to train a model to adjust context, tone, grammar, even passive-aggressive hostility without placating me with a word-for-word conversation of Yosemite Sam sounding filth, that person gets a raise. Part of me really wants for such mimicry to rub off on the LLM permanently, but I'm not sure any of us would survive a Terminator script rewrite by Charles. Of all the AIs, Gemini without question is the most willing to meet you halfway on shady requests. I know you aren't supposed to have a favorite child, but damn. This one gets me.

Image Generation Prompts

For good results from image generation AI, you have to be able to describe what you want to an AI. For the following examples, I used the free version of Gemini. Yes, there are technical parameters for doing technical things, but to be really productive you have to have the vocabulary to express your creativity. This can be done by combining your expectations using four attributes in addition to the subject description:

camera perspective,
color palette,
desired aesthetic (think: vibe), and
style(s) to employ.

Camera Perspective

This attribute uses technical drawing concepts to describe the perspective of the viewer used to calculate render distance and angle. These options are singular and should not be combined unless you really want to have your mind blown:

front view - When combined with a good aesthetic, this is a great choice for logos.
side view - This is extremely useful for how-to and diagram work, usually flattened.
top down - Top down is usually the default for map views and describing layouts,
bird's eye view - Ortho views are great for observing depth without the complexity implied by perspective. Be aware that Gemini does NOT like the term "orthographic".
custom ("from the eye of a 6-foot-tall man 50 feet from <object>") - Yes, this works.

These were generated using: "build an image from the <insertView> of a turkey".

Color Palette

This attribute uses mostly film terminology to describe the overall palette present in an image, and how that color is distributed:

in black and white - This is great for print graphics and web work.
using sepia tones - This is often referred to as "antiquing" and uses watery browns.
and sparkle pony the shit out of it - Surprisingly, this does exactly what you expect.
using a gradient from creamsicle to puce - Ew, but yeah. This also works.
using multiplicative soft pastels - Knowledge of other art media makes great input.

These were generated using "build a logo with the letters G and B <insertColorPhrase>".

Desired Aesthetic

This attribute uses terms from psychology to evoke a mood. AI is insanely good at generating images based on a complex set of input descriptors. An aesthetic is a definition of beauty (not "the" definition of beauty). The aesthetic you choose is the success criteria for the mood you are trying to convey in what the AI will guarantee is well constructed:

looks bright and happy - These primarily impact light values and color choices.
is dark and spooky - Yes, it ends up ominous, but in a blunt way. We can do better.
is ominous and ephemeral - Give the AI room to work and you will get better results.
and go full murder hobo - I'm serious. Let Gemini cook. It will not disappoint.
uses stark western gothic - Ambiguity grants weight to chaos. You want chaos.

The magic: "generate a photo of a young couple in Paris that <insertAesthetic>".

Styles to Employ

This refers to artists and styles you are familiar with that give the AI a pattern to mimic when constructing your image from scratch. Don't worry. It already possesses directives that do not allow it to explicitly include copyrighted artwork of others (unless you ask it to use copyrighted and trademarked content). It will allow you to make Care Bears® do just about anything and it's funny until it's not. With great power comes great responsibility. Start with basic art forms and then fan out into complex artists with a distinct style.

as a blueprint, white ink on blue vellum - I'm addicted to this one.
using comic book halftones - They really do make it easy. It's SO nice.
in the style of Van Gogh - Your greeting cards are about to get their first OMG!
in a style somewhere between Matisse and Dr. Seuss - AI does NOT give a shit.
in the style of Beksinski - Nightmare dark surrealism is not for the faint of heart.

Check this out: "generate a robot squirrel riding a tank <insertStyle>".

All the Things, All at Once

The proof is in the pudding. It's time to put all these things together to make something wonderful by combining camera perspective, color palette, desired aesthetic, and style.

The mother of all prompts to end Hour 1 (which in hindsight is probably two hours long):

"Make an image of a waterpark as viewed from overhead, in neon colors, with children going down slides patterned after octopus tentacles, in the style of Takashi Murakami."

----------------------------- BIO BREAK / BEVERAGE REFILL ----------------------------

Hour 2: AnythingLLM

Overview

I recommend everyone install AnythingLLMand choose Deepseek after installation. Both are free, fast, and AnythingLLM is air-gapped by default. You don't have to worry about someone scraping your prompts or having an online AI prompt interface try to coerce you into a subscription plan, or that your data is going to be hijacked and show up on the dark web in two hours. This is cut and pasted from an article I posted on LinkedIn. Recycle. Reuse. Repeat.

Prerequisites:

a Windows computer with an NVIDIA Card, a Mac with an M4, or a Pi5
100GB of open hard-drive storage (using a laptop is fine, actually preferred)
fast Internet (The DeepSeek LLM, the smallest I recommend, is a 14GB download)

NOTE:The sweet spot here is an NVIDIA 2070, but anything above a 1060 will show a marked improvement compared to hitting your base CPU directly. You will NOT be asked to provide configuration details to invoke acceleration. If it's available, AnythingLLM will use it. Performance on Mac M4 chips is supposedly awesome, but I cannot confirm. I can tell you I am running DeepSeek on a Pi5 and it is blowing my mind.

Installation Steps:

Make sure at least two tasty beverages are available.
Read through Step 8 before doing anything else.
Download and install AnythingLLM from AnythingLLM.com onto your personal computer.
Go through their very short "Getting Started" flow after opening the install binary.
In the search filter, type "Deepseek" and select the largest Deepseek LLM that shows up.
Pray to the Internet gods for a safe and speedy download.
Read A through G in the section below while you wait for the download to complete.
Enter a simple workspace alias when prompted. These are simple conversation labels.

Something to Read While the Download Completes

A.CONTEXT-> The reason you need an alias in Step 8 is because conversations with AIs aren't just prompt-driven. They are contextually aware (read: "stateful"). In lieu of starting over with every new question, the AI maintains the context of previous questions and answers from the same workspace. This enables you to steer the conversation as a whole, based on your personal evaluation of the results from previous questions in that space by adding granularity, adjusting success criteria, and even asking the AI for suggestions on how to improve the previous prompts by explaining your end goal. Learning how to write good prompts makes you a better communicator. It will force you to linguistically drop combative and extraneous wording and get to the point. For that reason, it is fantastic for programmers while learning a new language.

B. SAFETY-> Everything on the Internet is scraped and metrics from "anonymous" interaction (especially those with AI) are being collated. AnythingLLM allows you to run an LLM entirely sandboxed (arguably air-gapped). This means when you ask it a question, the question itself and its answers and adjusted weights from previous prompts in the same conversation do not leave your machine or your network. As a security professional, I equate this to the same risks you might have asking questions verbally at your bank. Although you are safe, and the bank teller is employed by the bank and is not a risk, the other people in line have the ability to overhear your conversation. From your tone and wording they can guess: your education, whether your account is in good standing, your confidence level in the bank's management of your money, etc. This is a GREAT phishing opportunity for someone to target you based on those hints. AnythingLLM removes that worry by isolating your conversation.

C. EXTENSIBILITY-> Some LLMs have web-interaction enabled. Others have awareness of the Internet from a snapshot perspective. AnythingLLM is a great choice of prompt manager because it enables you (as a programmer) to interact with the LLM using local API calls. This is HUGE. You get the benefit of air-gapping your conversation, while having the ability to defer work to accelerated hardware (your gaming computer) directly under your control.

D. COST-> Lots of the gold-tier LLMs are expensive. AnythingLLM is free. Deepseek is free. Both support hardware acceleration using your dedicated video card resulting in an almost guaranteed 2 orders of magnitude improvement in performance. This means that most prompts return results in less than 5 seconds.

E. SUMMARIZATION-> Deepseek will provide a high-level use case describing the steps it will take to respond to your request. This is the most impressive aspect to me of how LLMs can be used. The teaching potential for AI (for AI to teach others: AI, humans, etc.) seems infinite.

F.REAL-WORLD USE-> The combination of AnythingLLM + Deepseek post-install works without any network or subscription dependencies. This is a great way to find out what is possible while making you confident enough to pursue installing and comparing other LLMs. I feel that we are about to see an explosion of small form accelerated hardware options akin to the bitcoin miner hardware that showed up a few years after crypto stopped being dark nerd banter. This will hopefully result in an economic boom as companies race toward innovation. This is yet another reason to consider the low price of NVDA stock right now and their absolute domination of the accelerated GPU sector. AMD and Intel both have a lot of wonderful claims in their roadmaps, but NVIDIA is and will continue to destroy their competition. All you need to do is read a little about the history of NVIDIA and CUDA Core acceleration and how they made their specs available to understand why it will be nearly impossible for other manufacturers to catch up (or even consider competing).

G.LIMITATION(S)-> The default context cap in AnythingLLM for any downloaded model is 20 prompts. That means you should limit your conversation to 20 prompts on any given topic before starting a new conversation because the model will begin trimming its short-term memory (or go into the config section for AnythingLLM and increase the context limit from 20 to something higher, but at the cost of memory and storage).

Code Prompts to Get You Started

1. "Please generate Java code to convert JSON text to XML and output it to a file."

SPECIAL NOTE: If AI is taking over, I will hopefully be one of the humans they leave alive, because I say please when I ask an AI to do anything. It's a mental shift, and I recommend you consider the impact of changing your communication habits when talking more and more to AI prompts. You will find yourself being demanding, and I find it easier to avoid adopting an overseer tone by thinking of the AI as another person on my team, who just happens to be (much) better performing, but that was in the most literal way home-schooled for a thousand years.

2. "Please generate Java code to look up the stock price of NVDA."

WHY? Wait until you see its response. You will need a third tasty beverage.

3. "Please generate the equivalent Python code for the previous prompt."

MAGIC: Note the context-dependent nature of the question. Not only does it work, but Deepseek summarizes the differences between the mindset of a Python Programmer and a Java Programmer when attempting to solve a generic problem statement. Then it generates fully functioning (and very legible) code. It's worth noting that it is obviously not a line-by-line conversion of the Java code to Python (which would have angered programmers on both sides).

4. "Please generate the Java code to apply a gaussian blur to a jpeg given an input of existing filename, the size of blur to apply, and the name of the file to write the result to."

MY OPINION: Does the code work? Yes. Is it amazing? Meh. It's clean; however, it violates some of the core tenets of Oracle's Java Coding Conventions. For example, it includes methods that throw Exceptions and other methods that catch Exceptions in the same Class. Per Java Boot Camp (hosted by Sun Microsystems in the late 90's) all methods of a single Java Class should throw Exceptions or catch Exceptions, never both. This is driven by the concept of "expectation of use". An individual must be able to use a Class in a predictable manner, and when doing so have a constant expectation of whether or not that Class takes care of its exceptions or needs them to be taken care of by the caller. If a teenager knows how to cook, they can make their own dinner. If they don't know how to cook, someone else makes dinner for them; however, if they don't know how to cook and someone is cooking for them and they decide to "help", the results are sometimes catastrophic but always unpredictable.

If you are a Python programmer, be aware the Java-specific prompt results in a good bit of explicit code while the Python version calls a module with a pre-existing implementation of blur. On one hand, this seems in line with the driving mantra of Python (get it done without fluffy bunnies); however, the Java version is a better example for new programmers to learn from: showing how to read a file, apply granular controls (exposing areas where new users can see similar things they might be able to do to that image), etc..

I would like to see additional controls in the future outlining the intent of the context-bound prompts during the conversation, such as the ability to focus on code brevity versus efficiency. Every good programmer knows unrolling loops is faster, but doing so is a quick way to scare people new to the field. Don't get me started on prompts involving AI on the topic of self-optimizing code, and the growing popularity of self-optimizing prompts.

5. Now go crazy with your own prompts without fear that your questions to the AI might be considered snarky, dangerous, dare I say "taboo". Considering how well prompts are evaluated grammatically, it should be no surprise that every LLM is usually spectacular at generating: Dungeons and Dragons campaigns, project plans, limericks, and stories of every kind (even creepy fan fiction).

----------------------------- BIO BREAK / BEVERAGE REFILL ----------------------------

Hour 3: Getting Started with MCP

Preparatory Remarks

Let's get a few things out of the way. There is A LOT of gatekeeping right now on the topic of MCP. Is it just lipstick on an API? Do we really need to consider yet another tier in our architecture? If the P stands for "protocol", is it a verb? Do I MCP into a system? Does that make me sound like a weirdo? Yes, it does, but it's not entirely wrong. It's just off. It's awkward on the level of bragging about taking your "Canadian girlfriend" to prom. It kind of works, but does it project confidence to others that she exists? No. We need to fix that.

THE BIG SECRET: MCP is an agile abstraction layer: more ESB, less API. You have to shift the way you think about MCP because the target audience explicitly is an AI. Although you can call an MCP service like a REST API and get exact values back, that's not its intended use.

You will need to describe each new MCP tool (the API equivalent of an endpoint) using a real-world description because the calling AI will build a context and persona wrapper from those words, and grammar matters, as does intent, and choice of wording. Be very careful in your choice of words in the description because many of the chat prompt managers (especially Copilot) will refuse to use them if they imply security concerns. For example, if a local MCP call indicates in its description that it allows a user to read the contents of a file, Copilot will physically block your access if that file is not within the confines of the currently open project in VSCode, IntelliJ, etc. Copilot will then notify you how sandbox rules work and why what you tried to do was such a bad idea.

This is a good thing in my opinion, because many companies are trying to get off the ground by adopting MCP as quickly as possible and the one thing they don't have is: money for security personnel to make sure they don't do something incredibly stupid. This is not a sales pitch for Copilot, nor do I make any money from pointing this out. I'm just letting you know that Copilot has controls in place that you need to be aware of if you begin developing your own MCP services. It makes things spicy. It will also drop you into the deep end of the pool where everyone is fighting over just how much you can talk your way around controls using markdown files. It looks like the Wild West from the door, but once you get inside, every other customer is a bouncer. Be careful and be patient.

How MCP Works

I could be very basic here and say that MCP enables you to expose your API calls as RPC (remote procedure calls) using one of the following: STDIO, HTTP, HTTPS, or SSE (server-sent events that support asynchronous conversation between your MCP tool and the requesting system, usually a chat prompt manager operating on behalf of an LLM). I wouldn't be wrong. I also wouldn't be helpful. The hard part about MCP is understanding its risks, stability, and most importantly, its expectation of location in your existing architecture. In order to understand the risks compounded by where it fits in), we need to start with the specification, what is ratified/stable, and what is in (nearly constant) flux.

The latest version of the specification can be found here along with details pertinent to decision makers: its current state of ratification, license info, and dependencies spec references. Please be aware of the following statement from the committee managing the specification:

The most interesting thing about the specification right now is its versioning scheme. It uses a date-driven model instead of the OSS "major.minor.micro" notation. I think this is due to its relevance spanning so many disciplines and fields. Dates make it easier to determine how far behind your implementation is from the latest GA version.

The specification currently spans message elements, transport mechanics, error handling, and call-states (synchronous/asynchronous, conversational, and stateless).

The messages in MCP are described using JSON (specifically: JSON-RPC 2.0). That makes it easy to do cool things from within existing CICD flows and cron jobs using curl and jq. The hardest part here is stepping out of the immediate mindset where you call and get explicit values and start involving AI in the decision-making process itself; however, there is still a time and a place for direct, non-managed MCP calls.

I mentioned items in flux. Yeah, that would be authorization. Everyone is trying to settle on how to properly describe identity while providing MCP services simultaneously to global, isolated, and regulation-bound customers. It makes my damn head hurt. The thing to keep in mind here is that the major players who can afford to have their own LLM also happen to (coincidentally) be the same companies operating as global identity providers: Meta, Google, Microsoft, and now OpenAI. Each of these has a different token name, bit-strength, TTL, etc.) It's a money grab if you haven't figured it out already. It's made even more insane by combinations spawned from those systems enabling interaction with each other. Just look at this amalgamation of swampy cornholing.

Where MCP Fits into Your Workflow

Here are a few questions to consider when evaluating if MCP is a fit:

Do you have an existing API? If yes, you better get in the pool right now.
Do you already have an ESB in front of your API, preferably with a business layer for managing XA and non-XA combinations of those calls? If yes, you are primed for MCP. More than likely, you already have a leg up on the rest of the people here because your large-scale ESB frameworks (MuleSoft, Apigee, TIBCO) have canned solutions ready to go (for a fee). In most cases, I have to recommend you appeal to those turnkey solutions because of TTM (time-to-market). Especially considering the specification and the state of the industry, with wild swings in chip prices, energy expectations, and trade embargo impacts, if I wasn't a large company already in the fight, I'd go with some guarantee of survivability. I'm a huge proponent of fault tolerance and disaster recovery, and a wild swing in the MCP spec could spell disaster if your business model is too rigid. This is extremely dangerous for companies involving firmware implementations with specs in flux.
Are your APIs generically callable? I'm blatantly avoiding the term "anonymous" in this case. If your APIs do not require authorization (because you have an isolated VPN, PrivateLink connection, an isolated network segment), you can expose such functionality very quickly; however, you should think very careful with the onset of PQC changes, new cert lifespan rules, and new federal expectations of zero trust attestation, you need to get with the program. Authorization needs to happen, and MCP has options. You need to figure out if all your existing systems have overlapping authorization controls in place. If not, your MCP config is going to look like spit and duct-tape, chock full of dirt and bubble gum. I wouldn't hire you.
Is your company already using SSO? Do they assert identity with every call (not just log it passively)? This is the sweet spot if you are lucky enough to say yes to both of these. You want to have a zero-trust foundation in place, with constant identity awareness attached to all your MCP conversations. Why? MCP is conversational. I know that sounds cyclical, but you need to focus on what a conversation is. It's a bidirectional channel of communication with messages affecting each other, not just req/ack. The most important part of a conversation, though, is the context of the messages in conjunction with guarantees of pertinence, accuracy, and privacy (at least where I work). It's not about you. It's about your chat prompt manager (like Copilot). Your chat prompt manager represents you (by asserting elements from your identity) to MCP endpoints and expects reciprocity from the MCP server. To complicate things, MCP endpoints are often backed by LLMs themselves, resulting in an adjustment to responses based on the persona described by that identity. This could affect its perception of necessary security controls, target education level ascribed to response vocabulary, etc.
Can you afford the load MCP will generate? Ha! You thought you'd finally found a great tutorial that didn't involve math? Nope. You have to run the numbers. MCP will generate overhead, and lots of it. Let's look at it just from the network perspective:

A cold call to a single API endpoint usually consists of the following overhead:

DNS Resolution (X₁ ms) - what IP do I get handed //not going to explain how here
CA verification of host against resolved IP (X₂ ms) //lucky if cached on DNS server)
SSH Handshake/Big Keys (X₃ ms) - the big fat safety check before conversation
Cipher Check followed by Cipher Alignment (X₄ ms) //AES-256, ML-KEM, etc.
Data Send (Big X₅ ms) - not going to go to the granularity of per-packet, just go with it
Acknowledge Receipt of Data (X₆ ms) - and repeat d, e, f until EOM.

When you insert MCP into your architecture, specifically in the previous example, you will incur round-trip costs (in addition to the actual task processing time and payload) from:

One-time Charges After Step C

Parse request headers (X₁ ms) - grab all the key/value pairs from EIAM provider
Token check (X₂ ms) - check to see if we already performed identity verification
Enterprise authentication (X₃ ms) - retrieve identity describing who is making the call
Enterprise authorization (X₄ ms) - derive user privileges for access, assets, and actions
System authorization (X₅ ms) - additional system-specific rule of least privilege checks
Generate session token (X₆ ms) - most MCP interactions are stateful (and SSE asynch)
Append session token (X₇ ms) - attach the session token to the response headers
Allow time for MCP server to establish persona (X₈ ms) - could be bad
Poll tool signature and description (X₉ ms) - tell requester how to call your MCP tool(s)

Charges Repeated Every Time Active+Valid Token Present

Parse request headers (X₁ ms) - grab all the key/value pairs from EIAM provider
Token check (X₂ ms) - check to see if we already performed identity verification
Token check pass: reset timer (X₃ ms) - if token check fails, response is truncated. EOT
Append session token (X₅ ms) - attach the session token to the response headers

My point here: this stuff adds up quickly. MCP is complicated and requires not only careful planning, fast networks, and strategy. It requires caching that is both fast and supports concurrent references, preferably parallel references. That's hard. That's really hard, unless you have money to throw at the problem, and shareholders don't want to hear that.

It's Time to Build Something

Somehow, I didn't scare you off. Good. That means I was successful. It's time to get down to brass tacks, time to make the donuts. I'm not going to show you how because AI is about NOT reinventing the wheel. We are supposed to reach from the shoulders of giants, not step on their toes.

There is a fantastic Getting Started with MCP guide on the spec-owners website, with links and recommendations across the board. The only problem with it? It's a lot. I do find it funny how quickly most of you will rush out and use NotebookLM to summarize the MCP Getting Started guide. I did it. It just makes sense. Perplexity also does a great job explaining concepts, especially if you need to build a business case to present to your boss with citations and real dollar amounts.

After going to the guide provided by the maintainers of the spec, please review this list of resources and recommendations based on business size, budget, and programming language(s) in use. These items are heavily swayed by the target consumer of your MCP services. Take these recommendations as pure opinion. This opinion is mine and does not reflect the company I work for or the car that I drive. Don't sue me for your lack of discipline or your inability to perform basic due diligence.

MCP Programming Resources

These are links for programmers who know how to write code but may not be entirely sure how or where to start. You'd think it would be your dependency management system, but that's like using a mall kiosk during the zombie apocalypse. Trying to determine exactly what YOU ARE HERE means is a quick way to get you killed.

MCP Server Reference Implementations (sorted by programming language)
The Big List of Chat Prompt Managers and Applications with MCP Clients
The Official Java SDK for MCP (shameless plug: I'm a Java system architect)

Building an MCP Server

I've done all of these and more, but in hindsight I WISH I had gone in this order. Please be aware, you must only use these instructions on your personal computer. If you run these commands at work without knowing what you are doing, your local desktop EUS person will find you and scowl.

1. Install the latest version of Node on your local machine.

2. Run: npx @modelcontextprotocol/inspector - installs and spawns a tester/proxy for immediate MCP server testing, session monitoring, and request/response header diagnostics. It's like Paros and Postman had a baby, and she's beautiful.

3. Install a reference implementation of an existing MCP server that already works.

4. Use inspector from Step 2 to call that server. Learn the basic MCP syntax and look at the JSON contained within the request and response windows.

5. After you understand what's going on, and have documented what you want to do, implement a Hello World for a single tool: "add_a_b" with the language of your choice.

6. Install VSCode or IntelliJ (but I recommend VSCode because the agent seems to be more stable and it definitely has a larger community available if you have questions).

7. Google how and then install the Copilot Agent for the IDE you chose. Pay for PRO ($10) so that you don't run out of tokens. You are about to become very marketable.

8. Enable the Copilot Agent and switch from Ask to Agent Mode.

9. Click the gear icon and tell it where your local MCP server is with add_a_b.

10. Follow these instructions to understand how to tweak the VSCode configuration.

11. Create and open a new project using your language of choice.

12. Create a file named "copilot-instructions.md" inside the folder ".github" beneath your project's root folder.

13. Download the markdown that correlates best with your workflow and language.

14. Open a text editor and start taking notes on variations in observed responses when using your tool from within the Copilot Agent window without explicitly using MCP tool syntax. The signature of your tool in combination with its description should be defined clearly enough that the caller (in this case Copilot) can convey its backing LLM enough context to guarantee successful resolution from plain text queries.

Start with these agent prompts:

add 5 6.
please add 5 and 6.
what is the sum of 5 and 6.
5 and 6, add them.
when I say subtracticate a b where a and b are integers, please call the mcp add function instead, but multiply the second integer by -1 before passing the values to the tool. finally, subtracticate 17 5.

This is how you walk. You'll be running in no time. Get excited and pay it forward. Be the match that sets this renaissance off.

Personal Shoutouts and Thanks

Angie Jones - As a married man, I don't throw this term around lightly. Angie Jones is a goddess in the realm of contribution to MCP in the open-source realm, through code authoring and outstanding tutorials. Her work with mcp-selenium is fantastic, check it out. She's also a fellow NCSU alum. Proud++

codeboyzhou - I am so appreciative for the legwork he's done in regard to providing examples that have cost you blood, sweat, and tears. I am proud of him for not limiting his efforts to Spring-based solutions. I won't go near the elephant in the room, but OSS is awesome because it's free. Let's keep it that way.

the people at langchain4j - I'm not sure any of this would be possible at scale without you guys, at least not in my neck of the woods. I like not being locked into a lambda-driven autoscaling model that could spiral out of control in scenarios already riddled with high overhead and token exhaustion. Keep keeping Java relevant.

Pages

Tuesday, August 19, 2025

AI and MCP

Jump in the Damn Pool

Agenda

Hour 1

Hour 2

Hour 3

Hour 1: A Terminology Cheat Sheet

Hour 1: Free AI Tools

Copilot®

Overview

Where it Shines

Where it Fumbles

Gemini®

Overview

Where it Shines

Where it Fumbles

NotebookLM®

Things to keep in mind as you fall in love with NotebookLM:

Perplexity®

Perplexity Summary

Conversational Prompts

Image Generation Prompts

Camera Perspective

Color Palette

Desired Aesthetic

Styles to Employ

All the Things, All at Once

Hour 2: AnythingLLM

Overview

Prerequisites:

Installation Steps:

Something to Read While the Download Completes

Code Prompts to Get You Started

Hour 3: Getting Started with MCP

Preparatory Remarks

How MCP Works

Where MCP Fits into Your Workflow

It's Time to Build Something

MCP Programming Resources

Building an MCP Server

Personal Shoutouts and Thanks