Recent Posts

by John M Costa, III

GPT for better understanding my cognitive, emotional, and relational architecture

Overview

I recently came across a couple of posts on LinkedIn12 that got me thinking about how I can use GPT to better understand my cognitive, emotional, and relational architecture. The post discusses how GPT can be used to create a mind map of your life, which can help you identify patterns and connections in your thoughts and feelings.

I decided to give it a try and see what I could come up with. The results were fascinating and provided me with a new perspective on myself.

The prompt I used was:

I want to better understand my cognitive, emotional, and relational architecture - how my mind works, how I process the world, and what makes me thrive. Based on our conversations, patterns in how I think, and the way I talk through problems, generate a structured table that maps out my internal wiring.

Categories should include:

Guiding Force
Core Personality
Superpowers
Growth Edges
Cognitive Architecture
Emotional Architecture
Relational Architecture

For each one, include a short description (how I experience it) and a ‘How It Shows Up’ column (real-life traits or behaviors that others might notice).

Make it sound like me. Use natural language, vivid but clear phrasing, and focus on insight, not fluff.

Results

img.png

mind-map.png

Curious what you think! I found this exercise to be incredibly insightful and a great way to reflect on my personal development journey. If you’re interested in trying it out, I encourage you to give it a go and see what insights you can uncover about yourself.

References

by John M Costa, III

Book Notes: Hands On Large Language Models: Language Understanding and Generation

Overview

This post contains my notes on the book *Hands-On Large Language Models: Language Understanding and Generation.

You can find the book on Amazon

I’ll be adding my notes to this post as I read through the book. The notes will be organized by chapter and will include key concepts, code examples, and any additional insights I find useful.

Chapter 1: An Introduction to Large Language Models

Chapter 1 introduces the reader to the recent history of Large Language Models (LLMs). The diagrams in this chapter are particularly useful for understanding the evolution of LLMs and how they relate to other AI technologies. It’s a great segway from the machine learning and neural networks covered in the previous book I’m reading in parallel.

Chapter 2: Tokens and Embeddings

Chapter 2 introduces the concept of tokens and embeddings.

Tokens are the basic units of text that LLMs use to process and generate language. The chapter covers a number of LLM Tokenizers including BERT (cased and uncased), GPD-2, FLAN-T5, StarCoder2, and a few others. It provides details on how each tokenizer works and how they differ from one another.

Token embeddings are numerical representations of tokens that capture their semantic meaning. Embeddings can be used to represent sentences, paragraphs, or even entire documents. Further, embeddings can be used in Recommendation Systems. The chapter covers a song recommendation system that uses embeddings to recommend songs based on a song input by the user.

Chapter 3: Looking Inside Large Language Models

Note: This chapter contains a number of useful diagrams that I’ve described in my own representation. However, the diagrams are not reproduced in their entirety. Please refer to the book for the complete diagrams and explanations.

Chapter 3 takes a deeper dive into the architecture of LLMs. We start out with a view into the Inputs and Outputs of Trained Transformer LLMs. This might be an overly simplified view, but it helps to understand the basic flow of data through an LLM.

transformer.highlevel.drawio.png

The transformer generates a single output token at a time, using the previous tokens as context. This is known as an autoregressive model.

Diving a little deeper, we learn about the Transformer architecture. It’s composed of a Tokenizer, a stack of Transformer blocks, and an LM Head.

transformer.components.drawio.png

Going further, the tokenizer breaks down the input text into tokens and becomes a token vocabulary. The set of transformer blocks have token embeddings based on the token vocabulary. The LM head is a neural network layer that contains token probabilities for each token in the vocabulary.

transformer.forwardpass.drawio.png

Greedy decoding is when the model selects the token with the highest probability at each step.

It is possible to process in parallel multiple input tokens and the amount of tokens that can be processed at once is referred to as the context size. Keep in mind that embeddings are not the same as tokens, but rather a numerical representation of tokens that captures their semantic meaning.

transformer-Page-2.processing-stream.drawio.png

Keep in mind that only the last token in the sequence is used to generate the next token. Even though this is the case, the processing stream results are parallelized and can be cashed to improve efficiency.

Digging even deeper, we learn about the Transformer blocks. Each block consists of a self-attention mechanism and a feed-forward neural network.

transformer-Page-3.drawio.png

The feed-forward neural network is the source of learned information that enables the model to generate coherent text.

Attention is a key mechanism in LLMs that allows the model to focus on specific parts of the input sequence when generating text.

transformer-Page-3.simple-self-attention.drawio.png

Taking a look into attention more closely, we see that we are getting to the core of how LLMs work.

Note: The book mentions projection matrices which are shown in the diagram below. However, it doesn’t explain them in detail. If you’re interested in understanding projection matrices, a good resource that I found helpful is this article. The content appears to be a summarization of the original paper on self-attention, Attention Is All You Need. Another good resource is The Illustrated Transformer

transformer-Page-3.relevance-scoring.drawio.png

Using the queries and keys, the model calculates relevance scores for each token in the input sequence. The scores are then multiplied by the values to produce the output vectors.

Attention is a powerful mechanism that allows the model to weigh the importance of different tokens in the input sequence when generating text. Newer LLMs have available a more efficient attention mechanism called sparse attention, which can be strided or fixed. These attention mechanisms use fewer input tokens as context for self-attention.

Additionally, there are other attention mechanisms such as:

  • Grouped Query Attention (GQA)
  • Multi-Head Attention
  • Flash Attention

Chapter 4: Text Classification

The goal of text classification is to assign a label to a piece of text based on its content. Classification can be used for a variety of tasks, such as:

  • sentiment analysis
  • topic classification
  • spam detection
  • intent detection
  • detecting language

Techniques:

  • Text Classification with Representation Models
  • Text Classification with Generative Models

Text Classification with Representation Models

How it works:

  1. Based models are fine-tuned for specific tasks, like classification or embeddings.

transformer-Page-4.fine-tuning.drawio.png

  1. The models are fed inputs and outputs specific to the task are generated.

transformer-Page-4.fine-tuning.drawio.png

There are some suggestions for models that are good for text classification:

When looking to generate embeddings the MTEB Leaderboard is a good resource to find models.

To evaluate the performance of classification models that have labeled data, we can use metrics such as accuracy, precision, recall, and F1 score.

Zero-shot classification is a technique that allows us to classify text without any labeled data.

Using the cosine similarity function, we can compare the embeddings of the input text to the embeddings of the labels.

Text Classification with Generative Models

Prompt engineering is the process of designing prompts that can effectively elicit the desired response from a generative model.

The T5 model is similar to the original Transformer architecture, using an encoder-decoder structure.

OpenAI’s GPT model training process is published here: https://openai.com/index/chatgpt/

Chapter 5: Text Clustering and Topic Modeling

Text clustering is the process of grouping similar pieces of text together based on their content, yeilding clusters of symantically similar text.

Text clustering can be used for topic modeling, which is the process of identifying the main topics in a collection of text.

The book example uses ArXiv papers as the text corpus.

Common Pipeline for Text Clustering:

  1. convert input documents -> embeddings w/ embedding model
  2. Reduce dimensionality w/ dimensionality reduction model
  3. find groups of documents w/ cluster model

Dimensionality Reduction

There are well known method for dimensionality reduction including:

  • Principal Component Analysis (PCA)
  • Uniform Manifold Approximation and Projection (UMAP)

Clustering Algorithms

An example of clustering algorithms include:

  • Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN)

Visualization of clusters can be done using tools like Matplotlib.

BERTopic: Modular Topic Modeling Framework

  1. Follow the same procedure in text clustering to generate clusters
  2. Model distribution over words, bag of words - use frequency of words in each cluster to identify topics
  3. Use class-based term frequency inverse document frequency (c-TF-IDF) to identify words that are unique to each cluster

A full pipeline for topic modeling using BERTopic:

clusteringtopic representationreranking
sbert -> umap -> hdbscancount vectorizaton -> c-TF-IDFreprentation model
embed docs -> reduce dim -> cluster docstokenize words -> weight wordsfine tune representation

BERTopics can be used like Legos to build custom pipelines.

Chapter 6: Prompt Engineering

Basics of using text generation model

  • Select a model considering:
    • opensource vs proprietary
    • output control
  1. Choose opensource or proprietary model

suggestion: start with a small foundational model

  1. Load the model

  2. Control the output

  • set do_Sample=True to use temperature and top_p
  1. Tune Temperature and top_p for the use case

Intro to prompt engineering

Ingredients of a good prompt:

  • When no instructions are given, the model will try to predict the next word based on the input text.
  • Two components of basic instructions:
    1. Task description
    2. Input text (data)
  • Extending the prompt with output indicator allows for specific output

Use cases for instruction based prompts:

  • Supervised classification
  • Search
  • Summarization
  • Code generation
  • Named entity recognition

Techniques for improving prompts:

  • Specificity
  • Hallucination mitigation
  • Order

Complex prompt components:

  • Persona
  • Instruction
  • Context
  • Format
  • Audience
  • Tone
  • Data

In-context learning:

  • Zero-shot learning
  • One-shot learning
  • Few-shot learning

Chain prompting:

  • Break the task into smaller sub-tasks and use the output of one prompt as the input to the next prompt.
  • Useful for:
    • Response validation
    • Parallel prompts
    • Writing stories

Reasoning with Generative Models

Chain of thought:

  • Prompt the model to think step-by-step

Self-consistency:

  • using the same prompt multiple times to generate multiple responses
  • works best with temperature and top_p sampling

Tree of thought:

  • useful when needing to explore multiple paths to a solution
  • ask the model to mimic multiple agents working together to solve a problem
  • question each other until they reach a consensus

Output Verification

  • Useful for:
    • Structured output
    • Valid output
    • ethics
    • accuracy

Techniques:

  • Provide examples of valid output

Grammar: constrained sampling

  • use packages for:
    • Guidance
    • Guardrails
    • LMQL

Taxonomy

  • Accuracy: A metric used to evaluate the performance of classification models, measuring the proportion of correct predictions.
  • Attention: A mechanism that allows models to focus on specific parts of the input sequence, improving context understanding.
  • Autoregressive Models: Models that generate text by predicting the next token in a sequence based on the previous tokens.
  • Bag of Words (BoW): A simple representation of text that ignores grammar and word order but keeps track of word frequency.
  • BERT (Bidirectional Encoder Representations from Transformers): A pre-trained model that uses a Transformer architecture to understand the context of words in a sentence.
  • Byte Tokens: A tokenization scheme that represents text as a sequence of bytes, allowing for a more compact representation.
  • Character Tokens: A tokenization scheme where each token represents a single character, useful for languages with complex morphology.
  • Context Size: The number of tokens the model can consider at once when generating text, affecting its ability to maintain coherence.
  • Embeddings: Numerical representations of words or tokens that capture their semantic meaning and relationships.
  • F1 Score: A metric that combines precision and recall to evaluate the performance of classification models.
  • Feed-Forward Neural Network: A type of neural network where connections between nodes do not form cycles, used in Transformer blocks.
  • Flash Attention: An efficient attention mechanism that reduces memory usage and speeds up computation in LLMs.
  • GPT (Generative Pre-trained Transformer): A type of LLM that is pre-trained on a large corpus of text and can generate coherent text based on a given prompt.
  • Greedy Decoding: A text generation strategy where the model selects the token with the highest probability at each step.
  • Grouped Query Attention (GQA): An attention mechanism that uses a single set of keys and values for multiple queries, improving efficiency.
  • Inverse Document Frequency (IDF): is a measure of how important a word is to a document in a collection of documents.
  • Large Language Models (LLMs): A type of AI model that is trained on large datasets to understand and generate human language.
  • LM Head: The final layer of an LLM that generates the output tokens based on the processed input.
  • Multi-Head Attention: An attention mechanism that allows the model to focus on different parts of the input sequence simultaneously.
  • Output vectors: The numerical representations of the output tokens generated by the LLM.
  • Parallel Processing: The ability to process multiple input tokens simultaneously, improving efficiency.
  • Precision: The numerical accuracy of the computations performed by the model, affecting its performance and resource usage.
  • Recall: A metric used to evaluate the performance of classification models, measuring the ability to identify all relevant instances.
  • Self-Attention: A mechanism that allows the model to weigh the importance of different tokens in the input sequence when generating text.
  • Sparse Attention: An efficient attention mechanism that uses fewer input tokens as context for self-attention, reducing computational complexity.
  • Subword Tokens: A tokenization scheme where tokens can represent parts of words, allowing for better handling of rare or unknown words.
  • T5 model: Text-To-Text Transfer Transformer, a model that converts all NLP tasks into a text-to-text format.
  • Temperature: A parameter that controls the randomness of the model’s output, with higher values leading to more diverse text.
  • Tokenization: The process of breaking down text into smaller units (tokens) for processing by LLMs.
  • Token Embedding: The process of converting tokens into numerical vectors that capture their semantic meaning.
  • Token Probabilities: The likelihood of each token in the vocabulary being the next token in a sequence, used for text generation.
  • Top-p Sampling (Nucleus Sampling): A text generation strategy that selects tokens from the smallest set whose cumulative probability exceeds a threshold p, allowing for more diverse outputs.
  • Transformer: A neural network architecture that uses self-attention mechanisms to process sequences of data, widely used in LLMs.
  • Transformer Blocks: The building blocks of the Transformer architecture, consisting of layers of attention and feed-forward neural networks.
  • Trained Transformer LLMs: LLMs that have been trained on large datasets using the Transformer architecture, enabling them to understand and generate human language effectively.
  • Word Tokens: A tokenization scheme where each token represents a whole word.
  • word2vec: A technique that uses neural networks to learn word embeddings, capturing semantic relationships between words.

References:

by John M Costa, III

Book Notes: In this Economy? How Money & Markets Really Work

Overview

This post contains my notes on the book In this Economy? How money & Markes Really Work.

You can find the book on Amazon

I’ll be adding my notes to this post as I read through the book. The notes will be organized by chapter and will include key concepts, code examples, and any additional insights I find useful.

Chapter 1: The Economic Kingdom

Chapter 1 introduces the concept of the “Economic Kingdom,” which is a framework for understanding how money and markets operate in our society.

The chapter discusses the following key concepts:

  • Monetary Policy Castle: Manages the entire kingdom and is owned by the Federal Reserve. Generally in charge of inflation and the labor market.
  • US Dollar Castle: Secret weapon of Monetary Policy Castle.
  • The Commodity Castle: Basic goods used by everyone.

Contractionary monetary policy: things slow down, people don’t spend, people loose jobs Expansionary monetary policy: things speed up, people spend, people get jobs

Chapter 2: The Vibe Economy

Chapter 2 begins by explaining the relationship of households and businesses and the flow of resources.

Note: The book offers a diagram that illustrates the flow of money between households and businesses.

Households: Inputs: - Income - Buying Goods & Services Outputs: - Spending - Land, Labor, Capital

Businesses:
    Inputs:
        - Resources
        - Revenue
    Outputs:
        - Selling Goods & Services
        - Wages, Rent, Profit

Fuel Vibes:

Fuel: As gasoline prices rise, consumer sentiment tends to decrease.

Higher prices make us feel bad => if we feel bad, the economy feels bad

Oil Market influenced by: - how much oil is being produced

  • how many people are consuming oil

Sentiment drives the economy:

  • If people feel good, they spend more money
  • If people feel bad, they spend less money

Sentiment is influenced by emotions. These emotions are summarized in varius theories:

  • Prospect Theory
  • Framing Effect
  • Anchoring and Adjustment
  • Endowment Effect
  • Regret Theory
  • Intertemporal Choice Theory
  • Affective Forcasting
  • Social Identity Theory
  • Consumer Emotional Engagement

Reflexivity: The idea that the economy is influenced by people’s perceptions and beliefs about it, which can create a feedback loop.

  1. Initial Perception/Believe
  2. Market Action Based on Perception/Believe
  3. Refliexive Feedback Loop
  4. Bubble Bursts

Benefits of reflexivity to companies:

  1. attracts investments based on expectations of future growth
  2. gain critical edge in hiring top talent

What is sentiment?

Expectations Theory Reality

How we feel X How everyone else feels = Sentiment

Consumer Confidence Survey is an example of consumer sentiment measurement

Chapter 3: The Weird World of Money

What is Money?

  • A store of value
  • A unit of account
  • A medium of exchange

Evolution of money:

  • Growth of societies and contact with other civiliations led to the need for a common medium of exchange.
  • Barter System: Direct exchange of goods and services without using money.
  • Example: Is a bag of grain worth 15 cows or 10 cows?
  • Coins solved the problem of barter by providing a standardized medium of exchange.

American Currency:

1700s: American colonies depend on European currencies (piece of eight) 1775: Continental Congress issues Continental Currency to fund the Revolutionary War 1785: Due to inflation, the Continental Currency is replaced by the US dollar

Hamilton’s Financial Plan:

  • a federal bank that provides credit to the government and businesses
  • issue a stable national currency
  • safe place to store money

1791: First Bank of the United States is established 1811: First Bank of the United States charter expires, states issue their own currencies 1816: Second Bank of the United States is established Wildcat Banking Era: state-chartered banks issue their own currencies, leading to instability and bank failures 1863: National Banking Act creates a system of national banks and a national currency, only national banks can issue currency 1971: US dollar is no longer backed by gold, leading to fiat currency

The federal reserve enforces the promise and collective trus in the US dollar.

Chapter 4: The Mechanics of Modern Money

Banks:

  • Gatekeepers of money
  • Money business model

US Government:

  • Creator of money
  • Facilitator of what banks do

How is money created?

  1. Issuing coins and notes
  2. Through credit markets: issuance of government bonds

Banking Blueprint:

Built on:

  • trust
  • ability to borrow short (customer deposits)
  • lend long (loans)

Fractional Reserve Banking: Banks are allowed to keep a fraction of deposits as reserves and lend out the rest.

  • Banks use money, invest in securities, and make loans to earn interest.

Art of Lending: Banks hold a balance sheet of assets and liabilities.

  • Assets: loans, securities, reserves
  • Liabilities: customer deposits, borrowings
  • Net Worth: assets - liabilities

Excess reserves: reserves held by banks above the required minimum.

Hedging: Banks use various strategies to manage risks, like interest rate fluctuations. Example:

  • Interest rate swap: an agreement between two parties to exchange interest payments on a specified principal amount.

How banks fail:

insolvency: liabilities exceed assets illiquidity: unable to meet short-term obligations

Examples:

  • Silicon Valley Bank (SVB): no hedges to protect against downside risk of rising interest rates
  • 2008 Financial Crisis: use of Collateralized Debt Obligations (CDOs) and mortgage-backed securities (MBS) led to widespread defaults and bank failures.

Dollar’s Reign: Countries worldwide use the US dollar as a reserve currency.

  • the dollar’s value goes up when there is uncertainty in the global economy as more people want to hold dollars.
  • the dollar can be an inflation hedge as it tends to hold its value better than other currencies during inflationary periods.

Structure:

  • least nasty alternative
  • surplus and deficit national economies
  • balance of payments

Chapter 5: Supply and Demand

low demand: no one wants it, price goes down high demand: everyone wants it, price goes up

Factors that influence demand:

  • supply chain (issues)
    • example: semiconductor shortage
    • example: used car prices during the pandemic
    • example: egg shortage during the pandemic

presumed infinity of resources

Chapter 6: GDP and the Economy

Measuring GDP:

  • GDP = C + I + G + (X - M)
    • C = Consumer Spending
    • I = Investment Spending
    • G = Government Spending
    • X = Exports
    • M = Imports

Consumption:

What people buy fueled by:

  • income
  • borrowing (credit cards)
  • Savings

Investment: spending $$ for economic benefit over time Government Purchases: goods and services that government buys Net Exports: exports - imports

Nominal vs Real GDP: includes inflation vs adjusted for inflation

Increased borrowing can lead to increased nominal GDP.

Limitations of GDP:

  • does not account for health and happiness
  • does not account for environmental degradation

Real GDP per capita: total gdp / population

Productivity: ratio of output to input

Alternatives to GDP:

  • degrowth
  • ecological econmics
  • postgrowth

Future of GDP:

  • Big Growth Policies probably won’t work forever
  • global challenges: climate change, income inequality, resource depletion rethinking of success; GDP may not be a holistic measure of well-being or social progress

Chapter 7: Commodities

What are commodities?

Commodities are raw materials used to create the building blocks of the economy.

Globalization + Free Trade & Comparative Advantage - Shipping issues & Floods & Fires

Oil: crude price is indicator of economy’s health Gas: direct impact to consumers as oil prices rise and fall

However: relationship between oil prices and gas prices is not symmetric

Gas Prices: Rocket and Feather Effect

  • The market power of gas stations
  • Fear of price variability

Metals: Steel, Aluminum, Copper Supply issues in one can affect the others

Renewables:

  • We still need oil to build out EVs, solar panels, and nuclear reactors
  • renewable isn’t just about tech
  • problems to solve:
    • ease trade barriers
    • sharing tech in energy storage
    • commit to collective action

AI and Commodities:

  • AI can help manage crops
  • predict and locate resources

AI’s computational power is linked to raw material

Symbiotic Relationship

Taxonomy

  • Fiat Money: currency that is not backed by a physical commodity, such as gold or silver, but rather by the government that issues it.
  • Inflation: rise in prices that create a decrease in purchasing power.
  • OPEC: Organization of the Petroleum Exporting Countries, a group of oil-producing countries that coordinate their oil production and pricing.
  • Sentiment: the overall attitude or feeling of consumers and businesses towards the economy, which can influence spending and investment decisions.
  • Prospect Theory: a behavioral economic theory that describes how people make decisions based on perceived gains and losses rather than absolute outcomes.
  • Framing Effect: a cognitive bias where people react differently to the same information depending on how it is presented or framed.
  • Anchoring and Adjustment: a cognitive bias where people rely too heavily on the first piece of information they receive (the “anchor”) when making decisions, and then make adjustments based on that anchor.
  • Endowment Effect: a cognitive bias where people value something more highly simply because they own it, leading to a reluctance to sell or trade it.
  • Regret Theory: a behavioral economic theory that suggests people anticipate regret when making decisions and may avoid choices that could lead to regret, even if those choices are rational.
  • Intertemporal Choice Theory: a behavioral economic theory that examines how people make decisions about trade-offs between immediate and delayed rewards, often leading to preferences for immediate gratification over long-term benefits.
  • Affective Forcasting: a cognitive bias where people predict their future emotional states based on current feelings, which can lead to inaccurate predictions about how they will feel in the future.
  • Social Identity Theory: a psychological theory that explains how individuals derive their self-concept and identity from their group memberships, influencing their behavior and decision-making.
  • Consumer Emotional Engagement: the emotional connection and involvement that consumers have with a brand or product, which can influence their purchasing decisions and loyalty.
  • Comparative Advantage: the ability of a country to produce a good or service at a lower opportunity cost than another country.

References:

by John M Costa, III

Book Notes: AI and Machine Learning for Coders - A Programmer's Guide to Artificial Intelligence

Overview

This post contains my notes on the book AI and Machine Learning for Coders - A Programmer’s Guide to Artificial Intelligence by Laurence Moroney[^1]. The book is a practical guide to building AI and machine learning applications using TensorFlow and Keras. It covers the basics of machine learning, deep learning, and neural networks, and provides hands-on examples of how to build and deploy AI applications.

You can find the book on Amazon

I’ll be adding my notes to this post as I read through the book. The notes will be organized by chapter and will include key concepts, code examples, and any additional insights I find useful.

Chapter 1: Introduction to tensorflow

In chapter 1 you’ll learn about the limitations of traditional programming and get some initial insight into Machine Learning. You’ll learn how to install tensorflow, in my case on a M2 Mac.

I searched around a little to see what common issues occurred for mac installations and came across a handy blog that covered different tensorflow installation options and included a handy script to verify tensorflow was indeed using my gpu. After a brief hiccup I added the following packages to my installation, re-ran the script and was on my way.

pip install tensorflow
pip install tensorflow-macos tensorflow-metal

At the end of this chapter you’ll build and train your first model, a simple linear regression model that predicts the output of a linear equation.

Note: Using the coding samples located in GitHub will make following along really easy. https://github.com/lmoroney/tfbook

Chapter 2: Introduction to Computer Vision

Using the Fashion MNIST dataset, chapter 2 introduces the reader to Neural Network design. Using a real, but simiple dataset, you’ll learn how to build a neural network that can classify images of clothing.

Chapter 3: Convolutional Neural Networks

In chapter three, you’ll explore Convolutional Neural Networks using images of humans and horses. You’ll use training and validation data to build up a model as well as learn about image augmentation to broaden the data and reduce overspecialization. Additional concepts introduced include Transferred Learning, Multiclass Classification, and Dropout Regularization.

Chapter 4: Using Public Datasets with TensorFlow Datasets

Using the TensorFlow Datasets library, chapter 4 introduces the reader to ETL which is a core pattern for training data. The chapter covers a practical example as well as how to use parallelization ETL to speed up the process.

Chapter 5: Natural Language Processing

Chapter 5 introduces the reader to tokenization, taking text and breaking it down into smaller units (tokens) for processing. It covers basics like Turning sentences into tokens, padding sequences, as well as more advanced techniques like removing stop words, and text cleaning.

The examples in this chapter use the IMDB, emotional sentiment, and scarcasim classification datasets as examples for building datasets from html like data, csv files, and json.

Chapter 6: Making Sentiment Programmable Using Embeddings

Chapter 6 uses a Sarcasm dataset to introduce the reader to word embeddings. Words are given a numerical representation of positive numbers that represent Sarcasm and negative numbers that represent realistic statements. Sentences could then be represented as a series of these numbers and evaluated for a Sarcasm score.

The example in this chapter begins to analyze accuracy and loss against the training and validation datasets. This can help identify overfitting, which is when the model becomes overspecialized to the training data.

accuracy-vs-val_accuracy.png loss-vs-val_loss.png

The example goes on to cover various techniques to improve the model. These include:

  • Adjusting the learning rate
  • Adjusting the vocabulary size
  • Adjusting the embedding dimension

When these techniques are applied, finer tuning of the model can be achieved by:

  • Using dropout
  • Using regularization

Chapter 7: Recurrent Neural Networks for Natural Language Processing

Chapter 7 introduces Recurrent Neural Networks (RNNs) and how they can be used for Natural Language Processing tasks.

They provide a diagram of a recurrent neuron which shows how a recurrent neuron is architected.

ai-ml.recurrent_neuron.drawio.png

Long Short-Term Memory (LSTM) is a type of RNN that is capable of learning long-term dependencies.

Bidirectional LSTMs are a type of LSTM that can process data in both directions, which can be useful for tasks like sentiment analysis where context from both the past and future can be important.

The exercise in this chapter reuses the exercise from chapter 6 and introduces stacked LSTMs.

Overtraining occurs in the example, so optimization techniques are applied to improve the model. These include:

  • Adjust the learning rate
  • Using dropout in the LSTM layers

A second example in this chapter uses pretrained embeddings, the GloVe set. In this second example, after the model is downloaded, an exercise to determine how many of the words in the corpus are in the GloVe vocabulary is performed.

Chapter 8: Using TensorFlow to Create Text

This chapter starts out with an example of tokenizing text and creating a word index. Then a model is built so that it can be trained.

With a trained model it can be used to predict the next word in a seqence. We can then use seed text to generate a token and test the model. We can then repeat this process to generate mode text with alternate seed text.

Using the same process we can use a differet, larger dataset to generate more complex text. The example adjusts the model slightly by adding an additional LSTM layer and increasing the number of epochs.

Chapter 9: Understanding Sequence and Time Series Data

Common attributes of time series data include:

  • Trend
  • Seasonality
  • Autocorrelation
  • Noise

Techniques for predicting time series data include:

  • Naive Prediction to create a baseline

Measuring Prediction accuracy:

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)

Less Naive: Using Moving Averages

Chapter 10: Creating ML Models to Predict Seqeuences

Windowed datasets can be used to emulate a time series dataset. The example in this chapter uses a window size of 5 to predict the next value in the sequence.

It goes through the process of creating a windowed version of time series data, then building and training a model.

Tuning the model can be done through the use of Keras Tuner.

Taxonomy

  • Autocorrelation: The correlation of a signal with a delayed copy of itself as a function of delay.
  • Convolution: Mathematical filter that works on the pixels of an image.
  • Dropout: A regularization technique that randomly sets a fraction of input units to 0 at each update during training time, which helps prevent overfitting.
  • Embedding Dimension: The size of the vector representation of each word in the vocabulary.
  • Learning Rate: A hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated.
  • Long Short-Term Memory (LSTM): A type of recurrent neural network (RNN) architecture that is capable of learning long-term dependencies.
  • Natural Language Processing (NLP): A field of AI that focuses on the interaction between computers and humans through natural language.
  • Noise: Random variation in data that does not contain useful information.
  • Out of Vocabulary (OOV): Words that are not present in the training vocabulary.
  • Overfitting: When the model becomes overspecialized to the training data.
  • Padding: Adding zeros to the beginning or end of a sequence to make it a fixed length.
  • Regularization: Techniques used to prevent overfitting by adding a penalty to the loss function based on the complexity of the model.
  • Seasonality: A pattern that repeats at regular intervals in the data.
  • Stop Words: Common words that are often removed from text data to reduce noise and improve model performance.
  • Tokenization: The process of breaking down text into smaller units (tokens) for processing.
  • Transfer Learning: Taking layers from another architecture.
  • Trend: A long-term increase or decrease in the data.
  • Vocabulary Size: The number of unique words in the training dataset.

References:

by John M Costa, III

Pragmatism for decision-making in Software Development

Overview

This post discusses pragmatism as a tool in Software Development. I consider myself to be pragmatic in my approach to software engineering, and wanted to explore the concept a little more.

What is pragmatism?

Pragmatism1 is a philosophical tradition that began in the United States around 1870. It is a way of thinking that focuses on the practical consequences of actions rather than on abstract principles. Pragmatists believe that the truth of an idea is determined by its usefulness in solving real-world problems. They are concerned with what works, rather than with what is theoretically correct.

Pragmatism in software development

Pragmatism can be a powerful tool in software development. Software projects can be complex, with lots of moving parts, and there’s lots of opportunity for things to go wrong, extending timelines and risking value delivery. Pragmatism can help you make decisions that are practical and effective, rather than getting bogged down in theoretical debates.

Some goals for pragmatism in software development can be distilled into the following:

  • deliver value to the customer (users and/or organization)
  • maximize stakeholder satisfaction
  • minimize stakeholder dissatisfaction

Looking around for inspiration

I looked around to see if there’s any existing work on pragmatism in software development. I found a few interesting papers, articles, and books that I wanted to share.

Optimization in Software Engineering - A Pragmatic Approach

Guenther Ruhe2 published a paper on Optimization in Software Engineering - A Pragmatic Approach3. The paper takes a process-based approach and includes a checklist for performing the optimization process. This process includes:

  • Scoping and Problem analysis
  • Modeling and Problem formulation
  • Solution Design
  • Data collection
  • Optimization
  • Validation
  • Implementation
  • Evaluation

The following describes each step in the process:

Scoping and Problem analysis

The first and most obvious step in the process is to ask if the problem can be solved easily. Easily solved problems help sidestep the need for additional time investment. When looking for an easy solution, consider alternatives as well.

Understand the stakeholders and decision makers around the problem. How important is this to them? How much time and effort can be invested in solving the problem? What’s the budget associated to solving the problem, considering both time and money?

Which approach best aligns with the business objectives and how would optimization benefit the problem context?

Modeling and Problem formulation

Depending on the complexity of the problem, work to break it down into smaller, more manageable parts. Identify key variables, constraints, and any dependencies.

Model project resourcing, budget and time for each phase of the project.

Identify technological constraints and dependencies.

Solution Design

Is there a solution already available that can be used as a baseline? If so, how does the proposed solution compare to the baseline?

What are the perceived expectations for the optimized approach?

Data collection

What data is available and what data is needed to solve the problem?

How reliable is the data? Is there a need for data cleaning?

Optimization

What settings are made and why? How do they vary?

Validation

What are the criteria for validation? How are they measured?

Do the stakeholders agree with the proposed solution?

Implementation

Is there anything that needs to be adjusted?

Evaluation

How much does the implemented solution solve the original problem and is acceptable by the stakeholders?

How much above the baseline does the implementation improve?

The Pragmatic Programmer

The Pragmatic Programmer is a book by Andrew Hunt and David Thomas. The book is a guide to software development that focuses on practical advice and best practices. The authors emphasize the importance of writing clean, maintainable code, and of being pragmatic in your approach to software development.

There are so many good nuggets in this book. If this isn’t already on your bookshelf, I highly recommend it.

Some of my favorites nuggets include:

  • Don’t live with broken windows
  • Good-Enough Software
  • Remember the bigger picture
  • Prototype to learn

Further reading on Pragmatism

Wrapping it up

I’m still exploring the concept of pragmatism in software development. Going beyond just pragmatism, I’m also interested in how to be a better product engineer4 while practicing pragmatism. I’m looking forward to sharing more on this topic in the future.

by John M Costa, III

Introducing RFCs to Share Ideas

Overview

There are a lot of positive benefits of being on a remote team. Finding ways to connect with your team and build relationships is important. One way to do this is to share your ideas and have discussions about various design topics. This is a great way to learn from your peers and to share your knowledge with them. It’s also a great way to build trust and a sense of community through the activity of writing and healthy discussion with your peers.

In addition to connecting with your team and building relationships, sharing your ideas is a great way towards building a writing culture. Writing is a great way to document your thought process and to share it with others. For me, writing is a way to become more concise with my thoughts and to be more clear in my communication.

One approach to sharing ideas is to write an RFC (Request for Comments). This is a document that outlines a problem and a proposed solution. It’s a great way to get feedback on your ideas and to build consensus around them.

What is an RFC?

Per Wikipedia1:

In 1969, Steve Croker invented the RFC system. He did this to create a way to record unofficial notes on the development of ARPANET. RFCs have since become official documents of internet specifications, communications protocols, procedures, and events.

There are so many great resources on how to write an RFC. This post from the Pragmatic Engineer is a great place to start and lists a lot of great resources on the topic.

I’ve come to know RFCs as a templated format for sharing ideas and seeking consensus. They range from very formal to informal. They can be used for a variety of things, such as proposing a new feature, discussing a problem, or documenting a decision.

How to Write an RFC

There are plenty of resources on how to write an RFC as they’ve been around for a while. Here are a few different formats I’ve come across and am interested in learning more about and trying:

Keep Track of Your RFCs

Keeping track of the status for each RFC is important. This could be as simple as a spreadsheet, a more formal system like GitHub issues. The idea is to have a way to track the status of each RFC and to make sure that they’re being reviewed and acted upon. Keep your RFCs organized and easy to find. This could be as simple as a folder in Google Drive or in a GitHub repository.

  • Draft
  • Review
  • Approved
  • Discarded
  • Deprecated

Sample RFC

Title: Using RFCs to Share Ideas
Authors:
John Costa

1 Executive Summary
The primary problem this RFC proposal is solving is how to arrive at a consensus. Documenting architecture decisions
would be done elseware.

2 Motivation
Often times there's a lot of great ideas that come up in discussions.  Unfortunately, these ideas never get documented
and are lost. Without a semi-formal process, it's easy for ideas to get lost. This is a great way to document your
thought process and to share it with others.

3 Proposed Implementation
The following proposal is a simplified version of a Request for Comment process based on the following published
resource. Much inspiration for this proposal and in some cases whole segments have been drawn from these resources:

* https://cwiki.apache.org/confluence/display/GEODE/Lightweight+RFC+Process
* https://philcalcado.com/2018/11/19/a_structured_rfc_process.html

Collaboration
Comments and feedback should be made in the RFC document itself. This way, all feedback is in one place and can be
easily referenced or referred to.

Authors must address all comments written by the deadline. This doesn't mean every comment and suggestion must be
accepted and incorporated, but they must be carefully read and responded to. Comments written after the deadline may be
addressed by the author, but they should be considered as a lower priority.

Every RFC has a lifecycle. The life cycle has the following phases:

* Draft: This is the initial state of the RFC, before the author(s) have started the discussion and are still working on the proposal.
* Review: This is the state where the RFC is being discussed and reviewed by the team.
* Approved: This is the state where the RFC has been approved and is ready to be implemented. It does not mean that the RFC is perfect, but that the team has reached a consensus that it is good enough to be implemented.
* Discarded: This is the state where the RFC has been discarded. This can happen for various reasons, such as the proposal being outdated, the team not reaching a consensus, or the proposal being too risky.
* Deprecated: This is the state where the RFC has been deprecated. This can happen when the proposal has been implemented and is no longer relevant, or when the proposal has been replaced by a better one.

Approval
The proposal should be posted with a date by which the author would like to see the approval decision to be made. How
much time is given to comment depends on the size and complexity of the proposed changes. Driving the actual decisions
should follow the lazy majority approach.

Blocking
If there are any blocking issues, the author should be able to escalate the issue to the team lead or the team. A block
should have a reason and, within a reasonable time frame, a solution should be proposed.

When to write an RFC?
Writing an RFC should be entirely voluntary. There is always the option of going straight to a pull request. However,
for larger changes, it might be wise to de-risk the risk of rejection of the pull request by first gathering input from
the team.

Immutability
Once approved the existing body of the RFC should remain immutable.

4 Metrics & Dashboards
There are no explicit metrics or dashboards for this proposal. The RFC process is a lightweight process that is meant to
be flexible and adaptable to the needs of the team.

5 Drawbacks
- Slow: The RFC process can take time
- Unpredictable: The rate of new RFCs is not controlled
- No backpressure: There is no mechanism to control the implementation of RFCs
- No explicit prioritization: RFCs are implicitly prioritized by teams, but this is not visible
- May clash with other processes: RFCs may not be needed for smaller things
- In corporate settings, the RFC process should have a decision-making process that is clear and transparent

6 Alternatives
- ADRs (Architecture Decision Records)
- Design Docs
- Hierarchical, democratic, or consensus-driven decision-making

7 Potential Impact and Dependencies
The desired impact of this proposal is to have a more structured way to share ideas and to build consensus around them.

8 Unresolved questions
- None

9 Conclusion
This RFC is a proposal for a lightweight RFC process and can be used for remote teams looking to build consensus around
ideas.

References

The following are some references that I’ve found useful in my research:

by John M Costa, III

Reviewing Code

Overview

Code reviews are a critical part of the software development process. They help to ensure that the code is of high quality, that it’s maintainable, and that it’s secure. They also help to ensure that the code is in line with the company’s goals and values. Code reviews are also a great way to learn from your peers and to share your knowledge with them.

Knowledge Sharing

The code review should be a learning opportunity for everyone involved, this could mean as part of the review or historically when looking back at motivations and decisions.

Higher Quality

The code review should ensure that the code is of high quality. This means that it should be free of errors and warnings, that it should run properly, and that it should accomplish the feature(s) it was designed to accomplish.

Better Maintainability

The code review should ensure that the code is maintainable. This means that it should be easy to read and understand, that it should be well-documented, and that it should follow coding and technical standards.

Increased Security

The code review should ensure that the code is secure. This means that it should be free of security vulnerabilities, that it should not introduce any new security vulnerabilities, and that it should follow security best practices.

Optimization Opportunities

The code review should consider if the code is efficient, not wasting resources, and is scaleable.

Assumptions

After the first pass through this blog post, I realized while writing this, there’s a few assumptions about the environment that I’m making.

One is that a version control system is being used and that the code is being reviewed in a pull request. This assumes healthy use of a version control system.

Another is that the code is being reviewed by teammates who you work closely with and that you trust to give and receive feedback and with positive intent.

Priorities

To follow the principles above, I try to review code with the following priorities in mind:

  1. Is the code functional?

The first thing I try to do is understand if it accomplishes the feature(s) it was designed to accomplish. As a reviewer, this could mean reading a README and running the code. When running the code, I try to capture not only the happy path but also the edge cases and error handling. As a submitter, this could mean providing these tools for the reviewer, ideally as unit tests and README documentation.

  1. Is the code clean and maintainable?

Secondly, I try to look at the code from cleanliness and maintainability perspective. To avoid as much subjectivity as possible, automated linters and static analysis tools should be used. In addition to these tools, the code should be well-documented, considering CSI (Comment Showing Intent)1 standards. The CSI Standard should exist alongside Self-Commenting2 Code practices, not instead of. The code should also have binaries and unnecessary cruft removed.

  1. Is the code secure?

Thirdly, I try to look at the code from a security perspective. Admittedly, this is an area I’m learning more about. With that said, I delegate much of this to automated tools which cover things like OWASP® Top 10 and CWE/SANS Top 25.

  1. Can the code be optimized?

Lastly, I try to look at the code from an optimization perspective. This means that the code should be efficient and not waste resources. It should also be scalable.

Design and architecture

Something I’ve been trying to do more of is using an RFCs (Request for Comments) ahead of writing code for larger changes. I think about the design and architecture of the code. This is a great way to get feedback on the design and approach well before the code is written. This is also a great way to get buy-in from the team on the approach.

Additional Considerations

Google’s Standard of Code Review mentions that the primary goal of the code review is to ensure that “the overall code health of Google’s codebase is improving over time”. This might be good for a big company like Google, but I feel that if you prioritize people over code, the code will naturally improve over time. This is why I like the idea of using code reviews as a learning and knowledge sharing opportunity.

Additionally, something that resonated with me from How to Do Code Reviews Like a Human (Part One), is that code reviews should be about the code, not the person. To help avoid some pitfalls use these techniques mentioned in the post:

  1. never say “you”
  2. Frame feedback as requests
  3. Tie notes to principles, not opinions.

Checklist

The following is a checklist that’s hopefully useful for pull requests. The idea is to use these to be consistent in process and should be applicable for both the openers and reviewers.

Checklist:

  • How

    • Does the code comply with the RFC (Request for Comments), if one exists?
    • Does the code accomplish the feature(s) it was designed to accomplish?
    • Is there documentation? README, CSI, Self-Commenting Code?
  • What

    • Are there tests? Do they cover the happy path, edge cases, and error handling?
    • Are linting and static analysis tools being used? If so, are they passing?
    • Are there any security vulnerabilities? Is the project up to date?
    • Are there any optimization opportunities?
      • Are there opportunities to reduce redundant code (DRY?
      • Does it follow SOLID principles?
  • Follow-Up/TODOs

    • Are there any follow-up items that could be addressed?
  • Feedback

    • Is the feedback framed as a request?
    • Is the feedback tied to principles, not opinions?
    • Does the feedback avoid using “you”?

References

by John M Costa, III

5x15 Reports to Advocate for the Work of Yourself, Project, or Team

Overview

I’ve found this less in smaller companies, but sometimes in larger companies, colleagues will take credit for the work of others. This is a toxic behavior that can lead to a lack of trust and a lack of collaboration. It’s important to recognize the work of others and to give credit where credit is due.

While this wouldn’t be the only reason for doing so, one of the solutions I’ve found to help erode the toxic behavior is to use 5x15 reports to advodacte for the work of one’s self, project, or team.

5x15

The 5x15 report is a weekly report that is sent to your manager. Yvon Chouinard, founder and CEO of outdoor equipment company Patagonia®, devised the 5-15 Report in the 1980s.1 As the name implies, it should take no longer than 5 minutes to read and no more than 15 minutes to write.

If you get a chance to read about Yvon Chouinard, you’ll find that he’s a very interesting person. He’s a rock climber, environmentalist, and a billionaire. He’s also the founder of Patagonia, a company that is known for its environmental advocacy.2

How to Write a 5x15 Report

The 5x15 report is a simple report that is sent to your manager. As mentioned, it should be no longer than 5 minutes to read and no more than 15 minutes to write. The report should include the following:

  1. Accomplishments: What you’ve accomplished in the past week.
  2. Priorities: What you plan to accomplish in the next week.
  3. Challenges: Any challenges you’re facing.
  4. Stats: Your personal stats for the week.

Accomplishments

The accomplishments section is where you can advocate for the work of yourself, your project, or your team. This is one of the most critical sections of the 5x15 report. It’s important to recognize the work of others and to give credit where credit is due, which includes crediting yourself, your project, or your team.

Depending on the size of your accomplishments, try to size them in terms of the impact they’ve had on the company. For example, if you’ve saved the company $100,000, then you should mention that in your report. If you don’t know the impact of your accomplishments, then you should try to find out. Perhaps put this in the challenges section of your report.

In addition to charting the impact of your accomplishments, you could also frame them in terms of the company’s goals and values. For example, if your company values Execution, then you could frame your accomplishments in terms of how they’ve helped the company execute on its goals. As an engineer, I sometimes forget about the soft skills that are required to be successful in the workplace. The 5x15 is a great way to highlight where you’ve used these soft skills to be successful.

Priorities

This should be a list of your priorities for the next week. This is a great way to set expectations with your manager and provide an opportunity to change the priorities if they’re not aligned with the company’s goals and values at that time.

In general, priorities shouldn’t change too much from week to week. If they do, then you should try to find out why.

Challenges

This isn’t a section to complain about your job. This is a section to highlight the challenges you’re facing and to provide an opportunity for your manager to help you overcome them. If you’re not facing any challenges, perhaps you’re not pushing yourself, project, or team. Try to provide potential solutions to the challenges you’re facing along with the challenges themselves. This will show that you’re proactive and that you’re thinking about how to overcome the challenges you’re facing without needing to be told what to do.

Stats

This is a section to provide your personal stats for the week. These stats should be meaningful to you and your manager and probably something like Energy, Credibility, Quality of Life, Skills, and Social Capital as resources. This should be your dashboard if you were to have one.3

Something new I’m thinking about trying is to include more insight into how I’m pacing myself. Stretching, Executing, Coasting could be three states of flow for the given week.4 Personally, I would find this useful to know if my reports where Stretching or Coasting as this would be a queue on whether they have additional capacity to take on more.

Follow-up

Once you’ve established the process of sending 5x15 reports, you should check in with your manager to see if they’re finding them useful. If they’re not, then you should try to find out why. If they are, then you should try to find out how you can make them more useful. Depending on the feedback, you might need to adjust the format or the content of the report.

Summary

The 5x15 report is a great way to communicate with your manager and can be used to set the agenda for your 1:1s. If it’s not already part of your company’s culture, then I would recommend trying to introduce it. It’s a great way to advocate for the work of yourself, your project, or your team.


  1. https://www.mindtools.com/aog8dj2/5-15-reports ↩︎

  2. https://en.wikipedia.org/wiki/Yvon_Chouinard ↩︎

  3. The Staff Engineer’s Path, Tanya Reilly, 2022, O’Reilly Media, Inc. p.121 ↩︎

  4. The Software Engineer’s Guidebook, Gergely Orosz, 2023, Pragmatic Engineer, p.32 ↩︎

by John M Costa, III

Kubernetes on DigitalOcean

Overview

Recently, I’ve been working on a project, a part of which is to deploy a Kubernetes cluster. I was hoping to document the process so that it could save some time for my future self and maybe others.

This post is the first in a series of posts which will document the process I went through to get a Kubernetes cluster up and running. In addition to documenting the process, I’ll be creating a repository which will contain the code I used to create the cluster. The repository is available here.

TLDR;

I’m using:

  • DigitalOcean to host my Kubernetes cluster
  • Terraform to manage the infrastructure
  • Spaces for object storage
  • tfenv to manage terraform versions
  • tgenv to manage terragrunt versions

Hosting Platform

Based on the cost estimates for what I was looking to do, I decided to go with DigitalOcean. I’ve used DigitalOcean in the past and have been happy with the service. I also like the simplicity of the platform and the user interface. More importantly, I like that they have a managed Kubernetes offering.

If you’d like to read more about the cost estimates for my project, you can read more about it here.

Kubernetes Cluster

Building up a kubernetes cluster is documented pretty thoroughly in the tutorials on DigitalOcean’s site1. After working through some of the setup steps, I realized that there could be a quicker way to get a cluster up and running using Terraform, by deferring the control plane setup to DigitalOcean. This would allow me to get a cluster up and running quickly, and then if it made sense I could work on automating the setup of the control plane later. It helps that they don’t charge for the control plane.

Infrastructure Management

Terraform is my go-to tool for infrastructure management. I’ve used it in the past to manage infrastructure on AWS, GCP, and DigitalOcean. Given my familiarity with the tool, I decided to use it to manage the infrastructure for my Kubernetes cluster.

Though there’s a kerfuffle with Hashicorp’s open source licencing2, I still decided to use Terraform, at least to start. I assume that there will be a migration path eventually to OpenToFu, but again I’d like to get up and running as fast as reasonable.

Spaces

One of the requirements to using terraform is that there needs to be a way to manage state of the remote objects. Keeping the state locally is not a good idea, as it can be lost or corrupted. Keeping the state in the cloud is a better.

Terraform keeps track of the state of the infrastructure it manages in a file, usualy named terraform.tfstate. This file is used to determine what changes need to be made to the infrastructure to bring it in line with the desired state.

Some resources already exist which walks through the setup34 of Spaces.

Spaces Setup

Digital Ocean has a pretty good tutorial on how to setup Spaces. I’ll walk through the steps I took to get it setup but if you’re new to DigitalOcean I’d recommend following their tutorial.5

As a quick overview, the steps are:

  1. Create a Space bucket in the console. This is typically a one time step depending on how you want to scale your projects. It’s as straighforward as setting the region and name of the space. I chose to use the default region of nyc3.

  2. Create a new Spaces Access Key and Secret. This is also a one time step assuming you back up your key. The access key is used to authenticate with the space.

Configuring Terraform to use Spaces

Once the space is set up, you’ll need to configure Terraform to use it. This is done by adding a backend configuration to the provider.tf file. The backend configuration tells Terraform where to store the state file. In this case, we’re telling Terraform to store the state file in the space we created earlier. A simple version of the configuration looks like this:

terraform {
  required_version = "~> v1.6.0"

  required_providers {
    digitalocean = {
      source = "digitalocean/digitalocean"
      version = "2.32.0"
    }
  }
}

variable "do_token" {}

provider "digitalocean" {
  token = var.do_token
  spaces_access_id  = "<access key>"
  spaces_secret_key = "<access key secret>"
}

In addition to the backend configuration, we also need to configure the DigitalOcean backend. The spaces access key and secret are used to authenticate with the space.

terraform {
    backend "s3" {
      key      = "<SPACES KEY>"
      bucket   = "<SPACES BUCKET>"
      region   = "nyc3"
      endpoints = { s3 = "https://nyc3.digitaloceanspaces.com" }

      encrypt                     = true

      # The following are currently required for Spaces
      # See: hashicorp/terraform#33983 and hashicorp/terraform#34086
      skip_region_validation      = true
      skip_credentials_validation = true
      skip_metadata_api_check     = true
      skip_requesting_account_id  = true
      skip_s3_checksum            = true
  }
}

Creating the cluster

Once the backend is configured, we can create the cluster. The cluster is created using the digitalocean_kubernetes_cluster resource. You’ll note that I’m glossing over some of the details in the configuration. I’ll go into more detail in a later post.

If you’re looking for a working example, you can find one in the terraform-digitalocean-kubernetes repository.

resource "digitalocean_kubernetes_cluster" "cluster" {
  name    = "<NAME>"
  region  = "<REGION>"
  version = "<VERSION>"

  # fixed node size
  node_pool {
    name       = "<POOL NAME>"
    size       = "<INSTANCE SIZE>"
    node_count = "<NODE COUNT>"
  }
}