Sunday, 26 April 2026
Oracle 26 AI New DB Parameter CALENDAR_FISCAL_YEAR_START
Some teams maintain calendar tables. Others push the logic into BI tools. In many cases, it ends up duplicated across ETL pipelines, reports, and applications. And sooner or later, something goes out of sync.
Oracle 26ai introduces a small but very practical fix for this: CALENDAR_FISCAL_YEAR_START.
Checking the Parameter
show parameter CALENDAR_FISCAL_YEAR_START
NAME TYPE VALUE
-------------------------- ------ -----
calendar_fiscal_year_start string
At this point it’s unset, which means Oracle is still operating on the standard calendar year.
Set the start of the fiscal year to June 1:
ALTER SESSION SET CALENDAR_FISCAL_YEAR_START = '01-JUN-2026', 'DD-MON-YYYY';
Only the month and day really matter, so this works as well:
ALTER SESSION SET CALENDAR_FISCAL_YEAR_START = '01-JUN', 'DD-MON';
Now let’s see how Oracle interprets dates once this is set.
Check June 15, 2026:
SELECT FISCAL_QUARTER('15-JUN-2026');
FISCAL_QUARTER
-------------
Q1-FY2027
And May 15, 2026:
SELECT FISCAL_QUARTER('15-MAY-2026');
FISCAL_QUARTER
-------------
Q4-FY2026
This is exactly how most organizations expect fiscal periods to behave when the year starts in June.
Why This Actually Matters:
This parameter removes a lot of quiet complexity that has been sitting in systems for years.
First, it cleans up SQL. You don’t need CASE statements or custom logic just to determine fiscal quarters. The database understands it natively.
Second, it brings consistency. Instead of every layer calculating fiscal periods differently, the logic lives in one place. That alone eliminates a lot of subtle reporting issues.
Third, it simplifies data pipelines. There’s no need to maintain fiscal calendar tables or transformation logic in ETL jobs. Less code, fewer moving parts, fewer things to break.
Saturday, 18 April 2026
What is Ollama Serve (REST API)
Running LLMs locally is becoming very common, and tools like Ollama make it extremely simple.
But one feature that really unlocks its power is ollama serve
This turns your local machine into a REST API server for AI models.
When we run:
ollama serve
It starts a local web server. This server allows other applications to talk to your AI models using HTTP requests.
Without serve → You manually run prompts in terminal
With serve → Your apps can call the model like an API
Default API Endpoint: Once the server starts
http://localhost:11434 becomes base URL.
Example API Call
Here’s a request:
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Explain cloud computing"
}'
Sample API Output (With Metrics):
{
"model": "llama3",
"created_at": "2026-04-18T12:10:00Z",
"response": "Cloud computing is the delivery of computing services over the internet...",
"done": true,
"total_duration": 2450000000,
"load_duration": 800000000,
"prompt_eval_count": 12,
"prompt_eval_duration": 200000000,
"eval_count": 65,
"eval_duration": 1450000000
}
Now let’s understand this response
Basic Response Fields
model ===> llama3
created_at ===> 2026-04-18T12:10:00Z
response ===> "Cloud computing is the delivery of computing services over the internet..."
done ===> true , Means response is complete, No more data coming
Performance Metrics:
1. total_duration ===> 2450000000 ns → ~2.45 seconds
This is the total time taken ==> From Request received To Final response sent
2. load_duration ===> 800000000 ns → ~0.8 seconds
Time taken to load the model into memory
This usually happens On first request ,When model is not already loaded
3. prompt_eval_count ===> 12 tokens
Number of tokens in your input
4. prompt_eval_duration ===> 200000000 ns → ~0.2 seconds
Time model spent reading your question
5. eval_count ===> 65 tokens
Number of tokens generated in response , This directly affects response size,Cost (in cloud scenarios) and Latency
6. eval_duration ===> 1450000000 ns → ~1.45 seconds
Time spent generating the response , This is Actual thinking + answering time
Friday, 17 April 2026
Understanding LLM Models: Basics That Help You Choose the Right One
LLMs are everywhere now. Every tool, every platform, every new feature seems to be powered by them.
But when it comes to actually choosing a model, things quickly get confusing.
You start seeing terms like parameters, quantization, context length… and it all feels a bit heavy
This blog will help you understand the key basics in a simple way.
Model Architecture – How the Model Thinks
At a high level, architecture is just how the model is designed to process information.
Most modern LLMs use something called a Transformer. You don’t need to go deep into it — just know this:
It helps the model understand relationships between words.
Instead of reading text word-by-word like old systems, it looks at the whole sentence and figures out what matters more.
That’s how it understands meaning, tone, and context.
Why should you care?
Because better architecture usually means:
More accurate responses
Better understanding of complex inputs
Smarter outputs overall
Parameters – How Big the Model Is
This is the one you’ll hear the most.
Parameters are basically the size of the model.
More parameters = more “learned knowledge”.
Think of it like this:
Small models are quick and efficient
Large models are more knowledgeable but heavier
But bigger isn’t always better.
Yes, large models can reason better and handle complex tasks.
But they also:
Cost more
Need more compute
Can be slower
So the real question is not “What’s the biggest model?”
It’s “What’s enough for my use case?”
Quantization – Making Models Practical
Quantization is simply a way to make models smaller and faster. Without it, most large language models would be too heavy to run outside of high-end infrastructure.
What “Quantization” Really Means
LLMs normally store weights in high precision like:
FP32 (32-bit float)
FP16 (16-bit float)
Quantization reduces that to:
8-bit (Q8)
6-bit (Q6)
5-bit (Q5)
4-bit (Q4)
So instead of each weight taking 16–32 bits, it might take just 4 bits.
Result:
Much smaller model size
Faster inference
Can run on CPU or smaller GPUs
But:
Slight loss in quality (depends on method)
And honestly, in many real-world cases, that quality drop is barely noticeable. Especially for things like chat, summaries, or general-purpose usage.
You’re basically making a smart trade:
a tiny bit of precision for a huge gain in usability
Where It Gets Slightly Confusing (But Important)
Once you start using quantized models, you’ll see names like:
Q4_0
Q4_1
Q4_K_M
Q4_K_S
At first, it looks like random naming. But there’s actually a simple idea behind it.
Q4 → means 4-bit quantization
The part after _ → tells you how the compression is done
Not All Q4 Are Equal
Older versions like:
Q4_0 → more aggressive, lower quality
Q4_1 → slightly better
Smarter Quantization (The K Family)
Q4_K_M
Q4_K_S
use better techniques (you’ll often see them in tools like llama.cpp).
Instead of compressing everything the same way, they:
Work in small blocks
Apply smarter scaling
Keep important information more intact
Same 4-bit size, but noticeably better quality.
Picking the Right One (Simple Rule)
Q4_K_M → best balance (default choice)
Q4_K_S → slightly faster, slightly less accurate
If you don’t want to overthink it, just go with Q4_K_M.
Context Length – How Much It Can Keep in Mind
Context length is like the model’s short-term memory.
It decides how much text the model can look at in one go.
Short context:
Faster
Cheaper
But forgets earlier parts quickly
Long context:
Can handle long documents
Better for conversations and analysis
Slightly more expensive
If your work involves long PDFs, logs, or conversations — this matters a lot.
Embedding Length – How Well It Understands Meaning
This one is less talked about, but very important.
Before a model understands text, it converts words into numbers. These are called embeddings.
Embedding length is just how detailed that representation is.
Higher dimension → richer understanding of meaning
This becomes critical when you're building things like:
Search systems
Recommendations
RAG (retrieval-based AI apps)
If your use case involves “finding similar things” — embeddings matter more than you think.
So, How Do You Choose?
Instead of chasing the biggest or newest model, think in terms of your actual need.
If you need deep reasoning → go for larger models
If you need speed and cost efficiency → smaller + quantized models
If you deal with long inputs → prioritize context length
If you're building search or RAG → focus on embedding quality
It’s always a trade-off. There’s no perfect model.