Sunday, 26 April 2026

Oracle 26 AI New DB Parameter CALENDAR_FISCAL_YEAR_START

Business doesn’t follow the January to December calendar, but the database does. Every time you build a report, a dashboard, or even a simple query, you end up rewriting the same fiscal logic again and again.
Some teams maintain calendar tables. Others push the logic into BI tools. In many cases, it ends up duplicated across ETL pipelines, reports, and applications. And sooner or later, something goes out of sync.

Oracle 26ai introduces a small but very practical fix for this: CALENDAR_FISCAL_YEAR_START.
Checking the Parameter
show parameter CALENDAR_FISCAL_YEAR_START
NAME                       TYPE   VALUE 
-------------------------- ------ ----- 
calendar_fiscal_year_start string       

At this point it’s unset, which means Oracle is still operating on the standard calendar year.

Set the start of the fiscal year to June 1:
ALTER SESSION SET CALENDAR_FISCAL_YEAR_START = '01-JUN-2026', 'DD-MON-YYYY';

Only the month and day really matter, so this works as well:
ALTER SESSION SET CALENDAR_FISCAL_YEAR_START = '01-JUN', 'DD-MON';

Now let’s see how Oracle interprets dates once this is set.
Check June 15, 2026:
SELECT FISCAL_QUARTER('15-JUN-2026');
FISCAL_QUARTER
-------------
Q1-FY2027

And May 15, 2026:
SELECT FISCAL_QUARTER('15-MAY-2026');
FISCAL_QUARTER
-------------
Q4-FY2026

This is exactly how most organizations expect fiscal periods to behave when the year starts in June.

Why This Actually Matters:
This parameter removes a lot of quiet complexity that has been sitting in systems for years.
First, it cleans up SQL. You don’t need CASE statements or custom logic just to determine fiscal quarters. The database understands it natively.
Second, it brings consistency. Instead of every layer calculating fiscal periods differently, the logic lives in one place. That alone eliminates a lot of subtle reporting issues.
Third, it simplifies data pipelines. There’s no need to maintain fiscal calendar tables or transformation logic in ETL jobs. Less code, fewer moving parts, fewer things to break.


Saturday, 18 April 2026

What is Ollama Serve (REST API)

Running LLMs locally is becoming very common, and tools like Ollama make it extremely simple.
But one feature that really unlocks its power is
ollama serve
This turns your local machine into a REST API server for AI models.

When we run:
ollama serve

It starts a local web server. This server allows other applications to talk to your AI models using HTTP requests.

Without serve → You manually run prompts in terminal
With serve → Your apps can call the model like an API

Default API Endpoint: Once the server starts
http://localhost:11434  becomes base URL.

Example API Call
Here’s a  request:
curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Explain cloud computing"
}'

Sample API Output (With Metrics):
{
  "model": "llama3",
  "created_at": "2026-04-18T12:10:00Z",
  "response": "Cloud computing is the delivery of computing services over the internet...",
  "done": true,

  "total_duration": 2450000000,
  "load_duration": 800000000,
  "prompt_eval_count": 12,
  "prompt_eval_duration": 200000000,
  "eval_count": 65,
  "eval_duration": 1450000000
}


Now let’s understand this response
Basic Response Fields
model  ===> llama3
created_at  ===>  2026-04-18T12:10:00Z
response    ===>  "Cloud computing is the delivery of computing services over the internet..."
done    ===>  true , Means response is complete, No more data coming

Performance Metrics:
1. total_duration ===>   2450000000 ns → ~2.45 seconds
This is the total time taken ==> From Request received To Final response sent

2. load_duration  ===> 800000000 ns → ~0.8 seconds
Time taken to load the model into memory 
This usually happens On first request ,When model is not already loaded

3. prompt_eval_count  ===>  12 tokens
Number of tokens in your input  

4. prompt_eval_duration   ===>  200000000 ns → ~0.2 seconds

Time model spent reading your question

5. eval_count    ===>  65 tokens

Number of tokens generated in response , This directly affects response size,Cost (in cloud scenarios) and Latency

6. eval_duration    ===>  1450000000 ns → ~1.45 seconds

Time spent generating the response , This is Actual thinking + answering time

Friday, 17 April 2026

Understanding LLM Models: Basics That Help You Choose the Right One

LLMs are everywhere now. Every tool, every platform, every new feature seems to be powered by them.

But when it comes to actually choosing a model, things quickly get confusing.
You start seeing terms like parameters, quantization, context length… and it all feels a bit heavy

This blog will help you understand the key basics in a simple way.

Model Architecture – How the Model Thinks

At a high level, architecture is just how the model is designed to process information.  
Most modern LLMs use something called a Transformer. You don’t need to go deep into it — just know this:
It helps the model understand relationships between words.
Instead of reading text word-by-word like old systems, it looks at the whole sentence and figures out what matters more.
That’s how it understands meaning, tone, and context.

Why should you care?
Because better architecture usually means:
More accurate responses
Better understanding of complex inputs
Smarter outputs overall

Parameters – How Big the Model Is
This is the one you’ll hear the most.

Parameters are basically the size of the model.
More parameters = more “learned knowledge”.

Think of it like this:
Small models are quick and efficient
Large models are more knowledgeable but heavier

But bigger isn’t always better.

Yes, large models can reason better and handle complex tasks.
But they also:
Cost more
Need more compute
Can be slower

So the real question is not “What’s the biggest model?”
It’s “What’s enough for my use case?”

Quantization – Making Models Practical
Quantization is simply a way to make models smaller and faster. Without it, most large language models would be too heavy to run outside of high-end infrastructure.

What “Quantization” Really Means
LLMs normally store weights in high precision like:
FP32 (32-bit float)
FP16 (16-bit float)

Quantization reduces that to:
8-bit (Q8)
6-bit (Q6)
5-bit (Q5)
4-bit (Q4)

So instead of each weight taking 16–32 bits, it might take just 4 bits.
Result:
Much smaller model size
Faster inference
Can run on CPU or smaller GPUs

But:
Slight loss in quality (depends on method)

And honestly, in many real-world cases, that quality drop is barely noticeable. Especially for things like chat, summaries, or general-purpose usage.

You’re basically making a smart trade:
a tiny bit of precision for a huge gain in usability

Where It Gets Slightly Confusing (But Important)
Once you start using quantized models, you’ll see names like:
Q4_0
Q4_1
Q4_K_M
Q4_K_S

At first, it looks like random naming. But there’s actually a simple idea behind it.
Q4 → means 4-bit quantization
The part after _ → tells you how the compression is done
Not All Q4 Are Equal

Older versions like:
Q4_0 → more aggressive, lower quality
Q4_1 → slightly better

Smarter Quantization (The K Family)
Q4_K_M
Q4_K_S

use better techniques (you’ll often see them in tools like llama.cpp).

Instead of compressing everything the same way, they:
Work in small blocks
Apply smarter scaling
Keep important information more intact

Same 4-bit size, but noticeably better quality.
Picking the Right One (Simple Rule)
Q4_K_M → best balance (default choice)
Q4_K_S → slightly faster, slightly less accurate

If you don’t want to overthink it, just go with Q4_K_M.

 
Context Length – How Much It Can Keep in Mind

Context length is like the model’s short-term memory.

It decides how much text the model can look at in one go.
Short context:
Faster
Cheaper
But forgets earlier parts quickly

Long context:
Can handle long documents
Better for conversations and analysis
Slightly more expensive

If your work involves long PDFs, logs, or conversations — this matters a lot.

Embedding Length – How Well It Understands Meaning
This one is less talked about, but very important.

Before a model understands text, it converts words into numbers. These are called embeddings.

Embedding length is just how detailed that representation is.
Higher dimension → richer understanding of meaning

This becomes critical when you're building things like:
Search systems
Recommendations
RAG (retrieval-based AI apps)

If your use case involves “finding similar things” — embeddings matter more than you think.

So, How Do You Choose?

Instead of chasing the biggest or newest model, think in terms of your actual need.

If you need deep reasoning → go for larger models
If you need speed and cost efficiency → smaller + quantized models
If you deal with long inputs → prioritize context length
If you're building search or RAG → focus on embedding quality

It’s always a trade-off. There’s no perfect model.