Do you work with Humain or other regional platform players?

We sit in the implementation layer above platform and infrastructure providers. We are not a reseller for any of them, and we do not take referral fees that would compromise an architectural recommendation.

NovaStar Labs

Production-grade applied AI for Gulf enterprises that need it to actually work in Arabic, on their own infrastructure.

NovaStar Labs is a boutique applied AI engineering firm. We design and ship fine-tuned, sovereignty-compliant AI systems for banks, healthcare, energy, and government/gouvernment contractor entities across Saudi Arabia and the wider GCC. Systems that run reliably for years, not pilots that quietly die after the demo.

Start a technical conversation

5–10×

lower cost-per-query vs frontier APIs

On-prem

sovereign deployment, data never leaves

Arabic-native

not a translation layer

What we do

We build AI systems that speak your business, not generic intelligence rented by the token.

The frontier APIs are a brilliant consultant who walks into the room knowing a little about everything, reads three pages of context, says clever things, and leaves at five. What our customers need is the senior accountant who has worked at the company for ten years, knows every vendor by name, knows which payment terms are notional and which are real, and knows when to escalate and when to absorb. Both have value. Only one closes the books.

We build the senior accountant. Three disciplines, one instrument.

Production-grade inference engineering

Our founding background is silicon: ASIC design, FPGA development, SerDes, and high-speed mixed-signal systems. We apply that same discipline to AI inference. We work in the techniques that actually move p95 and p99 latency at production volume: constrained decoding for valid structured outputs, prefix and KV-cache reuse on shared context, speculative decoding for throughput, quantization-aware deployment, batching strategy tuned to traffic shape, and serving topology designed against a roofline rather than guessed at. The result is workflows that run typically 5 to 10 times cheaper per query than the equivalent on frontier APIs, with predictable tail latency and a fixed cost curve instead of a usage-linear bill. None of this is exposed by a closed API. All of it is the difference between a demo and a system that operates for years.

Advanced fine-tuning, applied to your workflow

Most "enterprise AI" today is retrieval bolted onto a generic model. The assistant gets better source material but never actually learns your business. We change the model's behavior at the parameter level on your data and your conventions. Full fine-tunes where the workload justifies the investment, parameter-efficient adapters where they deliver the lift at lower cost, supervised fine-tuning on curated workflow data, preference tuning where calibrated judgment matters, and a hybrid fine-tuning-plus-retrieval architecture where the tuned model carries the reasoning and retrieval carries the customer-specific facts. We measure lift against a customer-specific evaluation harness from day one: accuracy, structural validity of outputs, calibrated uncertainty, drift over time. If the lift is not real, we will say so before you have spent a procurement cycle on it.

Arabic-domain expertise, calibrated to real Gulf operations

Most "Arabic AI" is English AI with a translation layer. It fails on the things that matter: Hijri date reasoning across rolling lunar months, Saudi accounting conventions, VAT-specific document patterns, vendor-name drift across years of inconsistent data entry, the Najdi-Hijazi-Khaleeji-MSA mix that shows up in real call-center transcripts, and the bilingual code-switching that real Saudi business actually runs on. We solve these problems specifically. A live example: the AI layer we built into Spacetoon's Odoo ERP closes a workflow gap the platform itself does not address. Vendor statement reconciliation across name and amount drift, partial payment allocation across batched wires, and Arabic-native receipt OCR. Running on infrastructure inside the Kingdom, with the data never leaving. The same pattern carries into the workflows where Gulf enterprises actually lose hours every month: payable approvals, monthly close, claims processing, KYC pipelines, contract intelligence.

Who we work with

In sectors where high-volume, language-sensitive workflows make frontier APIs uneconomic and where data sovereignty is non-negotiable.

Live

Live in production

Spacetoon's Odoo ERP, where our fine-tuned model handles vendor statement reconciliation, partial payment allocation, payable approvals, and Arabic-native receipt processing. Chat with your data model that saves hours for the Odoo user, bidirectional linking with WhatsApp. Running on infrastructure inside the Kingdom. Data does not leave.

The workflows we are built for

Vendor reconciliation and accounts payable across Arabic-name and amount drift
Document intelligence on Arabic contracts, trade-finance paperwork, and clinical documentation
Claims processing where "almost right" is a compliance event
KYC and AML support in Arabic-English bilingual settings
Contact-center augmentation across the Najdi-Hijazi-Khaleeji-MSA mix that closed-API speech models still mishandle
Citizen-services automation and inter-agency document workflows
Hospitals looking to increase efficiency on X-ray (we are building an X-ray/MRI image analyser to help doctors with diagnosis)

The buyer profile

CIOs, CTOs, heads of operations, and heads of finance at mid-to-large Gulf enterprises and government-adjacent entities. We do not pitch SMBs, we do not sell chatbot subscriptions, and we do not run unpaid pilots. If your problem is a high-volume workflow where accuracy, sovereignty, and cost-per-query all matter at the same time, we are likely a fit.

CIO / CTOHead of OperationsHead of FinanceBankingHealthcareEnergyGovernment

How we engage

Engagements run in three phases.

Discovery

Two to four weeks. We scope the workflow, profile the data, benchmark candidate model and serving configurations against your actual accuracy and latency requirements, and produce a technical proposal with realistic cost and timeline figures. Discovery is paid and concrete. Not a sales process disguised as a workshop.

Implementation

Typically eight to sixteen weeks, depending on scope. Fine-tuning, integration, evaluation harness, observability, deployment into your environment. We instrument latency, throughput, accuracy, drift, and cost-per-query from day one, because you cannot operate what you cannot measure.

Operation and improvement

Ongoing. Model retraining as your data and workflows evolve, performance tuning as utilization patterns shift, expansion into adjacent workflows once the first system is proven in production.

We are deliberately a small firm and we take a small number of engagements at a time. This is a feature, not a constraint.

Why us

Engineering depth across the full stack

NovaStar Labs is a small, deep team. Our founding background is silicon: ASIC design, FPGA development, SerDes, and high-speed mixed-signal. Around that core we have ML specialists working on fine-tuning and evaluation, data scientists building the workflow-specific evaluation harnesses that production AI actually needs, and senior backend engineers shipping the integration and serving infrastructure that makes the model usable inside a real ERP, document pipeline, or contact center. We are an AI infrastructure company. We design the system, fine-tune the model, run the inference, and integrate it into the customer's workflow. End to end.

Calibrated, not confident

The dangerous failure mode in enterprise AI is not the wrong answer. It is the confident wrong answer. A generic model will fabricate a vendor balance, match a wire payment to the wrong invoice, or hallucinate a Hijri date, and present all three with the same fluent certainty. Our systems are built to flag what they do not know, route ambiguous cases to a one-click approval queue, and write a plain-language audit trail for every decision. The accountant becomes a reviewer, not a detective. Reliability in production is not "gets the right answer." It is "tells you when it might not."

Regulatory fluency, not regulatory theater

We design for SAMA, SDAIA, NCA, NDMO, and CBUAE requirements because we operate inside this regulatory landscape every day. We can articulate which controls apply to your workload, which architectural choices satisfy them, and which "compliant" claims from other vendors do not survive a real review.

FAQ

We pin base-model versions, version our fine-tuning datasets and adapters separately, and maintain a regression evaluation suite tied to the customer's accuracy criteria. When a new base model is a candidate for upgrade, we run it through the same eval harness before any production switch. Reproducibility is a deployment requirement, not an aspiration.

For Arabic enterprise workloads, the linguistic and domain-reasoning gap on frontier models is real, particularly on Saudi accounting conventions, regulatory language, and dialectal customer interactions. A fine-tuned smaller model carries the reasoning. The retrieval layer carries the customer-specific facts. The hybrid approach typically delivers better accuracy at materially lower cost-per-query than retrieval-only on a frontier API, and it avoids the schema fragility of fine-tuning per customer dataset.

Wherever your regulatory and operational requirements dictate. We deploy into customer VPCs, on-premise GPU clusters, Humain Cloud, AWS Saudi region, Oracle Saudi cloud, and other sovereign environments. Our internal development and benchmarking runs on a mixed GPU fleet (A100, L40S, H100 NVL) with active expansion to H200 underway. Production deployment is a separate decision driven by your constraints, not ours.

We sit in the implementation layer above platform and infrastructure providers. We deploy on Humain infrastructure where it fits the customer's sovereignty profile, and we partner with platform vendors where their tooling accelerates a specific engagement. We are not a reseller for any of them, and we do not take referral fees that would compromise an architectural recommendation.

We start with a paid discovery, which is small by design. We do not run unpaid pilots, and we do not run pilots structured to never reach production. That pattern is how most enterprise AI investment quietly fails.

If you are working on a specific workflow inside a Gulf enterprise and you want a direct technical conversation about whether AI is the right tool for it, and if it is, what the realistic shape of an implementation looks like, we would like to hear from you.

Start a conversation Book a call