SheikhLM

SheikhLM Architecture Specifications

This document outlines the architectural decisions and technical specifications for the SheikhLM family of models.

Design Philosophy

SheikhLM is designed for efficiency, speed, and deployment in resource-constrained environments. The architecture incorporates modern best practices from the Llama and Mistral families while maintaining a compact footprint.

Core Architectural Components

Model Variants

Feature SheikhLM-135M SheikhLM-360M SheikhLM-1.7B
Parameters ~135M ~360M ~1.7B
Hidden Size 768 1024 2048
Layers 12 24 24
Attention Heads 12 16 16
Intermediate Size 2944 3072 8384
Vocab Size 32,000 32,000 32,000
Max Context 2048 2048 2048

Parameter Calculation Verification

Parameters are calculated including embeddings, all transformer layers (Attention + MLP + Norms), and tied output head.