=============
Wizard LangID
=============

.. figure:: _static/img/WizardLangIDBanner.png
   :alt: WizardLangID Banner
   :width: 800
   :height: 300
   :align: center

.. image:: https://img.shields.io/pypi/v/wizardlangid.svg
   :target: https://pypi.org/project/wizardlangid/
   :alt: PyPI - Version

.. image:: https://img.shields.io/pypi/dm/wizardlangid.svg?label=PyPI%20downloads
   :target: https://pypistats.org/packages/wizardlangid
   :alt: PyPI - Downloads/month

.. image:: https://img.shields.io/pypi/l/wizardlangid.svg
   :target: https://github.com/textwizard-dev/wizardlangid/blob/main/LICENSE
   :alt: License


**WizardLangID** is a Python library for Language identification via character n-gram profiles. Candidate gating guided by priors and linguistic cues, then probability estimation for each language. Supports 161 languages. Returns a top-1 ISO code or a probability-ordered list.

Installation
============

Requires Python 3.9+.

.. code-block:: bash

   pip install wizardlangid


Quick start
===========

.. code-block:: python


==================
Language Detection
==================

Parameters
==========

- ``text``: Input string (Unicode).
- ``top_k``: How many candidates to return (default ``3``).
- ``profiles_dir``: Optional path overriding the bundled language profiles.
- ``use_mmap``: If ``True``, memory-map the profile tries (lower RAM; slightly slower first access).
- ``return_top1``: If ``True``, return only the best language code; otherwise a list of ``(lang, prob)``.

Return value
============

- ``str`` when ``return_top1=True`` (e.g., ``"it"``).  
- ``list[tuple[str, float]]`` when ``return_top1=False`` (sorted by probability).

Examples
========

Top-1 (single code)
-------------------

.. code-block:: python

   import wizardlangid as wl

   text = "Ciao, come stai oggi?"
   lang = wl.lang_detect(text, return_top1=True)
   print(lang) 
   
**Output**  

   .. code-block:: text

      it

Top-k distribution
------------------

.. code-block:: python

   import wizardlangid as wl

   text = "The quick brown fox jumps over the lazy dog."
   langs = wl.lang_detect(text, top_k=5, return_top1=False)
   print(langs)  
   
**Output**  

   .. code-block:: text

      [('en', 0.9999376335362183), ('mg', 4.719212057614953e-05), ('fy', 1.4727973350205069e-05), ('rm', 2.8718519851832537e-07), ('la', 1.5918465665694727e-07)]

Batch examples
--------------

.. code-block:: python

   import wizardlangid as wl

   for s in [
       "これは日本語のテスト文です。",
       "Alex parle un peu français, aber nicht so viel.",
       "¿Dónde está la estación de tren?",
   ]:
       print("TOP1:", wl.lang_detect(s, return_top1=True))
        
**Output**  

   .. code-block:: text

    TOP1: ja
    TOP1: fr
    TOP1: es

Profiles directory & mmap
-------------------------

.. code-block:: python

   from pathlib import Path
   import wizardlangid as wl

   langs = wl.lang_detect(
       "Buongiorno a tutti!",
       profiles_dir=Path("/opt/WizardLangID/profiles"),  # custom profiles
       use_mmap=True,                                   # lower RAM
       top_k=3,
   )
   print(langs)

Operational notes
=================

- **Lazy loading**: the model loads on first call and is cached for reuse.  
- **Short/ASCII texts**: ambiguity is common; provide longer samples for better confidence.  
- **Profiles**: if you keep profiles outside the package, pass ``profiles_dir``.  
- **Probabilities** are softmax-normalised over candidates returned by the gate.

Supported languages (161)
=========================

.. csv-table::
   :header-rows: 0
   :widths: 33,33,34

   "aa — Afar","ab — Abkhazian","af — Afrikaans"
   "am — Amharic","an — Aragonese","ar — Arabic"
   "as — Assamese","av — Avaric","ay — Aymara"
   "az — Azerbaijani","ba — Bashkir","be — Belarusian"
   "bg — Bulgarian","bm — Bambara","bn — Bengali"
   "bo — Tibetan","br — Breton","bs — Bosnian"
   "ca — Catalan","ce — Chechen","ch — Chamorro"
   "cs — Czech","cv — Chuvash","cy — Welsh"
   "da — Danish","de — German","dz — Dzongkha"
   "ee — Ewe","el — Greek","en — English"
   "eo — Esperanto","es — Spanish","et — Estonian"
   "eu — Basque","fa — Persian","ff — Fula"
   "fi — Finnish","fj — Fijian","fo — Faroese"
   "fr — French","fy — Western Frisian","ga — Irish"
   "gd — Scottish Gaelic","gl — Galician","gn — Guarani"
   "gu — Gujarati","gv — Manx","ha — Hausa"
   "he — Hebrew","hi — Hindi","hr — Croatian"
   "ht — Haitian Creole","hu — Hungarian","hy — Armenian"
   "id — Indonesian","ig — Igbo","io — Ido"
   "is — Icelandic","it — Italian","iu — Inuktitut"
   "ja — Japanese","jv — Javanese","ka — Georgian"
   "kg — Kongo","ki — Kikuyu","kk — Kazakh"
   "kl — Kalaallisut","km — Khmer","kn — Kannada"
   "ko — Korean","kr — Kanuri","ks — Kashmiri"
   "ku — Kurdish","kv — Komi","kw — Cornish"
   "ky — Kyrgyz","la — Latin","lb — Luxembourgish"
   "lg — Ganda","li — Limburgan","ln — Lingala"
   "lo — Lao","lt — Lithuanian","lu — Luba-Kasai"
   "lv — Latvian","mg — Malagasy","mh — Marshallese"
   "mi — Māori","mk — Macedonian","ml — Malayalam"
   "mn — Mongolian","mr — Marathi","ms — Malay"
   "mt — Maltese","my — Burmese","ne — Nepali"
   "nl — Dutch","nn — Norwegian Nynorsk","no — Norwegian"
   "nv — Navajo","ny — Chichewa / Nyanja","oc — Occitan"
   "om — Oromo","or — Odia","os — Ossetian"
   "pa — Punjabi","pl — Polish","ps — Pashto"
   "pt — Portuguese","qu — Quechua","rm — Romansh"
   "rn — Kirundi","ro — Romanian","ru — Russian"
   "rw — Kinyarwanda","sa — Sanskrit","sc — Sardinian"
   "sd — Sindhi","se — Northern Sami","sg — Sango"
   "si — Sinhala","sk — Slovak","sl — Slovenian"
   "sm — Samoan","sn — Shona","so — Somali"
   "sq — Albanian","sr — Serbian","ss — Swati"
   "st — Sotho","su — Sundanese","sv — Swedish"
   "sw — Swahili","ta — Tamil","te — Telugu"
   "tg — Tajik","th — Thai","ti — Tigrinya"
   "tk — Turkmen","tl — Tagalog","tn — Tswana"
   "to — Tonga","tr — Turkish","ts — Tsonga"
   "tt — Tatar","wl — Twi","ty — Tahitian"
   "ug — Uyghur","uk — Ukrainian","ur — Urdu"
   "uz — Uzbek","ve — Venda","vi — Vietnamese"
   "vo — Volapük","wa — Walloon","wo — Wolof"
   "xh — Xhosa","yi — Yiddish","yo — Yoruba"
   "zh — Chinese","zu — Zulu"


License
=======

`AGPL-3.0-or-later <_static/LICENSE>`_.

Resources
=========

- `PyPI Package <https://pypi.org/project/wizardlangid/>`_
- `Documentation <https://wizardlangid.readthedocs.io/en/latest/>`_
- `GitHub Repository <https://github.com/textwizard-dev/WizardLangID>`_

.. _contact_author:

Contact & Author
================

:Author: Mattia Rubino
:Email: `textwizard.dev@gmail.com <mailto:textwizard.dev@gmail.com>`_