Model Validation & Testing =========================== .. meta:: :llm-description: Automatic testing system that runs on model load. Learn what nnterp guarantees, trade-offs with model dispatch and attention implementation, and manual testing commands. ``nnterp`` includes automatic validation to prevent silent failures and ensure model correctness. When you load a model, a series of fast tests run automatically to verify the model works as expected. Automatic Testing System ------------------------- When loading a ``StandardizedTransformer``, ``nnterp`` automatically runs tests to ensure: - **Model renaming correctness**: All modules are properly renamed to the standardized interface - **Module output shapes**: Layer outputs have expected shapes (batch_size, seq_len, hidden_size) - **Attention probabilities**: If enabled, attention probabilities have correct shape (batch_size, num_heads, seq_len, seq_len), sum to 1 for each token, and modifying them changes model output .. code-block:: python from nnterp import StandardizedTransformer # Automatic tests run during model loading model = StandardizedTransformer("gpt2") # Tests passed: model is ready to use # If tests fail, you'll see detailed error messages model = StandardizedTransformer("unsupported-model") # Error: Could not find layers module... What ``nnterp`` Guarantees ~~~~~~~~~~~~~~~~~~~~~~ ``nnterp`` guarantees that: - All models follow the standardized naming convention - ``model.layers_output[i]`` returns tensors with expected shapes - ``model.attention_probabilities[i]`` (if enabled) returns properly normalized attention matrices What ``nnterp`` Cannot Guarantee ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``nnterp`` cannot guarantee: - **Attention probabilities remain unmodified**: The model might apply additional transformations after the attention probabilities are computed but before they're used. Check ``model.attention_probabilities.print_source()`` to understand the exact hook location in the HuggingFace implementation. - **Perfect HuggingFace compatibility**: While ``nnterp`` uses original HuggingFace implementations, some edge cases might behave differently due to the renaming process. .. code-block:: python # Check where attention probabilities are hooked model.attention_probabilities.print_source() Trade-offs and Configuration ----------------------------- The automatic testing system comes with some trade-offs: Model Dispatch ~~~~~~~~~~~~~~ ``nnterp`` automatically dispatches your model to available devices (``device_map="auto"``) during loading. This can be inconvenient if you don't want to load model weights immediately. However you can set ``allow_dispatch=False`` to disable this (but some tests won't be run). Attention Implementation ~~~~~~~~~~~~~~~~~~~~~~~~ ``nnterp`` uses the default HuggingFace attention implementation by default. To access attention probabilities, you must explicitly set ``enable_attention_probs=True``, which automatically configures the model to use ``attn_implementation="eager"`` since other implementations (like sdpa or flash attention) don't support attention pattern tracing. .. code-block:: python # To access attention probabilities (slower but traceable) model = StandardizedTransformer( "gpt2", enable_attention_probs=True ) # Use default HuggingFace implementation (faster, no attention tracing) model = StandardizedTransformer("gpt2") # Uses default HF implementation If you try to use both ``enable_attention_probs=True`` and a non-eager ``attn_implementation``, ``nnterp`` will raise an error: .. code-block:: python # This will raise an error model = StandardizedTransformer( "gpt2", enable_attention_probs=True, attn_implementation="sdpa" # Conflicts with enable_attention_probs ) Manual Testing -------------- You can run tests manually for specific models or architectures: .. code-block:: bash # Test specific models python -m nnterp run_tests --model-names "gpt2" "meta-llama/Llama-2-7b-hf" # Test using toy models of specific architectures (faster/cheaper) python -m nnterp run_tests --class-names "LlamaForCausalLM" "GPT2LMHeadModel" This is useful when: - You're using a different version of ``transformers`` or ``nnsight`` than officially tested - You want to test a new model architecture before using it in research Version Compatibility ---------------------- ``nnterp`` checks if tests were run for your current ``nnsight`` and ``transformers`` versions. If not, it will warn you and suggest running manual tests. The automatic testing system ensures that even if an architecture hasn't been officially tested, if it loads successfully, it's probably working correctly.