This notebook shows how to reuse a nlp.networks.BertEncoder from TensorFlow Model Garden to power three tasks: (1) pretraining with nlp.models.BertPretrainer (masked-LM + next-sentence), (2) span labeling with nlp.models.BertSpanLabeler (start/end logits for SQuAD-style QA), and (3) classification with nlp.models.BertClassifier ([CLS] head). You install tf-models-official (or tf-models-nightly for latest), import tensorflow_models.nlp, build small dummy examples, run each model forward pass, and compute losses (weighted sparse CE for MLM/NSP; CE for span start/end; CE for classification). Result: a clear pattern for wrapping one encoder into multiple BERT task heads with concise, production-friendly APIs.This notebook shows how to reuse a nlp.networks.BertEncoder from TensorFlow Model Garden to power three tasks: (1) pretraining with nlp.models.BertPretrainer (masked-LM + next-sentence), (2) span labeling with nlp.models.BertSpanLabeler (start/end logits for SQuAD-style QA), and (3) classification with nlp.models.BertClassifier ([CLS] head). You install tf-models-official (or tf-models-nightly for latest), import tensorflow_models.nlp, build small dummy examples, run each model forward pass, and compute losses (weighted sparse CE for MLM/NSP; CE for span start/end; CE for classification). Result: a clear pattern for wrapping one encoder into multiple BERT task heads with concise, production-friendly APIs.

TensorFlow Models NLP Library for Beginners

2025/09/08 17:40

Content Overview

  • Learning objectives

  • Install and import

  • Install the TensorFlow Model Garden pip package

  • Import TensorFlow and other libraries

  • BERT pretraining model

  • Build a BertPretrainer model wrapping BertEncoder

  • Compute loss

  • Span labelling model

  • Build a BertSpanLabeler wrapping BertEncoder

  • Compute loss

  1. Classification model
  2. Build a BertClassifier model wrapping BertEncoder
  3. Compute loss

\

Learning objectives

In this Colab notebook, you will learn how to build transformer-based models for common NLP tasks including pretraining, span labelling and classification using the building blocks from NLP modeling library.

Install and import

Install the TensorFlow Model Garden pip package

  • tf-models-official is the stable Model Garden package. Note that it may not include the latest changes in the tensorflow_models github repo. To include latest changes, you may install tf-models-nightly, which is the nightly Model Garden package created daily automatically.
  • pip will install all models and dependencies automatically.

\

pip install tf-models-official 

Import Tensorflow and other libraries

import numpy as np import tensorflow as tf  from tensorflow_models import nlp 

\

2023-10-17 12:23:04.557393: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-10-17 12:23:04.557445: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-10-17 12:23:04.557482: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 

BERT pretraining model

BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding) introduced the method of pre-training language representations on a large text corpus and then using that model for downstream NLP tasks.

In this section, we will learn how to build a model to pretrain BERT on the masked language modeling task and next sentence prediction task. For simplicity, we only show the minimum example and use dummy data.

Build a BertPretrainer model wrapping BertEncoder

The nlp.networks.BertEncoder class implements the Transformer-based encoder as described in BERT paper. It includes the embedding lookups and transformer layers (nlp.layers.TransformerEncoderBlock), but not the masked language model or classification task networks.

The nlp.models.BertPretrainer class allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives.

\

# Build a small transformer network. vocab_size = 100 network = nlp.networks.BertEncoder(     vocab_size=vocab_size,      # The number of TransformerEncoderBlock layers     num_layers=3) 

\

2023-10-17 12:23:09.241708: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2211] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 

Inspecting the encoder, we see it contains few embedding layers, stacked nlp.layers.TransformerEncoderBlock layers and are connected to three input layers:

input_word_idsinput_type_ids and input_mask.

\

tf.keras.utils.plot_model(network, show_shapes=True, expand_nested=True, dpi=48) 

\

# Create a BERT pretrainer with the created network. num_token_predictions = 8 bert_pretrainer = nlp.models.BertPretrainer(     network, num_classes=2, num_token_predictions=num_token_predictions, output='predictions') 

\

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/official/nlp/modeling/models/bert_pretrainer.py:112: Classification.__init__ (from official.nlp.modeling.networks.classification) is deprecated and will be removed in a future version. Instructions for updating: Classification as a network is deprecated. Please use the layers.ClassificationHead instead. 

Inspecting the bert_pretrainer, we see it wraps the encoder with additional MaskedLM and nlp.layers.ClassificationHead heads.

\

tf.keras.utils.plot_model(bert_pretrainer, show_shapes=True, expand_nested=True, dpi=48) 

\

# We can feed some dummy data to get masked language model and sentence output. sequence_length = 16 batch_size = 2  word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length)) mask_data = np.random.randint(2, size=(batch_size, sequence_length)) type_id_data = np.random.randint(2, size=(batch_size, sequence_length)) masked_lm_positions_data = np.random.randint(2, size=(batch_size, num_token_predictions))  outputs = bert_pretrainer(     [word_id_data, mask_data, type_id_data, masked_lm_positions_data]) lm_output = outputs["masked_lm"] sentence_output = outputs["classification"] print(f'lm_output: shape={lm_output.shape}, dtype={lm_output.dtype!r}') print(f'sentence_output: shape={sentence_output.shape}, dtype={sentence_output.dtype!r}') 

\

lm_output: shape=(2, 8, 100), dtype=tf.float32 sentence_output: shape=(2, 2), dtype=tf.float32 

Compute loss

Next, we can use lm_output and sentence_output to compute loss.

\

masked_lm_ids_data = np.random.randint(vocab_size, size=(batch_size, num_token_predictions)) masked_lm_weights_data = np.random.randint(2, size=(batch_size, num_token_predictions)) next_sentence_labels_data = np.random.randint(2, size=(batch_size))  mlm_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(     labels=masked_lm_ids_data,     predictions=lm_output,     weights=masked_lm_weights_data) sentence_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(     labels=next_sentence_labels_data,     predictions=sentence_output) loss = mlm_loss + sentence_loss  print(loss) 

\

tf.Tensor(5.2983174, shape=(), dtype=float32) 

With the loss, you can optimize the model. After training, we can save the weights of TransformerEncoder for the downstream fine-tuning tasks. Please see run_pretraining.py for the full example.

Span labeling model

Span labeling is the task to assign labels to a span of the text, for example, label a span of text as the answer of a given question.

In this section, we will learn how to build a span labeling model. Again, we use dummy data for simplicity.

Build a BertSpanLabeler wrapping BertEncoder

The nlp.models.BertSpanLabeler class implements a simple single-span start-end predictor (that is, a model that predicts two values: a start token index and an end token index), suitable for SQuAD-style tasks.

Note that nlp.models.BertSpanLabeler wraps a nlp.networks.BertEncoder, the weights of which can be restored from the above pretraining model.

\

network = nlp.networks.BertEncoder(         vocab_size=vocab_size, num_layers=2)  # Create a BERT trainer with the created network. bert_span_labeler = nlp.models.BertSpanLabeler(network) 

Inspecting the bert_span_labeler, we see it wraps the encoder with additional SpanLabeling that outputs start_position and end_position.

\

tf.keras.utils.plot_model(bert_span_labeler, show_shapes=True, expand_nested=True, dpi=48) 

\

# Create a set of 2-dimensional data tensors to feed into the model. word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length)) mask_data = np.random.randint(2, size=(batch_size, sequence_length)) type_id_data = np.random.randint(2, size=(batch_size, sequence_length))  # Feed the data to the model. start_logits, end_logits = bert_span_labeler([word_id_data, mask_data, type_id_data])  print(f'start_logits: shape={start_logits.shape}, dtype={start_logits.dtype!r}') print(f'end_logits: shape={end_logits.shape}, dtype={end_logits.dtype!r}') 

\

start_logits: shape=(2, 16), dtype=tf.float32 end_logits: shape=(2, 16), dtype=tf.float32 

Compute loss

With start_logits and end_logits, we can compute loss:

\

start_positions = np.random.randint(sequence_length, size=(batch_size)) end_positions = np.random.randint(sequence_length, size=(batch_size))  start_loss = tf.keras.losses.sparse_categorical_crossentropy(     start_positions, start_logits, from_logits=True) end_loss = tf.keras.losses.sparse_categorical_crossentropy(     end_positions, end_logits, from_logits=True)  total_loss = (tf.reduce_mean(start_loss) + tf.reduce_mean(end_loss)) / 2 print(total_loss) 

\

tf.Tensor(5.3621416, shape=(), dtype=float32) 

With the loss, you can optimize the model. Please see run_squad.py for the full example.

Classification model

In the last section, we show how to build a text classification model.

Build a BertClassifier model wrapping BertEncoder

nlp.models.BertClassifier implements a [CLS] token classification model containing a single classification head.

\

network = nlp.networks.BertEncoder(         vocab_size=vocab_size, num_layers=2)  # Create a BERT trainer with the created network. num_classes = 2 bert_classifier = nlp.models.BertClassifier(     network, num_classes=num_classes) 

Inspecting the bert_classifier, we see it wraps the encoder with additional Classification head.

\

tf.keras.utils.plot_model(bert_classifier, show_shapes=True, expand_nested=True, dpi=48) 

\

# Create a set of 2-dimensional data tensors to feed into the model. word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length)) mask_data = np.random.randint(2, size=(batch_size, sequence_length)) type_id_data = np.random.randint(2, size=(batch_size, sequence_length))  # Feed the data to the model. logits = bert_classifier([word_id_data, mask_data, type_id_data]) print(f'logits: shape={logits.shape}, dtype={logits.dtype!r}') 

\

logits: shape=(2, 2), dtype=tf.float32 

Compute loss

With logits, we can compute loss:

\

labels = np.random.randint(num_classes, size=(batch_size))  loss = tf.keras.losses.sparse_categorical_crossentropy(     labels, logits, from_logits=True) print(loss) 

\

tf.Tensor([0.7332015 1.3447659], shape=(2,), dtype=float32) 

With the loss, you can optimize the model. Please see the Fine tune_bert notebook or the model training documentation for the full example.

\ \

:::info Originally published on the TensorFlow website, this article appears here under a new headline and is licensed under CC BY 4.0. Code samples shared under the Apache 2.0 License.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Share Insights

You May Also Like

US Spot ETH ETFs Witness Remarkable $244M Inflow Surge

US Spot ETH ETFs Witness Remarkable $244M Inflow Surge

BitcoinWorld US Spot ETH ETFs Witness Remarkable $244M Inflow Surge The world of digital assets is buzzing with exciting news! US spot ETH ETFs recently experienced a significant milestone, recording a whopping $244 million in net inflows on October 28. This marks the second consecutive day of positive movement for these crucial investment vehicles, signaling a growing appetite for Ethereum exposure among mainstream investors. What’s Fueling the Latest US Spot ETH ETFs Inflow? This impressive influx of capital into US spot ETH ETFs highlights a clear trend: institutional and retail investors are increasingly comfortable with regulated crypto investment products. The figures, reported by industry tracker Trader T, show a robust interest that could reshape the market. Fidelity’s FETH led the charge, attracting a substantial $99.27 million. This demonstrates strong confidence in Fidelity’s offering and Ethereum’s long-term potential. BlackRock’s ETHA wasn’t far behind, securing $74.74 million in inflows. BlackRock’s entry into the crypto ETF space has been closely watched, and these numbers confirm its growing influence. Grayscale’s Mini ETH also saw significant action, pulling in $73.03 million. This new product is quickly gaining traction, offering investors another avenue for Ethereum exposure. It’s important to note that while most products saw positive flows, Grayscale’s ETHE experienced a net outflow of $2.66 million. This might suggest a shift in investor preference towards newer, perhaps more cost-effective, spot ETF options. Why Are US Spot ETH ETFs Attracting Such Significant Capital? The appeal of US spot ETH ETFs is multifaceted. For many investors, these products offer a regulated and accessible way to gain exposure to Ethereum without directly owning the cryptocurrency. This removes some of the complexities associated with digital asset management, such as setting up wallets, managing private keys, or dealing with less regulated exchanges. Key benefits include: Accessibility: Investors can buy and sell shares of the ETF through traditional brokerage accounts, just like stocks. Regulation: Being regulated by financial authorities provides a layer of security and trust that some investors seek. Diversification: For traditional portfolios, adding exposure to a leading altcoin like Ethereum through an ETF can offer diversification benefits. Liquidity: ETFs are generally liquid, allowing for easy entry and exit from positions. Moreover, Ethereum itself continues to be a powerhouse in the blockchain space, underpinning a vast ecosystem of decentralized applications (dApps), NFTs, and decentralized finance (DeFi) protocols. Its ongoing development and significant network activity make it an attractive asset for long-term growth. What Does This US Spot ETH ETFs Trend Mean for Investors? The consistent positive inflows into US spot ETH ETFs could be a strong indicator of maturing institutional interest in the broader crypto market. It suggests that major financial players are not just dabbling but are actively integrating digital assets into their investment strategies. For individual investors, this trend offers several actionable insights: Market Validation: The increasing capital flow validates Ethereum’s position as a significant digital asset with real-world utility and investor demand. Potential for Growth: Continued institutional adoption through ETFs could contribute to greater price stability and potential upward momentum for Ethereum. Observing Investor Behavior: The shift from products like Grayscale’s ETHE to newer spot ETFs highlights how investors are becoming more discerning about their investment vehicles, prioritizing efficiency and cost. However, it is crucial to remember that the crypto market remains volatile. While these inflows are positive, investors should always conduct their own research and consider their risk tolerance before making investment decisions. A Compelling Outlook for US Spot ETH ETFs The recent $244 million net inflow into US spot ETH ETFs is more than just a number; it’s a powerful signal. It underscores a growing confidence in Ethereum as an asset class and the increasing mainstream acceptance of regulated cryptocurrency investment products. With major players like Fidelity and BlackRock leading the charge, the landscape for digital asset investment is evolving rapidly, offering exciting new opportunities for both seasoned and new investors alike. This positive momentum suggests a potentially bright future for Ethereum’s integration into traditional financial portfolios. Frequently Asked Questions (FAQs) What is a US spot ETH ETF? A US spot ETH ETF (Exchange-Traded Fund) is an investment product that allows investors to gain exposure to the price movements of Ethereum (ETH) without directly owning the cryptocurrency. The fund holds actual Ethereum, and shares of the fund are traded on traditional stock exchanges. Which firms are leading the inflows into US spot ETH ETFs? On October 28, Fidelity’s FETH led with $99.27 million, followed by BlackRock’s ETHA with $74.74 million, and Grayscale’s Mini ETH with $73.03 million. Why are spot ETH ETFs important for the crypto market? Spot ETH ETFs are crucial because they provide a regulated, accessible, and often more familiar investment vehicle for traditional investors to enter the cryptocurrency market. This can lead to increased institutional adoption, greater liquidity, and enhanced legitimacy for Ethereum as an asset class. What was Grayscale’s ETHE outflow and what does it signify? Grayscale’s ETHE experienced a net outflow of $2.66 million. This might indicate that some investors are shifting capital from older, perhaps less efficient, Grayscale products to newer spot ETH ETFs, which often offer better fee structures or direct exposure without the previous trust structure limitations. If you found this article insightful, consider sharing it with your network! Your support helps us bring more valuable insights into the world of cryptocurrency. Spread the word and let others discover the exciting trends shaping the digital asset space. To learn more about the latest Ethereum trends, explore our article on key developments shaping Ethereum institutional adoption. This post US Spot ETH ETFs Witness Remarkable $244M Inflow Surge first appeared on BitcoinWorld.
Share
2025/10/29 11:45
First Ethereum Treasury Firm Sells ETH For Buybacks: Death Spiral Incoming?

First Ethereum Treasury Firm Sells ETH For Buybacks: Death Spiral Incoming?

Ethereum-focused treasury company ETHZilla said it has sold roughly $40 million worth of ether to fund ongoing share repurchases, a maneuver aimed at closing what it calls a “significant discount to NAV.” In a press statement on Monday, the company disclosed that since Friday, October 24, it has bought back about 600,000 common shares for approximately $12 million under a broader authorization of up to $250 million, and that it intends to continue buying while the discount persists. ETHZilla Dumps ETH For BuyBacks The company framed the buybacks as balance-sheet arbitrage rather than a strategic retreat from its core Ethereum exposure. “We are leveraging the strength of our balance sheet, including reducing our ETH holdings, to execute share repurchases,” chairman and CEO McAndrew Rudisill said, adding that ETH sales are being used as “cash” while common shares trade below net asset value. He argued the transactions would be immediately accretive to remaining shareholders. Related Reading: Crypto Analyst Shows The Possibility Of The Ethereum Price Reaching $16,000 ETHZilla amplified the message on X, saying it would “use its strong balance sheet to support shareholders through buybacks, reduce shares available for short borrow, [and] drive up NAV per share” and reiterating that it still holds “~$400 million of ETH” on the balance sheet and carries “no net debt.” The company also cited “recent, concentrated short selling” as a factor keeping the stock under pressure. The market-structure logic is straightforward: when a digital-asset treasury trades below the value of its coin holdings and cash, buying back stock with “coin-cash” can, in theory, collapse the discount and lift NAV per share. But the optics are contentious inside crypto because the mechanism requires selling the underlying asset—here, ETH—to purchase equity, potentially weakening the very treasury backing that investors originally sought. Death Spiral Incoming? Popular crypto trader SalsaTekila (@SalsaTekila) commented on X: “This is extremely bearish, especially if it invites similar behavior. ETH treasuries are not Saylor; they haven’t shown diamond-hand will. If treasury companies start dumping the coin to buy shares, it’s a death spiral setup.” Skeptics also zeroed in on funding choices. “I am mostly curious why the company chose to sell ETH and not use the $569m in cash they had on the balance sheet last month,” another analyst Dan Smith wrote, noting ETHZilla had just said it still holds about $400 million of ETH and thus didn’t deploy it on fresh ETH accumulation. “Why not just use cash?” The question cuts to the core of treasury signaling: using ETH as a liquidity reservoir to defend a discounted equity can be read as rational capital allocation, or as capitulation that undermines the ETH-as-reserve narrative. Beyond the buyback, a retail-driven storyline has rapidly formed around the stock. Business Insider reported that Dimitri Semenikhin—who recently became the face of the Beyond Meat surge—has targeted ETHZilla, saying he purchased roughly 2% of the company at what he views as a 50% discount to modified NAV. He has argued that the market is misreading ETHZilla’s balance sheet because it still reflects legacy biotech results rather than the current digital-asset treasury model. Related Reading: Ethereum Emerges As The Sole Trillion-Dollar Institutional Store Of Value — Here’s Why The same report cites liquid holdings on the order of 102,300 ETH and roughly $560 million in cash, translating to about $62 per share in liquid assets, and calls out a 1-for-10 reverse split on October 15 that, in his view, muddied the optics for retail. Semenikhin flagged November 13 as a potential catalyst if results show the pivot to ETH generating profits. The company’s own messaging emphasizes the discount-to-NAV lens rather than a change in strategy. ETHZilla told investors it would keep buying while the stock trades below asset value and highlighted a goal of shrinking lendable supply to blunt short-selling pressure. For Ethereum markets, the immediate flow effect is limited—$40 million is marginal in ETH’s daily liquidity—but the second-order risk flagged by traders is behavioral contagion. If other ETH-heavy treasuries follow the playbook, selling the underlying to buy their own stock, the flow could become pro-cyclical: coins are sold to close equity discounts, the selling pressures spot, and wider discounts reappear as equity screens rerate to the weaker mark—repeat. That is the “death spiral” scenario skeptics warn about when the treasury asset doubles as the company’s signal of conviction. At press time, ETH traded at $4,156. Featured image created with DALL.E, chart from TradingView.com
Share
2025/10/29 12:00