BayesianBLP#

class pymc_marketing.customer_choice.bayesian_blp.BayesianBLP(market_data, *, characteristics, product_col='product', market_col='market', region_col=None, share_col='share', market_size_col='n', price_col='price', instruments=None, outside_good='outside', time_col=None, n_mc_draws=None, random_coef_on=None, product_fixed_effects=True, likelihood='normal_logshare', min_share=0.0001, track_delta=False, hierarchical_parameterisation='centered', model_config=None, sampler_config=None, random_seed=None)[source]#

Bayesian random-coefficients logit on aggregate market-share panels.

Parameters:
market_datapd.DataFrame

Long-format panel. Each (region, market, product) cell is one row. Every market must contain exactly one row per inside product plus a single outside-good row whose product_col value matches outside_good. Outside-good rows should have price, characteristics, and instruments all set to 0.

product_col, market_col, region_col, share_col, market_size_col, price_colstr

Column names. region_col=None (default) collapses the region hierarchy to a single bucket. market_col must uniquely identify a (region, period) cell.

characteristicslist of str

Columns holding product characteristics x_jt.

instrumentslist of str, optional

Columns holding instruments z_jt for the price-endogeneity block. If None, no first-stage price equation is built and the price coefficient is not identified under endogeneity — a warning is raised.

outside_goodstr

Row label of the outside good in product_col.

time_colstr, optional

Column holding the period (time) coordinate. When set, every (region, period) cell must appear exactly once and the panel must be rectangular (every region has every period). The period coordinate is then exposed on the InferenceData and counterfactual_shares() / elasticities() accept periods= and regions= coord-label arguments. Default None — the model treats markets as unstructured and the graph is bit-identical to the pre-time-aware behaviour.

n_mc_drawsint, optional

Number of Owen-scrambled Halton draws used to integrate the share equation over consumer heterogeneity. Defaults to max(200, 100 * n_random_coefs) and warns when the chosen value looks too small for the integration dimension.

random_coef_onlist of str, optional

Names of dimensions that receive consumer-level random coefficients. Use the literal string "price" for the price coefficient and any characteristic name for that characteristic. Defaults to ["price"].

product_fixed_effectsbool

If True (default), the structural error decomposes as ξ_jt = ξ_j + ξ̃_jt with a product fixed effect. If False, per-product alternative-specific intercepts are used instead and ξ_jt = ξ̃_jt. Only one of the two is included, never both. False is not supported in this v1 release; pass True.

likelihood{“normal_logshare”}

Aggregate-share likelihood. Currently only the Berry (1994) heteroskedastic Normal-on-log-share-ratio formulation is wired up.

min_sharefloat

Floor applied to observed shares to avoid log(0). A warning is emitted when the floor is hit.

track_deltabool

If True, store the mean-utility tensor δ_jt as a pm.Deterministic (memory-heavy on large panels). Default False.

hierarchical_parameterisation{“centered”, “noncentered”}

Parameterisation of the region-level hierarchy on α_r and β_r. Default "centered". Use "noncentered" only when per-region data is sparse and the prior dominates the likelihood (e.g. many regions with very few markets each). For typical scanner panels — a handful of regions, each with informative per-region data — the centered form has a cleaner posterior geometry and avoids the Neal’s-funnel pathology that otherwise biases τ_α low and over-shrinks per-region coefficients.

model_config, sampler_configdict, optional

Standard ModelBuilder overrides. The default sampler configuration targets numpyro at target_accept=0.95 because the ξ̃_jt block is funnel-prone.

Notes

Identification. Endogeneity correction uses the conditional decomposition of the joint (η_jt, ξ̃_jt) Normal: the price equation p_jt = π_0j + π_z · z_jt + η_jt is fit as a marginal likelihood, η_jt is the price residual, and ξ̃_jt | η_jt is parameterised on the slope-residual coordinates γ = ρ · σ_ξ and ω = σ_ξ · sqrt(1 ρ²) so that ξ̃_jt = (γ/σ_η) · η_jt + ω · ε_jt. The marginal scale σ_ξ and correlation ρ_price_xi are exposed as Deterministics for downstream summaries. This is mathematically equivalent to a joint MvNormal in (ρ, σ_ξ) coordinates but the conditional likelihood depends on ρ × σ_ξ only through γ, so the slope-residual basis avoids the multiplicative ridge that pinned diagonal-mass NUTS at the depth cap.

Sampler geometry. The ξ̃_jt and random-coefficient raw blocks are non-centered. The region-level hierarchy on α_r / β_r is centered by default — counterintuitive but standard advice (Betancourt & Girolami 2015): centered is preferable when per-group data is informative, while non-centered helps in sparse-data regimes. The default sampler runs numpyro NUTS with target_accept=0.95; when residual correlations between variance components push tree depth toward the cap, prefer nutpie (low-rank modified mass matrix by default) or pass nuts_sampler_kwargs={"nuts_kwargs": {"dense_mass": True}} to fit for numpyro. Set track_delta=True only if you actually need the per-cell mean utility in the trace — on a typical 100-week × 10-SKU panel this is ~7 MB per chain.

Notation glossary. Variable names in the trace and posterior summaries map to the model symbols as follows. See the synthetic notebook for the full index conventions and a more detailed table.

Code name

Math

Role

alpha / alpha_r

α, α_r

Price coefficient (population, per-region)

beta / beta_r

β, β_r

Characteristic utility weights

alpha_pop, tau_alpha, beta_pop, tau_beta

α_pop, τ_α, β_pop, τ_β

Cross-region hyperparameters (only when region_col is set)

sigma_random

σ_d

Consumer heterogeneity scale per random-coefficient dimension

model._halton[:, d]

ν_id

Consumer i’s standardised N(0,1) taste shock on dimension d, drawn from the Halton grid (fixed data, not sampled)

internal mu_dev

μ_ijm

Consumer-level utility deviation Σ_d σ_d · ν_id · c_jmd; not exposed as a posterior variable

xi / xi_j / xi_tilde

ξ_jm, ξ_j, ξ̃_jm

Product-market quality shock decomposed as product fixed effect + centered residual

sigma_xi, sigma_xi_j

σ_ξ, σ_{ξ_j}

Marginal scales of ξ_jm and ξ_j

eta / sigma_eta

η_jm, σ_η

First-stage price residual and its scale

pi_0 / pi_z

π_0, π_z

First-stage intercepts / instrument coefficients

rho_price_xi

ρ

Endogeneity correlation between ξ and η

gamma_xi_eta, omega_xi

γ, ω

Slope-residual coordinates the sampler uses; (ρ, σ_ξ) are derived Deterministics

delta

δ_jm

Mean utility (only if track_delta=True)

s_inside / s_outside

ŝ_jm, ŝ_0m

Halton-averaged predicted shares

log_share_ratio

log s_jm − log s_0m

Likelihood’s observed quantity

Methods

BayesianBLP.__init__(market_data, *, ...[, ...])

Initialize model configuration and sampler configuration for the model.

BayesianBLP.attrs_to_init_kwargs(attrs)

Convert the model configuration and sampler configuration from the attributes to keyword arguments.

BayesianBLP.batch_shares(alpha_M, beta_M, ...)

Numpy-evaluate the share equation for a batch of posterior samples.

BayesianBLP.build_from_idata(idata)

Not implemented for v1.

BayesianBLP.build_model(**kwargs)

Construct the PyMC model and attach it to self.model.

BayesianBLP.counterfactual_shares([...])

Posterior shares under a counterfactual price intervention.

BayesianBLP.create_idata_attrs()

Serialise scalar constructor arguments onto InferenceData.attrs.

BayesianBLP.elasticities(*[, at, periods, ...])

Posterior price elasticities ε[market, share, price].

BayesianBLP.fit([progressbar, random_seed])

Fit by sampling the joint posterior with NUTS.

BayesianBLP.graphviz(**kwargs)

Get the graphviz representation of the model.

BayesianBLP.idata_to_init_kwargs(idata)

Create the model configuration and sampler configuration from the InferenceData to keyword arguments.

BayesianBLP.iterate_posterior_samples(n_samples)

Stack chain × draw and (optionally) subsample posterior arrays.

BayesianBLP.load(fname[, check])

Not implemented for v1.

BayesianBLP.load_from_idata(idata[, check])

Create a ModelBuilder instance from an InferenceData object.

BayesianBLP.sample_prior_predictive([...])

Draw from the prior predictive distribution.

BayesianBLP.save(fname, **kwargs)

Persist the fitted InferenceData (model graph is not saved).

BayesianBLP.set_idata_attrs([idata])

Set attributes on an InferenceData object.

BayesianBLP.table(**model_table_kwargs)

Get the summary table of the model.

BayesianBLP.xi_as_grid()

Reshape the posterior xi to (region, period, inside_product).

Attributes

default_model_config

Default priors for every univariate / vector parameter in the model.

default_sampler_config

Default sampler kwargs: numpyro NUTS at target_accept=0.95.

fit_result

Get the posterior fit_result.

id

Generate a unique hash value for the model.

output_var

Name of the observed variable (the log-share-ratio likelihood).

posterior

Access the 'posterior' attribute of the InferenceData object.

posterior_predictive

Access the 'posterior_predictive' attribute of the InferenceData object.

predictions

Access the 'predictions' attribute of the InferenceData object.

prior

Access the 'prior' attribute of the InferenceData object.

prior_predictive

Access the 'prior_predictive' attribute of the InferenceData object.

version

idata

sampler_config

model_config