BayesianBLP#

class pymc_marketing.customer_choice.bayesian_blp.BayesianBLP(market_data, *, characteristics, product_col='product', market_col='market', region_col=None, share_col='share', market_size_col='n', price_col='price', instruments=None, outside_good='outside', time_col=None, n_mc_draws=None, random_coef_on=None, product_fixed_effects=True, likelihood='normal_logshare', min_share=0.0001, track_delta=False, hierarchical_parameterisation='centered', model_config=None, sampler_config=None, random_seed=None)[source]#

Bayesian random-coefficients logit on aggregate market-share panels.

Parameters:

market_datapd.DataFrame: Long-format panel. Each (region, market, product) cell is one row. Every market must contain exactly one row per inside product plus a single outside-good row whose product_col value matches outside_good. Outside-good rows should have price, characteristics, and instruments all set to 0.
product_col, market_col, region_col, share_col, market_size_col, price_colstr: Column names. region_col=None (default) collapses the region hierarchy to a single bucket. market_col must uniquely identify a (region, period) cell.
characteristicslist of str: Columns holding product characteristics x_jt.
instrumentslist of str, optional: Columns holding instruments z_jt for the price-endogeneity block. If None, no first-stage price equation is built and the price coefficient is not identified under endogeneity — a warning is raised.
outside_goodstr: Row label of the outside good in product_col.
time_colstr, optional: Column holding the period (time) coordinate. When set, every (region, period) cell must appear exactly once and the panel must be rectangular (every region has every period). The period coordinate is then exposed on the InferenceData and counterfactual_shares() / elasticities() accept periods= and regions= coord-label arguments. Default None — the model treats markets as unstructured and the graph is bit-identical to the pre-time-aware behaviour.
n_mc_drawsint, optional: Number of Owen-scrambled Halton draws used to integrate the share equation over consumer heterogeneity. Defaults to max(200, 100 * n_random_coefs) and warns when the chosen value looks too small for the integration dimension.
random_coef_onlist of str, optional: Names of dimensions that receive consumer-level random coefficients. Use the literal string "price" for the price coefficient and any characteristic name for that characteristic. Defaults to ["price"].
product_fixed_effectsbool: If True (default), the structural error decomposes as ξ_jt = ξ_j + ξ̃_jt with a product fixed effect. If False, per-product alternative-specific intercepts are used instead and ξ_jt = ξ̃_jt. Only one of the two is included, never both. False is not supported in this v1 release; pass True.
likelihood{“normal_logshare”}: Aggregate-share likelihood. Currently only the Berry (1994) heteroskedastic Normal-on-log-share-ratio formulation is wired up.
min_sharefloat: Floor applied to observed shares to avoid log(0). A warning is emitted when the floor is hit.
track_deltabool: If True, store the mean-utility tensor δ_jt as a pm.Deterministic (memory-heavy on large panels). Default False.
hierarchical_parameterisation{“centered”, “noncentered”}: Parameterisation of the region-level hierarchy on α_r and β_r. Default "centered". Use "noncentered" only when per-region data is sparse and the prior dominates the likelihood (e.g. many regions with very few markets each). For typical scanner panels — a handful of regions, each with informative per-region data — the centered form has a cleaner posterior geometry and avoids the Neal’s-funnel pathology that otherwise biases τ_α low and over-shrinks per-region coefficients.
model_config, sampler_configdict, optional: Standard ModelBuilder overrides. The default sampler configuration targets numpyro at target_accept=0.95 because the ξ̃_jt block is funnel-prone.

Notes

Identification. Endogeneity correction uses the conditional decomposition of the joint (η_jt, ξ̃_jt) Normal: the price equation p_jt = π_0j + π_z · z_jt + η_jt is fit as a marginal likelihood, η_jt is the price residual, and ξ̃_jt | η_jt is parameterised on the slope-residual coordinates γ = ρ · σ_ξ and ω = σ_ξ · sqrt(1 − ρ²) so that ξ̃_jt = (γ/σ_η) · η_jt + ω · ε_jt. The marginal scale σ_ξ and correlation ρ_price_xi are exposed as Deterministics for downstream summaries. This is mathematically equivalent to a joint MvNormal in (ρ, σ_ξ) coordinates but the conditional likelihood depends on ρ × σ_ξ only through γ, so the slope-residual basis avoids the multiplicative ridge that pinned diagonal-mass NUTS at the depth cap.

Sampler geometry. The ξ̃_jt and random-coefficient raw blocks are non-centered. The region-level hierarchy on α_r / β_r is centered by default — counterintuitive but standard advice (Betancourt & Girolami 2015): centered is preferable when per-group data is informative, while non-centered helps in sparse-data regimes. The default sampler runs numpyro NUTS with target_accept=0.95; when residual correlations between variance components push tree depth toward the cap, prefer nutpie (low-rank modified mass matrix by default) or pass nuts_sampler_kwargs={"nuts_kwargs": {"dense_mass": True}} to fit for numpyro. Set track_delta=True only if you actually need the per-cell mean utility in the trace — on a typical 100-week × 10-SKU panel this is ~7 MB per chain.

Notation glossary. Variable names in the trace and posterior summaries map to the model symbols as follows. See the synthetic notebook for the full index conventions and a more detailed table.

Code name	Math	Role
`alpha` / `alpha_r`	α, α_r	Price coefficient (population, per-region)
`beta` / `beta_r`	β, β_r	Characteristic utility weights
`alpha_pop`, `tau_alpha`, `beta_pop`, `tau_beta`	α_pop, τ_α, β_pop, τ_β	Cross-region hyperparameters (only when `region_col` is set)
`sigma_random`	σ_d	Consumer heterogeneity scale per random-coefficient dimension
`model._halton[:, d]`	ν_id	Consumer i’s standardised N(0,1) taste shock on dimension d, drawn from the Halton grid (fixed data, not sampled)
internal `mu_dev`	μ_ijm	Consumer-level utility deviation Σ_d σ_d · ν_id · c_jmd; not exposed as a posterior variable
`xi` / `xi_j` / `xi_tilde`	ξ_jm, ξ_j, ξ̃_jm	Product-market quality shock decomposed as product fixed effect + centered residual
`sigma_xi`, `sigma_xi_j`	σ_ξ, σ_{ξ_j}	Marginal scales of ξ_jm and ξ_j
`eta` / `sigma_eta`	η_jm, σ_η	First-stage price residual and its scale
`pi_0` / `pi_z`	π_0, π_z	First-stage intercepts / instrument coefficients
`rho_price_xi`	ρ	Endogeneity correlation between ξ and η
`gamma_xi_eta`, `omega_xi`	γ, ω	Slope-residual coordinates the sampler uses; (ρ, σ_ξ) are derived Deterministics
`delta`	δ_jm	Mean utility (only if `track_delta=True`)
`s_inside` / `s_outside`	ŝ_jm, ŝ_0m	Halton-averaged predicted shares
`log_share_ratio`	log s_jm − log s_0m	Likelihood’s observed quantity

Methods

`BayesianBLP.__init__`(market_data, *, ...[, ...])	Initialize model configuration and sampler configuration for the model.
`BayesianBLP.attrs_to_init_kwargs`(attrs)	Convert the model configuration and sampler configuration from the attributes to keyword arguments.
`BayesianBLP.batch_shares`(alpha_M, beta_M, ...)	Numpy-evaluate the share equation for a batch of posterior samples.
`BayesianBLP.build_from_idata`(idata)	Not implemented for v1.
`BayesianBLP.build_model`(**kwargs)	Construct the PyMC model and attach it to `self.model`.
`BayesianBLP.counterfactual_shares`([...])	Posterior shares under a counterfactual price intervention.
`BayesianBLP.create_idata_attrs`()	Serialise scalar constructor arguments onto `InferenceData.attrs`.
`BayesianBLP.elasticities`(*[, at, periods, ...])	Posterior price elasticities `ε[market, share, price]`.
`BayesianBLP.fit`([progressbar, random_seed])	Fit by sampling the joint posterior with NUTS.
`BayesianBLP.graphviz`(**kwargs)	Get the graphviz representation of the model.
`BayesianBLP.idata_to_init_kwargs`(idata)	Create the model configuration and sampler configuration from the InferenceData to keyword arguments.
`BayesianBLP.iterate_posterior_samples`(n_samples)	Stack `chain × draw` and (optionally) subsample posterior arrays.
`BayesianBLP.load`(fname[, check])	Not implemented for v1.
`BayesianBLP.load_from_idata`(idata[, check])	Create a ModelBuilder instance from an InferenceData object.
`BayesianBLP.sample_prior_predictive`([...])	Draw from the prior predictive distribution.
`BayesianBLP.save`(fname, **kwargs)	Persist the fitted InferenceData (model graph is not saved).
`BayesianBLP.set_idata_attrs`([idata])	Set attributes on an InferenceData object.
`BayesianBLP.table`(**model_table_kwargs)	Get the summary table of the model.
`BayesianBLP.xi_as_grid`()	Reshape the posterior `xi` to `(region, period, inside_product)`.

Attributes

`default_model_config`	Default priors for every univariate / vector parameter in the model.
`default_sampler_config`	Default sampler kwargs: `numpyro` NUTS at `target_accept=0.95`.
`fit_result`	Get the posterior fit_result.
`id`	Generate a unique hash value for the model.
`output_var`	Name of the observed variable (the log-share-ratio likelihood).
`posterior`	Access the 'posterior' attribute of the InferenceData object.
`posterior_predictive`	Access the 'posterior_predictive' attribute of the InferenceData object.
`predictions`	Access the 'predictions' attribute of the InferenceData object.
`prior`	Access the 'prior' attribute of the InferenceData object.
`prior_predictive`	Access the 'prior_predictive' attribute of the InferenceData object.
`version`
`idata`
`sampler_config`
`model_config`