Skip to content
Marc Hadley edited this page Jun 23, 2022 · 5 revisions

The BFD RIF exporter produces files that conform to the following specifications:

Configuration

The exporter is configured via a set of properties as shown below with their default values:

  • exporter.bfd.bene_id_start = -1000000 defines the start value of BENE_ID, the first exported patient will get the specified value, subsequent ids are monotonically decremented from that value
  • exporter.bfd.clm_id_start = -100000000 defines the start value of CLM_ID, the first exported claim will get the specified value, subsequent ids are monotonically decremented from that value
  • exporter.bfd.clm_grp_id_start = -100000000 defines the start value of CLM_GRP_ID, the first exported group will get the specified value, subsequent ids are monotonically decremented from that value
  • exporter.bfd.pde_id_start = -100000000 defines the start value of PDE_ID, the first exported PDE claim will get the specified value, subsequent ids are monotonically decremented from that value
  • exporter.bfd.mbi_start = 1S00-E00-AA00 defines the start value of MBI_NUM, the first exported patient will use that value, subsequent ids will monotonically increase from that value
  • exporter.bfd.hicn_start = T01000000A defines the start value of BENE_CRNT_HIC_NUM, the first exported record will use that value, subsequent ids will monotonically increase from that value.
  • exporter.bfd.partc_contract_start = Y0001 defines the start value of Part C contract IDs that will be used in PTC_CNTRCT_JAN_ID to PTC_CNTRCT_DEC_ID, the first contract will use that id, subsequent ids will monotonically increase from that value.
  • exporter.bfd.partc_contract_count = 10 defines the number of Part C contracts that Synthea will use in exports; each year, each patient will be randomly assigned to one of the contracts (or no contract).
  • exporter.bfd.partd_contract_start = Z0001 defines the start value of Part D contract IDs that will be used in PLAN_CNTRCT_REC_ID, the first contract will use that id, subsequent ids will monotonically increase from that value.
  • exporter.bfd.partd_contract_count = 10 defines the number of Part D contracts that Synthea will use in exports; each year, each patient will be randomly assigned to one of the contracts (or no contract).
  • exporter.bfd.plan_benefit_package_start = 800 defines the starting value of plan benefit package identifiers
  • exporter.bfd.plan_benefit_package_count = 5 defines the number of plan benefit package identifiers, each Part C and Part D plan will share the same set of plan benefit package identifiers.
  • exporter.bfd.clia_labs_start = 00A0000000 defines the start number of CLIA lab numbers that will be used to populate CARR_LINE_CLIA_LAB_NUM.
  • exporter.bfd.clia_labs_count = 10 defines the number of CLIA lab numbers that will be used.
  • exporter.bfd.cutoff_date=20140529 defines the earliest date for any exported claims

At the end of a Synthea run, the exporter will create an end_state.properties file that captures the final value of any of the above configuration options that require a monotonically increasing or decreasing value per beneficiary or claim. The value in this file will override the configured values to permit subsequent runs of Synthea to start where the prior run ended. An example file is shown below.

exporter.bfd.hicn_start=T01000020A
exporter.bfd.mbi_start=1S00E00AA20
exporter.bfd.clm_grp_id_start=-100003266
exporter.bfd.pde_id_start=-100000996
exporter.bfd.fi_doc_cntl_num_start=-100000575
exporter.bfd.bene_id_start=-1000020
exporter.bfd.carr_clm_cntl_num_start=-100001695
exporter.bfd.clm_id_start=-100002270

Random and Fixed Values

Synthea does not model values for all the RIF file fields. In these cases, each field is assigned a fixed value, or a value randomly taken from a set of allowed values. These values are configured using the bfd_field_values.tsv tab-separated file. Each cell within this file specifies the allowed values for a particular field (row) for a particular file (column): where a value can be one from a set of allowed values, this is shown as a comma-separated list; where the field is always empty, this is shown as [Blank].

Generating a National Set of Records

The following shell script will generate records for a set of beneficiaries for all 50 states and Washington, DC. The desired total size of the population is supplied as a command line argument, numbers of beneficiaries in each location will be proportional to the population of each state (based on census data).

#!/bin/bash

if [[ $# -eq 0 ]]; then
  echo "Usage: $0 size"
  echo "where 'size' is an integer specifying the target population size"
  exit 1
fi

# Weights are based on 2019 census data:
#
# https://data.census.gov/cedsci/table?q=Total%20Population&g=0400000US01,02,04,05,06,08,09,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,44,45,46,47,48,49,50,51,53,54,55,56&tid=ACSDP1Y2019.DP05&hidePreview=true&moe=false
#
# Each value represents the number of state residents aged 62 or more divided by the
# total number of USA state residents aged 62 or more expressed as a percentage.
#
states=( ); weights=( )
states+=( "Alabama" ); weights+=( "1.578" )
states+=( "Alaska" ); weights+=( "0.178" )
states+=( "Arizona" ); weights+=( "2.357" )
states+=( "Arkansas" ); weights+=( "0.958" )
# states+=( "California" ); weights+=( "10.801" ) # California is handled separately at the end and is used to absorb any rounding errors
states+=( "Colorado" ); weights+=( "1.586" )
states+=( "Connecticut" ); weights+=( "1.170" )
states+=( "Delaware" ); weights+=( "0.351" )
states+=( "District of Columbia" ); weights+=( "0.161" )
states+=( "Florida" ); weights+=( "8.044" )
states+=( "Georgia" ); weights+=( "2.836" )
states+=( "Hawaii" ); weights+=( "0.492" )
states+=( "Idaho" ); weights+=( "0.536" )
states+=( "Illinois" ); weights+=( "3.796" )
states+=( "Indiana" ); weights+=( "2.016" )
states+=( "Iowa" ); weights+=( "1.016" )
states+=( "Kansas" ); weights+=( "0.891" )
states+=( "Kentucky" ); weights+=( "1.401" )
states+=( "Louisiana" ); weights+=( "1.399" )
states+=( "Maine" ); weights+=( "0.530" )
states+=( "Maryland" ); weights+=( "1.801" )
states+=( "Massachusetts" ); weights+=( "2.179" )
states+=( "Michigan" ); weights+=( "3.288" )
states+=( "Minnesota" ); weights+=( "1.712" )
states+=( "Mississippi" ); weights+=( "0.905" )
states+=( "Missouri" ); weights+=( "1.963" )
states+=( "Montana" ); weights+=( "0.382" )
states+=( "Nebraska" ); weights+=( "0.580" )
states+=( "Nevada" ); weights+=( "0.916" )
states+=( "New Hampshire" ); weights+=( "0.472" )
states+=( "New Jersey" ); weights+=( "2.753" )
states+=( "New Mexico" ); weights+=( "0.698" )
states+=( "New York" ); weights+=( "6.092" )
states+=( "North Carolina" ); weights+=( "3.210" )
states+=( "North Dakota" ); weights+=( "0.220" )
states+=( "Ohio" ); weights+=( "3.804" )
states+=( "Oklahoma" ); weights+=( "1.175" )
states+=( "Oregon" ); weights+=( "1.406" )
states+=( "Pennsylvania" ); weights+=( "4.413" )
states+=( "Rhode Island" ); weights+=( "0.351" )
states+=( "South Carolina" ); weights+=( "1.713" )
states+=( "South Dakota" ); weights+=( "0.285" )
states+=( "Tennessee" ); weights+=( "2.098" )
states+=( "Texas" ); weights+=( "7.031" )
states+=( "Utah" ); weights+=( "0.686" )
states+=( "Vermont" ); weights+=( "0.234" )
states+=( "Virginia" ); weights+=( "2.523" )
states+=( "Washington" ); weights+=( "2.247" )
states+=( "West Virginia" ); weights+=( "0.679" )
states+=( "Wisconsin" ); weights+=( "1.903" )
states+=( "Wyoming" ); weights+=( "0.185" )

END_STATE_PROPS_FILE="./output/bfd/end_state.properties"

total_generated=0
for i in "${!states[@]}"
do 
  state=${states[$i]}
  weight=${weights[$i]}
  count=`echo "${1}*${weight}/100" | bc`
  total_generated=`echo "${total_generated}+${count}" | bc`
  
  if [[ $count -eq "0" ]]
  then
    echo "Skipping generating ${state}, requested patients is ${count} "
    continue
  fi

  if [[ -f "${END_STATE_PROPS_FILE}" ]]
  then
    load_props="-c ${END_STATE_PROPS_FILE}"
  else
    load_props=
  fi

  echo "Generating ${count} patients for ${state}"
  ./run_synthea -s ${i} -cs ${i} -r 20211020 ${load_props} -p ${count} --exporter.fhir.export=false --exporter.fhir.transaction_bundle=false --exporter.hospital.fhir.export=false --exporter.practitioner.fhir.export=false --exporter.bfd.export=true --exporter.years_of_history=10 --generate.only_alive_patients=true -a 70-80 "${state}"
done

# Generate remaining requested population for California to handle any rounding errors
if [[ -f "${END_STATE_PROPS_FILE}" ]]
then
  load_props="-c ${END_STATE_PROPS_FILE}"
else
  load_props=
fi

remaining=`echo "${1}-${total_generated}" | bc`
echo "Generating ${remaining} patients for California"
total_generated=`echo "${total_generated}+${remaining}" | bc`
./run_synthea -s 51 -cs 51 -r 20211020 ${load_props} -p ${remaining} --exporter.fhir.export=false --exporter.fhir.transaction_bundle=false --exporter.hospital.fhir.export=false --exporter.practitioner.fhir.export=false --exporter.bfd.export=true --exporter.years_of_history=10 --generate.only_alive_patients=true -a 70-80 California
echo "Finished generating ${total_generated} of ${1} requested patients"
Clone this wiki locally