Data Importers¶
Populating the applications in the suite with data requires a number of import tasks, outlined below.
X-Drive Importer¶
Before Scout, City of Pittsburgh commodity contracts lived in a variety of spreadsheets on a shared drive in the City known as the X-Drive. A consolidated final form of this spreadsheet lives in the app’s source under purchasing/data/importer/files/2015-08-13-contractlist.csv
. This was the switchover date for the City to Scout, meaning that after this date all contract updates were made through Scout.
The X-Drive importer (also known as the old contract importer) is tasked with performing the following actions:
- Creating new
Company
objects - Creating new
CompanyContact
objects - Creating new
ContractBase
objects - Linking all of these entities together as is appropriate
- Handling spec numbers where applicable and linking them as properties to new Contract objects
The way the importer handles this is roughly as follows:
For each row in a given csv file:
- Look up or create (via
get_or_create()
) companies based on their names. - Look up or create (via the same function as above) company contacts based on their names/addresses/phone/email/etc., linking them with the found or created company from the above step
- Convert expiration dates, financial ids, and contract types to meaningful information for the data model, including looking up or creating new
ContractType
objects as necessary - Use the converted data to look up or create new
ContractBase
, with the linkedCompany
from the first step
COSTARS Importer¶
Contracts under the Cooperative Sourcing to Achieve Reductions in Spend (COSTARS) agreement servicing Allegheny County (and thus available to City of Pittsburgh purchasers) can be found as numerous downloadable spreadsheets via the COSTARS website, but not linked to directly.
The COSTARS Importer is tasked with the following actions:
- Creating new
Company
objects - Creating new
CompanyContact
objects - Creating new
ContractBase
objects - Creating new
LineItem
objects - Linking all of these entities together as is appropriate
- Handling manufacturer lists and linking them as properties to new Contract objects
The way the importer handles this is roughly as follows:
For each row in COSTARS spreadsheet file:
- Look up or create (via
get_or_create()
) companies based on their names - Look up or create (via the same function as above) company contacts based on their names/addresses/phone/email/etc., linking them with the found or created company from the above step
- Convert expiration dates, financial ids, contract types to meaningful information for the data model, including looking up or creating new
ContractType
objects as necessary - Use the converted data to look up or create new
ContractBase
, with the linkedCompany
from the first step - Check to see if contract has matching PDF file and build URL for View Contract link
NIGP Importer¶
Categories for commodities and services used in Beacon are based on the National Institute for Governmental Purchasing (NIGP) codes. In order to facilitate a more straightforward signup process, codes and descriptions were combined, grouped, and consolidated in purchasing/data/importer/files/2015-07-01-nigp-cleaned.csv
.
The NIGP Importer is tasked with the following actions:
- Creating new
Category
objects
The way the importer handles this is roughly as follows:
For each row in NIGP category file:
- Look up or create (via
get_or_create()
) subcategories based on their names - Split up cases of multiple NIGP code or examples into separate rows
State Contract Importer¶
State contracts adopted by the City of Pittsburgh were collected via the State of Pennsylvania eMarketplace search tool and live in the app’s source under purchasing/data/importer/files/2015-10-27-state-contracts.csv
The State Contract Importer is tasked with the following actions:
- Creating new
Company
objects - Creating new
CompanyContact
objects - Creating new
ContractBase
objects - Linking all of these entities together as is appropriate
- Handling contract numbers as well as parent contract numbers and linking them as properties to new Contract objects
The way the importer handles this is roughly as follows:
For each row in a given csv file:
- Look up or create (via
get_or_create()
) companies based on their names. - Look up or create (via the same function as above) company contacts based on their names/addresses/phone/email/etc., linking them with the found or created company from the above step
- Convert expiration dates, financial ids, and contract types to meaningful information for the data model, including looking up or creating new
ContractType
objects as necessary - Use the converted data to look up or create new
ContractBase
, with the linkedCompany
from the first step
County Scraper¶
- See Also:
CountyScrapeJob
,scrape_county_task()
The County Scraper attempts to scrape line item information to build LineItem
objects to link with ContractBase
objects. It does this by generating links to all un-scraped contracts and trying to hit those links and parse out the information contained there. The HTML generated on the site is not particularly good (for example, “checked” radio boxes are, in fact, not radio boxes but images of radio boxes and tables abound), so the process is a bit brittle.
The scraping is divided into two distinct steps:
From the main page of all contracts, build links to all of the individual contract pages. Because these follow a regular server-generated pattern, it is much faster to build them internally instead of trying to scrape them out of the HTML.
For each of the generated links, go through and try to parse out individual line items:
Get the contract object to append line items to based on the contract description and the IFB (spec) number (see
get_spec_number()
for more information on spec numbers).Using Beautiful Soup, we go through an individual page of awards and pull out the line items. Note that we have to deal with some pretty non-compliant HTML (including unclosed table tags and an early-terminating form tag), which makes this a bit trickier. The basic method is:
- Get all of the tables on the page
- Exclude “metadata” tables (the first five and last table)
- From this point, the tables alternate: “item” tables are tables that contain line items. “Award” tables are tables that contain information about the awarded company of the previous item. This mean that we need to parse two tables to get the full information about a single line item.
Once we have both the line item information and the award information for each line item contained in an individual page, we are able to create the
LineItem
object and try to find the relevant company for that line item as well.
Stages and Flows¶
Contracts in Conductor are moved through stages according to the flows they are a part of. These are created via the admin interface, but are seeded for development with this importer.
The Stages and Flows seed task is tasked with the following actions:
- Creating three new
Stage
objects - Creating one new
Flow
object - Linking these entities together as is appropriate
The way the seed task handles this is roughly as follows:
- Create three new stages
- Create one new flow and seed with the new stages arranged in order
Importer Utilities¶
All of the importers described above share some common utilities, which are discussed here:
-
convert_empty_to_none
(val)[source]¶ Converts empty or “None” strings to None Types
Parameters: val – The field to be converted Returns: The passed value if the value is not an empty string or ‘None’, None
otherwise.
-
extract
(file_target, first_row_headers=[])[source]¶ Pulls csv data out of a file target.
Parameters: file_target – a file object Keyword Arguments: first_row_headers – An optional list of headers that can be used as the keys in the returned DictReader Returns: A DictReader
object.
-
determine_company_contact
(row)[source]¶ Convert input data to
Parameters: row – An input row of data from an input spreadsheet Returns: A dict object which can be used to create a new CompanyContact
object