Run the Matching Process
========================

.. toctree::
   :maxdepth: 2
   :hidden:

Setup
-----

In order to run the matching process a python 3  environment is required with the following packages (as well as their dependancies) installed:

* Pandas
* GDAL
* Shapely
* Fiona
* Geopandas
* Swifter
* python-dotenv
* pathlib

It is reccomended to start with a fresh venv and install the modules using the conda-forge channel on anaconda for the smoothest installation 
experience. The requirement.txt file found in the github repository will allow for a streamlined method to setup the environment. Running an install 
of the requirements.txt would be as follows:

In order to run the process effectively an environments file needs to be created. This file will contain the key information that the process
needs to run. The environments file should be made up of the following variables (this list may vary based on data availability.):

.. code-block:: markdown
   
   # Varibles for cleaned layer names 
   CLEANED_BF_LYR_NAME = footprints_cleaned  
   CLEANED_AP_LYR_NAME = addresses_cleaned
   CLEANED_RD_LYR_NAME = roads_cleaned
   CLEANED_SP_LYR_NAME = parcels_cleaned
   UNLINKED_BF_LYR_NME = unmatched_bfs
   FLAGGED_AP_LYR_NME = ap_full
   
   # The layer name in for the civic address field in the civic address data
   AP_CIVIC_ADDRESS_FIELD_NAME = NUMBER

   # The root directory to which all data will be saved
   BASE_PATH = Z:\working\NWT_data

   # The path to the geopackage where the initial cleaned data will be saved
   DATA_GPKG = ${BASE_PATH}\working\data.gpkg

   # The path to the geopackage where the final output will be saved
   OUTPUT_GPKG = C${BASE_PATH}\working\output.gpkg

   # The CRS for the projection to be used for all layers must be projected
   PROJ_CRS = 26911
   
   # The initial path and layer name for the linking data (parcel data) 
   LINKING_PATH = ${BASE_PATH}\merged_parcels.gpkg
   LINKING_LYR_NME = merged_parcels

   # The initial path where the building polygon data is located
   BF_PATH = ${BASE_PATH}\ATLAS_extract.gdb
   BF_LYR_NME = Building_Footprints

   # The initial path to where the address point data is located
   ADDRESS_PATH = ${BASE_PATH}\yk_ap.gdb
   ADDRESS_LAYER = yk_Address_Points

   # If subsetting the data by a specific geographic region point these variables to the boundary file here
   AOI_MASK = ${BASE_PATH}\yk_Municipal_Boundary_gdb.gdb
   AOI_LYR_NME = yk_municipal_boundary


   MATCHED_OUTPUT_GPKG =  ${BASE_PATH}\working\matched_output.gpkg
   MATCHED_OUTPUT_LYR_NME = point_linkages
   UNMATCHED_OUTPUT_LYR_NME = unlinked_points
   UNMATCHED_POLY_LYR_NME = unlinked_polygons

   MATCH_ACC_GPKG = ${BASE_PATH}\working

   # variables for setting the thresholds used by the BP process at the mathcing stage
   BP_THRESHOLD = 10
   BP_AREA_THRESHOLD = 175

   QA_GPKG = ${BASE_PATH}\qa_qc_files.gpkg
   ST_MUN_CIVICS = clean_mun_civs

   FLAGGED_ADP_LYR_NME = flagged_adp

Initiate Process
----------------

Once setup above is complete the address matching process can be intiated. To do this we
will run the files from inside of the IDE of your choice.

1. Navigate to the scripts folder

2. Create the environments file at that level. Ensure that all paths are accurate

3. Before running each script change the following line of code so that the file name matches the name of your environments file:

   .. code-block:: python
      
      load_dotenv(os.path.join(os.path.dirname(__file__), 'NB_environments.env'))

4. Run the scripts in the following order from an IDE
   
   a. clean_data.py
   b. issue_flagging.py
   c. matching_master.py
   d. qa_qc.py
   e. match_confidence_calc.py

5. Examine results and make changes to the process if needed