Which Cycle Frontier Jobs Are Worth Doing? - Part Four.

Time To Deal With the Missing Values
game
python
data
science
exploration
cycle
frontier
Published

November 4, 2022

Introduction

Welcome back to the fourth post in the series. Last time, we were having issues with the data since there were missing values - including the Job that prompted the initial investigation. We’ll jump right into it picking up with the mismatch between naming in tables.

While working through the list of problems I found reviewing the data, there is clearly an error in the Regex built before. The symptom of this was the Autoloader which has a hanging number when the group is pulled out.

tasksSubset = siteJobs[1][["Name", "Description", "Tasks"]].copy()
tasksSubset = tasksSubset[ ~tasksSubset.Tasks.isna()]
tasksSubset = tasksSubset[ ~tasksSubset.Tasks.str.contains("Kill")]

regex = r"(\d+\s[\w]+\s[\w]+)"
tmp = tasksSubset.Tasks.str.extractall(regex)
tmp[tmp.reset_index()[0].str.contains('Autoloader').values]
0
match
8 0 1 Autoloader 5
32 0 2 Autoloader 8

After some work, this was because the hanging [\w]+ was too greedy and had to be toned down.

newRegex = r"(\d+\s[\w]+\s?[a-zA-Z]+)"
tmp = tasksSubset.Tasks.str.extractall(newRegex)
tmp[tmp.reset_index()[0].str.contains('Autoloader').values]
0
match
8 0 1 Autoloader
32 0 2 Autoloader

This didn’t end up being the only correction to the Regex; in fact, the Co-Tool Multitool was being broken apart since it has a - in the name. So, we had to update this to check and include the - on the split.

newRegex = r"((\d+\s[\w\-]+\s?[a-zA-Z]+))"

Now we’ll have our updated version of the breakLoot function we write before.

def breakLoot(taskString, index=0):
    parts = taskString.split(' ', maxsplit=1)
    if index == 0:
        return int(parts[index])
    elif index == 1:
        return parts[index]
    else:
        # This shouldn't be called.
        return None

tasks = []

for index in range(0,3):
    tasksSubset = siteJobs[index][["Name", "Description", "Tasks"]].copy()
    tasksSubset = tasksSubset[ ~tasksSubset.Tasks.isna()]
    tasksSubset = tasksSubset[ ~tasksSubset.Tasks.str.contains("Kill")]

    newRegex = r"((\d+\s[\w\-]+\s?[a-zA-Z]+))"
    tmp = tasksSubset.Tasks.str.extractall(newRegex)

    count = tmp.reset_index()[0].apply(breakLoot).values
    aLoot = tmp.reset_index()[0].apply(breakLoot, index=1).values

    tmp = tmp.assign(
        count = count,
        loot = aLoot
    )

    nameDescriptSlice = tasksSubset.loc[tmp.reset_index()["level_0"], ['Name', 'Description']]

    tmp = tmp.assign(
        name = nameDescriptSlice.Name.values,
        description = nameDescriptSlice.Description.values
    )

    taskSlice = tmp.reset_index().drop([
        'level_0',
        'match',
        0
    ], axis =1 )

    taskSlice = taskSlice[['name', 'count', 'loot', 'description']]
    tasks.append(taskSlice)
tasks = pd.concat([*tasks])

Fix Item Names

Now we can continue and move to fixing the name mismatches like intended. There was quite a few of these so I’ll only show the process for a few of them - and then the fix for all of them. We’ll pull the code from the previous post so we can start by using the 0 values and work backwards.

allJobs = pd.concat([korolevRewards, icaRewards, osirisRewards])
lootKMarks = loot.query('Name == "K-Marks"')
tasks = tasks.copy()
taskLoot = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='outer')
taskLoot['Cost'] = taskLoot['count'] * taskLoot['Unit']
results = allJobs.query('Rewards == "K-Marks"').merge(
    taskLoot[['name', 'Cost']].groupby('name').sum(), left_on="Job", right_on="name", how='left')
results['Balance'] = results['Units'] - results['Cost']

# How many are there?
len(results.query("Cost == 0"))
30

There are 31 rows with a Cost of 0 which we’ll need to investigate. The first one which I had noticed and corrected in the previous post was the CPU’s so we’ll address this one first. We’ll need to find the values in the tasks table and the lootKMarks table so we can figure out where the disconnect is.

tasks.loc[tasks.loot.str.contains("Master|Zero")]
name count loot description
3 Mining Bot 2 Zero Systems Our engineers have designed an autonomous mini...
6 Insufficient Processing Power 1 Master Unit Prospector! The Zero Systems CPU you brought u...
12 Automated Security 5 Zero Systems We will have to build new turrets to help prot...
16 Classified I 2 Master Unit Prospector! We need Derelict Explosives, Maste...
47 Upgrades 1 Master Unit We want to take over an ICA Data Center. Stash...
9 Sabotage 4 Zero Systems Prospector! Those Korolev simpletons are drill...
34 Spare Parts 4 Zero Systems One of the Servers at Starport has been failin...
40 NEW-Hard-ICA-Gather-6 1 Zero Systems DESCRIPTION MISSING
11 Data Center Upgrades 2 Master Unit Turns out our data center was not powerful eno...
lootKMarks.loc[lootKMarks.Loot.str.contains("CPU")]
Unit Name Rarity Personal Quarters Campaigns Jobs Printing Loot
131 3845.0 K-Marks Epic Yes x3 Yes x3 Yes No Master Unit CPU
321 506.0 K-Marks Rare Yes x37 Yes x6 Yes No Zero Systems CPU

The tasks table has a shortened version of its name so the easiest way to fix this would be to append CPU to the loot name in the tasks table.

tasks.loc[ tasks.loot == "Master Unit", 'loot'] = 'Master Unit CPU'
tasks.loc[ tasks.loot == "Master Unit CPU", 'loot']
6     Master Unit CPU
16    Master Unit CPU
47    Master Unit CPU
11    Master Unit CPU
Name: loot, dtype: object

Another was the Pure Focus Crystals which has the same problem of having the name truncated; this is a trend among all the affected materials with higher tiered materials. Those will all be included in the final correction and will skip showing the process; it’s literally just the same and tedious.

# Missing the Crystal part
tasks.loc[tasks.loot.str.contains("Pure Focus")]
name count loot description
25 Geologist 1 Pure Focus You got time for a job, Prospector? The sample...
26 Industry Secret 3 Pure Focus Hm. Interesting. The Pure Focus Crystals you b...
28 Laser Rifles 3 Pure Focus We are prototyping a new laser rifle that can ...
16 Arms Race 4 Pure Focus Prospector. It seems like Korolev is working o...
lootKMarks.loc[lootKMarks.Loot.str.contains("Crystal")]
Unit Name Rarity Personal Quarters Campaigns Jobs Printing Loot
56 961.0 K-Marks Rare Yes x19 Yes x10 Yes No Focus Crystal
61 2883.0 K-Marks Epic Yes x6 Yes x9 Yes No Pure Focus Crystal
66 38924.0 K-Marks Legendary Yes x6 Yes x9 Yes No Polirium Crystal
81 1709.0 K-Marks Epic Yes x36 Yes x9 Yes No Teratomorphic Crystal Core

This one was a surprise since truncating it doesn’t make any sense and gains nothing but the Magnetic Field Stabilizer is being truncated as well.

# Missing the Stabilizer:
tasks.loc[tasks.loot.str.contains("Magnetic")]
name count loot description
42 Stability is Key 1 Magnetic Field Our Comms are jammed. We need you to stash a M...
43 Stability is Key 1 Magnetic Field Our Comms are jammed. We need you to stash a M...
44 Stability is Key 1 Magnetic Field Our Comms are jammed. We need you to stash a M...
28 Storm Interference 2 Magnetic Field Prospector, the Storm has been distorting all ...
2 Surveillance Center 2 Magnetic Field We want to expand our surveillance operations ...
lootKMarks.loc[lootKMarks.Loot.str.contains("Magnetic")]
Unit Name Rarity Personal Quarters Campaigns Jobs Printing Loot
171 338.0 K-Marks Uncommon Yes x71 Yes x25 Yes No Magnetic Field Stabilizer

Lastly, we’ll show the NiC Oil which was a problem from the previous post which is missing the Cannister at the end again.

tasks.loc[tasks.loot.str.contains("Oil")]
name count loot description
41 Striking Big 10 NiC Oil Damn it! We ran out of Fuel for our Radiation ...
lootKMarks.loc[lootKMarks.Loot.str.contains("Oil")]
Unit Name Rarity Personal Quarters Campaigns Jobs Printing Loot
121 20183.0 K-Marks Epic Yes x103 Yes x10 Yes No NiC Oil Cannister

There we go! Now, we’ll update the data again and run the merge once more.

# Most corrections:
tasks.loc[ tasks.loot == "Master Unit", 'loot'] = 'Master Unit CPU'
tasks.loc[ tasks.loot == "Zero Systems", 'loot'] = 'Zero Systems CPU'
tasks.loc[ tasks.loot == "Pure Focus", 'loot'] = 'Pure Focus Crystal'
tasks.loc[ tasks.loot == "Heavy Mining", 'loot'] = 'Heavy Mining Tool'
tasks.loc[ tasks.loot == "Magnetic Field", 'loot'] = 'Magnetic Field Stabilizer'
tasks.loc[ tasks.loot == "Brittle Titan", 'loot'] = 'Brittle Titan Ore'
tasks.loc[ tasks.loot == "NiC Oil", 'loot'] = 'NiC Oil Cannister'
tasks.loc[ tasks.loot == "Charged Spinal", 'loot'] = 'Charged Spinal Base'
tasks.loc[ tasks.loot == "Hardened Bone", 'loot'] = 'Hardened Bone Plates'
tasks.loc[ tasks.loot == "Pale Ivy", 'loot'] = 'Pale Ivy Blossom'
tasks.loc[ tasks.loot == "Glowy Brightcap", 'loot'] = 'Glowy Brightcap Mushroom'
tasks.loc[ tasks.loot == "Blue Runner", 'loot'] = 'Blue Runner Egg'
tasks.loc[ tasks.loot == "Magic", 'loot'] = 'Magic-GROW Fertilizer'
tasks.loc[ tasks.loot == "Letium", 'loot'] = 'Letium Clot'
tasks.loc[ tasks.loot == "Azure Tree", 'loot'] = 'Azure Tree Bark'
tasks = tasks.copy()
taskLoot = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='outer')
taskLoot['Cost'] = taskLoot['count'] * taskLoot['Unit']
results = allJobs.query('Rewards == "K-Marks"').merge(
    taskLoot[['name', 'Cost']].groupby('name').sum(), left_on="Job", right_on="name", how='left')
results['Balance'] = results['Units'] - results['Cost']

len(results.query("Cost == 0"))
13

We’ve made some good progress so far but we’ve still got quite a few jobs to update. Let’s check the list of missing valuse once more and see if there is a new pattern here.

tmp = tasks.merge(results[['Job', 'Cost']], left_on="name", right_on="Job", how='left').query("Cost == 0")
tmp
name count loot description Job Cost
40 Excavation Gear 1 Heavy Mining Tool For excavations, we need you to stash a Heavy ... Excavation Gear 0.0
46 And two smoking Barrels 1 PKR Maelstrom Prospector. Get down there and stash a PKR Mae... And two smoking Barrels 0.0
47 And two smoking Barrels 200 Shotgun Ammo Prospector. Get down there and stash a PKR Mae... And two smoking Barrels 0.0
74 Data Drive II 1 Rare Data The Data you brought us was helpful, but we ne... Data Drive II 0.0
75 Data Drive III 2 Rare Data Good work last time, Prospector. The Data was ... Data Drive III 0.0
76 Data Drive IV 1 Epic Data In order to be able to predict Storm Behaviour... Data Drive IV 0.0
77 Data Drive V 1 Legendary Data Yes! The more precise Data you brought us was ... Data Drive V 0.0
78 Data Drive VI 3 Legendary Data We're finding more than just Storm data now...... Data Drive VI 0.0
82 Grenadier 1 Frag Grenade Prospector. You have heard of Badum's Dead Dro... Grenadier 0.0
86 Ammo Supplies 1000 Medium Ammo Our Field Agents need more Ammo if they are to... Ammo Supplies 0.0
87 Data Drop 2 Rare Data Prospector. One of our Scientists is convinced... Data Drop 0.0
90 Provide an Advocate 1 Advocate at Our field agent requested better gear to take ... Provide an Advocate 0.0
129 Loadout Drop 1 Rare Shield One of our more... lethal assets on Fortuna re... Loadout Drop 0.0
130 Loadout Drop 1 Rare Helmet One of our more... lethal assets on Fortuna re... Loadout Drop 0.0
131 Loadout Drop 1 Rare Backpack One of our more... lethal assets on Fortuna re... Loadout Drop 0.0
138 NEW-Hard-Osiris-EliteCrusher-1 1 Alpha Crusher DESCRIPTION MISSING NEW-Hard-Osiris-EliteCrusher-1 0.0

The biggest standout problem here now is all those data drive quests so lets resolve those next.

Data Drives:

We have another problem though since none of the tables we have actually contains the data we’re after. So, we’re back to the wiki to do some more scraping work. If we also look at the names of the data drives we’re going to have another problem soon - which you’ll see when we download the data.

# Missing the Data Drive
tasks.loc[tasks.loot.str.contains("Data")]
name count loot description
19 Data Drive II 1 Rare Data The Data you brought us was helpful, but we ne...
20 Data Drive III 2 Rare Data Good work last time, Prospector. The Data was ...
21 Data Drive IV 1 Epic Data In order to be able to predict Storm Behaviour...
22 Data Drive V 1 Legendary Data Yes! The more precise Data you brought us was ...
23 Data Drive VI 3 Legendary Data We're finding more than just Storm data now......
32 Data Drop 2 Rare Data Prospector. One of our Scientists is convinced...

Like the previous posts, scraping is a tedious process of matching keywords and pulling the right tables so that’s getting skipped; it’s just mostly trial and error.

# game taken down
# urlDataDrives = 'https://thecyclefrontier.wiki/wiki/Utilities#Data_Drives-0'
# urlDataDrives = 'https://thecyclefrontier.wiki/wiki/Data_Drive'
urlDataDrives = 'https://archive.ph/zG8Q2'
siteDrive = pd.read_html(urlDataDrives, attrs={"class":"zebra"})[2]

We’ll use a modified version of the code we wrote before for parsing the loot table for this.

driveSubset = siteDrive[
    ['Image', 'Name', 'Rarity', 'Weight']
    ].copy()

driveSubset = driveSubset.assign(
    Loot = np.NaN
)

index = range( 0, len(driveSubset) - 2, 3)
offset = np.array([1, 2])

for i in index:
    
    aLoot = driveSubset.iloc[i, 1]
    indexes = i + offset
    driveSubset.iloc[indexes, 4] = aLoot


tmp = driveSubset.iloc[:, 1:4]
tmp = tmp.fillna(method="ffill")
driveSubset.iloc[:, 1:4] = tmp

cutNA = driveSubset.Loot.isna()
driveData = driveSubset[ ~cutNA ]
driveData = driveData.rename(columns={'Image':'Unit', 'Name':'Reward'})
driveData['Rarity'] = pd.Categorical(
    driveData.Rarity, categories = ['Common', 'Uncommon', 'Rare', 'Epic', 'Exotic', 'Legendary']
)
driveData['Label'] = driveData['Rarity'].astype('str') + ' Data'
drives = driveData

drives
Unit Reward Rarity Weight Loot Label
1 30.0 K-Marks Common 15.0 Data Drive Tier 1 Common Data
2 0.0 Reputation Common 15.0 Data Drive Tier 1 Common Data
4 1013.0 K-Marks Uncommon 15.0 Data Drive Tier 2 Uncommon Data
5 1.0 Reputation Uncommon 15.0 Data Drive Tier 2 Uncommon Data
7 2531.0 K-Marks Rare 15.0 Data Drive Tier 3 Rare Data
8 3.0 Reputation Rare 15.0 Data Drive Tier 3 Rare Data
10 6075.0 K-Marks Epic 15.0 Data Drive Tier 4 Epic Data
11 6.0 Reputation Epic 15.0 Data Drive Tier 4 Epic Data
13 10252.0 K-Marks Legendary 15.0 Data Drive Tier 5 Legendary Data
14 10.0 Reputation Legendary 15.0 Data Drive Tier 5 Legendary Data

Since this is the second time that we’ve needed this - and we’re going to need this again - we should write a function to wrap this whole process.

# this is the function, where:
## siteData: the table from the scraped site
## columns: the columns you want to keep from the scraped data
## adjust: the count from the bottom containing an index
## step: how many rows between values we care about
## offset: how many rewards are there?
def extractSite(siteData, columns, adjust, step,  offset):
    if not isinstance(columns, list):
        print("Columns argument must be a list.")
        return None
    siteSubset = siteData[columns].copy()

    siteSubset = siteSubset.assign(
        Loot = np.NaN
    )

    # Some extra error handling 
    if not isinstance(adjust, int):
        print("adjust argument must be an int.")
        return None
    if not isinstance(step, int):
        print("step argument must be an int.")
        return None
    if not isinstance(offset, list):
        print("offset argument must be a list.")
        return None
    
    index = range( 0, len(siteSubset) - adjust, step)
    offset = np.array(offset)

    for i in index:
        aLoot = siteSubset.iloc[i, 1]
        indexes = i + offset
        siteSubset.iloc[indexes, len(siteSubset.columns)-1] = aLoot

    tmp = siteSubset.iloc[:, 1:len(siteSubset.columns)]
    tmp = tmp.fillna(method="ffill")
    siteSubset.iloc[:, 1:len(siteSubset.columns)-1] = tmp

    cutNA = siteSubset.Loot.isna()
    returnData = siteSubset[ ~cutNA ]
    returnData = returnData.rename(columns={'Image':'Unit', 'Name':'Reward'})

    return returnData

Now we’ll sanity check this to make sure it works.

drives = extractSite(siteDrive, ['Image', 'Name', 'Rarity', 'Weight'], 2, 3, [1, 2])
drives['Rarity'] = pd.Categorical(
        drives.Rarity, categories = ['Common', 'Uncommon', 'Rare', 'Epic', 'Exotic', 'Legendary']
    )
drives
Unit Reward Rarity Weight Loot
1 30.0 K-Marks Common 15.0 Data Drive Tier 1
2 0.0 Reputation Common 15.0 Data Drive Tier 1
4 1013.0 K-Marks Uncommon 15.0 Data Drive Tier 2
5 1.0 Reputation Uncommon 15.0 Data Drive Tier 2
7 2531.0 K-Marks Rare 15.0 Data Drive Tier 3
8 3.0 Reputation Rare 15.0 Data Drive Tier 3
10 6075.0 K-Marks Epic 15.0 Data Drive Tier 4
11 6.0 Reputation Epic 15.0 Data Drive Tier 4
13 10252.0 K-Marks Legendary 15.0 Data Drive Tier 5
14 10.0 Reputation Legendary 15.0 Data Drive Tier 5

Perfect! Now we just add this to our list of adjustments right? Sadly no. If you look at the names of the drives you’ll find that we’re not quite there. The names of the drives were renamed in Season 2 but the values in our tasks were not updated from their previous values. This is not too hard to update since the old drive names were just the Rarity + Data and we have that so we’ll just need a new column.

drives['Loot'] = drives['Rarity'].astype('str') + ' Data'
drives
Unit Reward Rarity Weight Loot
1 30.0 K-Marks Common 15.0 Common Data
2 0.0 Reputation Common 15.0 Common Data
4 1013.0 K-Marks Uncommon 15.0 Uncommon Data
5 1.0 Reputation Uncommon 15.0 Uncommon Data
7 2531.0 K-Marks Rare 15.0 Rare Data
8 3.0 Reputation Rare 15.0 Rare Data
10 6075.0 K-Marks Epic 15.0 Epic Data
11 6.0 Reputation Epic 15.0 Epic Data
13 10252.0 K-Marks Legendary 15.0 Legendary Data
14 10.0 Reputation Legendary 15.0 Legendary Data
# drives
pd.concat(
    [lootKMarks.rename({"Name":"Reward"}, axis=1)[['Unit', 'Reward', 'Rarity', 'Loot']],
    drives.query('Reward == "K-Marks"').drop("Weight", axis=1)])
Unit Reward Rarity Loot
1 150.0 K-Marks Common Flawed Veltecite
6 570.0 K-Marks Uncommon Cloudy Veltecite
11 854.0 K-Marks Rare Clear Veltecite
16 1922.0 K-Marks Epic Pure Veltecite
21 6487.0 K-Marks Legendary Veltecite Heart
... ... ... ... ...
1 30.0 K-Marks Common Common Data
4 1013.0 K-Marks Uncommon Uncommon Data
7 2531.0 K-Marks Rare Rare Data
10 6075.0 K-Marks Epic Epic Data
13 10252.0 K-Marks Legendary Legendary Data

99 rows × 4 columns

Add that to our pipeline and let’s see where we are now

allJobs = pd.concat([korolevRewards, icaRewards, osirisRewards])
lootKMarks = loot.query('Name == "K-Marks"')

# Most corrections:
tasks.loc[ tasks.loot == "Master Unit", 'loot'] = 'Master Unit CPU'
tasks.loc[ tasks.loot == "Zero Systems", 'loot'] = 'Zero Systems CPU'
tasks.loc[ tasks.loot == "Pure Focus", 'loot'] = 'Pure Focus Crystal'
tasks.loc[ tasks.loot == "Heavy Mining", 'loot'] = 'Heavy Mining Tool'
tasks.loc[ tasks.loot == "Magnetic Field", 'loot'] = 'Magnetic Field Stabilizer'
tasks.loc[ tasks.loot == "Brittle Titan", 'loot'] = 'Brittle Titan Ore'
tasks.loc[ tasks.loot == "NiC Oil", 'loot'] = 'NiC Oil Cannister'
tasks.loc[ tasks.loot == "Charged Spinal", 'loot'] = 'Charged Spinal Base'
tasks.loc[ tasks.loot == "Hardened Bone", 'loot'] = 'Hardened Bone Plates'
tasks.loc[ tasks.loot == "Pale Ivy", 'loot'] = 'Pale Ivy Blossom'
tasks.loc[ tasks.loot == "Glowy Brightcap", 'loot'] = 'Glowy Brightcap Mushroom'
tasks.loc[ tasks.loot == "Blue Runner", 'loot'] = 'Blue Runner Egg'
tasks.loc[ tasks.loot == "Magic", 'loot'] = 'Magic-GROW Fertilizer'
tasks.loc[ tasks.loot == "Letium", 'loot'] = 'Letium Clot'
tasks.loc[ tasks.loot == "Azure Tree", 'loot'] = 'Azure Tree Bark'
tasks = tasks.copy()

# Extract Stuff here for now:
drives = extractSite(siteDrive, ['Image', 'Name', 'Rarity', 'Weight'], 2, 3, [1, 2])
drives['Rarity'] = pd.Categorical(
        drives.Rarity, categories = ['Common', 'Uncommon', 'Rare', 'Epic', 'Exotic', 'Legendary']
)
drives['Loot'] = drives['Rarity'].astype('str') + ' Data'

lootKMarks = pd.concat(
    [lootKMarks.rename({"Name":"Reward"}, axis=1)[['Unit', 'Reward', 'Rarity', 'Loot']],
    drives.query('Reward == "K-Marks"').drop("Weight", axis=1)])

taskLoot = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='outer')
taskLoot['Cost'] = taskLoot['count'] * taskLoot['Unit']
results = allJobs.query('Rewards == "K-Marks"').merge(
    taskLoot[['name', 'Cost']].groupby('name').sum(), left_on="Job", right_on="name", how='left')
results['Balance'] = results['Units'] - results['Cost']

len(results.query("Cost == 0"))
7

That’s much better! So, what’s left to do?

tmp = tasks.merge(results[['Job', 'Cost']], left_on="name", right_on="Job", how='left').query("Cost == 0")
tmp
name count loot description Job Cost
40 Excavation Gear 1 Heavy Mining Tool For excavations, we need you to stash a Heavy ... Excavation Gear 0.0
46 And two smoking Barrels 1 PKR Maelstrom Prospector. Get down there and stash a PKR Mae... And two smoking Barrels 0.0
47 And two smoking Barrels 200 Shotgun Ammo Prospector. Get down there and stash a PKR Mae... And two smoking Barrels 0.0
82 Grenadier 1 Frag Grenade Prospector. You have heard of Badum's Dead Dro... Grenadier 0.0
86 Ammo Supplies 1000 Medium Ammo Our Field Agents need more Ammo if they are to... Ammo Supplies 0.0
90 Provide an Advocate 1 Advocate at Our field agent requested better gear to take ... Provide an Advocate 0.0
129 Loadout Drop 1 Rare Shield One of our more... lethal assets on Fortuna re... Loadout Drop 0.0
130 Loadout Drop 1 Rare Helmet One of our more... lethal assets on Fortuna re... Loadout Drop 0.0
131 Loadout Drop 1 Rare Backpack One of our more... lethal assets on Fortuna re... Loadout Drop 0.0
138 NEW-Hard-Osiris-EliteCrusher-1 1 Alpha Crusher DESCRIPTION MISSING NEW-Hard-Osiris-EliteCrusher-1 0.0

Add the Guns

We’re going to move to getting the gun data included since we already actualy have it. This was part of another post which was done - not included in the series.

# game taken down
# url = "https://thecyclefrontier.wiki/wiki/Weapons"
url = 'https://archive.ph/pM11n'

siteGun = pd.read_html(url, attrs={"class":"zebra"})[0]

gunData = siteGun[~siteGun.Type.isna()]
indx = gunData['Proj. Speed'] == 'Hitscan'
gunData.loc[indx, 'Proj. Speed'] = np.NaN

gunData = gunData.assign(
    Unit = gunData['Sell Value'].str.replace(' K-Marks', '').astype('float'),
    Reward = "K-Marks",
    Loot = gunData['Name']
)

# # This removes the legendary weapons
# data = data.query('Faction != "Printing"')

guns = gunData
guns[['Unit', 'Reward', 'Rarity', 'Loot']].head(15)
Unit Reward Rarity Loot
0 17429.0 K-Marks Epic Advocate
3 524.0 K-Marks Common AR-55 Autorifle
6 12341.0 K-Marks Epic Asp Flechette Gun
9 371.0 K-Marks Common B9 Trenchgun
12 63080.0 K-Marks Exotic Basilisk
15 1918.0 K-Marks Uncommon Bulldog
18 2052.0 K-Marks Common C-32 Bolt Action
21 17429.0 K-Marks Epic Gorgon
24 16805.0 K-Marks Exotic Hammer
27 7143.0 K-Marks Rare ICA Guarantee
30 228.0 K-Marks Common K-28
33 325513.0 K-Marks Legendary KARMA-1
36 22781.0 K-Marks Epic KBR Longshot
39 94459.0 K-Marks Exotic Kinetic Arbiter
42 1918.0 K-Marks Uncommon KM-9 'Scrapper'

Now we have the gun data to be added to the loot table but if you were paying attention you may have noticed something is wrong with one of our values.

# Not sure what this is from.
tasks.loc[tasks.loot.str.contains(' at')]
name count loot description
35 Provide an Advocate 1 Advocate at Our field agent requested better gear to take ...

The Advocate in this row is labeled as Advocate at which is another problem. In this instance, instead of trying to deal with the regex we’re just going to update that single value.

tasks.loc[tasks.loot.str.contains('Advocate at'), 'loot'] = 'Advocate'
tasks.loc[tasks.loot.str.contains('Advocate')]
name count loot description
35 Provide an Advocate 1 Advocate Our field agent requested better gear to take ...

And, make sure that the append works as intended:

pd.concat(
    [lootKMarks.rename({"Name":"Reward"}, axis=1)[['Unit', 'Reward', 'Rarity', 'Loot']],
    drives.query('Reward == "K-Marks"').drop("Weight", axis=1),
    guns[['Unit', 'Reward', 'Rarity', 'Loot']]]
)
Unit Reward Rarity Loot
1 150.0 K-Marks Common Flawed Veltecite
6 570.0 K-Marks Uncommon Cloudy Veltecite
11 854.0 K-Marks Rare Clear Veltecite
16 1922.0 K-Marks Epic Pure Veltecite
21 6487.0 K-Marks Legendary Veltecite Heart
... ... ... ... ...
63 371.0 K-Marks Common S-576 PDW
66 1179.0 K-Marks Uncommon Scarab
69 12341.0 K-Marks Epic Shattergun
72 34172.0 K-Marks Exotic Voltaic Brute
75 270540.0 K-Marks Legendary Zeus Beam

125 rows × 4 columns

Deal With Backpacks

We’re going to try to fix the Backpacks, the Shields and the Helmets toegther since they’re all on the same page of the wiki. Again, this data was not part of the previous tables that we had so we’ll need to go get it.

# game taken down:
# gearUrl = 'https://thecyclefrontier.wiki/wiki/Gear'
gearUrl = 'https://archive.ph/qeeTm'
site = pd.read_html(gearUrl)

This was tricky to collect - which is only why I’m pointing out how it was done. While working out how to collect the data from the website, there was no good strategy to collect just the data that I wanted. What you will see is that I use hard coded values to pull out where each table is.

siteBackPacks = site[0]
siteHelmet = site[10]
siteShield = site[22]

What I did was enumerate throught all the tables collected to find which results had outlier sizes. I then pulled those to make sure they were what I was after like this:

# 10
# 22
list(enumerate(map(len, site)))
[(0, 22),
 (1, 1),
 (2, 1),
 (3, 3),
 (4, 1),
 (5, 4),
 (6, 1),
 (7, 4),
 (8, 1),
 (9, 1),
 (10, 46),
 (11, 1),
 (12, 1),
 (13, 3),
 (14, 4),
 (15, 4),
 (16, 3),
 (17, 4),
 (18, 4),
 (19, 3),
 (20, 4),
 (21, 4),
 (22, 43),
 (23, 1),
 (24, 1),
 (25, 3),
 (26, 4),
 (27, 4),
 (28, 4),
 (29, 4),
 (30, 4),
 (31, 4),
 (32, 4)]

It is not pretty but web scraping rarely is. And, it works. Next we’ll start with the backup packs; with some tweaking of the values to the extractSite function which was defined earlier, this becomes really easy.

backpacks = extractSite(
    siteData = siteBackPacks.loc[siteBackPacks.Name.str.contains("Backpack|K-Marks")],
    columns = ['Image', 'Name', 'Rarity', 'Space', 'Sale Price'],
    adjust = 2, 
    step = 3,
    offset = [1, 2])

backpacks
Unit Reward Rarity Space Sale Price Loot
1 600.0 K-Marks Common 200.0 180 K-Marks Small Backpack
2 180.0 K-Marks Common 200.0 180 K-Marks Small Backpack
4 2700.0 K-Marks Uncommon 250.0 810 K-Marks Medium Backpack
7 810.0 K-Marks Uncommon 250.0 810 K-Marks Medium Backpack
9 6100.0 K-Marks Rare 300.0 1,830 K-Marks Large Backpack
13 1830.0 K-Marks Rare 300.0 1,830 K-Marks Large Backpack
15 12000.0 K-Marks Epic 350.0 4,000 K-Marks Heavy Duty Backpack
19 4000.0 K-Marks Epic 350.0 4,000 K-Marks Heavy Duty Backpack

Looking at these results though, what we’re really after is the Sale Price and so we’ll need to move some stuff around.

backpacks = backpacks.assign(
    Unit = backpacks['Sale Price'].str.replace(' K-Marks', '').str.replace(',', '').astype('float')
)

backpacks = backpacks.drop(["Sale Price", "Space"], axis=1)
backpacks
Unit Reward Rarity Loot
1 180.0 K-Marks Common Small Backpack
2 180.0 K-Marks Common Small Backpack
4 810.0 K-Marks Uncommon Medium Backpack
7 810.0 K-Marks Uncommon Medium Backpack
9 1830.0 K-Marks Rare Large Backpack
13 1830.0 K-Marks Rare Large Backpack
15 4000.0 K-Marks Epic Heavy Duty Backpack
19 4000.0 K-Marks Epic Heavy Duty Backpack

We have the same problem which we had for the Data Drives: the the old name Rare Backpack is in the tasks table but is not how they are named here. Again, we’re just going to steal and modify the solution we had before and apply it here.

backpacks['Loot'] = backpacks['Rarity'].astype('str') + ' Backpack'

Time to add them to the pipeline.

pd.concat([
    lootKMarks.rename({"Name":"Reward"}, axis=1)[['Unit', 'Reward', 'Rarity', 'Loot']],
    drives.query('Reward == "K-Marks"').drop("Weight", axis=1),
    guns[['Unit', 'Reward', 'Rarity', 'Loot']],
    backpacks
])
Unit Reward Rarity Loot
1 150.0 K-Marks Common Flawed Veltecite
6 570.0 K-Marks Uncommon Cloudy Veltecite
11 854.0 K-Marks Rare Clear Veltecite
16 1922.0 K-Marks Epic Pure Veltecite
21 6487.0 K-Marks Legendary Veltecite Heart
... ... ... ... ...
7 810.0 K-Marks Uncommon Uncommon Backpack
9 1830.0 K-Marks Rare Rare Backpack
13 1830.0 K-Marks Rare Rare Backpack
15 4000.0 K-Marks Epic Epic Backpack
19 4000.0 K-Marks Epic Epic Backpack

133 rows × 4 columns

This is much better. So, how many are missing now?

len(results.query("Cost == 0"))
4

Much better. But, there is actually a new problem here: masking missing values. If we look back at our data from before we’ll see that there was Helment and Shield information missing and now it’s gone! Since we’ve fixed some of the values for those jobs the Balance is no longer 0 and thefore we’ve lost track of it! Let’s step back and look at the merged result and look for mistakes. And, upon doing this we run into our first masked problem.

tmp = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='outer')
tmp
name count loot description Loot Unit
0 New Mining Tools 2.0 Hydraulic Piston We are producing new Mining Tools for new Pros... Hydraulic Piston 338.0
1 Excavator Improvements 3.0 Hydraulic Piston The suspension on our mining excavators need i... Hydraulic Piston 338.0
2 New Mining Tools 10.0 Hardened Metals We are producing new Mining Tools for new Pros... Hardened Metals 150.0
3 Automated Security 16.0 Hardened Metals We will have to build new turrets to help prot... Hardened Metals 150.0
4 Air Lock Upgrades 12.0 Hardened Metals Our engineers designed a safer airlock system ... Hardened Metals 150.0
... ... ... ... ... ... ...
211 NaN NaN NaN NaN Common Backpack 180.0
212 NaN NaN NaN NaN Uncommon Backpack 810.0
213 NaN NaN NaN NaN Uncommon Backpack 810.0
214 NaN NaN NaN NaN Epic Backpack 4000.0
215 NaN NaN NaN NaN Epic Backpack 4000.0

216 rows × 6 columns

There are all these NaN values at the bottom which were attached and included due to the outer including all the rows. Can we simply change this over to a left?

tmp = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='left')
tmp.loc[tmp.Loot.isna()]
name count loot description Loot Unit
39 Excavation Gear 1 Heavy Mining Tool For excavations, we need you to stash a Heavy ... NaN NaN
46 And two smoking Barrels 200 Shotgun Ammo Prospector. Get down there and stash a PKR Mae... NaN NaN
79 Grenadier 1 Frag Grenade Prospector. You have heard of Badum's Dead Dro... NaN NaN
83 Ammo Supplies 1000 Medium Ammo Our Field Agents need more Ammo if they are to... NaN NaN
126 Loadout Drop 1 Rare Shield One of our more... lethal assets on Fortuna re... NaN NaN
127 Loadout Drop 1 Rare Helmet One of our more... lethal assets on Fortuna re... NaN NaN
136 NEW-Hard-Osiris-EliteCrusher-1 1 Alpha Crusher DESCRIPTION MISSING NaN NaN

This is more in line what I’d have expected. We’ll temporarily keep this - but mostly for discussion purposes. The problem here is that when you look through the wiki there are no sell values for the Shields and Helmets. We cannot get these from the wiki which means we cannot automate it. We could do this manually but then I’d have to constantly update this value when it changes - which I’m trying to avoid. This looks to be a case where we’re going to need to remove this job for now.

# Adding this to the pipeline:
idx = tmp.name.str.contains("Loadout Drop")
tmp = tmp.loc[-idx]
tmp
name count loot description Loot Unit
0 New Mining Tools 2 Hydraulic Piston We are producing new Mining Tools for new Pros... Hydraulic Piston 338.0
1 New Mining Tools 10 Hardened Metals We are producing new Mining Tools for new Pros... Hardened Metals 150.0
2 Explosive Excavation 4 Derelict Explosives One of our mines collapsed with valuable equip... Derelict Explosives 1709.0
3 Mining Bot 2 Zero Systems CPU Our engineers have designed an autonomous mini... Zero Systems CPU 506.0
4 Mining Bot 3 Ball Bearings Our engineers have designed an autonomous mini... Ball Bearings 338.0
... ... ... ... ... ... ...
136 NEW-Hard-Osiris-EliteCrusher-1 1 Alpha Crusher DESCRIPTION MISSING NaN NaN
137 Flexible Sealant 8 Resin Gun A number of our weather balloons took damage f... Resin Gun 759.0
138 Indigenous Fruit 4 Indigenous Fruit Ah, Prospector, have you come across any Indig... Indigenous Fruit 759.0
139 Indigenous Fruit 4 Biological Sampler Ah, Prospector, have you come across any Indig... Biological Sampler 1139.0
140 Don't get crushed 3 Crusher Hide Gear up, Prospector! We need Crusher Skins for... Crusher Hide 11533.0

137 rows × 6 columns

Ok, so it looks like next will be solving the ammo listings.

tmp.loc[tmp.Loot.isna()]
name count loot description Loot Unit
39 Excavation Gear 1 Heavy Mining Tool For excavations, we need you to stash a Heavy ... NaN NaN
46 And two smoking Barrels 200 Shotgun Ammo Prospector. Get down there and stash a PKR Mae... NaN NaN
79 Grenadier 1 Frag Grenade Prospector. You have heard of Badum's Dead Dro... NaN NaN
83 Ammo Supplies 1000 Medium Ammo Our Field Agents need more Ammo if they are to... NaN NaN
136 NEW-Hard-Osiris-EliteCrusher-1 1 Alpha Crusher DESCRIPTION MISSING NaN NaN

Deal with Ammo

# game taken down:
# ammoUrl = "https://thecyclefrontier.wiki/wiki/Ammo"
ammoUrl = 'https://archive.ph/Xacnz'
ammo = pd.read_html(ammoUrl)[0]

We’ve already done this before so this is simply here to show it was done. And, you add it to the pipeline just the same.

ammo = ammo.rename(
    {"Item Name":"Loot", "Sell Value":"Unit"}, axis=1
    ).assign(
        Reward = "K-Marks",
        Rarity = pd.Categorical(
            ammo.Rarity, categories = ['Common', 'Uncommon', 'Rare', 'Epic', 'Exotic', 'Legendary'])
    )[['Unit', 'Reward', 'Rarity', 'Loot']]

Deal with The Heavy Mining Tool

Sadly, this tool is on its own page so we’ll need something custom again. There is a table here we can pull but it’s oriented incorrectly for our use.

# game taken down:
# site = pd.read_html("https://thecyclefrontier.wiki/wiki/Heavy_Mining_Tool")
site = pd.read_html("https://archive.ph/c6QCk")

minerData = site[0]
minerData
0 1
0 Description Allows faster mining of materials.
1 Rarity Common
2 Weight 30
3 Buy Value 600
4 Sell Value 180
5 Faction Points 2

Thankfully, a data frame has the Transpose function that a matrix does. A simple explanation of this function is that it swaps the rows to columns and columns to rows.

minerData[0].T
0       Description
1            Rarity
2            Weight
3         Buy Value
4        Sell Value
5    Faction Points
Name: 0, dtype: object

We’re going to extract the rows values to create a new dataframe object. Since the Heaving Mining Tool designation is missing from the data then we’ll need to add it ourself.

row = minerData[1].T.to_list() + ['Heavy Mining Tool']
row
['Allows faster mining of materials.',
 'Common',
 '30',
 '600',
 '180',
 '2',
 'Heavy Mining Tool']
columns = minerData[0].to_list() + ['Loot']
columns
['Description',
 'Rarity',
 'Weight',
 'Buy Value',
 'Sell Value',
 'Faction Points',
 'Loot']

And, join them together.

mineTool = pd.DataFrame(columns = columns)
mineTool.loc[0] = row
mineTool = mineTool.assign(
    Reward = "K-Marks",
    Rarity = pd.Categorical(
        mineTool.Rarity, categories = ['Common', 'Uncommon', 'Rare', 'Epic', 'Exotic', 'Legendary']
    ),
    Unit = mineTool['Sell Value'].astype(int),
)[['Unit', 'Reward', 'Rarity', 'Loot']]
mineTool
Unit Reward Rarity Loot
0 180 K-Marks Common Heavy Mining Tool

And, add it to the pipeline.

lootKMarks = pd.concat([
    lootKMarks.rename({"Name":"Reward"}, axis=1)[['Unit', 'Reward', 'Rarity', 'Loot']],
    drives.query('Reward == "K-Marks"').drop("Weight", axis=1),
    guns[['Unit', 'Reward', 'Rarity', 'Loot']],
    backpacks,
    ammo,
    mineTool
])

# Drop this quest
idx = tasks.name.str.contains("Loadout Drop")
tasks = tasks.loc[-idx]

taskLoot = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='outer')
taskLoot['Cost'] = taskLoot['count'] * taskLoot['Unit']
results = allJobs.query('Rewards == "K-Marks"').merge(
    taskLoot[['name', 'Cost']].groupby('name').sum(), left_on="Job", right_on="name", how='left')
results['Balance'] = results['Units'] - results['Cost']
results.query("Cost == 0")
Units Rewards Job Cost Balance
77 2400 K-Marks Grenadier 0.0 2400.0
138 155000 K-Marks NEW-Hard-Osiris-EliteCrusher-1 0.0 155000.0
tmp = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='left')
tmp.loc[tmp.Loot.isna()]
name count loot description Loot Unit
93 Grenadier 1 Frag Grenade Prospector. You have heard of Badum's Dead Dro... NaN NaN
151 NEW-Hard-Osiris-EliteCrusher-1 1 Alpha Crusher DESCRIPTION MISSING NaN NaN

Conclusions

This is the best we’ll get I suppose. Now, back to our check: How bad is our good old job And two smoking Barrels?

results.loc[results.Job.str.contains("Barrel")]
Units Rewards Job Cost Balance
39 19000 K-Marks And two smoking Barrels 40716.0 -21716.0

And, there we go! So, how many jobs have a negative balance?

results.query("Balance < 0 ")
Units Rewards Job Cost Balance
39 19000 K-Marks And two smoking Barrels 40716.0 -21716.0
70 9500 K-Marks Data Drive III 10124.0 -624.0
73 56000 K-Marks Data Drive VI 61512.0 -5512.0
81 17000 K-Marks Ammo Supplies 69000.0 -52000.0

That’s not as bad as I expected. What is the average job balance?

results.Balance.mean(), results.Balance.median()
(13300.702127659575, 11256.0)

So, what jobs are above the mean?

results.query(f"Balance >= {results.Balance.mean()}")
Units Rewards Job Cost Balance
5 25000 K-Marks Excavator Improvements 4306.0 20694.0
6 37000 K-Marks A new type of Alloy 15468.0 21532.0
7 27000 K-Marks Automated Security 4930.0 22070.0
9 52000 K-Marks Classified I 27030.0 24970.0
15 27000 K-Marks Geologist 6727.0 20273.0
16 29000 K-Marks Industry Secret 8649.0 20351.0
17 34000 K-Marks Veltecite Hearts 12974.0 21026.0
18 40000 K-Marks Laser Rifles 18259.0 21741.0
19 47000 K-Marks Classified II 23922.0 23078.0
20 41000 K-Marks Unlimited Power 17377.0 23623.0
32 27000 K-Marks Meteor Core 7594.0 19406.0
33 30000 K-Marks Shards of pure Power 8104.0 21896.0
34 48000 K-Marks Battery Day 22782.0 25218.0
41 28000 K-Marks Upgrades 3845.0 24155.0
43 140000 K-Marks Exclusive Drilling Rights 115330.0 24670.0
44 32000 K-Marks Stocking Up 9120.0 22880.0
45 26000 K-Marks Shock and Awe 11385.0 14615.0
50 37000 K-Marks Sabotage 15696.0 21304.0
51 28000 K-Marks Old Currency 6000.0 22000.0
52 26000 K-Marks Recoil Compensation 9390.0 16610.0
54 45000 K-Marks New 3D Printer 21362.0 23638.0
55 33000 K-Marks Arms Race 11532.0 21468.0
56 35000 K-Marks Energy Crisis 18984.0 16016.0
71 26000 K-Marks Data Drive IV 12150.0 13850.0
76 33000 K-Marks A Solution 10125.0 22875.0
82 25000 K-Marks Data Drop 10124.0 14876.0
83 26000 K-Marks Field Supplies 3000.0 23000.0
84 25000 K-Marks Spare Parts 2024.0 22976.0
90 29000 K-Marks NEW-Hard-ICA-Gather-6 9142.0 19858.0
91 227000 K-Marks Striking Big 201830.0 25170.0
93 25000 K-Marks A Craving 4048.0 20952.0
97 26000 K-Marks Cretins 6300.0 19700.0
98 27000 K-Marks Sensor Array Repairs 6908.0 20092.0
99 32000 K-Marks A new Energy Source 11289.0 20711.0
100 30000 K-Marks Data Center Upgrades 7690.0 22310.0
101 30000 K-Marks Keep the Experiment running 7200.0 22800.0
102 61000 K-Marks Safety Measures 36309.0 24691.0
122 31000 K-Marks Harvest Day 6583.0 24417.0
125 37000 K-Marks Core Values 15188.0 21812.0
126 45000 K-Marks Promising Results 21266.0 23734.0
130 24000 K-Marks Stormy Samples 2700.0 21300.0
131 25000 K-Marks Stormy Samples II 2024.0 22976.0
132 33000 K-Marks Loadout Drop 7320.0 25680.0
136 33000 K-Marks NEW-Hard-Osiris-Gather-7 12816.0 20184.0
138 155000 K-Marks NEW-Hard-Osiris-EliteCrusher-1 0.0 155000.0

Perfect! Next we’ll loop back around to automating the pipeline and uploading the data to Kaggle.