Which Cycle Frontier Jobs Are Worth Doing? - Part Four.

Introduction

Welcome back to the fourth post in the series. Last time, we were having issues with the data since there were missing values - including the Job that prompted the initial investigation. We’ll jump right into it picking up with the mismatch between naming in tables.

While working through the list of problems I found reviewing the data, there is clearly an error in the Regex built before. The symptom of this was the Autoloader which has a hanging number when the group is pulled out.

tasksSubset = siteJobs[1][["Name", "Description", "Tasks"]].copy()
tasksSubset = tasksSubset[ ~tasksSubset.Tasks.isna()]
tasksSubset = tasksSubset[ ~tasksSubset.Tasks.str.contains("Kill")]

regex = r"(\d+\s[\w]+\s[\w]+)"
tmp = tasksSubset.Tasks.str.extractall(regex)
tmp[tmp.reset_index()[0].str.contains('Autoloader').values]

		0
	match
8	0	1 Autoloader 5
32	0	2 Autoloader 8

After some work, this was because the hanging [\w]+ was too greedy and had to be toned down.

newRegex = r"(\d+\s[\w]+\s?[a-zA-Z]+)"
tmp = tasksSubset.Tasks.str.extractall(newRegex)
tmp[tmp.reset_index()[0].str.contains('Autoloader').values]

		0
	match
8	0	1 Autoloader
32	0	2 Autoloader

This didn’t end up being the only correction to the Regex; in fact, the Co-Tool Multitool was being broken apart since it has a - in the name. So, we had to update this to check and include the - on the split.

newRegex = r"((\d+\s[\w\-]+\s?[a-zA-Z]+))"

Now we’ll have our updated version of the breakLoot function we write before.

def breakLoot(taskString, index=0):
    parts = taskString.split(' ', maxsplit=1)
    if index == 0:
        return int(parts[index])
    elif index == 1:
        return parts[index]
    else:
        # This shouldn't be called.
        return None

tasks = []

for index in range(0,3):
    tasksSubset = siteJobs[index][["Name", "Description", "Tasks"]].copy()
    tasksSubset = tasksSubset[ ~tasksSubset.Tasks.isna()]
    tasksSubset = tasksSubset[ ~tasksSubset.Tasks.str.contains("Kill")]

    newRegex = r"((\d+\s[\w\-]+\s?[a-zA-Z]+))"
    tmp = tasksSubset.Tasks.str.extractall(newRegex)

    count = tmp.reset_index()[0].apply(breakLoot).values
    aLoot = tmp.reset_index()[0].apply(breakLoot, index=1).values

    tmp = tmp.assign(
        count = count,
        loot = aLoot
    )

    nameDescriptSlice = tasksSubset.loc[tmp.reset_index()["level_0"], ['Name', 'Description']]

    tmp = tmp.assign(
        name = nameDescriptSlice.Name.values,
        description = nameDescriptSlice.Description.values
    )

    taskSlice = tmp.reset_index().drop([
        'level_0',
        'match',
        0
    ], axis =1 )

    taskSlice = taskSlice[['name', 'count', 'loot', 'description']]
    tasks.append(taskSlice)
tasks = pd.concat([*tasks])

Fix Item Names

Now we can continue and move to fixing the name mismatches like intended. There was quite a few of these so I’ll only show the process for a few of them - and then the fix for all of them. We’ll pull the code from the previous post so we can start by using the 0 values and work backwards.

allJobs = pd.concat([korolevRewards, icaRewards, osirisRewards])
lootKMarks = loot.query('Name == "K-Marks"')
tasks = tasks.copy()
taskLoot = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='outer')
taskLoot['Cost'] = taskLoot['count'] * taskLoot['Unit']
results = allJobs.query('Rewards == "K-Marks"').merge(
    taskLoot[['name', 'Cost']].groupby('name').sum(), left_on="Job", right_on="name", how='left')
results['Balance'] = results['Units'] - results['Cost']

# How many are there?
len(results.query("Cost == 0"))

There are 31 rows with a Cost of 0 which we’ll need to investigate. The first one which I had noticed and corrected in the previous post was the CPU’s so we’ll address this one first. We’ll need to find the values in the tasks table and the lootKMarks table so we can figure out where the disconnect is.

tasks.loc[tasks.loot.str.contains("Master|Zero")]

	name	count	loot	description
3	Mining Bot	2	Zero Systems	Our engineers have designed an autonomous mini...
6	Insufficient Processing Power	1	Master Unit	Prospector! The Zero Systems CPU you brought u...
12	Automated Security	5	Zero Systems	We will have to build new turrets to help prot...
16	Classified I	2	Master Unit	Prospector! We need Derelict Explosives, Maste...
47	Upgrades	1	Master Unit	We want to take over an ICA Data Center. Stash...
9	Sabotage	4	Zero Systems	Prospector! Those Korolev simpletons are drill...
34	Spare Parts	4	Zero Systems	One of the Servers at Starport has been failin...
40	NEW-Hard-ICA-Gather-6	1	Zero Systems	DESCRIPTION MISSING
11	Data Center Upgrades	2	Master Unit	Turns out our data center was not powerful eno...

lootKMarks.loc[lootKMarks.Loot.str.contains("CPU")]

	Unit	Name	Rarity	Personal Quarters	Campaigns	Jobs	Printing	Loot
131	3845.0	K-Marks	Epic	Yes x3	Yes x3	Yes	No	Master Unit CPU
321	506.0	K-Marks	Rare	Yes x37	Yes x6	Yes	No	Zero Systems CPU

The tasks table has a shortened version of its name so the easiest way to fix this would be to append CPU to the loot name in the tasks table.

tasks.loc[ tasks.loot == "Master Unit", 'loot'] = 'Master Unit CPU'
tasks.loc[ tasks.loot == "Master Unit CPU", 'loot']

6     Master Unit CPU
16    Master Unit CPU
47    Master Unit CPU
11    Master Unit CPU
Name: loot, dtype: object

Another was the Pure Focus Crystals which has the same problem of having the name truncated; this is a trend among all the affected materials with higher tiered materials. Those will all be included in the final correction and will skip showing the process; it’s literally just the same and tedious.

# Missing the Crystal part
tasks.loc[tasks.loot.str.contains("Pure Focus")]

	name	count	loot	description
25	Geologist	1	Pure Focus	You got time for a job, Prospector? The sample...
26	Industry Secret	3	Pure Focus	Hm. Interesting. The Pure Focus Crystals you b...
28	Laser Rifles	3	Pure Focus	We are prototyping a new laser rifle that can ...
16	Arms Race	4	Pure Focus	Prospector. It seems like Korolev is working o...

lootKMarks.loc[lootKMarks.Loot.str.contains("Crystal")]

	Unit	Name	Rarity	Personal Quarters	Campaigns	Jobs	Printing	Loot
56	961.0	K-Marks	Rare	Yes x19	Yes x10	Yes	No	Focus Crystal
61	2883.0	K-Marks	Epic	Yes x6	Yes x9	Yes	No	Pure Focus Crystal
66	38924.0	K-Marks	Legendary	Yes x6	Yes x9	Yes	No	Polirium Crystal
81	1709.0	K-Marks	Epic	Yes x36	Yes x9	Yes	No	Teratomorphic Crystal Core

This one was a surprise since truncating it doesn’t make any sense and gains nothing but the Magnetic Field Stabilizer is being truncated as well.

# Missing the Stabilizer:
tasks.loc[tasks.loot.str.contains("Magnetic")]

	name	count	loot	description
42	Stability is Key	1	Magnetic Field	Our Comms are jammed. We need you to stash a M...
43	Stability is Key	1	Magnetic Field	Our Comms are jammed. We need you to stash a M...
44	Stability is Key	1	Magnetic Field	Our Comms are jammed. We need you to stash a M...
28	Storm Interference	2	Magnetic Field	Prospector, the Storm has been distorting all ...
2	Surveillance Center	2	Magnetic Field	We want to expand our surveillance operations ...

lootKMarks.loc[lootKMarks.Loot.str.contains("Magnetic")]

	Unit	Name	Rarity	Personal Quarters	Campaigns	Jobs	Printing	Loot
171	338.0	K-Marks	Uncommon	Yes x71	Yes x25	Yes	No	Magnetic Field Stabilizer

Lastly, we’ll show the NiC Oil which was a problem from the previous post which is missing the Cannister at the end again.

tasks.loc[tasks.loot.str.contains("Oil")]

	name	count	loot	description
41	Striking Big	10	NiC Oil	Damn it! We ran out of Fuel for our Radiation ...

lootKMarks.loc[lootKMarks.Loot.str.contains("Oil")]

	Unit	Name	Rarity	Personal Quarters	Campaigns	Jobs	Printing	Loot
121	20183.0	K-Marks	Epic	Yes x103	Yes x10	Yes	No	NiC Oil Cannister

There we go! Now, we’ll update the data again and run the merge once more.

# Most corrections:
tasks.loc[ tasks.loot == "Master Unit", 'loot'] = 'Master Unit CPU'
tasks.loc[ tasks.loot == "Zero Systems", 'loot'] = 'Zero Systems CPU'
tasks.loc[ tasks.loot == "Pure Focus", 'loot'] = 'Pure Focus Crystal'
tasks.loc[ tasks.loot == "Heavy Mining", 'loot'] = 'Heavy Mining Tool'
tasks.loc[ tasks.loot == "Magnetic Field", 'loot'] = 'Magnetic Field Stabilizer'
tasks.loc[ tasks.loot == "Brittle Titan", 'loot'] = 'Brittle Titan Ore'
tasks.loc[ tasks.loot == "NiC Oil", 'loot'] = 'NiC Oil Cannister'
tasks.loc[ tasks.loot == "Charged Spinal", 'loot'] = 'Charged Spinal Base'
tasks.loc[ tasks.loot == "Hardened Bone", 'loot'] = 'Hardened Bone Plates'
tasks.loc[ tasks.loot == "Pale Ivy", 'loot'] = 'Pale Ivy Blossom'
tasks.loc[ tasks.loot == "Glowy Brightcap", 'loot'] = 'Glowy Brightcap Mushroom'
tasks.loc[ tasks.loot == "Blue Runner", 'loot'] = 'Blue Runner Egg'
tasks.loc[ tasks.loot == "Magic", 'loot'] = 'Magic-GROW Fertilizer'
tasks.loc[ tasks.loot == "Letium", 'loot'] = 'Letium Clot'
tasks.loc[ tasks.loot == "Azure Tree", 'loot'] = 'Azure Tree Bark'

tasks = tasks.copy()
taskLoot = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='outer')
taskLoot['Cost'] = taskLoot['count'] * taskLoot['Unit']
results = allJobs.query('Rewards == "K-Marks"').merge(
    taskLoot[['name', 'Cost']].groupby('name').sum(), left_on="Job", right_on="name", how='left')
results['Balance'] = results['Units'] - results['Cost']

len(results.query("Cost == 0"))

We’ve made some good progress so far but we’ve still got quite a few jobs to update. Let’s check the list of missing valuse once more and see if there is a new pattern here.

tmp = tasks.merge(results[['Job', 'Cost']], left_on="name", right_on="Job", how='left').query("Cost == 0")
tmp

	name	count	loot	description	Job
40	Excavation Gear	1	Heavy Mining Tool	For excavations, we need you to stash a Heavy ...	Excavation Gear
46	And two smoking Barrels	1	PKR Maelstrom	Prospector. Get down there and stash a PKR Mae...	And two smoking Barrels
47	And two smoking Barrels	200	Shotgun Ammo	Prospector. Get down there and stash a PKR Mae...	And two smoking Barrels
74	Data Drive II	1	Rare Data	The Data you brought us was helpful, but we ne...	Data Drive II
75	Data Drive III	2	Rare Data	Good work last time, Prospector. The Data was ...	Data Drive III
76	Data Drive IV	1	Epic Data	In order to be able to predict Storm Behaviour...	Data Drive IV
77	Data Drive V	1	Legendary Data	Yes! The more precise Data you brought us was ...	Data Drive V
78	Data Drive VI	3	Legendary Data	We're finding more than just Storm data now......	Data Drive VI
82	Grenadier	1	Frag Grenade	Prospector. You have heard of Badum's Dead Dro...	Grenadier
86	Ammo Supplies	1000	Medium Ammo	Our Field Agents need more Ammo if they are to...	Ammo Supplies
87	Data Drop	2	Rare Data	Prospector. One of our Scientists is convinced...	Data Drop
90	Provide an Advocate	1	Advocate at	Our field agent requested better gear to take ...	Provide an Advocate
129	Loadout Drop	1	Rare Shield	One of our more... lethal assets on Fortuna re...	Loadout Drop
130	Loadout Drop	1	Rare Helmet	One of our more... lethal assets on Fortuna re...	Loadout Drop
131	Loadout Drop	1	Rare Backpack	One of our more... lethal assets on Fortuna re...	Loadout Drop
138	NEW-Hard-Osiris-EliteCrusher-1	1	Alpha Crusher	DESCRIPTION MISSING	NEW-Hard-Osiris-EliteCrusher-1

The biggest standout problem here now is all those data drive quests so lets resolve those next.

Data Drives:

We have another problem though since none of the tables we have actually contains the data we’re after. So, we’re back to the wiki to do some more scraping work. If we also look at the names of the data drives we’re going to have another problem soon - which you’ll see when we download the data.

# Missing the Data Drive
tasks.loc[tasks.loot.str.contains("Data")]

	name	count	loot	description
19	Data Drive II	1	Rare Data	The Data you brought us was helpful, but we ne...
20	Data Drive III	2	Rare Data	Good work last time, Prospector. The Data was ...
21	Data Drive IV	1	Epic Data	In order to be able to predict Storm Behaviour...
22	Data Drive V	1	Legendary Data	Yes! The more precise Data you brought us was ...
23	Data Drive VI	3	Legendary Data	We're finding more than just Storm data now......
32	Data Drop	2	Rare Data	Prospector. One of our Scientists is convinced...

Like the previous posts, scraping is a tedious process of matching keywords and pulling the right tables so that’s getting skipped; it’s just mostly trial and error.

# game taken down
# urlDataDrives = 'https://thecyclefrontier.wiki/wiki/Utilities#Data_Drives-0'
# urlDataDrives = 'https://thecyclefrontier.wiki/wiki/Data_Drive'
urlDataDrives = 'https://archive.ph/zG8Q2'
siteDrive = pd.read_html(urlDataDrives, attrs={"class":"zebra"})[2]

We’ll use a modified version of the code we wrote before for parsing the loot table for this.

driveSubset = siteDrive[
    ['Image', 'Name', 'Rarity', 'Weight']
    ].copy()

driveSubset = driveSubset.assign(
    Loot = np.NaN
)

index = range( 0, len(driveSubset) - 2, 3)
offset = np.array([1, 2])

for i in index:
    
    aLoot = driveSubset.iloc[i, 1]
    indexes = i + offset
    driveSubset.iloc[indexes, 4] = aLoot


tmp = driveSubset.iloc[:, 1:4]
tmp = tmp.fillna(method="ffill")
driveSubset.iloc[:, 1:4] = tmp

cutNA = driveSubset.Loot.isna()
driveData = driveSubset[ ~cutNA ]
driveData = driveData.rename(columns={'Image':'Unit', 'Name':'Reward'})
driveData['Rarity'] = pd.Categorical(
    driveData.Rarity, categories = ['Common', 'Uncommon', 'Rare', 'Epic', 'Exotic', 'Legendary']
)
driveData['Label'] = driveData['Rarity'].astype('str') + ' Data'
drives = driveData

drives

	Unit	Reward	Rarity	Weight	Loot	Label
1	30.0	K-Marks	Common	15.0	Data Drive Tier 1	Common Data
2	0.0	Reputation	Common	15.0	Data Drive Tier 1	Common Data
4	1013.0	K-Marks	Uncommon	15.0	Data Drive Tier 2	Uncommon Data
5	1.0	Reputation	Uncommon	15.0	Data Drive Tier 2	Uncommon Data
7	2531.0	K-Marks	Rare	15.0	Data Drive Tier 3	Rare Data
8	3.0	Reputation	Rare	15.0	Data Drive Tier 3	Rare Data
10	6075.0	K-Marks	Epic	15.0	Data Drive Tier 4	Epic Data
11	6.0	Reputation	Epic	15.0	Data Drive Tier 4	Epic Data
13	10252.0	K-Marks	Legendary	15.0	Data Drive Tier 5	Legendary Data
14	10.0	Reputation	Legendary	15.0	Data Drive Tier 5	Legendary Data

Since this is the second time that we’ve needed this - and we’re going to need this again - we should write a function to wrap this whole process.

# this is the function, where:
## siteData: the table from the scraped site
## columns: the columns you want to keep from the scraped data
## adjust: the count from the bottom containing an index
## step: how many rows between values we care about
## offset: how many rewards are there?
def extractSite(siteData, columns, adjust, step,  offset):
    if not isinstance(columns, list):
        print("Columns argument must be a list.")
        return None
    siteSubset = siteData[columns].copy()

    siteSubset = siteSubset.assign(
        Loot = np.NaN
    )

    # Some extra error handling 
    if not isinstance(adjust, int):
        print("adjust argument must be an int.")
        return None
    if not isinstance(step, int):
        print("step argument must be an int.")
        return None
    if not isinstance(offset, list):
        print("offset argument must be a list.")
        return None
    
    index = range( 0, len(siteSubset) - adjust, step)
    offset = np.array(offset)

    for i in index:
        aLoot = siteSubset.iloc[i, 1]
        indexes = i + offset
        siteSubset.iloc[indexes, len(siteSubset.columns)-1] = aLoot

    tmp = siteSubset.iloc[:, 1:len(siteSubset.columns)]
    tmp = tmp.fillna(method="ffill")
    siteSubset.iloc[:, 1:len(siteSubset.columns)-1] = tmp

    cutNA = siteSubset.Loot.isna()
    returnData = siteSubset[ ~cutNA ]
    returnData = returnData.rename(columns={'Image':'Unit', 'Name':'Reward'})

    return returnData

Now we’ll sanity check this to make sure it works.

drives = extractSite(siteDrive, ['Image', 'Name', 'Rarity', 'Weight'], 2, 3, [1, 2])
drives['Rarity'] = pd.Categorical(
        drives.Rarity, categories = ['Common', 'Uncommon', 'Rare', 'Epic', 'Exotic', 'Legendary']
    )
drives

	Unit	Reward	Rarity	Weight	Loot
1	30.0	K-Marks	Common	15.0	Data Drive Tier 1
2	0.0	Reputation	Common	15.0	Data Drive Tier 1
4	1013.0	K-Marks	Uncommon	15.0	Data Drive Tier 2
5	1.0	Reputation	Uncommon	15.0	Data Drive Tier 2
7	2531.0	K-Marks	Rare	15.0	Data Drive Tier 3
8	3.0	Reputation	Rare	15.0	Data Drive Tier 3
10	6075.0	K-Marks	Epic	15.0	Data Drive Tier 4
11	6.0	Reputation	Epic	15.0	Data Drive Tier 4
13	10252.0	K-Marks	Legendary	15.0	Data Drive Tier 5
14	10.0	Reputation	Legendary	15.0	Data Drive Tier 5

Perfect! Now we just add this to our list of adjustments right? Sadly no. If you look at the names of the drives you’ll find that we’re not quite there. The names of the drives were renamed in Season 2 but the values in our tasks were not updated from their previous values. This is not too hard to update since the old drive names were just the Rarity + Data and we have that so we’ll just need a new column.

drives['Loot'] = drives['Rarity'].astype('str') + ' Data'
drives

	Unit	Reward	Rarity	Weight	Loot
1	30.0	K-Marks	Common	15.0	Common Data
2	0.0	Reputation	Common	15.0	Common Data
4	1013.0	K-Marks	Uncommon	15.0	Uncommon Data
5	1.0	Reputation	Uncommon	15.0	Uncommon Data
7	2531.0	K-Marks	Rare	15.0	Rare Data
8	3.0	Reputation	Rare	15.0	Rare Data
10	6075.0	K-Marks	Epic	15.0	Epic Data
11	6.0	Reputation	Epic	15.0	Epic Data
13	10252.0	K-Marks	Legendary	15.0	Legendary Data
14	10.0	Reputation	Legendary	15.0	Legendary Data

# drives
pd.concat(
    [lootKMarks.rename({"Name":"Reward"}, axis=1)[['Unit', 'Reward', 'Rarity', 'Loot']],
    drives.query('Reward == "K-Marks"').drop("Weight", axis=1)])

	Unit	Reward	Rarity	Loot
1	150.0	K-Marks	Common	Flawed Veltecite
6	570.0	K-Marks	Uncommon	Cloudy Veltecite
11	854.0	K-Marks	Rare	Clear Veltecite
16	1922.0	K-Marks	Epic	Pure Veltecite
21	6487.0	K-Marks	Legendary	Veltecite Heart
...	...	...	...	...
1	30.0	K-Marks	Common	Common Data
4	1013.0	K-Marks	Uncommon	Uncommon Data
7	2531.0	K-Marks	Rare	Rare Data
10	6075.0	K-Marks	Epic	Epic Data
13	10252.0	K-Marks	Legendary	Legendary Data

99 rows × 4 columns

Add that to our pipeline and let’s see where we are now

allJobs = pd.concat([korolevRewards, icaRewards, osirisRewards])
lootKMarks = loot.query('Name == "K-Marks"')

# Most corrections:
tasks.loc[ tasks.loot == "Master Unit", 'loot'] = 'Master Unit CPU'
tasks.loc[ tasks.loot == "Zero Systems", 'loot'] = 'Zero Systems CPU'
tasks.loc[ tasks.loot == "Pure Focus", 'loot'] = 'Pure Focus Crystal'
tasks.loc[ tasks.loot == "Heavy Mining", 'loot'] = 'Heavy Mining Tool'
tasks.loc[ tasks.loot == "Magnetic Field", 'loot'] = 'Magnetic Field Stabilizer'
tasks.loc[ tasks.loot == "Brittle Titan", 'loot'] = 'Brittle Titan Ore'
tasks.loc[ tasks.loot == "NiC Oil", 'loot'] = 'NiC Oil Cannister'
tasks.loc[ tasks.loot == "Charged Spinal", 'loot'] = 'Charged Spinal Base'
tasks.loc[ tasks.loot == "Hardened Bone", 'loot'] = 'Hardened Bone Plates'
tasks.loc[ tasks.loot == "Pale Ivy", 'loot'] = 'Pale Ivy Blossom'
tasks.loc[ tasks.loot == "Glowy Brightcap", 'loot'] = 'Glowy Brightcap Mushroom'
tasks.loc[ tasks.loot == "Blue Runner", 'loot'] = 'Blue Runner Egg'
tasks.loc[ tasks.loot == "Magic", 'loot'] = 'Magic-GROW Fertilizer'
tasks.loc[ tasks.loot == "Letium", 'loot'] = 'Letium Clot'
tasks.loc[ tasks.loot == "Azure Tree", 'loot'] = 'Azure Tree Bark'
tasks = tasks.copy()

# Extract Stuff here for now:
drives = extractSite(siteDrive, ['Image', 'Name', 'Rarity', 'Weight'], 2, 3, [1, 2])
drives['Rarity'] = pd.Categorical(
        drives.Rarity, categories = ['Common', 'Uncommon', 'Rare', 'Epic', 'Exotic', 'Legendary']
)
drives['Loot'] = drives['Rarity'].astype('str') + ' Data'

lootKMarks = pd.concat(
    [lootKMarks.rename({"Name":"Reward"}, axis=1)[['Unit', 'Reward', 'Rarity', 'Loot']],
    drives.query('Reward == "K-Marks"').drop("Weight", axis=1)])

taskLoot = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='outer')
taskLoot['Cost'] = taskLoot['count'] * taskLoot['Unit']
results = allJobs.query('Rewards == "K-Marks"').merge(
    taskLoot[['name', 'Cost']].groupby('name').sum(), left_on="Job", right_on="name", how='left')
results['Balance'] = results['Units'] - results['Cost']

len(results.query("Cost == 0"))

That’s much better! So, what’s left to do?

tmp = tasks.merge(results[['Job', 'Cost']], left_on="name", right_on="Job", how='left').query("Cost == 0")
tmp

	name	count	loot	description	Job
40	Excavation Gear	1	Heavy Mining Tool	For excavations, we need you to stash a Heavy ...	Excavation Gear
46	And two smoking Barrels	1	PKR Maelstrom	Prospector. Get down there and stash a PKR Mae...	And two smoking Barrels
47	And two smoking Barrels	200	Shotgun Ammo	Prospector. Get down there and stash a PKR Mae...	And two smoking Barrels
82	Grenadier	1	Frag Grenade	Prospector. You have heard of Badum's Dead Dro...	Grenadier
86	Ammo Supplies	1000	Medium Ammo	Our Field Agents need more Ammo if they are to...	Ammo Supplies
90	Provide an Advocate	1	Advocate at	Our field agent requested better gear to take ...	Provide an Advocate
129	Loadout Drop	1	Rare Shield	One of our more... lethal assets on Fortuna re...	Loadout Drop
130	Loadout Drop	1	Rare Helmet	One of our more... lethal assets on Fortuna re...	Loadout Drop
131	Loadout Drop	1	Rare Backpack	One of our more... lethal assets on Fortuna re...	Loadout Drop
138	NEW-Hard-Osiris-EliteCrusher-1	1	Alpha Crusher	DESCRIPTION MISSING	NEW-Hard-Osiris-EliteCrusher-1

Add the Guns

We’re going to move to getting the gun data included since we already actualy have it. This was part of another post which was done - not included in the series.

# game taken down
# url = "https://thecyclefrontier.wiki/wiki/Weapons"
url = 'https://archive.ph/pM11n'

siteGun = pd.read_html(url, attrs={"class":"zebra"})[0]

gunData = siteGun[~siteGun.Type.isna()]
indx = gunData['Proj. Speed'] == 'Hitscan'
gunData.loc[indx, 'Proj. Speed'] = np.NaN

gunData = gunData.assign(
    Unit = gunData['Sell Value'].str.replace(' K-Marks', '').astype('float'),
    Reward = "K-Marks",
    Loot = gunData['Name']
)

# # This removes the legendary weapons
# data = data.query('Faction != "Printing"')

guns = gunData
guns[['Unit', 'Reward', 'Rarity', 'Loot']].head(15)

	Unit	Reward	Rarity	Loot
0	17429.0	K-Marks	Epic	Advocate
3	524.0	K-Marks	Common	AR-55 Autorifle
6	12341.0	K-Marks	Epic	Asp Flechette Gun
9	371.0	K-Marks	Common	B9 Trenchgun
12	63080.0	K-Marks	Exotic	Basilisk
15	1918.0	K-Marks	Uncommon	Bulldog
18	2052.0	K-Marks	Common	C-32 Bolt Action
21	17429.0	K-Marks	Epic	Gorgon
24	16805.0	K-Marks	Exotic	Hammer
27	7143.0	K-Marks	Rare	ICA Guarantee
30	228.0	K-Marks	Common	K-28
33	325513.0	K-Marks	Legendary	KARMA-1
36	22781.0	K-Marks	Epic	KBR Longshot
39	94459.0	K-Marks	Exotic	Kinetic Arbiter
42	1918.0	K-Marks	Uncommon	KM-9 'Scrapper'

Now we have the gun data to be added to the loot table but if you were paying attention you may have noticed something is wrong with one of our values.

# Not sure what this is from.
tasks.loc[tasks.loot.str.contains(' at')]

	name	count	loot	description
35	Provide an Advocate	1	Advocate at	Our field agent requested better gear to take ...

The Advocate in this row is labeled as Advocate at which is another problem. In this instance, instead of trying to deal with the regex we’re just going to update that single value.

tasks.loc[tasks.loot.str.contains('Advocate at'), 'loot'] = 'Advocate'
tasks.loc[tasks.loot.str.contains('Advocate')]

	name	count	loot	description
35	Provide an Advocate	1	Advocate	Our field agent requested better gear to take ...

And, make sure that the append works as intended:

pd.concat(
    [lootKMarks.rename({"Name":"Reward"}, axis=1)[['Unit', 'Reward', 'Rarity', 'Loot']],
    drives.query('Reward == "K-Marks"').drop("Weight", axis=1),
    guns[['Unit', 'Reward', 'Rarity', 'Loot']]]
)

	Unit	Reward	Rarity	Loot
1	150.0	K-Marks	Common	Flawed Veltecite
6	570.0	K-Marks	Uncommon	Cloudy Veltecite
11	854.0	K-Marks	Rare	Clear Veltecite
16	1922.0	K-Marks	Epic	Pure Veltecite
21	6487.0	K-Marks	Legendary	Veltecite Heart
...	...	...	...	...
63	371.0	K-Marks	Common	S-576 PDW
66	1179.0	K-Marks	Uncommon	Scarab
69	12341.0	K-Marks	Epic	Shattergun
72	34172.0	K-Marks	Exotic	Voltaic Brute
75	270540.0	K-Marks	Legendary	Zeus Beam

125 rows × 4 columns

Deal With Backpacks

We’re going to try to fix the Backpacks, the Shields and the Helmets toegther since they’re all on the same page of the wiki. Again, this data was not part of the previous tables that we had so we’ll need to go get it.

# game taken down:
# gearUrl = 'https://thecyclefrontier.wiki/wiki/Gear'
gearUrl = 'https://archive.ph/qeeTm'
site = pd.read_html(gearUrl)

This was tricky to collect - which is only why I’m pointing out how it was done. While working out how to collect the data from the website, there was no good strategy to collect just the data that I wanted. What you will see is that I use hard coded values to pull out where each table is.

siteBackPacks = site[0]
siteHelmet = site[10]
siteShield = site[22]

What I did was enumerate throught all the tables collected to find which results had outlier sizes. I then pulled those to make sure they were what I was after like this:

# 10
# 22
list(enumerate(map(len, site)))

[(0, 22),
 (1, 1),
 (2, 1),
 (3, 3),
 (4, 1),
 (5, 4),
 (6, 1),
 (7, 4),
 (8, 1),
 (9, 1),
 (10, 46),
 (11, 1),
 (12, 1),
 (13, 3),
 (14, 4),
 (15, 4),
 (16, 3),
 (17, 4),
 (18, 4),
 (19, 3),
 (20, 4),
 (21, 4),
 (22, 43),
 (23, 1),
 (24, 1),
 (25, 3),
 (26, 4),
 (27, 4),
 (28, 4),
 (29, 4),
 (30, 4),
 (31, 4),
 (32, 4)]

It is not pretty but web scraping rarely is. And, it works. Next we’ll start with the backup packs; with some tweaking of the values to the extractSite function which was defined earlier, this becomes really easy.

backpacks = extractSite(
    siteData = siteBackPacks.loc[siteBackPacks.Name.str.contains("Backpack|K-Marks")],
    columns = ['Image', 'Name', 'Rarity', 'Space', 'Sale Price'],
    adjust = 2, 
    step = 3,
    offset = [1, 2])

backpacks

	Unit	Reward	Rarity	Space	Sale Price	Loot
1	600.0	K-Marks	Common	200.0	180 K-Marks	Small Backpack
2	180.0	K-Marks	Common	200.0	180 K-Marks	Small Backpack
4	2700.0	K-Marks	Uncommon	250.0	810 K-Marks	Medium Backpack
7	810.0	K-Marks	Uncommon	250.0	810 K-Marks	Medium Backpack
9	6100.0	K-Marks	Rare	300.0	1,830 K-Marks	Large Backpack
13	1830.0	K-Marks	Rare	300.0	1,830 K-Marks	Large Backpack
15	12000.0	K-Marks	Epic	350.0	4,000 K-Marks	Heavy Duty Backpack
19	4000.0	K-Marks	Epic	350.0	4,000 K-Marks	Heavy Duty Backpack

Looking at these results though, what we’re really after is the Sale Price and so we’ll need to move some stuff around.

backpacks = backpacks.assign(
    Unit = backpacks['Sale Price'].str.replace(' K-Marks', '').str.replace(',', '').astype('float')
)

backpacks = backpacks.drop(["Sale Price", "Space"], axis=1)
backpacks

	Unit	Reward	Rarity	Loot
1	180.0	K-Marks	Common	Small Backpack
2	180.0	K-Marks	Common	Small Backpack
4	810.0	K-Marks	Uncommon	Medium Backpack
7	810.0	K-Marks	Uncommon	Medium Backpack
9	1830.0	K-Marks	Rare	Large Backpack
13	1830.0	K-Marks	Rare	Large Backpack
15	4000.0	K-Marks	Epic	Heavy Duty Backpack
19	4000.0	K-Marks	Epic	Heavy Duty Backpack

We have the same problem which we had for the Data Drives: the the old name Rare Backpack is in the tasks table but is not how they are named here. Again, we’re just going to steal and modify the solution we had before and apply it here.

backpacks['Loot'] = backpacks['Rarity'].astype('str') + ' Backpack'

Time to add them to the pipeline.

pd.concat([
    lootKMarks.rename({"Name":"Reward"}, axis=1)[['Unit', 'Reward', 'Rarity', 'Loot']],
    drives.query('Reward == "K-Marks"').drop("Weight", axis=1),
    guns[['Unit', 'Reward', 'Rarity', 'Loot']],
    backpacks
])

	Unit	Reward	Rarity	Loot
1	150.0	K-Marks	Common	Flawed Veltecite
6	570.0	K-Marks	Uncommon	Cloudy Veltecite
11	854.0	K-Marks	Rare	Clear Veltecite
16	1922.0	K-Marks	Epic	Pure Veltecite
21	6487.0	K-Marks	Legendary	Veltecite Heart
...	...	...	...	...
7	810.0	K-Marks	Uncommon	Uncommon Backpack
9	1830.0	K-Marks	Rare	Rare Backpack
13	1830.0	K-Marks	Rare	Rare Backpack
15	4000.0	K-Marks	Epic	Epic Backpack
19	4000.0	K-Marks	Epic	Epic Backpack

133 rows × 4 columns

This is much better. So, how many are missing now?

len(results.query("Cost == 0"))

Much better. But, there is actually a new problem here: masking missing values. If we look back at our data from before we’ll see that there was Helment and Shield information missing and now it’s gone! Since we’ve fixed some of the values for those jobs the Balance is no longer 0 and thefore we’ve lost track of it! Let’s step back and look at the merged result and look for mistakes. And, upon doing this we run into our first masked problem.

tmp = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='outer')
tmp

	name	count	loot	description	Loot	Unit
0	New Mining Tools	2.0	Hydraulic Piston	We are producing new Mining Tools for new Pros...	Hydraulic Piston	338.0
1	Excavator Improvements	3.0	Hydraulic Piston	The suspension on our mining excavators need i...	Hydraulic Piston	338.0
2	New Mining Tools	10.0	Hardened Metals	We are producing new Mining Tools for new Pros...	Hardened Metals	150.0
3	Automated Security	16.0	Hardened Metals	We will have to build new turrets to help prot...	Hardened Metals	150.0
4	Air Lock Upgrades	12.0	Hardened Metals	Our engineers designed a safer airlock system ...	Hardened Metals	150.0
...	...	...	...	...	...	...
211	NaN	NaN	NaN	NaN	Common Backpack	180.0
212	NaN	NaN	NaN	NaN	Uncommon Backpack	810.0
213	NaN	NaN	NaN	NaN	Uncommon Backpack	810.0
214	NaN	NaN	NaN	NaN	Epic Backpack	4000.0
215	NaN	NaN	NaN	NaN	Epic Backpack	4000.0

216 rows × 6 columns

There are all these NaN values at the bottom which were attached and included due to the outer including all the rows. Can we simply change this over to a left?

tmp = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='left')
tmp.loc[tmp.Loot.isna()]

	name	count	loot	description	Loot	Unit
39	Excavation Gear	1	Heavy Mining Tool	For excavations, we need you to stash a Heavy ...	NaN	NaN
46	And two smoking Barrels	200	Shotgun Ammo	Prospector. Get down there and stash a PKR Mae...	NaN	NaN
79	Grenadier	1	Frag Grenade	Prospector. You have heard of Badum's Dead Dro...	NaN	NaN
83	Ammo Supplies	1000	Medium Ammo	Our Field Agents need more Ammo if they are to...	NaN	NaN
126	Loadout Drop	1	Rare Shield	One of our more... lethal assets on Fortuna re...	NaN	NaN
127	Loadout Drop	1	Rare Helmet	One of our more... lethal assets on Fortuna re...	NaN	NaN
136	NEW-Hard-Osiris-EliteCrusher-1	1	Alpha Crusher	DESCRIPTION MISSING	NaN	NaN

This is more in line what I’d have expected. We’ll temporarily keep this - but mostly for discussion purposes. The problem here is that when you look through the wiki there are no sell values for the Shields and Helmets. We cannot get these from the wiki which means we cannot automate it. We could do this manually but then I’d have to constantly update this value when it changes - which I’m trying to avoid. This looks to be a case where we’re going to need to remove this job for now.

# Adding this to the pipeline:
idx = tmp.name.str.contains("Loadout Drop")
tmp = tmp.loc[-idx]
tmp

	name	count	loot	description	Loot	Unit
0	New Mining Tools	2	Hydraulic Piston	We are producing new Mining Tools for new Pros...	Hydraulic Piston	338.0
1	New Mining Tools	10	Hardened Metals	We are producing new Mining Tools for new Pros...	Hardened Metals	150.0
2	Explosive Excavation	4	Derelict Explosives	One of our mines collapsed with valuable equip...	Derelict Explosives	1709.0
3	Mining Bot	2	Zero Systems CPU	Our engineers have designed an autonomous mini...	Zero Systems CPU	506.0
4	Mining Bot	3	Ball Bearings	Our engineers have designed an autonomous mini...	Ball Bearings	338.0
...	...	...	...	...	...	...
136	NEW-Hard-Osiris-EliteCrusher-1	1	Alpha Crusher	DESCRIPTION MISSING	NaN	NaN
137	Flexible Sealant	8	Resin Gun	A number of our weather balloons took damage f...	Resin Gun	759.0
138	Indigenous Fruit	4	Indigenous Fruit	Ah, Prospector, have you come across any Indig...	Indigenous Fruit	759.0
139	Indigenous Fruit	4	Biological Sampler	Ah, Prospector, have you come across any Indig...	Biological Sampler	1139.0
140	Don't get crushed	3	Crusher Hide	Gear up, Prospector! We need Crusher Skins for...	Crusher Hide	11533.0

137 rows × 6 columns

Ok, so it looks like next will be solving the ammo listings.

tmp.loc[tmp.Loot.isna()]

	name	count	loot	description	Loot	Unit
39	Excavation Gear	1	Heavy Mining Tool	For excavations, we need you to stash a Heavy ...	NaN	NaN
46	And two smoking Barrels	200	Shotgun Ammo	Prospector. Get down there and stash a PKR Mae...	NaN	NaN
79	Grenadier	1	Frag Grenade	Prospector. You have heard of Badum's Dead Dro...	NaN	NaN
83	Ammo Supplies	1000	Medium Ammo	Our Field Agents need more Ammo if they are to...	NaN	NaN
136	NEW-Hard-Osiris-EliteCrusher-1	1	Alpha Crusher	DESCRIPTION MISSING	NaN	NaN

Deal with Ammo

# game taken down:
# ammoUrl = "https://thecyclefrontier.wiki/wiki/Ammo"
ammoUrl = 'https://archive.ph/Xacnz'
ammo = pd.read_html(ammoUrl)[0]

We’ve already done this before so this is simply here to show it was done. And, you add it to the pipeline just the same.

ammo = ammo.rename(
    {"Item Name":"Loot", "Sell Value":"Unit"}, axis=1
    ).assign(
        Reward = "K-Marks",
        Rarity = pd.Categorical(
            ammo.Rarity, categories = ['Common', 'Uncommon', 'Rare', 'Epic', 'Exotic', 'Legendary'])
    )[['Unit', 'Reward', 'Rarity', 'Loot']]

Deal with The Heavy Mining Tool

Sadly, this tool is on its own page so we’ll need something custom again. There is a table here we can pull but it’s oriented incorrectly for our use.

# game taken down:
# site = pd.read_html("https://thecyclefrontier.wiki/wiki/Heavy_Mining_Tool")
site = pd.read_html("https://archive.ph/c6QCk")

minerData = site[0]
minerData

	0	1
0	Description	Allows faster mining of materials.
1	Rarity	Common
2	Weight	30
3	Buy Value	600
4	Sell Value	180
5	Faction Points	2

Thankfully, a data frame has the Transpose function that a matrix does. A simple explanation of this function is that it swaps the rows to columns and columns to rows.

minerData[0].T

0       Description
1            Rarity
2            Weight
3         Buy Value
4        Sell Value
5    Faction Points
Name: 0, dtype: object

We’re going to extract the rows values to create a new dataframe object. Since the Heaving Mining Tool designation is missing from the data then we’ll need to add it ourself.

row = minerData[1].T.to_list() + ['Heavy Mining Tool']
row

['Allows faster mining of materials.',
 'Common',
 '30',
 '600',
 '180',
 '2',
 'Heavy Mining Tool']

columns = minerData[0].to_list() + ['Loot']
columns

['Description',
 'Rarity',
 'Weight',
 'Buy Value',
 'Sell Value',
 'Faction Points',
 'Loot']

And, join them together.

mineTool = pd.DataFrame(columns = columns)
mineTool.loc[0] = row
mineTool = mineTool.assign(
    Reward = "K-Marks",
    Rarity = pd.Categorical(
        mineTool.Rarity, categories = ['Common', 'Uncommon', 'Rare', 'Epic', 'Exotic', 'Legendary']
    ),
    Unit = mineTool['Sell Value'].astype(int),
)[['Unit', 'Reward', 'Rarity', 'Loot']]
mineTool

	Unit	Reward	Rarity	Loot
0	180	K-Marks	Common	Heavy Mining Tool

And, add it to the pipeline.

lootKMarks = pd.concat([
    lootKMarks.rename({"Name":"Reward"}, axis=1)[['Unit', 'Reward', 'Rarity', 'Loot']],
    drives.query('Reward == "K-Marks"').drop("Weight", axis=1),
    guns[['Unit', 'Reward', 'Rarity', 'Loot']],
    backpacks,
    ammo,
    mineTool
])

# Drop this quest
idx = tasks.name.str.contains("Loadout Drop")
tasks = tasks.loc[-idx]

taskLoot = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='outer')
taskLoot['Cost'] = taskLoot['count'] * taskLoot['Unit']
results = allJobs.query('Rewards == "K-Marks"').merge(
    taskLoot[['name', 'Cost']].groupby('name').sum(), left_on="Job", right_on="name", how='left')
results['Balance'] = results['Units'] - results['Cost']

results.query("Cost == 0")

	Units	Rewards	Job	Cost	Balance
77	2400	K-Marks	Grenadier	0.0	2400.0
138	155000	K-Marks	NEW-Hard-Osiris-EliteCrusher-1	0.0	155000.0

tmp = tasks.merge(lootKMarks[['Loot', 'Unit']], left_on='loot', right_on='Loot', how='left')
tmp.loc[tmp.Loot.isna()]

	name	count	loot	description	Loot	Unit
93	Grenadier	1	Frag Grenade	Prospector. You have heard of Badum's Dead Dro...	NaN	NaN
151	NEW-Hard-Osiris-EliteCrusher-1	1	Alpha Crusher	DESCRIPTION MISSING	NaN	NaN

Conclusions

This is the best we’ll get I suppose. Now, back to our check: How bad is our good old job And two smoking Barrels?

results.loc[results.Job.str.contains("Barrel")]

	Units	Rewards	Job	Cost	Balance
39	19000	K-Marks	And two smoking Barrels	40716.0	-21716.0

And, there we go! So, how many jobs have a negative balance?

results.query("Balance < 0 ")

	Units	Rewards	Job	Cost	Balance
39	19000	K-Marks	And two smoking Barrels	40716.0	-21716.0
70	9500	K-Marks	Data Drive III	10124.0	-624.0
73	56000	K-Marks	Data Drive VI	61512.0	-5512.0
81	17000	K-Marks	Ammo Supplies	69000.0	-52000.0

That’s not as bad as I expected. What is the average job balance?

results.Balance.mean(), results.Balance.median()

(13300.702127659575, 11256.0)

So, what jobs are above the mean?

results.query(f"Balance >= {results.Balance.mean()}")

	Units	Rewards	Job	Cost	Balance
5	25000	K-Marks	Excavator Improvements	4306.0	20694.0
6	37000	K-Marks	A new type of Alloy	15468.0	21532.0
7	27000	K-Marks	Automated Security	4930.0	22070.0
9	52000	K-Marks	Classified I	27030.0	24970.0
15	27000	K-Marks	Geologist	6727.0	20273.0
16	29000	K-Marks	Industry Secret	8649.0	20351.0
17	34000	K-Marks	Veltecite Hearts	12974.0	21026.0
18	40000	K-Marks	Laser Rifles	18259.0	21741.0
19	47000	K-Marks	Classified II	23922.0	23078.0
20	41000	K-Marks	Unlimited Power	17377.0	23623.0
32	27000	K-Marks	Meteor Core	7594.0	19406.0
33	30000	K-Marks	Shards of pure Power	8104.0	21896.0
34	48000	K-Marks	Battery Day	22782.0	25218.0
41	28000	K-Marks	Upgrades	3845.0	24155.0
43	140000	K-Marks	Exclusive Drilling Rights	115330.0	24670.0
44	32000	K-Marks	Stocking Up	9120.0	22880.0
45	26000	K-Marks	Shock and Awe	11385.0	14615.0
50	37000	K-Marks	Sabotage	15696.0	21304.0
51	28000	K-Marks	Old Currency	6000.0	22000.0
52	26000	K-Marks	Recoil Compensation	9390.0	16610.0
54	45000	K-Marks	New 3D Printer	21362.0	23638.0
55	33000	K-Marks	Arms Race	11532.0	21468.0
56	35000	K-Marks	Energy Crisis	18984.0	16016.0
71	26000	K-Marks	Data Drive IV	12150.0	13850.0
76	33000	K-Marks	A Solution	10125.0	22875.0
82	25000	K-Marks	Data Drop	10124.0	14876.0
83	26000	K-Marks	Field Supplies	3000.0	23000.0
84	25000	K-Marks	Spare Parts	2024.0	22976.0
90	29000	K-Marks	NEW-Hard-ICA-Gather-6	9142.0	19858.0
91	227000	K-Marks	Striking Big	201830.0	25170.0
93	25000	K-Marks	A Craving	4048.0	20952.0
97	26000	K-Marks	Cretins	6300.0	19700.0
98	27000	K-Marks	Sensor Array Repairs	6908.0	20092.0
99	32000	K-Marks	A new Energy Source	11289.0	20711.0
100	30000	K-Marks	Data Center Upgrades	7690.0	22310.0
101	30000	K-Marks	Keep the Experiment running	7200.0	22800.0
102	61000	K-Marks	Safety Measures	36309.0	24691.0
122	31000	K-Marks	Harvest Day	6583.0	24417.0
125	37000	K-Marks	Core Values	15188.0	21812.0
126	45000	K-Marks	Promising Results	21266.0	23734.0
130	24000	K-Marks	Stormy Samples	2700.0	21300.0
131	25000	K-Marks	Stormy Samples II	2024.0	22976.0
132	33000	K-Marks	Loadout Drop	7320.0	25680.0
136	33000	K-Marks	NEW-Hard-Osiris-Gather-7	12816.0	20184.0
138	155000	K-Marks	NEW-Hard-Osiris-EliteCrusher-1	0.0	155000.0

Perfect! Next we’ll loop back around to automating the pipeline and uploading the data to Kaggle.