09-01-2008, 08:23 PM
Alright, here you go. Some basic enhanced encoding of the tiles. I could have done better if I had my other tools like my bit stream or knew more information about the data, but this is good enough.
I lacked the give-a-shit to actually test it with some real maps or to do the decoding, but its pretty simple. The header is quite basic and explanatory. The tile system I used two different encoding techniques on, first being the same as the header's, second being a recursive updated library.
A little explanation:
Dynamic non-guessable variable bit flags:
This is what the first tile encoding and header uses. It holds a bit for every variable. If the variable does not equal the default value (I just used 0 for everything since I have no idea what the common values for them are), it has to be written. A 1 in the bit states the value was written, while a 0 states it was not. This allows us to eliminate having to write any value that is equal to the default, and provides us a foundation of understanding for the next method...
Recursive non-guessable variable bit flags:
This is the same as above, where a value is written when we can not 100% guess the value. But unlike above, instead of using a constant dictionary (in my example, it was just all 0's) for assumed values, we use a dynamic dictionary, where the assumed values are equal to that of the previous tile. The bit flag meaning changes a little bit, but the output is still the same. The values we are writing out are what is different from the dynamic diction, resulting in a recursive updating. For example, if we had the theoretical data set {A, B, C} and we wanted to write {A, B, D}, we would write the bits 001 stating that A and B were equal, but C was not, then we would write D out so we know what to replace C with. When decoding, we'd just update the dictionary then the dictionary would now be equal to the tile.
Some of you may be now thinking, "Well Spodi, what if I made it smarter at guessing? Could I theoretically make the map take up less than, say, 10 bytes?" Yup. Totally plausible. The catch is that you have to be able to guess it. The recursive updating I used just assumes that the last tile will be somewhat close to the current tile. Seeing as that this scans on a horizontal line left to right, vertical top to bottom, the tiles will vary a bit since a tile often has similarities to other tiles near it, so this is far from optimal. Most likely you will never be able to get such accurate guessing in less than 10 bytes of data unless your map is very predictable, but you can make it take a lot less than I have had it use up.
The results:
Now for the interesting part - the results. Like I said earlier, lacked the give-a-shit to do much more than the basics or to test this on anything but the auto-generated maps, but these took up 218 bytes while the default maps took up 2925 bytes per map, resulting in a file 13.5x less than the original size. Sounds good enough to me. You'll actually be able to fit that whole map into a single packet. Hell, you can fit almost 7 maps into a single packet.
How to improve:
Like I said earlier, one way to improve things would be to improve the guessing system. Though unfortunately, besides altering the order of tiles it reads to a more optimal approach such as crawling the tiles from a center point and reading outwards in a node-based system, using the parent node to retrieve the source dictionary to allow for less variation from the dictionary, guessing is pretty much at a peak on a algorithmic standpoint. You could devise a routine to read all your maps, create a dictionary that holds the percent change of finding tile X when surrounded by tiles A, B, C, D, E, F, G, and H, but that would require quite a bit of effort. In the end its just not worth it except for extreme cases.
Another improvement would be smashing down those goddamn bit flag sizes. How might we do this? Professor David A. Huffman may have an idea. Huffman encoding is a variable-length entropy encoding using a weighted binary tree generated by the Huffman algorithm (yes, the encoding and algorithm are different things, but the encoding is just the application of the algorithm). Its a very simple encoding algorithm, but it is going to require a bit stream to make use of. This will work on randomly placed sets of values as long as we know the start of the value because Huffman encoding has a definitive end-point. When you crawl the tree, you always hit the end (unless you **** up something), and you just read until you hit the end. It is practically a given you won't use up every one of the 256 different bit combinations for the tile bit flags. In fact, you'll probably be using around 20 or so at most. How you would do this is go through as normal and generate all the bit flags for the tiles, but instead of writing them out, you will store them in an array. This is your first pass. Generate a binary tree with the Huffman algorithm out of this. Write out a sequence of bits to generate the binary tree from so the decoder can understand the values. On the second pass, you will go through the exact same (though won't have to regenerate the bit flags again if you stored them in an ordered fashion), but instead of storing the bit flags, you will write out the Huffman encoding for that series of bits. Then just write out the other values as normal. Optimally, you'll be using probably around 1.5-2.5 bits per tile header instead of 8 (depends on how much variation there is). If so, that'd be about an extra 300%+ compression for the default map size. I won't bother elaborating since I doubt anyone cares enough to do this.
I obviously had nothing to do this morning... and this probably would've been best in the knowledge base.
Code:
Sub SaveMapSpodi(ByVal MapNum As Long)
Dim f As Long
Dim HeaderBitFlags As Integer
Dim MapNPCCount As Byte
Dim i As Long
Dim x As Long
Dim y As Long
Dim TileBitFlags As Byte
Dim LastTile As TileRec
f = FreeFile
Open App.Path & "\maps\map" & MapNum & ".dat.spodi" For Binary As #f
With Map(MapNum)
' Static header
Put #f, , .Name
Put #f, , .Revision
' Dynamic header
For i = 1 To MAX_MAP_NPCS
If .Npc(i) 0 Then MapNPCCount = MapNPCCount + 1
Next i
If .Moral 0 Then HeaderBitFlags = HeaderBitFlags Or 1
If .Up 0 Then HeaderBitFlags = HeaderBitFlags Or 2
If .Down 0 Then HeaderBitFlags = HeaderBitFlags Or 4
If .Left 0 Then HeaderBitFlags = HeaderBitFlags Or 8
If .Right 0 Then HeaderBitFlags = HeaderBitFlags Or 16
If .Music 0 Then HeaderBitFlags = HeaderBitFlags Or 32
If .BootMap 0 Then HeaderBitFlags = HeaderBitFlags Or 64
If .BootX 0 Then HeaderBitFlags = HeaderBitFlags Or 128
If .BootY 0 Then HeaderBitFlags = HeaderBitFlags Or 256
If .Shop 0 Then HeaderBitFlags = HeaderBitFlags Or 512
If .Indoors 0 Then HeaderBitFlags = HeaderBitFlags Or 1024
If MapNPCCount 0 Then HeaderBitFlags = HeaderBitFlags Or 2048
Put #f, , HeaderBitFlags
If HeaderBitFlags And 1 Then Put #f, , .Moral
If HeaderBitFlags And 2 Then Put #f, , .Up
If HeaderBitFlags And 4 Then Put #f, , .Down
If HeaderBitFlags And 8 Then Put #f, , .Left
If HeaderBitFlags And 16 Then Put #f, , .Right
If HeaderBitFlags And 32 Then Put #f, , .Music
If HeaderBitFlags And 64 Then Put #f, , .BootMap
If HeaderBitFlags And 128 Then Put #f, , .BootX
If HeaderBitFlags And 256 Then Put #f, , .BootY
If HeaderBitFlags And 512 Then Put #f, , .Shop
If HeaderBitFlags And 1024 Then Put #f, , .Indoors
If HeaderBitFlags And 2048 Then
Put #f, , MapNPCCount
For i = 1 To MAX_MAP_NPCS
If .Npc(i) 0 Then Put #f, , .Npc(i)
Next i
End If
' Tile encoding (pick one)
' Tiles with independant encoding
'For y = 0 To MAX_MAPY
' For x = 0 To MAX_MAPX
' TileBitFlags = 0
' With .Tile(x, y)
' If .Anim 0 Then TileBitFlags = TileBitFlags Or 1
' If .Data1 0 Then TileBitFlags = TileBitFlags Or 2
' If .Data2 0 Then TileBitFlags = TileBitFlags Or 4
' If .Data3 0 Then TileBitFlags = TileBitFlags Or 8
' If .Fringe 0 Then TileBitFlags = TileBitFlags Or 16
' If .Ground 0 Then TileBitFlags = TileBitFlags Or 32
' If .Mask 0 Then TileBitFlags = TileBitFlags Or 64
' If .Type 0 Then TileBitFlags = TileBitFlags Or 128
'
' Put #f, , TileBitFlags
'
' If TileBitFlags And 1 Then Put #f, , .Anim
' If TileBitFlags And 2 Then Put #f, , .Data1
' If TileBitFlags And 4 Then Put #f, , .Data2
' If TileBitFlags And 8 Then Put #f, , .Data3
' If TileBitFlags And 16 Then Put #f, , .Fringe
' If TileBitFlags And 32 Then Put #f, , .Ground
' If TileBitFlags And 64 Then Put #f, , .Mask
' If TileBitFlags And 128 Then Put #f, , .Type
' End With
' Next x
'Next y
' Tiles with delta encoding
' Set up the first LastTile with the default constants
LastTile.Anim = 0
LastTile.Data1 = 0 '... I'm just using all 0's because I have no idea what the fuck these typically are
For y = 0 To MAX_MAPY
For x = 0 To MAX_MAPX
TileBitFlags = 0
With .Tile(x, y)
If .Anim LastTile.Anim Then TileBitFlags = TileBitFlags Or 1
If .Data1 LastTile.Data1 Then TileBitFlags = TileBitFlags Or 2
If .Data2 LastTile.Data2 Then TileBitFlags = TileBitFlags Or 4
If .Data3 LastTile.Data3 Then TileBitFlags = TileBitFlags Or 8
If .Fringe LastTile.Fringe Then TileBitFlags = TileBitFlags Or 16
If .Ground LastTile.Ground Then TileBitFlags = TileBitFlags Or 32
If .Mask LastTile.Mask Then TileBitFlags = TileBitFlags Or 64
If .Type LastTile.Type Then TileBitFlags = TileBitFlags Or 128
Put #f, , TileBitFlags
If TileBitFlags And 1 Then Put #f, , .Anim
If TileBitFlags And 2 Then Put #f, , .Data1
If TileBitFlags And 4 Then Put #f, , .Data2
If TileBitFlags And 8 Then Put #f, , .Data3
If TileBitFlags And 16 Then Put #f, , .Fringe
If TileBitFlags And 32 Then Put #f, , .Ground
If TileBitFlags And 64 Then Put #f, , .Mask
If TileBitFlags And 128 Then Put #f, , .Type
End With
LastTile = .Tile(x, y)
Next x
Next y
End With
Close #f
End Sub
I lacked the give-a-shit to actually test it with some real maps or to do the decoding, but its pretty simple. The header is quite basic and explanatory. The tile system I used two different encoding techniques on, first being the same as the header's, second being a recursive updated library.
A little explanation:
Dynamic non-guessable variable bit flags:
This is what the first tile encoding and header uses. It holds a bit for every variable. If the variable does not equal the default value (I just used 0 for everything since I have no idea what the common values for them are), it has to be written. A 1 in the bit states the value was written, while a 0 states it was not. This allows us to eliminate having to write any value that is equal to the default, and provides us a foundation of understanding for the next method...
Recursive non-guessable variable bit flags:
This is the same as above, where a value is written when we can not 100% guess the value. But unlike above, instead of using a constant dictionary (in my example, it was just all 0's) for assumed values, we use a dynamic dictionary, where the assumed values are equal to that of the previous tile. The bit flag meaning changes a little bit, but the output is still the same. The values we are writing out are what is different from the dynamic diction, resulting in a recursive updating. For example, if we had the theoretical data set {A, B, C} and we wanted to write {A, B, D}, we would write the bits 001 stating that A and B were equal, but C was not, then we would write D out so we know what to replace C with. When decoding, we'd just update the dictionary then the dictionary would now be equal to the tile.
Some of you may be now thinking, "Well Spodi, what if I made it smarter at guessing? Could I theoretically make the map take up less than, say, 10 bytes?" Yup. Totally plausible. The catch is that you have to be able to guess it. The recursive updating I used just assumes that the last tile will be somewhat close to the current tile. Seeing as that this scans on a horizontal line left to right, vertical top to bottom, the tiles will vary a bit since a tile often has similarities to other tiles near it, so this is far from optimal. Most likely you will never be able to get such accurate guessing in less than 10 bytes of data unless your map is very predictable, but you can make it take a lot less than I have had it use up.
The results:
Now for the interesting part - the results. Like I said earlier, lacked the give-a-shit to do much more than the basics or to test this on anything but the auto-generated maps, but these took up 218 bytes while the default maps took up 2925 bytes per map, resulting in a file 13.5x less than the original size. Sounds good enough to me. You'll actually be able to fit that whole map into a single packet. Hell, you can fit almost 7 maps into a single packet.
How to improve:
Like I said earlier, one way to improve things would be to improve the guessing system. Though unfortunately, besides altering the order of tiles it reads to a more optimal approach such as crawling the tiles from a center point and reading outwards in a node-based system, using the parent node to retrieve the source dictionary to allow for less variation from the dictionary, guessing is pretty much at a peak on a algorithmic standpoint. You could devise a routine to read all your maps, create a dictionary that holds the percent change of finding tile X when surrounded by tiles A, B, C, D, E, F, G, and H, but that would require quite a bit of effort. In the end its just not worth it except for extreme cases.
Another improvement would be smashing down those goddamn bit flag sizes. How might we do this? Professor David A. Huffman may have an idea. Huffman encoding is a variable-length entropy encoding using a weighted binary tree generated by the Huffman algorithm (yes, the encoding and algorithm are different things, but the encoding is just the application of the algorithm). Its a very simple encoding algorithm, but it is going to require a bit stream to make use of. This will work on randomly placed sets of values as long as we know the start of the value because Huffman encoding has a definitive end-point. When you crawl the tree, you always hit the end (unless you **** up something), and you just read until you hit the end. It is practically a given you won't use up every one of the 256 different bit combinations for the tile bit flags. In fact, you'll probably be using around 20 or so at most. How you would do this is go through as normal and generate all the bit flags for the tiles, but instead of writing them out, you will store them in an array. This is your first pass. Generate a binary tree with the Huffman algorithm out of this. Write out a sequence of bits to generate the binary tree from so the decoder can understand the values. On the second pass, you will go through the exact same (though won't have to regenerate the bit flags again if you stored them in an ordered fashion), but instead of storing the bit flags, you will write out the Huffman encoding for that series of bits. Then just write out the other values as normal. Optimally, you'll be using probably around 1.5-2.5 bits per tile header instead of 8 (depends on how much variation there is). If so, that'd be about an extra 300%+ compression for the default map size. I won't bother elaborating since I doubt anyone cares enough to do this.
I obviously had nothing to do this morning... and this probably would've been best in the knowledge base.