on 09-23-2013 9:56 PM
Hi.
I know if I perform a DUMP DATABASE in an actively modified database, the pages might be written out in an essentially random order, because ASE will use an algorithm that gives priority to a page that a user is wanting to modify just then. Or something.
But what if the database is idle throughout the whole DUMP operation? Will ASE tend to write the pages out in a fixed order, like in sequential order by page number?
I ask because I'm wondering if it's technically feasible to reduce the file transfer size for a full dump, through use of a diff-type algorithm such as rsync. It seems like if the content of one full dump is wildly different from the content of the next one, due to ASE shuffling the order of the pages each time, then such a diff algorithm could not possibly work. The entire dump file would have to be transferred off each time. But what if one dump file was only slightly different from the last one? Would rsync be able to take advantage of that? And I realize that this could possibly make sense only with UNcompressed dumps.
Thanks.
- John.
John
I know if I perform a DUMP DATABASE in an actively modified database, the pages might be written out in an essentially random order, because ASE will use an algorithm that gives priority to a page that a user is wanting to modify just then. Or something.
Yes. And Backup Server is much more intelligent than that.
Think about the dump file written in, say, eight stripes.
But what if the database is idle throughout the whole DUMP operation? Will ASE tend to write the pages out in a fixed order, like in sequential order by page number?
AFAIK no order can be determined. BackupServer maps onto the Shared Memory Segment. It can read the caches directly. So if any order can be identified (noting that it would not be a programmable or predictable one), it would be a list of caches, and then physical order (not MRU-LRU, which is not a physical ordering). Plus all the pages that are not in memory, directly from the disk. There would be order in those. All spliced with its internal read/write extents algorithm. Plus another level of splicing if stripes are used.
I ask because I'm wondering if it's technically feasible to reduce the file transfer size for a full dump, through use of a diff-type algorithm such as rsync. It seems like if the content of one full dump is wildly different from the content of the next one, due to ASE shuffling the order of the pages each time, then such a diff algorithm could not possibly work. The entire dump file would have to be transferred off each time. But what if one dump file was only slightly different from the last one? Would rsync be able to take advantage of that?
Now we get to the intent of the question, rather than the question itself. So what you want is really an efficient way of not storing pages that do not change, in the context of the large volume of full db dump files that are stored, archived, etc, is that correct ? And you are researching methods outside ASE that will reduce that volume.
Bit of background.
Next thing I knew was, ASE 15.7 SP100 was delivered, a few months ago. I always read the New Features Guide for every release, and this one is 154 pages. No reference to the series of previous minor releases, so this is not a minor release. It is packed with a number of large new features. One of them is the promised Incremental Backup, with a full set of features around it. They have called it Cumulative Backup. I have not tested that release yet, but the doco is complete (still not synchronised, but let's not beat a dead horse).
The release is not a minor or minor, minor release. AFAIC the release should have been named 15.8, if not ASE 16. Since there is no list of what ASE 16 is, we can't state that it is, or is not ASE 16. And they keep changing the internal and external release names. If I were to go by promises made five years ago, and the few posts about the subject in-between, which is all that I have, this is ASE 16. (Note that I am not a Sybase employee, I do not speak for Sybase, or wish to appear to do so. I am speaking for myself, as an old, battle-scarred Sybase hand.)
Please look into ASE 15.7 SP100, the New Features Guide. What you are seeking, at least on the face of it and unconfirmed, has been delivered, inside ASE. No need for anything outside ASE, or for post-dump processing.
Cheers
Derek
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Derek,
Thanks. I should have added that I'm using ASE 15.0.3. The Cumulative Backup feature sounds cool though.
Now we get to the intent of the question, rather than the question itself. So what you want is really an efficient way of not storing pages that do not change, in the context of the large volume of full db dump files that are stored, archived, etc, is that correct ? And you are researching methods outside ASE that will reduce that volume.
Yes, I think that is basically correct. I'm wanting to copy a periodic dump file through a slow network, and I want the receiving end to always hold the most recent dump file. Worst case scenario is I can copy the entire dump file each time. Better would be to rely on rsync's ability to copy only the parts of a file that have changed since the last time, thereby minimizing the size of the network transfer..
Thanks.
- John.
I'm not sure if anyone will ever read this, but the new ASE roadmap seems to be here:
https://roadmaps.sap.com/board?range=CURRENT-LAST&PRODUCT=67837800100800005166#Q4%202024
(requires an SAP support login to view)
I'm using rsync with rsyncable gzip on 8 stripes. It works well for me on ASE 15.0.3.
rsync send around 200M of a 2G rsyncable gzip file.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Furter to my previous post I've decided to give an example of how I do the dump as that might be of assistance.
Example:
$ mkfifo sysprocs.1of2.fifo
$ mkfifo sysprocs.2of2.fifo
$ /usr/local/bin/gzip -1c --rsyncable <sysprocs.2of2.fifo >sysprocs.2of2.dgz &
$ /usr/local/bin/gzip -1c --rsyncable <sysprocs.1of2.fifo >sysprocs.1of2.dgz &
1> dump database sybsystemprocs to 'compress::0::/backup/sysprocs.1of2.fifo'
2> stripe on 'compress::0::/backup/sysprocs.2of2.fifo'
3> go
...
Backup Server: 4.188.1.1: Database sybsystemprocs: 126318 kilobytes (100%) DUMPED.
Backup Server: 3.42.1.1: DUMP is complete (database sybsystemprocs).
1>
[2] + Done /usr/local/bin/gzip -1c --rsyncable <sysprocs.1of2.fifo >sysprocs.1of2.dgz &
[1] + Done /usr/local/bin/gzip -1c --rsyncable <sysprocs.2of2.fifo >sysprocs.2of2.dgz &
Files at remote:
$ ll sysprocs*dgz
-rw-r----- 1 syb1503 sybase 8983974 Sep 25 14:18 sysprocs.1of2.dgz
-rw-r----- 1 syb1503 sybase 9010398 Sep 25 14:18 sysprocs.2of2.dgz
New local files:
$ ll sysprocs*dgz
-rw-r----- 1 syb1503 sybase 8983209 Sep 25 14:44 sysprocs.1of2.dgz
-rw-r----- 1 syb1503 sybase 9010320 Sep 25 14:44 sysprocs.2of2.dgz
Sending file 1 to remote with rsync:
building file list ...
done
delta-transmission enabled
sysprocs.1of2.dgz
total: matches=2995 hash_hits=3904 false_alarms=0 data=22169
sent 34294 bytes received 18060 bytes 34902.67 bytes/sec
total size is 8983209 speedup is 171.59
Sending file 2 to remote with rsync:
building file list ...
done
delta-transmission enabled
sysprocs.2of2.dgz
total: matches=2996 hash_hits=3928 false_alarms=0 data=22320
sent 34445 bytes received 18066 bytes 35007.33 bytes/sec
total size is 9010320 speedup is 171.59
pd123456 dreyer wrote:
I'm using rsync with rsyncable gzip on 8 stripes. It works well for me on ASE 15.0.3.
rsync send around 200M of a 2G rsyncable gzip file.
That's great! It mystifies me how this could work so well though. Doesn't rsync have to depend on an assumption that the source file is sufficiently similar to the destination file? Otherwise it would just have to send the entire file each time. It seems like each of your 8 stripe files would tend to be radically different from one dump to the next, given the "randomness" of how ASE sends pages to the dump output.
Thanks.
- John.
John
That's great! It mystifies me how this could work so well though. Doesn't rsync have to depend on an assumption that the source file is sufficiently similar to the destination file? Otherwise it would just have to send the entire file each time. It seems like each of your 8 stripe files would tend to be radically different from one dump to the next, given the "randomness" of how ASE sends pages to the dump output.
I don't know about mystical, but it certainly brings what I posted, into question.
My info is from a reputable Sybase employee in the very old days. Anyone with access to the codeline used to, and still should, correct incorrect posts. BackupServer has changed a lot in two decades. I used to say that the only value in a dump file is to load, and fast. Some Engineer may have come upon the idea themselves. AFAIC, the structure of the dumpfiles should be identical to that of the devices we are loading into. It should count on the issuer identifying the number of dump threads for the best load into production.
From the evidence, the predictability, it looks like it is doing that, or close to tat.
But I have no idea what Backup Server does these days.
Cheers
Derek
rsyncable gzip info: http://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/
I'm curious as to why you're using rsync/fifo/gzip instead of NFS or some other net file system (or clustered file system). What benefit do you receive over them?
jason
I create rsyncable gzip files on the local server and then use rsync to send the files to our remote site. The resulting compressed files is almost 17G in size of which rsync only need to send about 1.8G and use a 10th of the time it would take to send the entire compressed file.
The network to the remote site is too slow to consider clustering and replication could not keep up during the end-of-day's bulk processing.
User | Count |
---|---|
75 | |
9 | |
8 | |
7 | |
7 | |
6 | |
6 | |
6 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.