Package edu.isi.pegasus.planner.refiner
Class DataReuseEngine
- java.lang.Object
-
- edu.isi.pegasus.planner.refiner.Engine
-
- edu.isi.pegasus.planner.refiner.DataReuseEngine
-
- All Implemented Interfaces:
Refiner
public class DataReuseEngine extends Engine implements Refiner
The data reuse engine reduces the workflow on the basis of existing output files of the workflow found in the Replica Catalog. The algorithm works in two passes.In the first pass , we determine all the jobs whose output files exist in the Replica Catalog. An output file with the transfer flag set to false is treated equivalent to the file existing in the Replica Catalog , if
- the output file is not an input to any of the children of the job X
In the second pass, we remove the job whose output files exist in the Replica Catalog and try to cascade the deletion upwards to the parent jobs. We start the breadth first traversal of the workflow bottom up. A node is marked for deletion if -( It is already marked for deletion in pass 1 OR ( ALL of it's children have been marked for deletion AND Node's output files have transfer flags set to false ) )
- Version:
- $Revision$
- Author:
- Karan Vahi
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
DataReuseEngine.BooleanBag
A bag implementation that cam be used to hold a boolean value associated with the graph node
-
Field Summary
Fields Modifier and Type Field Description private java.util.List<Job>
mAllDeletedJobs
List of all deleted jobs during workflow reduction.private java.util.List<GraphNode>
mAllDeletedNodes
List of all deleted jobs during workflow reduction.private ADag
mWorkflow
The workflow object being worked upon.private XMLProducer
mXMLStore
The XML Producer object that records the actions.-
Fields inherited from class edu.isi.pegasus.planner.refiner.Engine
mBag, mLogger, mLogMsg, mOutputPool, mPoolFile, mPOptions, mProps, mRLIUrl, mSiteStore, mTCFile, mTCHandle, mTCMode, REGISTRATION_UNIVERSE, TRANSFER_UNIVERSE
-
-
Constructor Summary
Constructors Constructor Description DataReuseEngine(ADag orgDag, PegasusBag bag)
The constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Graph
cascadeDeletionUpwards(Graph workflow, java.util.List<GraphNode> originalJobsInRC)
Cascade the deletion of the jobs upwards in the workflow.java.util.List<Job>
getDeletedJobs()
This returns all the jobs deleted from the workflow after the reduction algorithm has run.java.util.List<Job>
getDeletedLeafJobs()
This returns all the deleted jobs that happen to be leaf nodes.private java.util.List<GraphNode>
getJobsInRC(Graph workflow, java.util.Set filesInRC)
Returns all the jobs whose output files exist in the Replica Catalog.ADag
getWorkflow()
Returns a reference to the workflow that is being refined by the refiner.XMLProducer
getXMLProducer()
Returns a reference to the XMLProducer, that generates the XML fragment capturing the actions of the refiner.ADag
reduceWorkflow(ADag workflow, ReplicaCatalogBridge rcb)
Reduces the workflow on the basis of the existence of lfn's in the replica catalog.Graph
reduceWorkflow(Graph workflow, ReplicaCatalogBridge rcb)
Reduces the workflow on the basis of the existence of lfn's in the replica catalog.protected boolean
transferOutput(GraphNode node)
Returns whether a user wants output transferred for a node or not.-
Methods inherited from class edu.isi.pegasus.planner.refiner.Engine
addVector, appendArrayList, complainForHeadNodeURLPrefix, complainForHeadNodeURLPrefix, loadProperties, printVector, stringInList, stringInPegVector, stringInVector, vectorToString
-
-
-
-
Field Detail
-
mAllDeletedJobs
private java.util.List<Job> mAllDeletedJobs
List of all deleted jobs during workflow reduction.
-
mAllDeletedNodes
private java.util.List<GraphNode> mAllDeletedNodes
List of all deleted jobs during workflow reduction.
-
mXMLStore
private XMLProducer mXMLStore
The XML Producer object that records the actions.
-
mWorkflow
private ADag mWorkflow
The workflow object being worked upon.
-
-
Constructor Detail
-
DataReuseEngine
public DataReuseEngine(ADag orgDag, PegasusBag bag)
The constructor- Parameters:
orgDag
- The original Dag objectbag
- the bag of initialization objects.
-
-
Method Detail
-
getWorkflow
public ADag getWorkflow()
Returns a reference to the workflow that is being refined by the refiner.- Specified by:
getWorkflow
in interfaceRefiner
- Returns:
- ADAG object.
-
getXMLProducer
public XMLProducer getXMLProducer()
Returns a reference to the XMLProducer, that generates the XML fragment capturing the actions of the refiner. This is used for provenace purposes.- Specified by:
getXMLProducer
in interfaceRefiner
- Returns:
- XMLProducer
-
reduceWorkflow
public ADag reduceWorkflow(ADag workflow, ReplicaCatalogBridge rcb)
Reduces the workflow on the basis of the existence of lfn's in the replica catalog. The existence of files, is determined via the bridge.- Parameters:
workflow
- the workflow to be reduced.rcb
- instance of the replica catalog bridge.- Returns:
- the reduced dag
-
reduceWorkflow
public Graph reduceWorkflow(Graph workflow, ReplicaCatalogBridge rcb)
Reduces the workflow on the basis of the existence of lfn's in the replica catalog. The existence of files, is determined via the bridge.- Parameters:
workflow
- the workflow to be reduced.rcb
- instance of the replica catalog bridge.- Returns:
- the reduced dag. The input workflow object is returned reduced.
-
getDeletedJobs
public java.util.List<Job> getDeletedJobs()
This returns all the jobs deleted from the workflow after the reduction algorithm has run.- Returns:
- List containing the
Job
of deleted leaf jobs.
-
getDeletedLeafJobs
public java.util.List<Job> getDeletedLeafJobs()
This returns all the deleted jobs that happen to be leaf nodes. This entails that the output files of these jobs be transferred from the location returned by the Replica Catalog to the pool specified. This is a subset of mAllDeletedJobs Also to determine the deleted leaf jobs it refers the original dag, not the reduced dag.- Returns:
- List containing the
Job
of deleted leaf jobs.
-
getJobsInRC
private java.util.List<GraphNode> getJobsInRC(Graph workflow, java.util.Set filesInRC)
Returns all the jobs whose output files exist in the Replica Catalog. An output file with the transfer flag set to false is treated equivalent to the file being in the Replica Catalog , if - the output file is not an input to any of the children of the job X- Parameters:
workflow
- the workflow objectfilesInRC
- Set ofString
objects corresponding to the logical filenames of files that are found to be in the Replica Catalog.- Returns:
- a List of GraphNodes with their Boolean bag value set to true.
- See Also:
org.griphyn.cPlanner.classes.Job
-
cascadeDeletionUpwards
protected Graph cascadeDeletionUpwards(Graph workflow, java.util.List<GraphNode> originalJobsInRC)
Cascade the deletion of the jobs upwards in the workflow. We start a breadth first traversal of the workflow bottom up. A node is marked for deletion if -( It is already marked for deletion OR ( ALL of it's children have been marked for deletion AND Node's output files have transfer flags set to false ) )
- Parameters:
workflow
- the worfklow to be deducedoriginalJobsInRC
- list of nodes found to be in the Replica Catalog.
-
transferOutput
protected boolean transferOutput(GraphNode node)
Returns whether a user wants output transferred for a node or not. If no output files are associated , true will be returned- Parameters:
node
- the GraphNode- Returns:
- boolean
-
-