Using machine learning for intelligent shard sizing on the cloud
Abstract
This paper proposes an algorithm that eliminates the need for conservative approximations and reduces the need for reactive refinement. A multiple linear regression based machine learning algorithm is used to predict the latency of requests for a given application deployed on a cloud machine. The predicted latency helps to decide accurately and with certainty if the capacity of the cloud machine will satisfy the service level agreement for effective operation of the application. Application of the proposed methods on a popular database schema on the cloud resulted in highly accurate predictions. The results of the deployment and the tests performed to establish the accuracy have been presented in detail and are shown to establish the authenticity of the claims.
Keywords
Full Text:
PDFReferences
Lionel C. Briand et al. “An Assessment and Comparison of Common Software Cost Estimation Modeling
Techniques”. In: Proceedings of the 21st International Conference on Software Engineering. ICSE ’99. Los
Angeles, California, USA: ACM, 1999, pp. 313–322.
Cassandra Architecture. https://docs.datastax.com/en/archived/cassandra/2.0/. Accessed: Jan, 2019.
Cloud At Cost. http://www.cloudatcost.com/. Accessed: October, 2018.
Carlo Curino et al. “Schism: A Workload-driven Approach to Database Replication and Partitioning”. In:
Proc. VLDB Endow. 3.1-2 (Sept. 2010), pp. 48–57.
Deniz Hastorun et al. “Dynamo: amazon’s highly available key-value store”. In: In Proc. SOSP. 2007, pp.
–220.
Chao-Wen Huang et al. “The improvement of auto-scaling mechanism for distributed database - A case
study for MongoDB”. In: Network Operations and Management Symposium (APNOMS), 2013 15th Asia
Pacific. Sept. 2013, pp. 1–3.
InnoDB Buffer Pool Size. https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html. Accessed:
January, 2019.
S. Jamil et al. “Impact of facebook intensity on academic grades of private university students”. In: 2013
th International Conference on Information and Communication Technologies. Dec. 2013, pp. 1–10.
David Karger et al. “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving
Hot Spots on the World Wide Web”. In: In ACM Symposium on Theory of Computing. 1997, pp. 654–
Peter Kennedy. A Guide to Econometrics, 5th Edition. 5th ed. Vol. 1. The MIT Press, 2003.
P. Kookarinrat and Y. Temtanapat. “Analysis of Range-Based Key Properties for Sharded Cluster of
MongoDB”. In: Information Science and Security (ICISS), 2015 2nd International Conference on. Dec.
, pp. 1–4.
John J. Marciniak. Encyclopedia of Software Engineering. 2nd. New York, NY, USA: John Wiley &
Sons, Inc., 2002. isbn: 0471210072.
Floyd A. Miller. “Improving Heuristic Regression Analysis”. In: Proceedings of the 6th
Annual Southeastern Regional Meeting of the Associatio for Computing Machinery and National Meeting
of Biomedical Computing- Volume 1. ACM-SE 6. Chapel Hill, North Carolina: ACM, 1967, pp. 1–23.
MySQL Employee Sample Database. https://dev.mysql.com/doc/employee/en/sakila-structure.html.
Accessed: January, 2019.
Sam Newman. Building Microservices. O’Reilly Media, Inc., Feb. 2015.
Oracle MySQL Cloud Service. https: / / www. mysql.com/cloud/. Accessed: 2018-06-22.
M. G. E. Peterson. “Multiple comparisons and the p-value in evaluation”. In: Proceedings 12th IEEE
Symposium on Computer-Based Medical Systems (Cat. No.99CB36365). 1999, pp. 260–263.
Man Qi et al. “Big Data Management in Digital Forensics”. In: Computational Science and Engineering
(CSE), 2014 IEEE 17th Internationa Conference on. Dec. 2014, pp. 238–243.
Riak Architecture. https://docs.basho.com/riak/kv/2.2.3/using/reference/v3-multi-datacenter/architecture/.
Accessed: January, 2019.
R. Rivest. The MD5 Message-Digest Algorithm. RFC 1321. Apr. 1992.
T. Rögnvaldsson et al. “Estimating p-Values for Deviation Detection”. In: 2014 IEEE Eighth
International Conference on Self-Adaptive and Self Organizing Systems. Sept. 2014, pp. 100–109.
Rebecca Taft et al. “E-store: Fine-grained Elastic Partitioning for Distributed Transaction Processing
Systems”. In: Proc. VLDB Endow. 8.3 (Nov. 2014), pp. 245–256.2735514.
Rebecca Taft et al. “P-Store: An Elastic Database System with Predictive Provisioning”. In: Proceedings
of the 2018 International Conference on Management of Data. SIGMOD ’18. Houston, TX, USA: ACM,
, pp. 205–219. isbn: 978-1-4503-4703-7.
Hee Beng Kuan Tan, Yuan Zhao, and Hongyu Zhang. “Conceptual Data Model-based Software Size
Estimation for Information Systems”. In: ACM Trans. Softw. Eng. Methodol. 19.2 (Oct. 2009), 4:1–4:37
Typical cloud applications. https: //financesonline.com/top-15-payroll-management-software-systems/.
Accessed: October, 2018.
Xiaolin Wang, Haopeng Chen, and Zhenhua Wang. “Research on Improvement of Dynamic Load
Balancing in MongoDB”. In: Dependable, Autonomic and Secure Computing (DASC), 2013 IEEE 11th
International Conference on. Dec. 2013, pp. 124–130.
Wikipedia page view statistics. https://dumps.wikimedia.org/other/pageviews/2018/. Accessed: January,
DOI: http://dx.doi.org/10.21533/pen.v7i1.332
Refbacks
- There are currently no refbacks.
Copyright (c) 2019 Narayanan Venkateswaran, Anurag Shekhar, Suvamoy Changder, Narayan C Debnath

This work is licensed under a Creative Commons Attribution 4.0 International License.
ISSN: 2303-4521
Digital Object Identifier DOI: 10.21533/pen
This work is licensed under a Creative Commons Attribution 4.0 International License